From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 00:40:18 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 00:40:18 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201084018.6FDD3E607F7@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #2 from erezz at voltaire.com 2007-02-01 00:40 ------- Created an attachment (id=71) --> (https://bugs.openfabrics.org/attachment.cgi?id=71&action=view) ofed.conf -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 00:40:39 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 00:40:39 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201084039.988DBE607F8@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #3 from erezz at voltaire.com 2007-02-01 00:40 ------- Created an attachment (id=72) --> (https://bugs.openfabrics.org/attachment.cgi?id=72&action=view) ofed_net.conf -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 00:50:19 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 00:50:19 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201085019.D3E05E607F7@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 erezz at voltaire.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |erezz at voltaire.com Component|IB Core |iSER ------- Comment #4 from erezz at voltaire.com 2007-02-01 00:50 ------- I wasn't able to reproduce this behavior. I made the cma fix: diff -ru openib-1.1/drivers/infiniband/core/cma.c openib-1.1-cma-fix/drivers/infiniband/core/cma.c --- openib-1.1/drivers/infiniband/core/cma.c 2006-12-13 00:36:17.000000000 +0200 +++ openib-1.1-cma-fix/drivers/infiniband/core/cma.c 2007-02-01 09:57:47.000000000 +0200 @@ -43,6 +43,7 @@ #include #include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); Before installing OFED, I installed the open-iscsi package that was shipped with SLES 10 (open-iscsi-0.5.545-9.12). Then, the installation was successful: thyme:/tmp/OFED-1.1.1-ib_local_sa # ./install.sh -c ofed.conf -net ofed_net.conf Removing previous InfiniBand Software installations Installing OFED software into /usr/local/ofed Running /bin/rpm -ihv --force --nodeps /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/kernel-ib-1.1-2.6.16.21_0.8_smp.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/kernel-ib-devel-1.1-2.6.16.21_0.8_smp.x86_64.rpm Running /bin/rpm -ihv /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibcm-0.9.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibcm-devel-0.9.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibcommon-1.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibcommon-devel-1.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibmad-1.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibmad-devel-1.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibumad-1.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibumad-devel-1.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibverbs-1.0.4-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibverbs-devel-1.0.4-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibverbs-utils-1.0.4-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libmthca-1.0.3-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libmthca-devel-1.0.3-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libopensm-2.0.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libosmcomp-2.0.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libosmvendor-2.0.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/librdmacm-0.9.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/librdmacm-devel-0.9.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/librdmacm-utils-0.9.0-0.x86_64.rpm /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/openib-diags-1.1.0-0.x86_64.rpm Running /bin/rpm -Uhv /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/oiscsi-iser-support-1-1.x86_64.rpm Running /bin/rpm -Uhv /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/ofed-docs-1.1.1-0.noarch.rpm Running /bin/rpm -Uhv /tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/ofed-scripts-1.1.1-0.noarch.rpm IPoIB configuration for ib0: IPADDR=192.168.10.58 NETMASK=255.255.255.0 NETWORK=192.168.10.0 BROADCAST=192.168.10.255 ONBOOT=yes IPoIB configuration for ib1: IPADDR=195.168.10.58 NETMASK=255.255.10.0 NETWORK=195.168.10.0 BROADCAST=195.168.10.255 ONBOOT=no Installation finished successfully... thyme:/tmp/OFED-1.1.1-ib_local_sa # rpm -qa|grep kernel-ib kernel-ib-1.1-2.6.16.21_0.8_smp kernel-ib-devel-1.1-2.6.16.21_0.8_smp thyme:/tmp/OFED-1.1.1-ib_local_sa # rpm -ql kernel-ib-1.1-2.6.16.21_0.8_smp|grep iser /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ulp/iser /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ulp/iser/ib_iser.ko For some reason, on your machine scsi/libiscsi.h was missing. On my machine it is located here (this is where SLES 10 puts it): thyme:/tmp/OFED-1.1.1-ib_local_sa # find /usr/src/linux-2.6.16.21-0.8 -name libiscsi.h /usr/src/linux-2.6.16.21-0.8/drivers/scsi/libiscsi.h If you take a look at openib-1.1/kernel_patches/backport/2.6.16_sles10/include_libiscsi.patch, you will see that iSER will look for it in the right place. Therefore, I don't understand what happened on your machine. Please check the following: 1. rpm -q open-iscsi 2. find /usr/src/linux-2.6.16.21-0.8 -name libiscsi.h 3. Check that kernel_patches/backport/2.6.16_sles10/include_libiscsi.patch was applied successfully. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ogerlitz at voltaire.com Thu Feb 1 00:58:56 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 01 Feb 2007 10:58:56 +0200 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <1170275331.14294.1.camel@stevo-desktop> References: <1170275331.14294.1.camel@stevo-desktop> Message-ID: <45C1ABD0.5090404@voltaire.com> Steve Wise wrote: > where can I find this symbol? I can't load rdma_cm on rhel4u4... > rdma_cm: Unknown symbol ip_ib_mc_map Sean, OK, sorry not to mention the rh4u4 issue once you did the push to OFED 1.2 ... From a reason that no one at RH can trace... someone went and removed all the support for ARPHRD_INFINIBAND multicast from u4 where it exists perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), see https://bugs.openfabrics.org/show_bug.cgi?id=2661 Specifically, the below snip from the patch means that on rh4 u4 all IPv4 ARPHRD_INFINIBAND multicast goes on the broadcast group !!! > Index: linux-2.6.9/net/ipv4/arp.c > =================================================================== > --- linux-2.6.9.orig/net/ipv4/arp.c 2004-10-18 23:55:06.000000000 +0200 > +++ linux-2.6.9/net/ipv4/arp.c 2006-09-20 14:43:59.000000000 +0300 > @@ -213,6 +213,9 @@ > case ARPHRD_IEEE802_TR: > ip_tr_mc_map(addr, haddr); > return 0; > + case ARPHRD_INFINIBAND: > + ip_ib_mc_map(addr, haddr); > + return 0; > default: > if (dir) { > memcpy(haddr, dev->broadcast, dev->addr_len); anyway, OFED wise, i see two ways to solve this: 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period. This means that apps offloading multicast traffic through the rdma cm would use the correct group where apps working through the net stack use the broadcast group. 2) having the rdma cm follow the net stack and make its consumer use the broadcast group. Or. From swise at opengridcomputing.com Thu Feb 1 01:01:24 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Thu, 01 Feb 2007 03:01:24 -0600 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <45C1480C.1020600@ichips.intel.com> References: <000101c74576$fedc81f0$8698070a@amr.corp.intel.com> <1170275680.14294.5.camel@stevo-desktop> <45C1480C.1020600@ichips.intel.com> Message-ID: <1170320484.654.6.camel@linux-q667.site> On Wed, 2007-01-31 at 17:53 -0800, Sean Hefty wrote: > Steve Wise wrote: > > Perhaps there's no backport for this to rhel4u4? > > I would have thought so, but I really don't know. The function is called from > net/ipv4/arp.c, and not directly by ipoib. So, I don't know how the backport > patches typically handle this. > > - Sean Here's what I see: ip_ib_mc_map() is called directly from cma_join_ib_multicast(), which is added to the ofed_1_2 cma.c via patch file: kernel_patches/fixes/sean_multicast_1.patch So when I compiled ofed_1_2 on rhel4u4, the cma wouldn't load because there is no ip_ib_mc_map() in rhel4u4. So you need a backport patch for this to work on rhel4u4. Probably many of the older kernels. Steve. From mst at mellanox.co.il Thu Feb 1 01:06:28 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 11:06:28 +0200 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <45C1ABD0.5090404@voltaire.com> References: <1170275331.14294.1.camel@stevo-desktop> <45C1ABD0.5090404@voltaire.com> Message-ID: <20070201090628.GC14189@mellanox.co.il> > From a reason that no one at RH can trace... someone went and removed > all the support for ARPHRD_INFINIBAND multicast from u4 where it exists > perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), > see https://bugs.openfabrics.org/show_bug.cgi?id=2661 > > Specifically, the below snip from the patch means that on rh4 u4 all > IPv4 ARPHRD_INFINIBAND multicast goes on the broadcast group !!! > > > Index: linux-2.6.9/net/ipv4/arp.c > > =================================================================== > > --- linux-2.6.9.orig/net/ipv4/arp.c 2004-10-18 23:55:06.000000000 +0200 > > +++ linux-2.6.9/net/ipv4/arp.c 2006-09-20 14:43:59.000000000 +0300 > > @@ -213,6 +213,9 @@ > > case ARPHRD_IEEE802_TR: > > ip_tr_mc_map(addr, haddr); > > return 0; > > + case ARPHRD_INFINIBAND: > > + ip_ib_mc_map(addr, haddr); > > + return 0; > > default: > > if (dir) { > > memcpy(haddr, dev->broadcast, dev->addr_len); > > anyway, OFED wise, i see two ways to solve this: > > 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period. > > This means that apps offloading multicast traffic through the rdma cm > would use the correct group where apps working through the net stack > use the broadcast group. > > 2) having the rdma cm follow the net stack and make its consumer use the > broadcast group. Correct. Since multicast is broken in other respects on U4 (sockets can't join multicast groups), I think 2 is the simplest approach. Anyone who wants IPoIB milticast should just stay away from U4. -- MST From mst at mellanox.co.il Thu Feb 1 01:09:58 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 11:09:58 +0200 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <1170320484.654.6.camel@linux-q667.site> References: <000101c74576$fedc81f0$8698070a@amr.corp.intel.com> <1170275680.14294.5.camel@stevo-desktop> <45C1480C.1020600@ichips.intel.com> <1170320484.654.6.camel@linux-q667.site> Message-ID: <20070201090958.GD14189@mellanox.co.il> > Quoting Steve WIse : > Subject: Re: ip_ib_mc_map? > > On Wed, 2007-01-31 at 17:53 -0800, Sean Hefty wrote: > > Steve Wise wrote: > > > Perhaps there's no backport for this to rhel4u4? > > > > I would have thought so, but I really don't know. The function is called from > > net/ipv4/arp.c, and not directly by ipoib. So, I don't know how the backport > > patches typically handle this. > > > > - Sean > > Here's what I see: > > ip_ib_mc_map() is called directly from cma_join_ib_multicast(), which is > added to the ofed_1_2 cma.c via patch file: > kernel_patches/fixes/sean_multicast_1.patch > > So when I compiled ofed_1_2 on rhel4u4, the cma wouldn't load because > there is no ip_ib_mc_map() in rhel4u4. > > So you need a backport patch for this to work on rhel4u4. Probably many > of the older kernels. I think this breakage is U4 specific. Someone at RH went to the trouble to rip all of IB related stuff out of the U4 kernel. I think just calling ip_tr_mc_map on U4 instead will be enough. -- MST From ogerlitz at voltaire.com Thu Feb 1 01:17:53 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 01 Feb 2007 11:17:53 +0200 Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails In-Reply-To: <45C0662A.7050203@dev.mellanox.co.il> References: <45BF0575.9020507@dev.mellanox.co.il> <45BF1866.3010807@voltaire.com> <45C0662A.7050203@dev.mellanox.co.il> Message-ID: <45C1B041.4000000@voltaire.com> Dotan Barak wrote: > I think that now, when implementation of IPoIB CM is available and SRQ > is being used, one may > need to use a SRQ with more than 16K WRs. IPoIB UD uses SRQ by nature (since RX from all peers consume buffers from the --only-- RQ) and lives fine with 32 buffers (or 64 you can look in the code). Moreover, my assumption is that pps(RC) <= pps(UC) <= pps(UD) this means that what ever number of RX buffer for UD/2K MTU which is "enough" to have no (or close to zero) packet loss under some traffic pattern, the same pattern can be served with IPoIB CM using SRQ of the same size. Or. From swise at opengridcomputing.com Thu Feb 1 01:37:50 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Thu, 01 Feb 2007 03:37:50 -0600 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com> References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com> Message-ID: <1170322670.654.23.camel@linux-q667.site> > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree > before I created an ofed_1_2 branch (which contains the fix), and didn't update > to match my ofed_1_2 branch. The crash that you reported occurring over iWarp > should also happen over IB for the same reason, so both are likely broken atm... > > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches > of my rdma-dev.git and librdmacm.git trees? I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes you made there will resolve this issue. It just needs to be pulled into ofed_1_2. Thanks! Steve. From ogerlitz at voltaire.com Thu Feb 1 01:38:46 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 01 Feb 2007 11:38:46 +0200 Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails In-Reply-To: References: <45BF0575.9020507@dev.mellanox.co.il> <45BF1866.3010807@voltaire.com> Message-ID: <45C1B526.30101@voltaire.com> Roland Dreier wrote: > > anyway, the solution that comes into my mind is to disable creating a > > QP/SRQ for which > 128KB allocations are needed. So > > mthca_query_device() will set the max_qp_wr and max_srq_wr attributes > > to values whose derived size still allows to use kmalloc. > > But that will limit the size of the queues that userspace can create > too. I guess we could allocate kernel wrid arrays with vmalloc(), but > I wonder if anyone actually cares about this limit... mmm, i would avoid vmalloc if possible. Allocating upto 128K bytes for a kernel resource sounds fine. As for the user space sharing of the same limitation, how about adding to the --kernel-- struct ib_device_attr "for user space" buddy fields to max_qp_wr max_srq_wr and max_cqe such that each hw driver set both values: for the "user space" field the actual hw limitation and for "kernel space" field a value which would pass kmalloc. kernel ULPs calling ibv_device_query would use the original fields, no need to patch them. Same for user space ULPs no need to patch them. However, when the call is made from user space, uverbs_query_device copies to the resp struct the "user space" attr. Or. From mst at mellanox.co.il Thu Feb 1 01:50:03 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 11:50:03 +0200 Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails In-Reply-To: <45C1B526.30101@voltaire.com> References: <45BF0575.9020507@dev.mellanox.co.il> <45BF1866.3010807@voltaire.com> <45C1B526.30101@voltaire.com> Message-ID: <20070201095003.GA15505@mellanox.co.il> > As for the user space sharing of the same limitation, how about adding > to the --kernel-- struct ib_device_attr "for user space" buddy fields to > max_qp_wr max_srq_wr and max_cqe such that each hw driver set both > values: for the "user space" field the actual hw limitation and for > "kernel space" field a value which would pass kmalloc. We could do that I guess but no one so far used query in kernel, and userspace values are currently good. -- MST From dledford at redhat.com Thu Feb 1 02:17:32 2007 From: dledford at redhat.com (Doug Ledford) Date: Thu, 01 Feb 2007 05:17:32 -0500 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <45C1ABD0.5090404@voltaire.com> References: <1170275331.14294.1.camel@stevo-desktop> <45C1ABD0.5090404@voltaire.com> Message-ID: <1170325052.2716.229.camel@fc6.xsintricity.com> On Thu, 2007-02-01 at 10:58 +0200, Or Gerlitz wrote: > Steve Wise wrote: > > where can I find this symbol? I can't load rdma_cm on rhel4u4... > > rdma_cm: Unknown symbol ip_ib_mc_map > > Sean, OK, sorry not to mention the rh4u4 issue once you did the push to > OFED 1.2 ... > > From a reason that no one at RH can trace... someone went and removed > all the support for ARPHRD_INFINIBAND multicast from u4 where it exists > perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), > see https://bugs.openfabrics.org/show_bug.cgi?id=2661 Yes. It's been fixed for U5. It wasn't that the patch got removed, it's that between U3 and U4 I did a complete rebase, which means that all the patches from U3 were tossed out the window and a complete new set made for U4. I just missed re-adding this one in U4. > Specifically, the below snip from the patch means that on rh4 u4 all > IPv4 ARPHRD_INFINIBAND multicast goes on the broadcast group !!! > > > Index: linux-2.6.9/net/ipv4/arp.c > > =================================================================== > > --- linux-2.6.9.orig/net/ipv4/arp.c 2004-10-18 23:55:06.000000000 +0200 > > +++ linux-2.6.9/net/ipv4/arp.c 2006-09-20 14:43:59.000000000 +0300 > > @@ -213,6 +213,9 @@ > > case ARPHRD_IEEE802_TR: > > ip_tr_mc_map(addr, haddr); > > return 0; > > + case ARPHRD_INFINIBAND: > > + ip_ib_mc_map(addr, haddr); > > + return 0; > > default: > > if (dir) { > > memcpy(haddr, dev->broadcast, dev->addr_len); > > anyway, OFED wise, i see two ways to solve this: > > 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period. > > This means that apps offloading multicast traffic through the rdma cm > would use the correct group where apps working through the net stack > use the broadcast group. > > 2) having the rdma cm follow the net stack and make its consumer use the > broadcast group. > > Or. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 02:22:40 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 02:22:40 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201102241.38A69E607F8@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #5 from dmitry.yulov at intel.com 2007-02-01 02:22 ------- Created an attachment (id=73) --> (https://bugs.openfabrics.org/attachment.cgi?id=73&action=view) The file configuration for OFED -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at lists.openfabrics.org Thu Feb 1 02:23:03 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 1 Feb 2007 02:23:03 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070201-0200 daily build status Message-ID: <20070201102303.B082FE607FA@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.12 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 02:30:18 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 02:30:18 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201103018.4DA5EE607F7@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 dmitry.yulov at intel.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #6 from dmitry.yulov at intel.com 2007-02-01 02:30 ------- Hi, Thanks a lot for explanation. I have some comments for you: First of all I need to run build script to make RPMS. I use the build.sh script to do this. Also I need to build all packages from sources. I have attached the my file configuration to build rpms and I see some difference from your file. > rpm -q open-iscsi The fale was presented before I run built RPMS > find /usr/src/linux-2.6.16.21-0.8 -name libiscsi.h The file has presented on my machine > Check that kernel_patches/backport/2.6.16_sles10/include_libiscsi.patch was applied successfully. I checked it and patch appalied sucess. Could you please try to build OFED-1.1.1-ib_local_sa from source using for it my file configuration not your? I get OFED-1.1.1-ib_local_sa from https://svn.openfabrics.org/svn/openib/gen2/branches/1.1/ofed/releases/OFED-1.1.1-ib_local_sa.tgz. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kliteyn at dev.mellanox.co.il Thu Feb 1 02:35:01 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 01 Feb 2007 12:35:01 +0200 Subject: [openib-general] [PATCH] osm: trivial casting for compilation on windows Message-ID: <45C1C255.4060405@dev.mellanox.co.il> Trivial casting for compilation on windows Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_subnet.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index f2e909b..e4e69c0 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -562,7 +562,7 @@ __osm_subn_opts_unpack_uint16( if (!strcmp(p_req_key, p_key)) { - val = strtoul(p_val_str, NULL, 0); + val = (uint16_t)strtoul(p_val_str, NULL, 0); if (val != *p_val) { char buff[128]; -- 1.4.4.1.GIT From dotanb at dev.mellanox.co.il Thu Feb 1 02:41:25 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 01 Feb 2007 12:41:25 +0200 Subject: [openib-general] IB/mthca: question about HCA profile module parameters Message-ID: <45C1C3D5.1050301@dev.mellanox.co.il> Hi Moni. I tried to use the mthca module parameter: for example i tried to change the number of QPs. I got several failures when i used the HCA 25204: * sometimes i got the following error message (when using big values, for example 512K QPs): ib_mthca: 0000:0c: INIT_HCA command failed aborting. ib_mthca: probe of 0000:0c: failed with error -16 * when i tried to use small amount of QPs (1024) the machine just hanged and i noticed a kernel oops message on the console Did you verify the HCA profile module parameter feature? Is there is any known limitation for the values that should be used? (for example: only values which are power of two) thanks Dotan From swise at opengridcomputing.com Thu Feb 1 02:53:25 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Thu, 01 Feb 2007 04:53:25 -0600 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <1170322670.654.23.camel@linux-q667.site> References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com> <1170322670.654.23.camel@linux-q667.site> Message-ID: <1170327205.654.34.camel@linux-q667.site> On Thu, 2007-02-01 at 03:37 -0600, Steve WIse wrote: > > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree > > before I created an ofed_1_2 branch (which contains the fix), and didn't update > > to match my ofed_1_2 branch. The crash that you reported occurring over iWarp > > should also happen over IB for the same reason, so both are likely broken atm... > > > > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches > > of my rdma-dev.git and librdmacm.git trees? > > I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes > you made there will resolve this issue. It just needs to be pulled into > ofed_1_2. > Also, I just pulled down and built the latest ofed_1_2 kernel and user code against 2.6.20-rc7, and the ucma abi is 4. So rdma_create_qp() will still crash even with the librdmacm code to avoid the call to rdma_init_qp_attr for ABI 3 kernels. Steve. From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 03:04:17 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 03:04:17 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201110417.93A48E607F7@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 dmitry.yulov at intel.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | ------- Comment #7 from dmitry.yulov at intel.com 2007-02-01 03:04 ------- I try to build the product again and i saw thet pathces from kernel_patches/backport/2.6.16_sles10/ directory not applied automaticaly. When I applay these patch manually all built. How I can run build process with automaticaly appaling patches? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ogerlitz at voltaire.com Thu Feb 1 03:10:48 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 01 Feb 2007 13:10:48 +0200 Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails In-Reply-To: <20070201095003.GA15505@mellanox.co.il> References: <45BF0575.9020507@dev.mellanox.co.il> <45BF1866.3010807@voltaire.com> <45C1B526.30101@voltaire.com> <20070201095003.GA15505@mellanox.co.il> Message-ID: <45C1CAB8.2080806@voltaire.com> Michael S. Tsirkin wrote: >> As for the user space sharing of the same limitation, how about adding >> to the --kernel-- struct ib_device_attr "for user space" buddy fields to >> max_qp_wr max_srq_wr and max_cqe such that each hw driver set both >> values: for the "user space" field the actual hw limitation and for >> "kernel space" field a value which would pass kmalloc. > We could do that I guess but no one so far used query in kernel, > and userspace values are currently good. srp calls ibv_device_query but does not care for these fields, as for IPoIB CM if you see things as in my other email, i guess you don't need to query as well. However, as this is a kind of easy to implement change which does not break the user kernel ABI and allows kernel consumers to count on query results they got from the hw driver, going longer term i think we do want to have it done. Or. From kliteyn at dev.mellanox.co.il Thu Feb 1 03:48:48 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 01 Feb 2007 13:48:48 +0200 Subject: [openib-general] [PATCH] osm: some trivial chages in the osm_ucast_lash for compilation on windows Message-ID: <45C1D3A0.7060201@dev.mellanox.co.il> Hi Hal, This patch has some trivial changes in the osm_ucast_lash.c for compilation on windows. In general, this file needs a major cosmetic (and not only) patch to fit better into the OSM code. Will get back to it at some point in the future. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_ucast_lash.c | 80 ++++++++++++++++++++++-------------------- 1 files changed, 42 insertions(+), 38 deletions(-) diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c index 70e5cbe..95f3ec9 100644 --- a/osm/opensm/osm_ucast_lash.c +++ b/osm/opensm/osm_ucast_lash.c @@ -217,6 +217,8 @@ static uint8_t find_port_from_lid(IN con uint8_t port_count = 0; uint8_t i=0; osm_physp_t *p_current_physp, *p_remote_physp = NULL; + ib_port_info_t *port_info; + ib_net16_t port_lid; uint8_t egress_port = 255; @@ -227,8 +229,8 @@ static uint8_t find_port_from_lid(IN con // process management port first p_current_physp = osm_node_get_physp_ptr(p_sw->p_node, 0); - ib_port_info_t *port_info = &p_current_physp->port_info; - ib_net16_t port_lid = port_info->base_lid; + port_info = &p_current_physp->port_info; + port_lid = port_info->base_lid; if (port_lid == lid_no) { egress_port = 0; goto Exit; @@ -294,15 +296,15 @@ static int cycle_exists(cdg_vertex_t * s } else { if(current == NULL) { current = start; - assert(prev == NULL); + CL_ASSERT(prev == NULL); } current->visiting_number = visit_num; if(prev != NULL) { prev->next = current; - assert(prev->to == current->from); - assert(prev->visiting_number > 0); + CL_ASSERT(prev->to == current->from); + CL_ASSERT(prev->visiting_number > 0); } new_visit_num = visit_num + 1; @@ -346,7 +348,7 @@ static void remove_semipermanent_depend_ while(sw != dest_switch){ v = cdg_vertex_matrix[lane][sw][i_next_switch]; - assert(v != NULL); + CL_ASSERT(v != NULL); if(v->num_using_vertex == 1) { @@ -366,7 +368,7 @@ static void remove_semipermanent_depend_ depend = i; } - assert(found); + CL_ASSERT(found); if(v->num_using_this_depend[depend] == 1) { for(i=depend; inum_dependencies-1; i++) { @@ -403,7 +405,7 @@ static void enqueue(lash_t *p_lash, int switch_t **switches = p_lash->switches; q_item_t *q_head; - assert(switches[sw]->q_member == 0); + CL_ASSERT(switches[sw]->q_member == 0); switches[sw]->q_member = 1; switches[sw]->dist = dist; switches[sw]->prev = prev; @@ -454,7 +456,7 @@ static void dequeue(lash_t *p_lash, int *dist = switches[q_min->sw]->dist; *prev = switches[q_min->sw]->prev; - assert(switches[q_min->sw]->q_member == 1 && !switches[q_min->sw]->mst_member); + CL_ASSERT(switches[q_min->sw]->q_member == 1 && !switches[q_min->sw]->mst_member); switches[q_min->sw]->q_member = 0; free(q_min); } @@ -468,12 +470,11 @@ static void dequeue(lash_t *p_lash, int static int get_phys_connection(switch_t **switches, int switch_from, int switch_to) { - int i = 0; + unsigned int i = 0; for (i = 0; i < switches[switch_from]->num_connections; i++) if(switches[switch_from]->phys_connections[i] == switch_to) return i; - assert(1==1); return i; } @@ -557,7 +558,7 @@ static void generate_routing_func_for_ms i_dest = i_dest->next; } - assert(prev->next == NULL); + CL_ASSERT(prev->next == NULL); prev->next = concat_dest; concat_dest = dest; } @@ -590,10 +591,9 @@ static void generate_cdg_for_sp(lash_t*p while(sw != dest_switch) { if(cdg_vertex_matrix[lane][sw][next_switch] == NULL) { + unsigned i; v = create_cdg_vertex(num_switches); - int i; - for(i=0; idependency[i] = NULL; v->num_using_this_depend[i] = 0; @@ -630,7 +630,7 @@ static void generate_cdg_for_sp(lash_t*p prev->num_using_this_depend[prev->num_dependencies]++; prev->num_dependencies++; - assert(prev->num_dependencies < num_switches); + CL_ASSERT(prev->num_dependencies < (int)num_switches); if(prev->temp==0) prev->num_temp_depend++; @@ -642,7 +642,7 @@ static void generate_cdg_for_sp(lash_t*p output_link = switches[sw]->routing_table[dest_switch].out_link; if(sw != dest_switch) { - assert(output_link != NONE); + CL_ASSERT(output_link != NONE); next_switch = switches[sw]->phys_connections[output_link]; } @@ -670,7 +670,7 @@ static void set_temp_depend_to_permanent while(sw != dest_switch) { v = cdg_vertex_matrix[lane][sw][next_switch]; - assert(v != NULL); + CL_ASSERT(v != NULL); if(v->temp == 1) { v->temp = 0; @@ -706,13 +706,13 @@ static void remove_temp_depend_for_sp(la while(sw != dest_switch) { v = cdg_vertex_matrix[lane][sw][next_switch]; - assert(v != NULL); + CL_ASSERT(v != NULL); if(v->temp==1) { cdg_vertex_matrix[lane][sw][next_switch] = NULL; free(v); } else { - assert(v->num_temp_depend <= v->num_dependencies); + CL_ASSERT(v->num_temp_depend <= v->num_dependencies); v->num_dependencies = v->num_dependencies - v->num_temp_depend; v->num_temp_depend = 0; v->num_using_vertex--; @@ -744,7 +744,8 @@ static void balance_virtual_lanes(lash_t int *num_mst_in_lane = p_lash->num_mst_in_lane; int ***virtual_location = p_lash->virtual_location; int min_filled_lane, max_filled_lane, medium_filled_lane, trials; - int old_min_filled_lane, old_max_filled_lane, i, j, new_num_min_lane, new_num_max_lane; + int old_min_filled_lane, old_max_filled_lane, new_num_min_lane, new_num_max_lane; + unsigned int i, j; int src, dest, start, next_switch, output_link; int stop = 0, cycle_found; @@ -788,7 +789,7 @@ static void balance_virtual_lanes(lash_t output_link = p_lash->switches[src]->routing_table[dest].out_link; next_switch = p_lash->switches[src]->phys_connections[output_link]; - assert(cdg_vertex_matrix[min_filled_lane][src][next_switch] != NULL); + CL_ASSERT(cdg_vertex_matrix[min_filled_lane][src][next_switch] != NULL); cycle_found = cycle_exists(cdg_vertex_matrix[min_filled_lane][src][next_switch], NULL, NULL, 1); for(i=0; inum_switches; switch_t *sw; - int i; + unsigned int i; sw = malloc(sizeof(*sw)); if (!sw) @@ -926,7 +927,7 @@ static void switch_delete(switch_t *sw) static void free_lash_structures(lash_t *p_lash) { - int i,j,k; + unsigned int i,j,k; unsigned num_switches = p_lash->num_switches; osm_log_t *p_log = &p_lash->p_osm->log; @@ -988,12 +989,11 @@ static int init_lash_structures(lash_t * unsigned vl_min = p_lash->vl_min; unsigned num_switches = p_lash->num_switches; osm_log_t *p_log = &p_lash->p_osm->log; + int status = IB_SUCCESS; + unsigned int i, j, k; OSM_LOG_ENTER( p_log, init_lash_structures); - int status = IB_SUCCESS; - int i, j, k; - // initialise cdg_vertex_matrix[num_switches][num_switches][num_switches] p_lash->cdg_vertex_matrix = (cdg_vertex_t****)malloc(vl_min * sizeof(cdg_vertex_t ****)); for (i = 0; i < vl_min; i++) { @@ -1084,10 +1084,11 @@ static int lash_core(lash_t *p_lash) unsigned num_switches = p_lash->num_switches; switch_t **switches = p_lash->switches; unsigned lanes_needed = 1; - int i, j, k, dest_switch = 0; + unsigned int i, j, k, dest_switch = 0; reachable_dest_t * dests, * idest; int cycle_found = 0; - int v_lane, stop = 0, output_link, i_next_switch; + unsigned v_lane; + int stop = 0, output_link, i_next_switch; int status = IB_SUCCESS; OSM_LOG_ENTER( p_log, lash_core); @@ -1113,7 +1114,7 @@ static int lash_core(lash_t *p_lash) output_link = switches[i]->routing_table[dest_switch].out_link; i_next_switch = switches[i]->phys_connections[output_link]; - assert(p_lash->cdg_vertex_matrix[v_lane][i][i_next_switch] != NULL); + CL_ASSERT(p_lash->cdg_vertex_matrix[v_lane][i][i_next_switch] != NULL); cycle_found = cycle_exists(p_lash->cdg_vertex_matrix[v_lane][i][i_next_switch], NULL, NULL, 1); for(j=0; jsw_guid_tbl )) { + uint64_t current_guid; + switch_t *sw; p_sw = p_next_sw; p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); max_lid_ho = osm_switch_get_max_lid_ho(p_sw); - uint64_t current_guid = p_sw->p_node->node_info.port_guid; - switch_t *sw = p_sw->priv; + current_guid = p_sw->p_node->node_info.port_guid; + sw = p_sw->priv; memset(p_osm->sm.ucast_mgr.lft_buf, 0xff, IB_LID_UCAST_END_HO + 1); @@ -1244,8 +1247,8 @@ static void populate_fwd_tbls(lash_t *p_ cl_ntoh64(current_guid), -1, egress_port); } else { unsigned dst_lash_switch_id = get_lash_id(p_dst_sw); - uint8_t lash_egress_port = sw->routing_table[dst_lash_switch_id].out_link; - uint8_t physical_egress_port = sw->virtual_physical_port_table[lash_egress_port]; + uint8_t lash_egress_port = (uint8_t)sw->routing_table[dst_lash_switch_id].out_link; + uint8_t physical_egress_port = (uint8_t)sw->virtual_physical_port_table[lash_egress_port]; p_osm->sm.ucast_mgr.lft_buf[lid] = physical_egress_port; osm_log(p_log, OSM_LOG_DEBUG, @@ -1366,7 +1369,7 @@ static void lash_cleanup(lash_t *p_lash) if (p_lash->switches) { unsigned id; - for (id = 0; id < p_lash->num_switches ; id++) + for (id = 0; ((int)id) < p_lash->num_switches ; id++) if (p_lash->switches[id]) switch_delete(p_lash->switches[id]); free(p_lash->switches); @@ -1400,6 +1403,7 @@ static int discover_network_properties(l p_next_sw = (osm_switch_t*)cl_qmap_head( &p_subn->sw_guid_tbl ); while(p_next_sw != (osm_switch_t*)cl_qmap_end( &p_subn->sw_guid_tbl ) ) { + uint16_t port_count; p_sw = p_next_sw; p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); @@ -1408,7 +1412,7 @@ static int discover_network_properties(l return -1; id++; - uint16_t port_count = osm_node_get_num_physp (p_sw->p_node); + port_count = osm_node_get_num_physp (p_sw->p_node); // Note, ignoring port 0. management port for (i=1; ip_remote_physp) { ib_port_info_t *p_port_info = &p_current_physp->port_info; - int port_vl_min = ib_port_info_get_op_vls(p_port_info); + uint8_t port_vl_min = ib_port_info_get_op_vls(p_port_info); if (port_vl_min && port_vl_min < vl_min) vl_min = port_vl_min; } @@ -1508,7 +1512,7 @@ static void lash_delete(void *context) lash_t *p_lash = context; if (p_lash->switches) { unsigned id; - for (id = 0; id < p_lash->num_switches ; id++) + for (id = 0; ((int)id) < p_lash->num_switches ; id++) if (p_lash->switches[id]) switch_delete(p_lash->switches[id]); free(p_lash->switches); @@ -1534,7 +1538,7 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_ if (!p_sw || !p_sw->priv) return OSM_DEFAULT_SL; - return ((switch_t *)p_sw->priv)->routing_table[dst_id].lane; + return (uint8_t)((switch_t *)p_sw->priv)->routing_table[dst_id].lane; } int osm_ucast_lash_setup(osm_opensm_t *p_osm) -- 1.4.4.1.GIT From vlad at dev.mellanox.co.il Thu Feb 1 03:58:16 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 01 Feb 2007 13:58:16 +0200 Subject: [openib-general] MVAPICH2 SRPM and install file patches In-Reply-To: <45C14344.9010602@cse.ohio-state.edu> References: <45C14344.9010602@cse.ohio-state.edu> Message-ID: <1170331096.6114.4.camel@vladsk-laptop> On Wed, 2007-01-31 at 20:32 -0500, Shaun Rowland wrote: > I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2, > and it is linked to here: > > http://www.openfabrics.org/~rowland/ofed_1_2/ ofed_1_2_scripts.patch applied. Thanks, -- Vladimir Sokolovsky Mellanox Technologies Ltd. From ogerlitz at voltaire.com Thu Feb 1 04:09:11 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 01 Feb 2007 14:09:11 +0200 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <20070201090628.GC14189@mellanox.co.il> References: <1170275331.14294.1.camel@stevo-desktop> <45C1ABD0.5090404@voltaire.com> <20070201090628.GC14189@mellanox.co.il> Message-ID: <45C1D867.4030208@voltaire.com> Michael S. Tsirkin wrote: >> 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period. >> 2) having the rdma cm follow the net stack and make its consumer use the >> broadcast group. > Correct. Since multicast is broken in other respects on U4 > (sockets can't join multicast groups), I think 2 is the simplest approach. The situation in U4 is kind of more involved, sockets doing IP_ADD_MEMBERSHIP to some multicast group are actually sending and receiving traffic over the IPoIB broadcast group which makes this cluster IPoIB kind of hell. > Anyone who wants IPoIB milticast should just stay away from U4. We are still interested to be able to run our multicast app over the RDMA CM and we want it to be done over the correct multicast group and not over a broadcast group. So option 2 is real problem for us. Or. From mst at mellanox.co.il Thu Feb 1 04:10:08 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 14:10:08 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070125191321.30934.74542.stgit@dell3.ogc.int> References: <20070125191321.30934.74542.stgit@dell3.ogc.int> Message-ID: <20070201121008.GA20789@mellanox.co.il> > Quoting Steve Wise : > Subject: [PATCH 00/12] ofed_1_2 - Neighbour update support > > > Michael/Vlad: > > Here are the backports for snooping arp packets to generate neighbour > update netevents. Also included is the addr.c patch to act on all valid > neigh update events. If this series looks good to you then I'll push > this up and you all can pull it from my git tree. This patches seems to have created a reference leak on each neighbour as a result ipoib interface could not be brought down. It also seems that RHASU2 backport was missing code. I pushed out the following: commit d140398db0da0beb3172e0ccf14ef3023cafec9c Author: Michael S. Tsirkin Date: Thu Feb 1 12:21:34 2007 +0200 Fix neighbour reference leak in netevent.c Signed-off-by: Michael S. Tsirkin diff --git a/kernel_addons/backport/2.6.11/include/src/netevent.c b/kernel_addons/backport/2.6.11/include/src/netevent.c index 6a8df29..0d26662 100644 --- a/kernel_addons/backport/2.6.11/include/src/netevent.c +++ b/kernel_addons/backport/2.6.11/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.12/include/src/netevent.c b/kernel_addons/backport/2.6.12/include/src/netevent.c index 6a8df29..0d26662 100644 --- a/kernel_addons/backport/2.6.12/include/src/netevent.c +++ b/kernel_addons/backport/2.6.12/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.13/include/src/netevent.c b/kernel_addons/backport/2.6.13/include/src/netevent.c index 6a8df29..0d26662 100644 --- a/kernel_addons/backport/2.6.13/include/src/netevent.c +++ b/kernel_addons/backport/2.6.13/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.14/include/src/netevent.c b/kernel_addons/backport/2.6.14/include/src/netevent.c index 188283c..17a12ff 100644 --- a/kernel_addons/backport/2.6.14/include/src/netevent.c +++ b/kernel_addons/backport/2.6.14/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.15/include/src/netevent.c b/kernel_addons/backport/2.6.15/include/src/netevent.c index 188283c..17a12ff 100644 --- a/kernel_addons/backport/2.6.15/include/src/netevent.c +++ b/kernel_addons/backport/2.6.15/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c b/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c index 188283c..17a12ff 100644 --- a/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c +++ b/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.16/include/src/netevent.c b/kernel_addons/backport/2.6.16/include/src/netevent.c index 188283c..17a12ff 100644 --- a/kernel_addons/backport/2.6.16/include/src/netevent.c +++ b/kernel_addons/backport/2.6.16/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c b/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c index 188283c..17a12ff 100644 --- a/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c +++ b/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.17/include/src/netevent.c b/kernel_addons/backport/2.6.17/include/src/netevent.c index 26a0920..4c67de1 100644 --- a/kernel_addons/backport/2.6.17/include/src/netevent.c +++ b/kernel_addons/backport/2.6.17/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c b/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c index 57a23ab..90fce0c 100644 --- a/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c +++ b/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c @@ -39,8 +39,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } diff --git a/kernel_addons/backport/2.6.9_U2/include/src/netevent.c b/kernel_addons/backport/2.6.9_U2/include/src/netevent.c index 5ffadd1..1589300 100644 --- a/kernel_addons/backport/2.6.9_U2/include/src/netevent.c +++ b/kernel_addons/backport/2.6.9_U2/include/src/netevent.c @@ -13,10 +13,59 @@ * Fixes: */ -#include -#include #include #include +#include +#include +#include +#include + +#include +#include +#include +#include + +static DEFINE_MUTEX(lock); +static int count; + +static void destructor(struct sk_buff *skb) +{ + struct neighbour *n; + u8 *arp_ptr; + __be32 gw; + + /* Pull the SPA */ + arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; + memcpy(&gw, arp_ptr, 4); + n = neigh_lookup(&arp_tbl, &gw, skb->dev); + if (n) { + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } + return; +} + +static int arp_recv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pkt) +{ + struct arphdr *arp_hdr; + u16 op; + + arp_hdr = (struct arphdr *) skb->nh.raw; + op = ntohs(arp_hdr->ar_op); + + if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor) + skb->destructor = destructor; + + kfree_skb(skb); + return 0; +} + +static struct packet_type arp = { + .type = __constant_htons(ETH_P_ARP), + .func = arp_recv, + .af_packet_priv = (void *)1, +}; static struct notifier_block *netevent_notif_chain; @@ -34,6 +83,12 @@ int register_netevent_notifier(struct notifier_block *nb) int err; err = notifier_chain_register(&netevent_notif_chain, nb); + if (!err) { + mutex_lock(&lock); + if (count++ == 0) + dev_add_pack(&arp); + mutex_unlock(&lock); + } return err; } @@ -49,7 +104,16 @@ int register_netevent_notifier(struct notifier_block *nb) int unregister_netevent_notifier(struct notifier_block *nb) { - return notifier_chain_unregister(&netevent_notif_chain, nb); + int err; + + err = notifier_chain_unregister(&netevent_notif_chain, nb); + if (!err) { + mutex_lock(&lock); + if (--count == 0) + dev_remove_pack(&arp); + mutex_unlock(&lock); + } + return err; } /** diff --git a/kernel_addons/backport/2.6.9_U3/include/src/netevent.c b/kernel_addons/backport/2.6.9_U3/include/src/netevent.c index 5ffadd1..1589300 100644 --- a/kernel_addons/backport/2.6.9_U3/include/src/netevent.c +++ b/kernel_addons/backport/2.6.9_U3/include/src/netevent.c @@ -13,10 +13,59 @@ * Fixes: */ -#include -#include #include #include +#include +#include +#include +#include + +#include +#include +#include +#include + +static DEFINE_MUTEX(lock); +static int count; + +static void destructor(struct sk_buff *skb) +{ + struct neighbour *n; + u8 *arp_ptr; + __be32 gw; + + /* Pull the SPA */ + arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; + memcpy(&gw, arp_ptr, 4); + n = neigh_lookup(&arp_tbl, &gw, skb->dev); + if (n) { + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } + return; +} + +static int arp_recv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pkt) +{ + struct arphdr *arp_hdr; + u16 op; + + arp_hdr = (struct arphdr *) skb->nh.raw; + op = ntohs(arp_hdr->ar_op); + + if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor) + skb->destructor = destructor; + + kfree_skb(skb); + return 0; +} + +static struct packet_type arp = { + .type = __constant_htons(ETH_P_ARP), + .func = arp_recv, + .af_packet_priv = (void *)1, +}; static struct notifier_block *netevent_notif_chain; @@ -34,6 +83,12 @@ int register_netevent_notifier(struct notifier_block *nb) int err; err = notifier_chain_register(&netevent_notif_chain, nb); + if (!err) { + mutex_lock(&lock); + if (count++ == 0) + dev_add_pack(&arp); + mutex_unlock(&lock); + } return err; } @@ -49,7 +104,16 @@ int register_netevent_notifier(struct notifier_block *nb) int unregister_netevent_notifier(struct notifier_block *nb) { - return notifier_chain_unregister(&netevent_notif_chain, nb); + int err; + + err = notifier_chain_unregister(&netevent_notif_chain, nb); + if (!err) { + mutex_lock(&lock); + if (--count == 0) + dev_remove_pack(&arp); + mutex_unlock(&lock); + } + return err; } /** diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c index 6a8df29..0d26662 100644 --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; memcpy(&gw, arp_ptr, 4); n = neigh_lookup(&arp_tbl, &gw, skb->dev); - if (n) + if (n) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + neigh_release(n); + } return; } -- MST From mst at mellanox.co.il Thu Feb 1 04:19:30 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 14:19:30 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070125191321.30934.74542.stgit@dell3.ogc.int> References: <20070125191321.30934.74542.stgit@dell3.ogc.int> Message-ID: <20070201121930.GB20789@mellanox.co.il> > Here are the backports for snooping arp packets to generate neighbour > update netevents. OK, I went (somewhat belatedly) over this code in more depth and I see a couple of issues that I'd like you to address: - There's some trailing whitespace in some netevet.c files. Could you clean these please? - I see: $ diff ./kernel_addons/backport/2.6.9_U4/include/src/netevent.c kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c > #include Should not redhat backports include skbuff.h too? They do use skbuff struct so it seems it is cleaner to include directly, and we would get identical code for redhat and suse. - What is the reason for: if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor) skb->destructor = destructor; kfree_skb(skb); Could we miss events because skb has a desctructor? Can we just call the descructor function directly (this is what addr.c did previously, and this apparently worked fine). Steve, could you pls clone ofed git and address these? -- MST From glebn at voltaire.com Thu Feb 1 04:42:30 2007 From: glebn at voltaire.com (glebn at voltaire.com) Date: Thu, 1 Feb 2007 14:42:30 +0200 Subject: [openib-general] [RFC/BUG] libibverbs: DMA vs. CQ race In-Reply-To: References: Message-ID: <20070201124230.GA23354@minantech.com> On Mon, Jan 29, 2007 at 01:49:04PM -0800, Roland Dreier wrote: > Even with that resolved this all seems rather unfortunate to me. I > don't like the idea of having the kernel keep all these buffers around > and then have the userspace library have to map the right buffer. It > leads to awkwardness like the fact that mthca_resize_cq() seems to be > totally screwed if ibv_cmd_resize_cq() fails for some reason -- it > already munmap'ed the original buffer, and it can't map the new > buffer, and so the CQ is dead with no chance to recover. I looked through ehca driver and it looks as it is doing exactly this "keep all these buffers around and then have the userspace library have to map the right buffer". ehca doesn't support resize_cq though, but lest say this issue will be also resolved will this approach be acceptable. This is how ehca works after all, so we are not inventing something new here. > > The really strange thing about this is that this Altix > coherent/consistent memory really isn't about the memory itself, but > about the relationship of that memory with DMA elsewhere -- as I > understand the code, doing dma_alloc_coherent() returns normal memory > with a special DMA address that tells the system to flush other DMAs > before doing DMA to the coherent region. Which isn't really what most > people understand coherent memory to be, but it has the magic property > of making most drivers work. Yes. It seems Altix abuses dma_alloc_coherent() for this. > > So I'd really like a better solution, but I don't have one in mind > unfortunately. Maybe we can all meditate on this and try to come up > with something cleaner -- I really hope there is a better way to > handle this. > Another approach may be to add another verbs (or we can make ibv_reg_mr do this with special flag) for coherent memory allocation. This verb will allocate coherent memory in the kernel and mmap it from a user space. Than cq will be created as usual by providing lkey to the create_cq verb. The resize will work exactly like it works now i.e allocate new cq buffer call resize_cq with new buffer's lkey, copy cqes, unregister old buffer. -- Gleb. From mst at mellanox.co.il Thu Feb 1 04:42:11 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 14:42:11 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070201121930.GB20789@mellanox.co.il> References: <20070125191321.30934.74542.stgit@dell3.ogc.int> <20070201121930.GB20789@mellanox.co.il> Message-ID: <20070201124211.GD20789@mellanox.co.il> > - There's some trailing whitespace in some netevet.c files. > Could you clean these please? OK, fixed the trailing whitespace and pushed out. -- MST From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 05:02:09 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 05:02:09 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201130209.CF235E607F7@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #8 from erezz at voltaire.com 2007-02-01 05:02 ------- (In reply to comment #7) > I try to build the product again and i saw thet pathces from > kernel_patches/backport/2.6.16_sles10/ directory not applied automaticaly. When > I applay these patch manually all built. How I can run build process with > automaticaly appaling patches? > What is the output of uname -a ? on my machine: thyme:/tmp/ofed_sa_test/OFED-1.1.1-ib_local_sa # uname -a Linux thyme 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux Try the following: Edit ofed_scripts/configure and add the line: "echo ${KVERSION}" where the switch starts in line 214. See what happens in case 2.6.16*. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From wombat2 at us.ibm.com Thu Feb 1 05:21:36 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Thu, 1 Feb 2007 08:21:36 -0500 Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K) in kernel level fails In-Reply-To: Message-ID: > ----- Message from "Or Gerlitz" on Thu, 01 Feb 2007 11:17:53 +0200 ----- > > Dotan Barak wrote: > > I think that now, when implementation of IPoIB CM is available and SRQ > > is being used, one may > > need to use a SRQ with more than 16K WRs. > > IPoIB UD uses SRQ by nature (since RX from all peers consume buffers > from the --only-- RQ) and lives fine with 32 buffers (or 64 you can look > in the code). Moreover, my assumption is that > > pps(RC) <= pps(UC) <= pps(UD) > > this means that what ever number of RX buffer for UD/2K MTU which is > "enough" to have no (or close to zero) packet loss under some traffic > pattern, the same pattern can be served with IPoIB CM using SRQ of the > same size. I would expect that you will need more than 32 or 64 buffers using RC and SRQ. With larger packets it takes longer to do receive processing on each packet under RC. Larger packets means it takes more time to do checksum and copy to the socket because of up to 60K or data vs. 2K. The residency time on the receive queue will be longer. In the traffic pattern where one adapter is receiving from many adapters over the fabric, there will be a larger imbalance between sender rate vs. the receiving rate out of the queue. Given a large enough TCP send and receive window for a single socket to get peak bandwidth, muliple sockets will have more packet in flight for a single destination at the same time in this pattern > > Or. > > > Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Thu Feb 1 05:55:22 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 15:55:22 +0200 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <1170322670.654.23.camel@linux-q667.site> References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com> <1170322670.654.23.camel@linux-q667.site> Message-ID: <20070201135522.GA27688@mellanox.co.il> > Quoting Steve WIse : > Subject: Re: [PATCH] RE: regression in ofed 1.2 > > > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree > > before I created an ofed_1_2 branch (which contains the fix), and didn't update > > to match my ofed_1_2 branch. The crash that you reported occurring over iWarp > > should also happen over IB for the same reason, so both are likely broken atm... > > > > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches > > of my rdma-dev.git and librdmacm.git trees? > > I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes > you made there will resolve this issue. It just needs to be pulled into > ofed_1_2. OK, I've updated ofed to code from rdma-dev ofed_1_2 branch. Some notes: - Sean, please base your branches on specific -rc from linus (OFED 1.2 is now -rc7). - Now that we are entering feature freeze, we should not do full replaces anymore. So Sean, please post incremental patches, labeled ofed-1.2 clearly. -- MST From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 05:57:29 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 05:57:29 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201135729.C10E3E607F7@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #9 from dmitry.yulov at intel.com 2007-02-01 05:57 ------- > Edit ofed_scripts/configure and add the line: "echo ${KVERSION}" where the > switch starts in line 214. See what happens in case 2.6.16*. When I try to run build.sh I see in log file: Applying patches for 2.6.16.21-0.8-smp kernel: /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.16/addr_1_netevents_revert_to_2_6_17.patch patching file drivers/infiniband/core/addr.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.16/ipath-backport.patch patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S patching file drivers/infiniband/hw/ipath/ipath_backport.h patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c As I understand in this case used directory 2.6.16 not 2.6.16_suse10. I try to add in build.sh script the option configure_options="$configure_options --with-patchdir=/root/install/OFED-1.1.1-ib_local_sa/2.6.16_sles10" But in this case build process broken. I don't know how I can add the patching procedure in build.sh for patch cma.c file and kernel. Do you have any ideas? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From swise at opengridcomputing.com Thu Feb 1 05:57:28 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 1 Feb 2007 07:57:28 -0600 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com> <1170322670.654.23.camel@linux-q667.site> <1170327205.654.34.camel@linux-q667.site> <20070201135619.GB27688@mellanox.co.il> Message-ID: <000e01c74608$e9b4a040$020010ac@haggard> >> > >> >> Also, I just pulled down and built the latest ofed_1_2 kernel and >> user >> code against 2.6.20-rc7, and the ucma abi is 4. So rdma_create_qp() >> will still crash even with the librdmacm code to avoid the call to >> rdma_init_qp_attr for ABI 3 kernels. >> >> >> Steve. > > I'm a bit confused. Can you please try with latest code I've just > pushed out? > Will do. This was before you pulled in sean's code. From mst at mellanox.co.il Thu Feb 1 05:56:19 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 15:56:19 +0200 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <1170327205.654.34.camel@linux-q667.site> References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com> <1170322670.654.23.camel@linux-q667.site> <1170327205.654.34.camel@linux-q667.site> Message-ID: <20070201135619.GB27688@mellanox.co.il> > Quoting Steve WIse : > Subject: Re: [PATCH] RE: regression in ofed 1.2 > > On Thu, 2007-02-01 at 03:37 -0600, Steve WIse wrote: > > > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree > > > before I created an ofed_1_2 branch (which contains the fix), and didn't update > > > to match my ofed_1_2 branch. The crash that you reported occurring over iWarp > > > should also happen over IB for the same reason, so both are likely broken atm... > > > > > > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches > > > of my rdma-dev.git and librdmacm.git trees? > > > > I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes > > you made there will resolve this issue. It just needs to be pulled into > > ofed_1_2. > > > > Also, I just pulled down and built the latest ofed_1_2 kernel and user > code against 2.6.20-rc7, and the ucma abi is 4. So rdma_create_qp() > will still crash even with the librdmacm code to avoid the call to > rdma_init_qp_attr for ABI 3 kernels. > > > Steve. I'm a bit confused. Can you please try with latest code I've just pushed out? -- MST From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 06:15:18 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 06:15:18 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070201141518.A7561E607F7@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #10 from erezz at voltaire.com 2007-02-01 06:15 ------- (In reply to comment #9) > > Edit ofed_scripts/configure and add the line: "echo ${KVERSION}" where the > > switch starts in line 214. See what happens in case 2.6.16*. > When I try to run build.sh I see in log file: > Applying patches for 2.6.16.21-0.8-smp kernel: What is the output of uname -r ? This is VERY important. Also, can you run `cat /etc/issue` and send the results? > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.16/addr_1_netevents_revert_to_2_6_17.patch > patching file drivers/infiniband/core/addr.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.16/ipath-backport.patch > patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S > patching file drivers/infiniband/hw/ipath/ipath_backport.h > patching file drivers/infiniband/hw/ipath/ipath_diag.c > patching file drivers/infiniband/hw/ipath/ipath_driver.c > > As I understand in this case used directory 2.6.16 not 2.6.16_suse10. This is not good. Try to debug ofed_scripts/configure and see what happens in the switch in apply_backport_patches. > I try to add in build.sh script the option > configure_options="$configure_options > --with-patchdir=/root/install/OFED-1.1.1-ib_local_sa/2.6.16_sles10" Don't do that. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Thu Feb 1 06:16:59 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Feb 2007 09:16:59 -0500 Subject: [openib-general] [PATCH] osm: some trivial chages in the osm_ucast_lash for compilation on windows In-Reply-To: <45C1D3A0.7060201@dev.mellanox.co.il> References: <45C1D3A0.7060201@dev.mellanox.co.il> Message-ID: <1170339359.15660.265762.camel@hal.voltaire.com> Hi Yevgeny, On Thu, 2007-02-01 at 06:48, Yevgeny Kliteynik wrote: > Hi Hal, > > This patch has some trivial changes in the osm_ucast_lash.c > for compilation on windows. > > In general, this file needs a major cosmetic (and not only) > patch to fit better into the OSM code. There will shortly be some work to improve this. This is one of the next items on the list for this. > Will get back to it at some point in the future. Sure; this is not your problem but if you get to it first that will help. > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From halr at voltaire.com Thu Feb 1 06:32:35 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Feb 2007 09:32:35 -0500 Subject: [openib-general] [PATCH] osm: trivial casting for compilation on windows In-Reply-To: <45C1C255.4060405@dev.mellanox.co.il> References: <45C1C255.4060405@dev.mellanox.co.il> Message-ID: <1170339465.15660.265845.camel@hal.voltaire.com> On Thu, 2007-02-01 at 05:35, Yevgeny Kliteynik wrote: > Trivial casting for compilation on windows > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From steakdbini at yahoo.co.jp Thu Feb 1 07:20:31 2007 From: steakdbini at yahoo.co.jp () Date: Fri, 2 Feb 2007 00:20:31 +0900 (JST) Subject: [openib-general] =?ISO-2022-JP?B?g4GBW4OLgqCC6IKqgsaCpIKygrSCooLcgrWCvYH0?= Message-ID: 20070202002015 お久し振りです。瑞奈です。 先日はメールありがとうございました。 返事が遅くなってしまい、申し訳ありません。 前のメールで質問されていた仕事の話ですが・・・ 私は専業主婦なんです。 去年の12月からずっと家のことをやってて、それで忙しかったんです。 家事は楽しいんですが、さすがに疲れが・・・(>< こんな生活なので出会いもないし、誰かに甘えたくなっちゃう事も多くて。 それで、急にこんな事をいうと変に思われるかもしれませんが 一度会ってお話をしたいのですが、ご迷惑でしょうか? 私は世田谷区に住んでいる31歳です。 一緒にゴハンを食べたり、たくさんお話がしたいです♪ できれば今週末、新宿か渋谷あたりが私は都合がいいのですが いかがでしょうか? http://mic.chu.jp/mizuna/ 最近、このサイトを利用しているので ここからメールを下さいませんか? mixiもやっているのですが、こちらの方が居心地がいいので このサイトばかりを使ってます(^^; それでは、お返事をお待ちしていますね。 瑞奈 From tziporet at mellanox.co.il Thu Feb 1 07:40:26 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 01 Feb 2007 17:40:26 +0200 Subject: [openib-general] components that have not opend the ofed_1_2 branch Message-ID: <45C209EA.1040207@mellanox.co.il> The following components have not opened ofed_1_2 branch: * libibverbs - Roland * libmthca - Roland * libipathverbs - Bryan * tvflash - Roland * srptools - Ishai * management - Hal Please open the branch today or tomorrow at the latest . Thanks, Tziporet From kliteyn at dev.mellanox.co.il Thu Feb 1 07:57:42 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 01 Feb 2007 17:57:42 +0200 Subject: [openib-general] [PATCH 10/10] osm: QoS in OpenSM In-Reply-To: <1170344724.15660.271079.camel@hal.voltaire.com> References: <45BF6548.80104@dev.mellanox.co.il> <1170264561.15660.189494.camel@hal.voltaire.com> <45C115D8.6070504@dev.mellanox.co.il> <1170344724.15660.271079.camel@hal.voltaire.com> Message-ID: <45C20DF6.6060809@dev.mellanox.co.il> Hi Hal, Hal Rosenstock wrote: > Hi again Yevgeny, > > On Wed, 2007-01-31 at 17:19, Yevgeny Kliteynik wrote: > > [snip...] > >>>> + for (i = 0; i < IB_MAX_NUM_VLS; i++) >>>> + { >>>> + if (valid_sls[i]) >>>> + { >>>> + vl = ib_slvl_table_get(p_slvl_tbl,i); >>>> + if (vl == IB_DROP_VL) >>> Does vl > Operational VLs need checking here or is it never set this way >>> ? >> I think that it would be better if the "setup" part would check it when >> configuring sl2vl tables, and when VL > Operational VL it should set >> some default value instead (VL15 looks as a good option). > > OK; but why scan all VLs if they are not supported ? Agree, adding it to my ToDo list of improvements in QoS. >>>> + valid_sls[i] = FALSE; >>>> + } >>>> + } >>>> + >>>> + /* >>>> + * now get pointer to the destination port (same as above) >>>> + */ >>>> + p_node = osm_physp_get_node_ptr( p_dest_physp ); >>>> + >>>> + if( p_node->sw ) >>>> + { >>>> + p_dest_physp = osm_switch_get_route_by_lid( p_node->sw, cl_ntoh16( dest_lid_ho ) ); >>>> + if ( p_dest_physp == 0 ) >>>> + { >>>> + osm_log( p_rcv->p_log, OSM_LOG_ERROR, >>>> + "__osm_pr_rcv_get_path_parms_qos: ERR 1F03: " >>>> + "Cannot find routing to LID 0x%X from switch for GUID 0x%016" PRIx64 "\n", >>>> + dest_lid_ho, >>>> + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); >>>> + status = IB_ERROR; >>>> + goto Exit; >>>> + } >>>> + } >>>> + >>>> + /* >>>> + * Now go through the path step by step >>>> + */ >>>> + >>>> + while( p_physp != p_dest_physp ) >>>> + { >>>> + p_physp = osm_physp_get_remote( p_physp ); >>>> + if ( p_physp == 0 ) >>>> + { >>>> + osm_log( p_rcv->p_log, OSM_LOG_ERROR, >>>> + "__osm_pr_rcv_get_path_parms_qos: ERR 1F04: " >>>> + "Cannot find remote phys port when routing to LID 0x%X from node GUID 0x%016" PRIx64 "\n", >>>> + dest_lid_ho, >>>> + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); >>>> + status = IB_ERROR; >>>> + goto Exit; >>>> + } >>>> + >>>> + in_port_num = osm_physp_get_port_num(p_physp); >>>> + >>>> + /* this is point to point case (no switch in between) */ >>>> + if( p_physp == p_dest_physp ) >>>> + break; >>> >>> Ordering of check for switch and point to point case are different here >>> and original routine. Should they be the same ? If so, which should >>> change ? (Any reason why this was moved in this routine ?) >> Not sure I'm following. >> The order of check for switch and point to point case looks the same >> to me (am I missing something?). The difference that I see is that >> the mtu and rate in the original function are adjusted after the >> check for switch, and in the new function they are adjusted before the >> check, which I think is the same. > > That could have been what I was seeing. Shouldn't the two functions be > indentical in order (assuming these are to be separated) ? I wouldn't > want to see them diverge further. The order in the new function can be changed to match the order in the old one - I have no problem with that. > [snip...] > >>>> +/********************************************************************** >>>> + **********************************************************************/ >>>> static void >>>> __osm_pr_rcv_build_pr( >>>> IN osm_pr_rcv_t* const p_rcv, >>>> @@ -774,7 +1569,8 @@ __osm_pr_rcv_build_pr( >>>> #endif >>>> >>>> p_pr->pkey = p_parms->pkey; >>>> - p_pr->sl = cl_hton16(p_parms->sl); >>>> + ib_path_rec_set_qos_class(p_pr,p_parms->class); >>>> + ib_path_rec_set_sl(p_pr,p_parms->sl); >>>> p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80); >>>> p_pr->rate = (uint8_t)(p_parms->rate | 0x80); >>>> >>>> @@ -832,10 +1628,14 @@ __osm_pr_rcv_get_lid_pair_path( >>>> goto Exit; >>>> } >>>> >>>> - status = __osm_pr_rcv_get_path_parms( p_rcv, p_pr, p_src_port, >>>> - p_dest_port, dest_lid_ho, >>>> - comp_mask, &path_parms ); >>>> - >>>> + if (p_rcv->p_subn->opt.no_qos) >>> Shouldn't this be based on p_rcv->p_subn.opt.qos_policy_file rather than >>> no_qos ? I think there are cases where the QoS will be used without the >>> QoS policy (higher level QoS support). >> By totally ignoring sl2vl tables the original function may return >> path that isn't a "real" path - it may lead to VL15 at some point. >> So the new function takes care of this problem. > > So it's a bug fix (missing functionality) in the existing QoS support. Right. Hopefully, the new function will replace the old one, and there won't be a need to add this functionality to the old function as a separate task. >> When there's no policy file, the policy parse tree is empty, and then >> the ports would not have any qos-level to be applied on the examined path. >> In that case the new function does whatever the old one did, plus checking >> the path for sl2vl "consistency". > > Got it. Thanks. > > -- Hal > >> -- Yevgeny > > From halr at voltaire.com Thu Feb 1 07:45:34 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Feb 2007 10:45:34 -0500 Subject: [openib-general] [PATCH 10/10] osm: QoS in OpenSM In-Reply-To: <45C115D8.6070504@dev.mellanox.co.il> References: <45BF6548.80104@dev.mellanox.co.il> <1170264561.15660.189494.camel@hal.voltaire.com> <45C115D8.6070504@dev.mellanox.co.il> Message-ID: <1170344724.15660.271079.camel@hal.voltaire.com> Hi again Yevgeny, On Wed, 2007-01-31 at 17:19, Yevgeny Kliteynik wrote: [snip...] > >> + for (i = 0; i < IB_MAX_NUM_VLS; i++) > >> + { > >> + if (valid_sls[i]) > >> + { > >> + vl = ib_slvl_table_get(p_slvl_tbl,i); > >> + if (vl == IB_DROP_VL) > > > > Does vl > Operational VLs need checking here or is it never set this way > > ? > I think that it would be better if the "setup" part would check it when > configuring sl2vl tables, and when VL > Operational VL it should set > some default value instead (VL15 looks as a good option). OK; but why scan all VLs if they are not supported ? > >> + valid_sls[i] = FALSE; > >> + } > >> + } > >> + > >> + /* > >> + * now get pointer to the destination port (same as above) > >> + */ > >> + p_node = osm_physp_get_node_ptr( p_dest_physp ); > >> + > >> + if( p_node->sw ) > >> + { > >> + p_dest_physp = osm_switch_get_route_by_lid( p_node->sw, cl_ntoh16( dest_lid_ho ) ); > >> + if ( p_dest_physp == 0 ) > >> + { > >> + osm_log( p_rcv->p_log, OSM_LOG_ERROR, > >> + "__osm_pr_rcv_get_path_parms_qos: ERR 1F03: " > >> + "Cannot find routing to LID 0x%X from switch for GUID 0x%016" PRIx64 "\n", > >> + dest_lid_ho, > >> + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > >> + status = IB_ERROR; > >> + goto Exit; > >> + } > >> + } > >> + > >> + /* > >> + * Now go through the path step by step > >> + */ > >> + > >> + while( p_physp != p_dest_physp ) > >> + { > >> + p_physp = osm_physp_get_remote( p_physp ); > >> + if ( p_physp == 0 ) > >> + { > >> + osm_log( p_rcv->p_log, OSM_LOG_ERROR, > >> + "__osm_pr_rcv_get_path_parms_qos: ERR 1F04: " > >> + "Cannot find remote phys port when routing to LID 0x%X from node GUID 0x%016" PRIx64 "\n", > >> + dest_lid_ho, > >> + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > >> + status = IB_ERROR; > >> + goto Exit; > >> + } > >> + > >> + in_port_num = osm_physp_get_port_num(p_physp); > >> + > >> + /* this is point to point case (no switch in between) */ > >> + if( p_physp == p_dest_physp ) > >> + break; > > > > > > Ordering of check for switch and point to point case are different here > > and original routine. Should they be the same ? If so, which should > > change ? (Any reason why this was moved in this routine ?) > Not sure I'm following. > The order of check for switch and point to point case looks the same > to me (am I missing something?). The difference that I see is that > the mtu and rate in the original function are adjusted after the > check for switch, and in the new function they are adjusted before the > check, which I think is the same. That could have been what I was seeing. Shouldn't the two functions be indentical in order (assuming these are to be separated) ? I wouldn't want to see them diverge further. [snip...] > >> +/********************************************************************** > >> + **********************************************************************/ > >> static void > >> __osm_pr_rcv_build_pr( > >> IN osm_pr_rcv_t* const p_rcv, > >> @@ -774,7 +1569,8 @@ __osm_pr_rcv_build_pr( > >> #endif > >> > >> p_pr->pkey = p_parms->pkey; > >> - p_pr->sl = cl_hton16(p_parms->sl); > >> + ib_path_rec_set_qos_class(p_pr,p_parms->class); > >> + ib_path_rec_set_sl(p_pr,p_parms->sl); > >> p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80); > >> p_pr->rate = (uint8_t)(p_parms->rate | 0x80); > >> > >> @@ -832,10 +1628,14 @@ __osm_pr_rcv_get_lid_pair_path( > >> goto Exit; > >> } > >> > >> - status = __osm_pr_rcv_get_path_parms( p_rcv, p_pr, p_src_port, > >> - p_dest_port, dest_lid_ho, > >> - comp_mask, &path_parms ); > >> - > >> + if (p_rcv->p_subn->opt.no_qos) > > > > Shouldn't this be based on p_rcv->p_subn.opt.qos_policy_file rather than > > no_qos ? I think there are cases where the QoS will be used without the > > QoS policy (higher level QoS support). > > By totally ignoring sl2vl tables the original function may return > path that isn't a "real" path - it may lead to VL15 at some point. > So the new function takes care of this problem. So it's a bug fix (missing functionality) in the existing QoS support. > When there's no policy file, the policy parse tree is empty, and then > the ports would not have any qos-level to be applied on the examined path. > In that case the new function does whatever the old one did, plus checking > the path for sl2vl "consistency". Got it. Thanks. -- Hal > -- Yevgeny From monil at voltaire.com Thu Feb 1 08:17:54 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 1 Feb 2007 18:17:54 +0200 Subject: [openib-general] OFED 1.2 release - to be reviewed in the meeting today In-Reply-To: <45C08E47.2040506@mellanox.co.il> References: <45BDFF11.9080901@mellanox.co.il> <45BFF296.8000908@cse.ohio-state.edu> <45C08E47.2040506@mellanox.co.il> Message-ID: <6a122cc00702010817j52958d85n1d141316e29a7ebf@mail.gmail.com> Tziporet, On 1/31/07, Tziporet Koren wrote: > Shaun Rowland wrote: > > > > Hi. I am not exactly sure where the ofed_1_2 directory for MPI SRPMs is > > supposed to go. I assume from previous meetings this is just a > > filesystem directory. Should it be a directory in my home directory on > > staging.openfabrics.org, in ~/public_html, or is there something else I > > need to do to put this into place? I think from the previous MPI > > specific meeting, this was supposed to be done in a web directory. Since > > I am unclear, I wanted to ask here. > > Please place your SRPM under your home directory at ofed_1_2 directory. > Then you can make this directory accessible to the web in this way: > 1. mkdir public_html > 2. chmod 755 public_html > > Now you can put any stuff under public_html (also symbolic links) and it > will be available via web > www.openfabrics.org/~/ I have put the ib-bonding SRPM in ~monis/ofed_1_2 --Moni > > Tziporet > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From swise at opengridcomputing.com Thu Feb 1 09:12:01 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 11:12:01 -0600 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070201121008.GA20789@mellanox.co.il> References: <20070125191321.30934.74542.stgit@dell3.ogc.int> <20070201121008.GA20789@mellanox.co.il> Message-ID: <1170349921.16637.1.camel@stevo-desktop> Looks good. Thanks, Steve. On Thu, 2007-02-01 at 14:10 +0200, Michael S. Tsirkin wrote: > > Quoting Steve Wise : > > Subject: [PATCH 00/12] ofed_1_2 - Neighbour update support > > > > > > Michael/Vlad: > > > > Here are the backports for snooping arp packets to generate neighbour > > update netevents. Also included is the addr.c patch to act on all valid > > neigh update events. If this series looks good to you then I'll push > > this up and you all can pull it from my git tree. > > This patches seems to have created a reference leak on each neighbour > as a result ipoib interface could not be brought down. > It also seems that RHASU2 backport was missing code. > I pushed out the following: > > > commit d140398db0da0beb3172e0ccf14ef3023cafec9c > Author: Michael S. Tsirkin > Date: Thu Feb 1 12:21:34 2007 +0200 > > Fix neighbour reference leak in netevent.c > > Signed-off-by: Michael S. Tsirkin > > diff --git a/kernel_addons/backport/2.6.11/include/src/netevent.c b/kernel_addons/backport/2.6.11/include/src/netevent.c > index 6a8df29..0d26662 100644 > --- a/kernel_addons/backport/2.6.11/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.11/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.12/include/src/netevent.c b/kernel_addons/backport/2.6.12/include/src/netevent.c > index 6a8df29..0d26662 100644 > --- a/kernel_addons/backport/2.6.12/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.12/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.13/include/src/netevent.c b/kernel_addons/backport/2.6.13/include/src/netevent.c > index 6a8df29..0d26662 100644 > --- a/kernel_addons/backport/2.6.13/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.13/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.14/include/src/netevent.c b/kernel_addons/backport/2.6.14/include/src/netevent.c > index 188283c..17a12ff 100644 > --- a/kernel_addons/backport/2.6.14/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.14/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.15/include/src/netevent.c b/kernel_addons/backport/2.6.15/include/src/netevent.c > index 188283c..17a12ff 100644 > --- a/kernel_addons/backport/2.6.15/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.15/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c b/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c > index 188283c..17a12ff 100644 > --- a/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.16/include/src/netevent.c b/kernel_addons/backport/2.6.16/include/src/netevent.c > index 188283c..17a12ff 100644 > --- a/kernel_addons/backport/2.6.16/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.16/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c b/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c > index 188283c..17a12ff 100644 > --- a/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.17/include/src/netevent.c b/kernel_addons/backport/2.6.17/include/src/netevent.c > index 26a0920..4c67de1 100644 > --- a/kernel_addons/backport/2.6.17/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.17/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c b/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c > index 57a23ab..90fce0c 100644 > --- a/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c > @@ -39,8 +39,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > diff --git a/kernel_addons/backport/2.6.9_U2/include/src/netevent.c b/kernel_addons/backport/2.6.9_U2/include/src/netevent.c > index 5ffadd1..1589300 100644 > --- a/kernel_addons/backport/2.6.9_U2/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.9_U2/include/src/netevent.c > @@ -13,10 +13,59 @@ > * Fixes: > */ > > -#include > -#include > #include > #include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > + > +static DEFINE_MUTEX(lock); > +static int count; > + > +static void destructor(struct sk_buff *skb) > +{ > + struct neighbour *n; > + u8 *arp_ptr; > + __be32 gw; > + > + /* Pull the SPA */ > + arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > + memcpy(&gw, arp_ptr, 4); > + n = neigh_lookup(&arp_tbl, &gw, skb->dev); > + if (n) { > + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > + return; > +} > + > +static int arp_recv(struct sk_buff *skb, struct net_device *dev, > + struct packet_type *pkt) > +{ > + struct arphdr *arp_hdr; > + u16 op; > + > + arp_hdr = (struct arphdr *) skb->nh.raw; > + op = ntohs(arp_hdr->ar_op); > + > + if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor) > + skb->destructor = destructor; > + > + kfree_skb(skb); > + return 0; > +} > + > +static struct packet_type arp = { > + .type = __constant_htons(ETH_P_ARP), > + .func = arp_recv, > + .af_packet_priv = (void *)1, > +}; > > static struct notifier_block *netevent_notif_chain; > > @@ -34,6 +83,12 @@ int register_netevent_notifier(struct notifier_block *nb) > int err; > > err = notifier_chain_register(&netevent_notif_chain, nb); > + if (!err) { > + mutex_lock(&lock); > + if (count++ == 0) > + dev_add_pack(&arp); > + mutex_unlock(&lock); > + } > return err; > } > > @@ -49,7 +104,16 @@ int register_netevent_notifier(struct notifier_block *nb) > > int unregister_netevent_notifier(struct notifier_block *nb) > { > - return notifier_chain_unregister(&netevent_notif_chain, nb); > + int err; > + > + err = notifier_chain_unregister(&netevent_notif_chain, nb); > + if (!err) { > + mutex_lock(&lock); > + if (--count == 0) > + dev_remove_pack(&arp); > + mutex_unlock(&lock); > + } > + return err; > } > > /** > diff --git a/kernel_addons/backport/2.6.9_U3/include/src/netevent.c b/kernel_addons/backport/2.6.9_U3/include/src/netevent.c > index 5ffadd1..1589300 100644 > --- a/kernel_addons/backport/2.6.9_U3/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.9_U3/include/src/netevent.c > @@ -13,10 +13,59 @@ > * Fixes: > */ > > -#include > -#include > #include > #include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > + > +static DEFINE_MUTEX(lock); > +static int count; > + > +static void destructor(struct sk_buff *skb) > +{ > + struct neighbour *n; > + u8 *arp_ptr; > + __be32 gw; > + > + /* Pull the SPA */ > + arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > + memcpy(&gw, arp_ptr, 4); > + n = neigh_lookup(&arp_tbl, &gw, skb->dev); > + if (n) { > + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > + return; > +} > + > +static int arp_recv(struct sk_buff *skb, struct net_device *dev, > + struct packet_type *pkt) > +{ > + struct arphdr *arp_hdr; > + u16 op; > + > + arp_hdr = (struct arphdr *) skb->nh.raw; > + op = ntohs(arp_hdr->ar_op); > + > + if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor) > + skb->destructor = destructor; > + > + kfree_skb(skb); > + return 0; > +} > + > +static struct packet_type arp = { > + .type = __constant_htons(ETH_P_ARP), > + .func = arp_recv, > + .af_packet_priv = (void *)1, > +}; > > static struct notifier_block *netevent_notif_chain; > > @@ -34,6 +83,12 @@ int register_netevent_notifier(struct notifier_block *nb) > int err; > > err = notifier_chain_register(&netevent_notif_chain, nb); > + if (!err) { > + mutex_lock(&lock); > + if (count++ == 0) > + dev_add_pack(&arp); > + mutex_unlock(&lock); > + } > return err; > } > > @@ -49,7 +104,16 @@ int register_netevent_notifier(struct notifier_block *nb) > > int unregister_netevent_notifier(struct notifier_block *nb) > { > - return notifier_chain_unregister(&netevent_notif_chain, nb); > + int err; > + > + err = notifier_chain_unregister(&netevent_notif_chain, nb); > + if (!err) { > + mutex_lock(&lock); > + if (--count == 0) > + dev_remove_pack(&arp); > + mutex_unlock(&lock); > + } > + return err; > } > > /** > diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > index 6a8df29..0d26662 100644 > --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb) > arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len; > memcpy(&gw, arp_ptr, 4); > n = neigh_lookup(&arp_tbl, &gw, skb->dev); > - if (n) > + if (n) { > call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); > + neigh_release(n); > + } > return; > } > > From swise at opengridcomputing.com Thu Feb 1 09:29:24 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 11:29:24 -0600 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070201121930.GB20789@mellanox.co.il> References: <20070125191321.30934.74542.stgit@dell3.ogc.int> <20070201121930.GB20789@mellanox.co.il> Message-ID: <1170350964.16637.18.camel@stevo-desktop> On Thu, 2007-02-01 at 14:19 +0200, Michael S. Tsirkin wrote: > > Here are the backports for snooping arp packets to generate neighbour > > update netevents. > > OK, I went (somewhat belatedly) over this code in more depth and I see > a couple of issues that I'd like you to address: > > - There's some trailing whitespace in some netevet.c files. > Could you clean these please? > You took care of these I assume based on your followup email. > - I see: > $ diff ./kernel_addons/backport/2.6.9_U4/include/src/netevent.c > kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c > > #include > > Should not redhat backports include skbuff.h too? > They do use skbuff struct so it seems it is cleaner to include > directly, and we would get identical code for redhat and suse. > Yup. > - What is the reason for: > if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor) > skb->destructor = destructor; > > kfree_skb(skb); > > Could we miss events because skb has a desctructor? Yes. I looked through the ethernet drivers and didn't see anyone using destructors. I thought perhaps this is ok for backports. There are ways to address this issue: 1) Enhance the current code to save off the original destructor function if it exists and put in ours. Then when our function is called, we do our processing, then call the original destructor function. We would need to save the original function ptr somewhere. 2) schedule the function to happen at a later time and hope the ARP subsystem has already updated the neigh table. I opted against this approach because it doesn't ensure that the neigh entry was updated before we act on it. > Can we just call the descructor function directly (this is what addr.c > did previously, and this apparently worked fine). The original addr.c snoop code worked fine for IB address resolution and for the initial ARP resolution for iWARP devices, but not for notifying iWARP devices when a neighbour changes. For instance, if the neighbour mac address changes, then the iWARP device needs to be notified so it can update its L2 table maintained in the device. We need to defer calling the destructor function until the ARP subsystem has processed this ARP packet. Through testing, I saw that our snoop function gets called _before_ the ARP subsystem processes the ARP packet. So the neighbour entry hasn't been updated yet. Hooking via destructor calls our function _after_ the ARP subsystem has updated the neighbour. So we can then lookup the neigh entry and do the callouts. From mshefty at ichips.intel.com Thu Feb 1 09:55:10 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Feb 2007 09:55:10 -0800 Subject: [openib-general] new IB CM reject reason In-Reply-To: <20070201062431.GB4499@mellanox.co.il> References: <000201c74585$a0bc7260$8698070a@amr.corp.intel.com> <20070201062431.GB4499@mellanox.co.il> Message-ID: <45C2297E.9050306@ichips.intel.com> > No, I don't think "application crashed" makes sense as an element of wire protocol. > I think an optional logging of errors in kernel CM would be a much better > solution. I know I had to add some printks it each time I was debugging SDP. The "application crashed" scenario is what high-lighted the issue. The problem is that the CM must provide a reject reason. Which reject reason do you use? My suggestion was for a reject reason of other/unknown/none given (pick one). > 2. Another objection is that this feature seems to invite misuse where applications > will use REJ reason as a hint on whether remote side crashed. But REJ could be > lost. Wouldn't this confuse the remote side? Currently, the CM issues the reject using "consumer defined", since nothing else maps any better under this condition. But the reject isn't consumer defined... By doing this, an application that expects specific private data in the reject message won't find it, which is just as likely to confuse the remote side. This is why I think an unknown/unspecified reject reason is needed. How an application interprets a reject with 'unknown' reason is up to the application, but I do think this is better than the application trying to guess whether 'consumer defined' really does mean consumer defined. - Sean From mshefty at ichips.intel.com Thu Feb 1 10:01:01 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Feb 2007 10:01:01 -0800 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <20070201135522.GA27688@mellanox.co.il> References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com> <1170322670.654.23.camel@linux-q667.site> <20070201135522.GA27688@mellanox.co.il> Message-ID: <45C22ADD.40108@ichips.intel.com> > - Sean, please base your branches on specific -rc from linus > (OFED 1.2 is now -rc7). My branches should be in sync with rc6. The original branches were built from an earlier rc version, and updated by pulling in the latest rc from Linus through my master branch. Are you wanting the history of the branches reworked so that they get completely rebuilt off of the latest kernel? > - Now that we are entering feature freeze, we should not do full replaces anymore. > So Sean, please post incremental patches, labeled ofed-1.2 clearly. Additional patches will be posted to my ofed_1-2 branch, which you should be able to pull. Do you see a problem with this process? I don't understand why you would need to do a full replace. - Sean From mst at mellanox.co.il Thu Feb 1 10:39:22 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 20:39:22 +0200 Subject: [openib-general] new IB CM reject reason In-Reply-To: <45C2297E.9050306@ichips.intel.com> References: <45C2297E.9050306@ichips.intel.com> Message-ID: <20070201183922.GB15115@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [openib-general] new IB CM reject reason > > > No, I don't think "application crashed" makes sense as an element of wire protocol. > > I think an optional logging of errors in kernel CM would be a much better > > solution. I know I had to add some printks it each time I was debugging SDP. > > The "application crashed" scenario is what high-lighted the issue. The problem > is that the CM must provide a reject reason. Which reject reason do you use? > My suggestion was for a reject reason of other/unknown/none given (pick one). I'm actually happy with what existing code does (consumer reject). I would like to highlight the lack of ability to make CM errors go to system log as a weekness in current CM code, which hinders debugging. Would you be interested in a patch making it possible to enable logging CM errors and/or all CM events? > > 2. Another objection is that this feature seems to invite misuse where applications > > will use REJ reason as a hint on whether remote side crashed. But REJ could be > > lost. Wouldn't this confuse the remote side? > > Currently, the CM issues the reject using "consumer defined", since nothing else > maps any better under this condition. But the reject isn't consumer defined... > By doing this, an application that expects specific private data in the reject > message won't find it, which is just as likely to confuse the remote side. This > is why I think an unknown/unspecified reject reason is needed. > > How an application interprets a reject with 'unknown' reason is up to the > application, but I do think this is better than the application trying to guess > whether 'consumer defined' really does mean consumer defined. Are we talking about code 28? My spec lists it as "consumer reject". The meaning of *private data* is consumer defined. The consumer decided to reject the communica- tion or EE context setup establishment attempt for reasons other than those listed in the other REJ codes. Typically this happens based upon infor- mation being conveyed in the PrivateData field of a message. It can also happen because the Con- sumer decided for reasons unrelated to any CM message it received to terminate the communica- tion or EE context setup establishment attempt. This would therefore be the appropriate Reason code to use if the Consumer decided to destroy the QP or EEC in the midst of the communication or EE context setup establishment attempt. So this really *does* seem to be what spec intended for exactly our case. Now, I do not really object to inventing new rejection reasons: for example, maybe we can invent one that lets us stick the errno value in private data somehow - but it's not like there's no solution inside the spec, and inventing a whole new reject reason just for userspace consumers seems like a narrow approach to me. -- MST From swise at opengridcomputing.com Thu Feb 1 10:42:11 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 12:42:11 -0600 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <20070201135522.GA27688@mellanox.co.il> References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com> <1170322670.654.23.camel@linux-q667.site> <20070201135522.GA27688@mellanox.co.il> Message-ID: <1170355331.16637.25.camel@stevo-desktop> Um, now on rhel4u4 we crash creating the mcast workqueue. The name is "ib_mcast_wq" which is too long for older kernels. Did we loose a backport patch? On Thu, 2007-02-01 at 15:55 +0200, Michael S. Tsirkin wrote: > > Quoting Steve WIse : > > Subject: Re: [PATCH] RE: regression in ofed 1.2 > > > > > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree > > > before I created an ofed_1_2 branch (which contains the fix), and didn't update > > > to match my ofed_1_2 branch. The crash that you reported occurring over iWarp > > > should also happen over IB for the same reason, so both are likely broken atm... > > > > > > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches > > > of my rdma-dev.git and librdmacm.git trees? > > > > I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes > > you made there will resolve this issue. It just needs to be pulled into > > ofed_1_2. > > OK, I've updated ofed to code from rdma-dev ofed_1_2 branch. Some notes: > > - Sean, please base your branches on specific -rc from linus > (OFED 1.2 is now -rc7). > - Now that we are entering feature freeze, we should not do full replaces anymore. > So Sean, please post incremental patches, labeled ofed-1.2 clearly. > From sean.hefty at intel.com Thu Feb 1 10:55:20 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 1 Feb 2007 10:55:20 -0800 Subject: [openib-general] new IB CM reject reason In-Reply-To: <20070201183922.GB15115@mellanox.co.il> Message-ID: <000101c74632$85b37bf0$8698070a@amr.corp.intel.com> >Would you be interested in a patch making it possible to enable logging CM >errors >and/or all CM events? A patch for this would be fine with me. >Are we talking about code 28? My spec lists it as "consumer reject". >The meaning of *private data* is consumer defined. > > The consumer decided to reject the communica- > tion or EE context setup establishment attempt for > reasons other than those listed in the other REJ > codes. Typically this happens based upon infor- > mation being conveyed in the PrivateData field of > a message. It can also happen because the Con- > sumer decided for reasons unrelated to any CM > message it received to terminate the communica- > tion or EE context setup establishment attempt. > This would therefore be the appropriate Reason > code to use if the Consumer decided to destroy > the QP or EEC in the midst of the communication > or EE context setup establishment attempt. > >So this really *does* seem to be what spec intended for exactly our case. I disagree. This is for the CM consumer, not the CM itself. In this case, the CM must issue a reject that will be delivered to the remote application. The CM has no idea what private data format the remote application expects. >Now, I do not really object to inventing new rejection reasons: for example, >maybe we can invent one that lets us stick the errno value in private data >somehow - but it's not like there's no solution inside the spec, >and inventing a whole new reject reason just for userspace consumers >seems like a narrow approach to me. Unless we start enforcing a policy that kernel consumers must issue a reject before destroying a cm_id (while in the connecting phase), they have this problem. My claim is that the reject reasons are insufficient to cover all possible conditions, and adding a generic 'other' reject reason solves this. Using consumer defined, which is what is done today, is incorrect. As an alternate solution, we could also not send any reject and just let the connection time out on the remote side. - Sean From mst at mellanox.co.il Thu Feb 1 11:00:49 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 21:00:49 +0200 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <45C22ADD.40108@ichips.intel.com> References: <45C22ADD.40108@ichips.intel.com> Message-ID: <20070201190049.GC15115@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [openib-general] [PATCH] RE: regression in ofed 1.2 > > > - Sean, please base your branches on specific -rc from linus > > (OFED 1.2 is now -rc7). > > My branches should be in sync with rc6. If you check, they are not. ofed 1 2 branch has an extra commit on top of -rc6. But I figured it out already. > so that they get completely rebuilt off of the latest kernel? No need to do anything at this point. > > - Now that we are entering feature freeze, we should not do full replaces anymore. > > So Sean, please post incremental patches, labeled ofed-1.2 clearly. > > Additional patches will be posted to my ofed_1-2 branch, which you should be > able to pull. First, please post patches on list as well. We can then just take the patch from git or from mail and add it under fixes. > Do you see a problem with this process? Yes. I had to jump through some hoops to first get a patch I can put in OFED due to the issue outlined above, and then get the diff I got to apply without conflicts, since port randomization code conflicted with the QoS patches. All solved now - just put your patch before QoS one - but these conflicts should be be figured out by whoever submits patches. > I don't understand why you would need to do a full replace. We won't do a full replace, just add patches in fixes directory. What I expect everyone to do however, to get patches put in OFED, is to test that patches one posts work in OFED git tree, not just against upstream based git trees. This currently includes testing for build against older kernels on various architectures (me and Vlad put a cross-build setup for this at staging, it now has kernel.org kernels but we will be adding distro kernels) and testing on at least one of the main supported enterprise distros (RHEL/SLES). I simply can't take untested patches - I have nightly tests but no time to test all ULPs before I apply. -- MST From mst at mellanox.co.il Thu Feb 1 11:06:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 21:06:24 +0200 Subject: [openib-general] new IB CM reject reason In-Reply-To: <000101c74632$85b37bf0$8698070a@amr.corp.intel.com> References: <20070201183922.GB15115@mellanox.co.il> <000101c74632$85b37bf0$8698070a@amr.corp.intel.com> Message-ID: <20070201190624.GB6473@mellanox.co.il> > >Are we talking about code 28? My spec lists it as "consumer reject". > >The meaning of *private data* is consumer defined. > > > > The consumer decided to reject the communica- > > tion or EE context setup establishment attempt for > > reasons other than those listed in the other REJ > > codes. Typically this happens based upon infor- > > mation being conveyed in the PrivateData field of > > a message. It can also happen because the Con- > > sumer decided for reasons unrelated to any CM > > message it received to terminate the communica- > > tion or EE context setup establishment attempt. > > This would therefore be the appropriate Reason > > code to use if the Consumer decided to destroy > > the QP or EEC in the midst of the communication > > or EE context setup establishment attempt. > > > >So this really *does* seem to be what spec intended for exactly our case. > > I disagree. This is for the CM consumer, not the CM itself. In this case, the > CM must issue a reject that will be delivered to the remote application. The CM > has no idea what private data format the remote application expects. Since we disagree about spec reading, would you raise this in the relevant WG? > >Now, I do not really object to inventing new rejection reasons: for example, > >maybe we can invent one that lets us stick the errno value in private data > >somehow - but it's not like there's no solution inside the spec, > >and inventing a whole new reject reason just for userspace consumers > >seems like a narrow approach to me. > > Unless we start enforcing a policy that kernel consumers must issue a reject > before destroying a cm_id (while in the connecting phase), they have this > problem. > > My claim is that the reject reasons are insufficient to cover all possible > conditions, and adding a generic 'other' reject reason solves this. Using > consumer defined, which is what is done today, is incorrect. As an alternate > solution, we could also not send any reject and just let the connection time out > on the remote side. And my claim is that you should define private data format to go with this other reason otherwise you are not really solving the problem. -- MST From mst at mellanox.co.il Thu Feb 1 11:11:28 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 21:11:28 +0200 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <1170355331.16637.25.camel@stevo-desktop> References: <1170355331.16637.25.camel@stevo-desktop> Message-ID: <20070201191128.GB17617@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [PATCH] RE: regression in ofed 1.2 > > Um, now on rhel4u4 we crash creating the mcast workqueue. > > The name is "ib_mcast_wq" which is too long for older kernels. > > Did we loose a backport patch? Sean, please rename the multicast wq to ib_mcast as we agreed. I just pushed the following out: commit efedfe57a21a134a65d951bcca73af46da609c5e Author: Michael S. Tsirkin Date: Thu Feb 1 21:09:16 2007 +0200 Make multicast WQ name shorter. Signed-off-by: Michael S. Tsirkin diff --git a/kernel_patches/fixes/merged_sean_rdma_dev_ofed_1_2.patch b/kernel_patches/fixes/merged_sean_rdma_dev_ofed_1_2.patch index e70d4da..4b968db 100644 --- a/kernel_patches/fixes/merged_sean_rdma_dev_ofed_1_2.patch +++ b/kernel_patches/fixes/merged_sean_rdma_dev_ofed_1_2.patch @@ -2225,7 +2225,7 @@ index 0000000..039f1eb +{ + int ret; + -+ mcast_wq = create_singlethread_workqueue("ib_mcast_wq"); ++ mcast_wq = create_singlethread_workqueue("ib_mcast"); + if (!mcast_wq) + return -ENOMEM; + -- MST From mst at mellanox.co.il Thu Feb 1 11:22:21 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 21:22:21 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <1170350964.16637.18.camel@stevo-desktop> References: <1170350964.16637.18.camel@stevo-desktop> Message-ID: <20070201192221.GD17617@mellanox.co.il> > > Could we miss events because skb has a desctructor? > > Yes. I looked through the ethernet drivers and didn't see anyone using > destructors. I thought perhaps this is ok for backports. There are > ways to address this issue: > > 1) Enhance the current code to save off the original destructor function > if it exists and put in ours. Then when our function is called, we do > our processing, then call the original destructor function. We would > need to save the original function ptr somewhere. > > 2) schedule the function to happen at a later time and hope the ARP > subsystem has already updated the neigh table. I opted against this > approach because it doesn't ensure that the neigh entry was updated > before we act on it. > > > Can we just call the descructor function directly (this is what addr.c > > did previously, and this apparently worked fine). > > The original addr.c snoop code worked fine for IB address resolution and > for the initial ARP resolution for iWARP devices, but not for notifying > iWARP devices when a neighbour changes. For instance, if the neighbour > mac address changes, then the iWARP device needs to be notified so it > can update its L2 table maintained in the device. > > We need to defer calling the destructor function until the ARP subsystem > has processed this ARP packet. Through testing, I saw that our snoop > function gets called _before_ the ARP subsystem processes the ARP > packet. So the neighbour entry hasn't been updated yet. Hooking via > destructor calls our function _after_ the ARP subsystem has updated the > neighbour. So we can then lookup the neigh entry and do the callouts. Not sure how do you mean all this. You do kfree_skb immediately in the arp processing function. Will this not call the destructor directly? Anyway, it seems too risky to change the code a lot now. what I am concerned is that this could have broken working code. To reduce the risk of problems for existing code, I'd like to see something like the following: if (someone asked for notification on neighbour changes) do the destructor trick if (someone asked for notification on address resolution) call the destructor directly Could you code this up please? -- MST From mst at mellanox.co.il Thu Feb 1 11:29:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Feb 2007 21:29:24 +0200 Subject: [openib-general] IPoIB CM for merge? Message-ID: <20070201192924.GE17617@mellanox.co.il> Roland, 2.6.20 is nearly done. Could you please spend some time reviewing IPoIB CM code? I am concerned about missing the 2.6.21 merge window. -- MST From swise at opengridcomputing.com Thu Feb 1 12:01:11 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 14:01:11 -0600 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070201192221.GD17617@mellanox.co.il> References: <1170350964.16637.18.camel@stevo-desktop> <20070201192221.GD17617@mellanox.co.il> Message-ID: <1170360071.16637.39.camel@stevo-desktop> On Thu, 2007-02-01 at 21:22 +0200, Michael S. Tsirkin wrote: > > > Could we miss events because skb has a desctructor? > > > > Yes. I looked through the ethernet drivers and didn't see anyone using > > destructors. I thought perhaps this is ok for backports. There are > > ways to address this issue: > > > > 1) Enhance the current code to save off the original destructor function > > if it exists and put in ours. Then when our function is called, we do > > our processing, then call the original destructor function. We would > > need to save the original function ptr somewhere. > > > > 2) schedule the function to happen at a later time and hope the ARP > > subsystem has already updated the neigh table. I opted against this > > approach because it doesn't ensure that the neigh entry was updated > > before we act on it. > > > > > Can we just call the descructor function directly (this is what addr.c > > > did previously, and this apparently worked fine). > > > > The original addr.c snoop code worked fine for IB address resolution and > > for the initial ARP resolution for iWARP devices, but not for notifying > > iWARP devices when a neighbour changes. For instance, if the neighbour > > mac address changes, then the iWARP device needs to be notified so it > > can update its L2 table maintained in the device. > > > > We need to defer calling the destructor function until the ARP subsystem > > has processed this ARP packet. Through testing, I saw that our snoop > > function gets called _before_ the ARP subsystem processes the ARP > > packet. So the neighbour entry hasn't been updated yet. Hooking via > > destructor calls our function _after_ the ARP subsystem has updated the > > neighbour. So we can then lookup the neigh entry and do the callouts. > > Not sure how do you mean all this. You do kfree_skb immediately in the > arp processing function. Will this not call the destructor directly? > No because the skb refcnt gets bumped by the dev packet code before passing it up to each snoop function. So the destructor fn will get called only when the _last_ user of this skbuf frees it. If by some reason we are the last ref, then yes, we'd get called immediately. But that's not what happens because the snoopers get added to the end of the list of users who want any given ethertype packet. Hope that makes sense. > Anyway, it seems too risky to change the code a lot now. > what I am concerned is that this could have broken working code. > I tested it with IB and iWARP. > To reduce the risk of problems for existing code, > I'd like to see something like the following: > > if (someone asked for notification on neighbour changes) > do the destructor trick > > if (someone asked for notification on address resolution) > call the destructor directly > > Could you code this up please? There's no easy way to tell who asked for notifications. And particularly why they asked for notification. I think we should leave it as-is. If we have problems, we'll fix it. Or you could put your arp snoop code back in addr.c and address translation will not use netevents. But still thing we should leave it... From mshefty at ichips.intel.com Thu Feb 1 12:05:34 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Feb 2007 12:05:34 -0800 Subject: [openib-general] new IB CM reject reason In-Reply-To: <20070201190624.GB6473@mellanox.co.il> References: <20070201183922.GB15115@mellanox.co.il> <000101c74632$85b37bf0$8698070a@amr.corp.intel.com> <20070201190624.GB6473@mellanox.co.il> Message-ID: <45C2480E.2000904@ichips.intel.com> > And my claim is that you should define private data format to go with this > other reason otherwise you are not really solving the problem. This is not a consumer issued reject. It is a CM issued reject, so the private data is ignored. This is no different than several other reject reasons (like invalid service ID). At best we could define the ARI, but if we knew what the contents of the ARI should be, then we should use a more specific reject reason than 'other'. - Sean From swise at opengridcomputing.com Thu Feb 1 12:07:21 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 14:07:21 -0600 Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 netevent backport] Message-ID: <1170360441.16637.41.camel@stevo-desktop> From: Steve Wise Add skbuff.h to include list for RHEL4U4 netevent.c file. This makes it identical to the SLES9SP3 file. Signed-off-by: Steve Wise --- .../backport/2.6.9_U4/include/src/netevent.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c index 1589300..87fb55c 100644 --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c @@ -13,6 +13,7 @@ * Fixes: */ +#include #include #include #include From swise at opengridcomputing.com Thu Feb 1 12:09:03 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 14:09:03 -0600 Subject: [openib-general] [PATCH] ofed_1_2 Chelsio ethernet driver updates. Message-ID: <1170360543.16637.45.camel@stevo-desktop> From: Steve Wise This patch updates the ofed_1_2 cxgb3 module to the latest queued for 2.6.21. Signed-off-by: Steve Wise --- drivers/net/cxgb3/firmware_exports.h | 2 +- drivers/net/cxgb3/sge.c | 21 +++++++++------------ drivers/net/cxgb3/t3_cpl.h | 3 --- 3 files changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/net/cxgb3/firmware_exports.h b/drivers/net/cxgb3/firmware_exports.h index 4538377..6a835f6 100755 --- a/drivers/net/cxgb3/firmware_exports.h +++ b/drivers/net/cxgb3/firmware_exports.h @@ -129,7 +129,7 @@ #define FW_OFLD_NUM 8 #define FW_OFLD_SGEEC_START 0 /* - * + * */ #define FW_RI_NUM 1 #define FW_RI_SGEEC_START 65527 diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c index 6b053bf..3f2cf8a 100755 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -601,17 +601,16 @@ static struct sk_buff *get_packet(struct if (len <= SGE_RX_COPY_THRES) { skb = alloc_skb(len, GFP_ATOMIC); if (likely(skb != NULL)) { - struct rx_desc *d = &fl->desc[fl->cidx]; - dma_addr_t mapping = - (dma_addr_t)((u64) be32_to_cpu(d->addr_hi) << 32 | - be32_to_cpu(d->addr_lo)); - __skb_put(skb, len); - pci_dma_sync_single_for_cpu(adap->pdev, mapping, len, - PCI_DMA_FROMDEVICE); + pci_dma_sync_single_for_cpu(adap->pdev, + pci_unmap_addr(sd, + dma_addr), + len, PCI_DMA_FROMDEVICE); memcpy(skb->data, sd->skb->data, len); - pci_dma_sync_single_for_device(adap->pdev, mapping, len, - PCI_DMA_FROMDEVICE); + pci_dma_sync_single_for_device(adap->pdev, + pci_unmap_addr(sd, + dma_addr), + len, PCI_DMA_FROMDEVICE); } else if (!drop_thres) goto use_orig_buf; recycle: @@ -1667,7 +1666,7 @@ #endif credits = G_RSPD_TXQ0_CR(flags); if (credits) qs->txq[TXQ_ETH].processed += credits; - + credits = G_RSPD_TXQ2_CR(flags); if (credits) qs->txq[TXQ_CTRL].processed += credits; @@ -2220,14 +2219,12 @@ static irqreturn_t t3b_intr_napi(int irq if (likely(map & 1)) { dev = adap->sge.qs[0].netdev; - BUG_ON(napi_is_scheduled(dev)); if (likely(__netif_rx_schedule_prep(dev))) __netif_rx_schedule(dev); } if (map & 2) { dev = adap->sge.qs[1].netdev; - BUG_ON(napi_is_scheduled(dev)); if (likely(__netif_rx_schedule_prep(dev))) __netif_rx_schedule(dev); } diff --git a/drivers/net/cxgb3/t3_cpl.h b/drivers/net/cxgb3/t3_cpl.h index 96b2f36..b7a1a31 100755 --- a/drivers/net/cxgb3/t3_cpl.h +++ b/drivers/net/cxgb3/t3_cpl.h @@ -184,9 +184,6 @@ #define V_OPCODE(x) ((x) << S_OPCODE) #define G_OPCODE(x) (((x) >> S_OPCODE) & 0xFF) #define G_TID(x) ((x) & 0xFFFFFF) -#define S_QNUM 0 -#define G_QNUM(x) (((x) >> S_QNUM) & 0xFFFF) - /* tid is assumed to be 24-bits */ #define MK_OPCODE_TID(opcode, tid) (V_OPCODE(opcode) | (tid)) From swise at opengridcomputing.com Thu Feb 1 12:19:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 14:19:43 -0600 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <20070201090958.GD14189@mellanox.co.il> References: <000101c74576$fedc81f0$8698070a@amr.corp.intel.com> <1170275680.14294.5.camel@stevo-desktop> <45C1480C.1020600@ichips.intel.com> <1170320484.654.6.camel@linux-q667.site> <20070201090958.GD14189@mellanox.co.il> Message-ID: <1170361183.16637.47.camel@stevo-desktop> On Thu, 2007-02-01 at 11:09 +0200, Michael S. Tsirkin wrote: > > Quoting Steve WIse : > > Subject: Re: ip_ib_mc_map? > > > > On Wed, 2007-01-31 at 17:53 -0800, Sean Hefty wrote: > > > Steve Wise wrote: > > > > Perhaps there's no backport for this to rhel4u4? > > > > > > I would have thought so, but I really don't know. The function is called from > > > net/ipv4/arp.c, and not directly by ipoib. So, I don't know how the backport > > > patches typically handle this. > > > > > > - Sean > > > > Here's what I see: > > > > ip_ib_mc_map() is called directly from cma_join_ib_multicast(), which is > > added to the ofed_1_2 cma.c via patch file: > > kernel_patches/fixes/sean_multicast_1.patch > > > > So when I compiled ofed_1_2 on rhel4u4, the cma wouldn't load because > > there is no ip_ib_mc_map() in rhel4u4. > > > > So you need a backport patch for this to work on rhel4u4. Probably many > > of the older kernels. > > I think this breakage is U4 specific. Someone at RH went to the trouble to > rip all of IB related stuff out of the U4 kernel. > > I think just calling ip_tr_mc_map on U4 instead will be enough. > I changed cma.c to call ip_tr_mc_map() and I can at least load rdma_cm now. I didn't test any mcast, but the rdma-cm is working over iwarp... Steve. From jlentini at netapp.com Thu Feb 1 12:29:00 2007 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Feb 2007 15:29:00 -0500 (EST) Subject: [openib-general] new IB CM reject reason In-Reply-To: <45C2480E.2000904@ichips.intel.com> References: <20070201183922.GB15115@mellanox.co.il> <000101c74632$85b37bf0$8698070a@amr.corp.intel.com> <20070201190624.GB6473@mellanox.co.il> <45C2480E.2000904@ichips.intel.com> Message-ID: On Thu, 1 Feb 2007, Sean Hefty wrote: > > And my claim is that you should define private data format to go with this > > other reason otherwise you are not really solving the problem. > > This is not a consumer issued reject. It is a CM issued reject, so > the private data is ignored. This is no different than several > other reject reasons (like invalid service ID). At best we could > define the ARI, but if we knew what the contents of the ARI should > be, then we should use a more specific reject reason than 'other'. Invalid Service ID (8) appears to be an appropriate Reason value for the case when a REQ is received for a service ID that is not registered with the CM (either because the application crashed or exited on its own accord). I agree that if the reason codes are insufficient we should take this up in the IBTA. From or.gerlitz at gmail.com Thu Feb 1 12:40:57 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 1 Feb 2007 22:40:57 +0200 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <1170325052.2716.229.camel@fc6.xsintricity.com> References: <1170275331.14294.1.camel@stevo-desktop> <45C1ABD0.5090404@voltaire.com> <1170325052.2716.229.camel@fc6.xsintricity.com> Message-ID: <15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com> On 2/1/07, Doug Ledford wrote: > On Thu, 2007-02-01 at 10:58 +0200, Or Gerlitz wrote: > > From a reason that no one at RH can trace... someone went and removed > > all the support for ARPHRD_INFINIBAND multicast from u4 where it exists > > perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), > > see https://bugs.openfabrics.org/show_bug.cgi?id=2661 > Yes. It's been fixed for U5. It wasn't that the patch got removed, > it's that between U3 and U4 I did a complete rebase, which means that > all the patches from U3 were tossed out the window and a complete new > set made for U4. I just missed re-adding this one in U4. thanks for fixing this for U5 (which i understand is not out yet, correct?). As of the importance for us to have IP multicast working fine with IPoIB over RH4... do you have an IB setup to test that? Or. From swise at opengridcomputing.com Thu Feb 1 13:05:34 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 15:05:34 -0600 Subject: [openib-general] [Fwd: Re: [PATCH 1/10] cxgb3 - main header files] In-Reply-To: References: <1169216896.15842.6.camel@stevo-desktop> Message-ID: <1170363934.16637.58.camel@stevo-desktop> On Fri, 2007-01-19 at 09:07 -0800, Roland Dreier wrote: > > Jeff has pulled in the Chelsio Ethernet driver. If you are ready to > > merge in the RDMA driver, you can pull it from > > Yes, I saw that... OK, I'll get serious about reviewing the RDMA stuff. Hey Roland, Have you had a chance to review this? Thanks, Steve. From dledford at redhat.com Thu Feb 1 14:19:21 2007 From: dledford at redhat.com (Doug Ledford) Date: Thu, 01 Feb 2007 17:19:21 -0500 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com> References: <1170275331.14294.1.camel@stevo-desktop> <45C1ABD0.5090404@voltaire.com> <1170325052.2716.229.camel@fc6.xsintricity.com> <15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com> Message-ID: <1170368361.2716.239.camel@fc6.xsintricity.com> On Thu, 2007-02-01 at 22:40 +0200, Or Gerlitz wrote: > On 2/1/07, Doug Ledford wrote: > > On Thu, 2007-02-01 at 10:58 +0200, Or Gerlitz wrote: > > > > From a reason that no one at RH can trace... someone went and removed > > > all the support for ARPHRD_INFINIBAND multicast from u4 where it exists > > > perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), > > > see https://bugs.openfabrics.org/show_bug.cgi?id=2661 > > > Yes. It's been fixed for U5. It wasn't that the patch got removed, > > it's that between U3 and U4 I did a complete rebase, which means that > > all the patches from U3 were tossed out the window and a complete new > > set made for U4. I just missed re-adding this one in U4. > > thanks for fixing this for U5 (which i understand is not out yet, correct?). Correct. Although I can get people the packages slated for U5 if they want to test/check them out. > As of the importance for us to have IP multicast working fine with > IPoIB over RH4... > do you have an IB setup to test that? Yeah, I've got a setup, I just don't have any multicast tests that I run. Any test programs you have for multicast in particular would be helpful. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at mellanox.co.il Thu Feb 1 14:24:05 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 00:24:05 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <1170360071.16637.39.camel@stevo-desktop> References: <1170360071.16637.39.camel@stevo-desktop> Message-ID: <20070201222405.GG17617@mellanox.co.il> > There's no easy way to tell who asked for notifications. And > particularly why they asked for notification. > > I think we should leave it as-is. If we have problems, we'll fix it. > > Or you could put your arp snoop code back in addr.c and address > translation will not use netevents. But still thing we should leave > it... I think the issues need to be addressed in some way. I think I see another issue with the destructor approach: ib_core could be unloaded while skb with destructor pointing to our code is still around. This will lead to nasty crashes without clear backtrace on screen if text segment memory gets over-written and the destructor gets called afterwards. It currently seems that invoking the callback function directly rather than sticking it in skb->destructor is the lesser of evils at this point. But I'll think all this over, and I'd like to ask you to do this too, and post some suggestions. I can think of some more complicated approaches that might work better for iwarp. Off the top of my head, our netevents implementation could keep a reference on the skb, start a timer, check the users counter on skb and call the notifier chain when it drops to 1. Let's sleep on it. -- MST From mst at mellanox.co.il Thu Feb 1 14:25:57 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 00:25:57 +0200 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <1170361183.16637.47.camel@stevo-desktop> References: <1170361183.16637.47.camel@stevo-desktop> Message-ID: <20070201222557.GH17617@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: ip_ib_mc_map? > > On Thu, 2007-02-01 at 11:09 +0200, Michael S. Tsirkin wrote: > > > Quoting Steve WIse : > > > Subject: Re: ip_ib_mc_map? > > > > > > On Wed, 2007-01-31 at 17:53 -0800, Sean Hefty wrote: > > > > Steve Wise wrote: > > > > > Perhaps there's no backport for this to rhel4u4? > > > > > > > > I would have thought so, but I really don't know. The function is called from > > > > net/ipv4/arp.c, and not directly by ipoib. So, I don't know how the backport > > > > patches typically handle this. > > > > > > > > - Sean > > > > > > Here's what I see: > > > > > > ip_ib_mc_map() is called directly from cma_join_ib_multicast(), which is > > > added to the ofed_1_2 cma.c via patch file: > > > kernel_patches/fixes/sean_multicast_1.patch > > > > > > So when I compiled ofed_1_2 on rhel4u4, the cma wouldn't load because > > > there is no ip_ib_mc_map() in rhel4u4. > > > > > > So you need a backport patch for this to work on rhel4u4. Probably many > > > of the older kernels. > > > > I think this breakage is U4 specific. Someone at RH went to the trouble to > > rip all of IB related stuff out of the U4 kernel. > > > > I think just calling ip_tr_mc_map on U4 instead will be enough. > > > > I changed cma.c to call ip_tr_mc_map() and I can at least load rdma_cm > now. I didn't test any mcast, but the rdma-cm is working over iwarp... So this could be a macro in kernel_addons, unless someone from Voltaire is willing to step up with a more elaborate implementation. -- MST From swise at opengridcomputing.com Thu Feb 1 14:41:56 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 16:41:56 -0600 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070201222405.GG17617@mellanox.co.il> References: <1170360071.16637.39.camel@stevo-desktop> <20070201222405.GG17617@mellanox.co.il> Message-ID: <1170369716.16637.69.camel@stevo-desktop> On Fri, 2007-02-02 at 00:24 +0200, Michael S. Tsirkin wrote: > > There's no easy way to tell who asked for notifications. And > > particularly why they asked for notification. > > > > I think we should leave it as-is. If we have problems, we'll fix it. > > > > Or you could put your arp snoop code back in addr.c and address > > translation will not use netevents. But still thing we should leave > > it... > > I think the issues need to be addressed in some way. > > I think I see another issue with the destructor approach: ib_core could > be unloaded while skb with destructor pointing to our code is still around. > This will lead to nasty crashes without clear backtrace on screen if text > segment memory gets over-written and the destructor gets called afterwards. > Yes...hmm... We could reference the module in the snoop function and deref it in the destructor function. > It currently seems that invoking the callback function directly rather than > sticking it in skb->destructor is the lesser of evils at this point. > But I'll think all this over, and I'd like to ask you to do this too, > and post some suggestions. > Ok. > I can think of some more complicated approaches that might work better > for iwarp. Off the top of my head, our netevents implementation could > keep a reference on the skb, start a timer, check the users counter on skb and > call the notifier chain when it drops to 1. Let's sleep on it. > Ok. I'll ponder it some more. But we could solve the module unload issue via module refs methinks. Steve. From mst at mellanox.co.il Thu Feb 1 14:43:04 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 00:43:04 +0200 Subject: [openib-general] new IB CM reject reason In-Reply-To: <45C2480E.2000904@ichips.intel.com> References: <20070201183922.GB15115@mellanox.co.il> <000101c74632$85b37bf0$8698070a@amr.corp.intel.com> <20070201190624.GB6473@mellanox.co.il> <45C2480E.2000904@ichips.intel.com> Message-ID: <20070201224304.GI17617@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: new IB CM reject reason > > > And my claim is that you should define private data format to go with this > > other reason otherwise you are not really solving the problem. > > This is not a consumer issued reject. It is a CM issued reject, so the private > data is ignored. This is no different than several other reject reasons (like > invalid service ID). At best we could define the ARI, but if we knew what the > contents of the ARI should be, then we should use a more specific reject reason > than 'other'. I still don't really buy this, and I think you don't see my point. The difference between ib_cm module and consumer is an artificial one - the consumer just uses ib_cm as a convenience module. In particular, as a feature, he gets automatic REJ generation when CM ID is destroyed. In this case private data is all 0s. So a custom protocol on top of ib_cm module that has its own consumer rejects for some reason, would be wise to put something other than all 0s in its private data if it wants to differentiate between the two kinds of consumer reject. Most likely no one cares much about reject reasons so all this is unnecessary. But adding "other" reason just moves the problem up one level - what if the actual consumer is using some library on top of CM? Consider for example cma. It might generate rejects on its own too. So now, there is cm, cma as a cm consumer, and the cma consumer. So do we need yet another reject reason for cma generated rejects? Do you see my point now? -- MST From mst at mellanox.co.il Thu Feb 1 14:48:41 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 00:48:41 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <1170369716.16637.69.camel@stevo-desktop> References: <1170369716.16637.69.camel@stevo-desktop> Message-ID: <20070201224841.GJ17617@mellanox.co.il> > > I can think of some more complicated approaches that might work better > > for iwarp. Off the top of my head, our netevents implementation could > > keep a reference on the skb, start a timer, check the users counter on skb and > > call the notifier chain when it drops to 1. Let's sleep on it. > > > > Ok. I'll ponder it some more. But we could solve the module unload > issue via module refs methinks. This almost never works cleanly - module can't reference itself without races: module can get unloaded after it drops the reference to itself and before the function exits. But I agree such a race is mostly theoretical. And we still have the case where destructor != NULL. Certainly something to think about. -- MST From mst at mellanox.co.il Thu Feb 1 14:57:54 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 00:57:54 +0200 Subject: [openib-general] new IB CM reject reason In-Reply-To: References: Message-ID: <20070201225754.GK17617@mellanox.co.il> > Invalid Service ID (8) appears to be an appropriate Reason value for > the case when a REQ is received for a service ID that is not > registered with the CM (either because the application crashed or > exited on its own accord). No, we are actually speaking about reject to generate when application cancels the communication establishment (e.g. by exiting), not as a response to any CM message. -- MST From or.gerlitz at gmail.com Thu Feb 1 15:18:26 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 2 Feb 2007 01:18:26 +0200 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <1170368361.2716.239.camel@fc6.xsintricity.com> References: <1170275331.14294.1.camel@stevo-desktop> <45C1ABD0.5090404@voltaire.com> <1170325052.2716.229.camel@fc6.xsintricity.com> <15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com> <1170368361.2716.239.camel@fc6.xsintricity.com> Message-ID: <15ddcffd0702011518qf115aaey862ef168784e81ca@mail.gmail.com> On 2/2/07, Doug Ledford wrote: > > As of the importance for us to have IP multicast working fine with > > IPoIB over RH4... > > do you have an IB setup to test that? > > Yeah, I've got a setup, I just don't have any multicast tests that I > run. Any test programs you have for multicast in particular would be > helpful. This is farely simple to do: have some multicast traffic routed over an IPoIB subnet on two nodes, eg using $ route add -net 224.0.0.0 netmask 255.0.0.0 dev ib0 and then server $ iperf -usB 224.5.5.5 -i 1 client $ iperf -uc 224.5.5.5 -l 100 -b 50M -t 30 -i 1 Or. From swise at opengridcomputing.com Thu Feb 1 15:23:37 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 17:23:37 -0600 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070201224841.GJ17617@mellanox.co.il> References: <1170369716.16637.69.camel@stevo-desktop> <20070201224841.GJ17617@mellanox.co.il> Message-ID: <1170372217.16637.87.camel@stevo-desktop> On Fri, 2007-02-02 at 00:48 +0200, Michael S. Tsirkin wrote: > > > I can think of some more complicated approaches that might work better > > > for iwarp. Off the top of my head, our netevents implementation could > > > keep a reference on the skb, start a timer, check the users counter on skb and > > > call the notifier chain when it drops to 1. Let's sleep on it. > > > Remembering which skbs to check later requires more complication. Here is one method to handle this and do what you suggest above. In the snoop function: Clone the skb and save the original skb ptr in the new skb->cb area. This area is ours to use on a freshly cloned skbuff. Add this new skb ptr to a linked list of outstanding netevents to be processed later. Don't free the original skb passed in. This keeps the reference on it like you proposed above. Schedule a delayed work handler for a few ticks in the future. In the delayed work handler: Walk the pending netevents skb list. For each pending skb, get the original skb ptr from the cloned skb->cb area, and if the user count is now 1 then do the current destructor() logic, remove the skb from the pending list, and free both skbs. If the list is not empty reschedule the delayed work handler for a few ticks later. In the module unload function: cancel any delayed work handling walk the pending list and free the skbs and the original snooped skbs. This solves the destructor issue and the rmmod issue, but is more complicated. If you're worried about regressing straight rdma address translation, then you can call the address translation timer function synchronously in the snoop function like before and change the addr_trans module to not use netevents... Steve. From mst at mellanox.co.il Thu Feb 1 15:33:18 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 01:33:18 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <1170372217.16637.87.camel@stevo-desktop> References: <1170372217.16637.87.camel@stevo-desktop> Message-ID: <20070201233318.GO17617@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [PATCH 00/12] ofed_1_2 - Neighbour update support > > On Fri, 2007-02-02 at 00:48 +0200, Michael S. Tsirkin wrote: > > > > I can think of some more complicated approaches that might work better > > > > for iwarp. Off the top of my head, our netevents implementation could > > > > keep a reference on the skb, start a timer, check the users counter on skb and > > > > call the notifier chain when it drops to 1. Let's sleep on it. > > > > > > Remembering which skbs to check later requires more complication. Here > is one method to handle this and do what you suggest above. > > In the snoop function: > > Clone the skb and save the original skb ptr in the new skb->cb area. > This area is ours to use on a freshly cloned skbuff. Add this new skb > ptr to a linked list of outstanding netevents to be processed later. > Don't free the original skb passed in. This keeps the reference on it > like you proposed above. Schedule a delayed work handler for a few > ticks in the future. > > In the delayed work handler: > > Walk the pending netevents skb list. For each pending skb, get the > original skb ptr from the cloned skb->cb area, and if the user count is > now 1 then do the current destructor() logic, remove the skb from the > pending list, and free both skbs. If the list is not empty reschedule > the delayed work handler for a few ticks later. > > In the module unload function: > > cancel any delayed work handling > walk the pending list and free the skbs and the original snooped skbs. > > This solves the destructor issue and the rmmod issue, but is more > complicated. If you're worried about regressing straight rdma address > translation, then you can call the address translation timer function > synchronously in the snoop function like before and change the > addr_trans module to not use netevents... Yes, this is what I proposed above. It does all sound quite complicated. Some notes: - you don't need an skb just too keep a void*. create your own structure for this. - better use a timer than a workqueue - you are calling netevents from atomic context on new kernels anyway. So maybe destructor with module ref counting is better. Donnu. -- MST From swise at opengridcomputing.com Thu Feb 1 15:50:27 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Feb 2007 17:50:27 -0600 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070201233318.GO17617@mellanox.co.il> References: <1170372217.16637.87.camel@stevo-desktop> <20070201233318.GO17617@mellanox.co.il> Message-ID: <1170373827.16637.92.camel@stevo-desktop> On Fri, 2007-02-02 at 01:33 +0200, Michael S. Tsirkin wrote: > > Quoting Steve Wise : > > Subject: Re: [PATCH 00/12] ofed_1_2 - Neighbour update support > > > > On Fri, 2007-02-02 at 00:48 +0200, Michael S. Tsirkin wrote: > > > > > I can think of some more complicated approaches that might work better > > > > > for iwarp. Off the top of my head, our netevents implementation could > > > > > keep a reference on the skb, start a timer, check the users counter on skb and > > > > > call the notifier chain when it drops to 1. Let's sleep on it. > > > > > > > > > Remembering which skbs to check later requires more complication. Here > > is one method to handle this and do what you suggest above. > > > > In the snoop function: > > > > Clone the skb and save the original skb ptr in the new skb->cb area. > > This area is ours to use on a freshly cloned skbuff. Add this new skb > > ptr to a linked list of outstanding netevents to be processed later. > > Don't free the original skb passed in. This keeps the reference on it > > like you proposed above. Schedule a delayed work handler for a few > > ticks in the future. > > > > In the delayed work handler: > > > > Walk the pending netevents skb list. For each pending skb, get the > > original skb ptr from the cloned skb->cb area, and if the user count is > > now 1 then do the current destructor() logic, remove the skb from the > > pending list, and free both skbs. If the list is not empty reschedule > > the delayed work handler for a few ticks later. > > > > In the module unload function: > > > > cancel any delayed work handling > > walk the pending list and free the skbs and the original snooped skbs. > > > > This solves the destructor issue and the rmmod issue, but is more > > complicated. If you're worried about regressing straight rdma address > > translation, then you can call the address translation timer function > > synchronously in the snoop function like before and change the > > addr_trans module to not use netevents... > > > Yes, this is what I proposed above. It does all sound quite complicated. > Some notes: > - you don't need an skb just too keep a void*. create your own > structure for this. > - better use a timer than a workqueue - you are calling netevents > from atomic context on new kernels anyway. > > So maybe destructor with module ref counting is better. > Donnu. We could use a global refcnt to count the number of pending destructions and use a completion object to block unload until all the destructors fire and the refcnt goes to zero. From rdreier at cisco.com Thu Feb 1 20:45:11 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Feb 2007 20:45:11 -0800 Subject: [openib-general] ipath and current git woes In-Reply-To: <20070201002202.GA12386@obsidianresearch.com> (Jason Gunthorpe's message of "Wed, 31 Jan 2007 17:22:02 -0700") References: <20070201002202.GA12386@obsidianresearch.com> Message-ID: > After applying that patch the user space consumers load but we got a > kernel oops when we tried to run a test here :< > > Unable to handle kernel NULL pointer dereference at 0000000000000918 RIP: > [] :ib_ipath:ipath_mmap+0x37/0x95 So I had a look at this, and it seems that there are two bugs that lead to this. First of all, libipathverbs gets a response from the kernel that has a 64-bit kernel address in it, and passes that back into a call to mmap(), where it uses that address as the offset. On 32-bit userspace, that chops off the high bits of the address and so the ipath kernel driver can't find the address in its list. So that explains why things don't work. And unfortunately the obvious fix for libipathverbs to use mmap64() instead of mmap() doesn't work, because on Linux, mmap64() is implemented with the mmap2 system call, which just allows the offset to be 12 bits bigger -- so it only gets you to 44 bits, which is not enough to reach a 64-bit kernel address (which is typically something like 0xffffc20000072000). So you probably want to use something like a 32-bit serial number to point at your buffers or something like that. The oops is caused by another more serious problem. Obviously a buggy libipathverbs shouldn't be able to crash the kernel, because even if libipathverbs is fixed then malicious userspace could do the same thing too. It turns out that all the handling of pending_mmaps in the ipath driver is not really careful about userspace screwing it up. When userspace creates a CQ, the CQ buffer is added to the device-wide list of pending mmaps. Of course 32-bit userspace never succeeds in mapping that CQ, so it stays on the list (the only way it gets removed is if it is successfully mmapped). But then the destroy CQ operation sees that the mmap is pending, and frees the structure holding the information (without removing it from the list). And of course when that memory gets reused, then the pending mmap list gets corrupted, etc etc. Of course this is ugly to fix with the current data structure -- the list of pending mmaps is singly-linked, which means I have to walk the whole list to delete an entry. It also makes the list walking in ipath_mmap() is unnecessarily obfuscated. I think it's much better to just use the standard kernel list_head stuff if you're going to delete things from the middle of the list, rather than implementing your own singly-linked list. Sure it costs an extra pointer in each entry but no one ever has to worry about whether you're deleting things correctly, etc. There's some other silly stuff I noticed too, like: grep -n mmap_cnt *.[ch] /dev/null ipath_cq.c:232: ip->mmap_cnt = 0; ipath_mmap.c:63: ip->mmap_cnt++; ipath_mmap.c:70: ip->mmap_cnt--; ipath_qp.c:837: ip->mmap_cnt = 0; ipath_srq.c:162: ip->mmap_cnt = 0; ipath_verbs.h:178: unsigned mmap_cnt; umm -- no one ever looks at mmap_cnt (there's a kref too), so why keep it at all? So Qlogic guys -- please fix this up! - R. From rdreier at cisco.com Thu Feb 1 20:47:10 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Feb 2007 20:47:10 -0800 Subject: [openib-general] IPoIB CM for merge? In-Reply-To: <20070201192924.GE17617@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 1 Feb 2007 21:29:24 +0200") References: <20070201192924.GE17617@mellanox.co.il> Message-ID: > Could you please spend some time reviewing IPoIB CM code? > I am concerned about missing the 2.6.21 merge window. Thanks for the reminder. Can we trade? Have you looked at the cxgb3 iwarp driver? Any comments? - R. From rdreier at cisco.com Thu Feb 1 20:48:13 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Feb 2007 20:48:13 -0800 Subject: [openib-general] [Fwd: Re: [PATCH 1/10] cxgb3 - main header files] In-Reply-To: <1170363934.16637.58.camel@stevo-desktop> (Steve Wise's message of "Thu, 01 Feb 2007 15:05:34 -0600") References: <1169216896.15842.6.camel@stevo-desktop> <1170363934.16637.58.camel@stevo-desktop> Message-ID: > Have you had a chance to review this? Still on my list. Can we trade? Can you look at the IPoIB connected mode stuff in the ipoib-cm branch in git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git and let me know if you see anything you don't like? - R. From mike.heffner at evergrid.com Thu Feb 1 21:10:09 2007 From: mike.heffner at evergrid.com (Mike Heffner) Date: Fri, 02 Feb 2007 00:10:09 -0500 Subject: [openib-general] Detecting when an RDMA writer process disappears Message-ID: <45C2C7B1.7090204@evergrid.com> Is there any method by which a receiving process that is polling in preregistered memory regions for data from a sender performing RDMA writes, can detect if the sender is killed? Say by a SIGKILL signal? The RC connection is setup using the RDMA CM and there do not appear to be any CM events created on the event channel, nor does there appear to be any async. events created. Occasionally I will get a CQE failure on the QP, depending on where the communication flow is, that I can use to mark the connection failed, but this happens only about 50% of the time. An alternative solution would be periodically sending "keep-alives" and detecting the CQE failure, but I'd be interested to know if there are any other options that don't require sending keep-alives. Thanks, Mike -- Mike Heffner EverGrid Software Blacksburg, VA USA Voice: (540) 443-3500 #603 From jgunthorpe at obsidianresearch.com Thu Feb 1 21:25:03 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Thu, 1 Feb 2007 22:25:03 -0700 Subject: [openib-general] ipath and current git woes In-Reply-To: <45C13771.2070406@qlogic.com> References: <20070201002202.GA12386@obsidianresearch.com> <45C13771.2070406@qlogic.com> Message-ID: <20070202052503.GA19654@obsidianresearch.com> On Wed, Jan 31, 2007 at 04:42:25PM -0800, Robert Walsh wrote: > Jason Gunthorpe wrote: > >Has anyone been able to use ipath with the current latest git > >everything? > > We're working on getting this up to date right now. Give us a couple of > days and we'll have some new patches ready. OK. Things are working ok here using the same kernel and a 64 bit OFED 1.1 user space built in a chroot. That makes sense after reading Roland's analysis... Thanks, Jason From eitan at sw053.yok.mtl.com Thu Feb 1 21:40:43 2007 From: eitan at sw053.yok.mtl.com (Eitan Zahavi) Date: Fri, 2 Feb 2007 07:40:43 +0200 Subject: [openib-general] nightly osm_sim report 2007-02-02:normal completion Message-ID: <200702020540.l125ehia022501@sw053.yok.mtl.com> OSM Simulation Regression Summary OpenSM rev = Thu_Feb_1_10:25:31_2007 b8cdb7 ibutils rev = Wed_Jan_3_11:42:12_2007 913448 Total=410 Pass=409 Fail=1 Pass: 30 Stability IS1-16.topo 30 Pkey IS1-16.topo 30 OsmTest IS1-16.topo 30 Multicast IS1-16.topo 30 LidMgr IS1-16.topo 29 OsmStress IS1-16.topo 10 Stability IS3-loop.topo 10 Stability IS3-128.topo 10 Pkey IS3-128.topo 10 OsmTest IS3-loop.topo 10 OsmTest IS3-128.topo 10 OsmStress IS3-128.topo 10 Multicast IS3-loop.topo 10 Multicast IS3-128.topo 10 LidMgr IS3-128.topo 10 FatTree part-4-ary-3-tree.topo 10 FatTree merge-roots-reorder-4-ary-2-tree.topo 10 FatTree merge-roots-4-ary-2-tree.topo 10 FatTree merge-root-4-ary-3-tree.topo 10 FatTree merge-root-12-ary-2-tree.topo 10 FatTree merge-2-ary-4-tree.topo 10 FatTree half-4-ary-3-tree.topo 10 FatTree blend-4-ary-2-tree.topo 10 FatTree 4-ary-4-tree.topo 10 FatTree 4-ary-3-tree.topo 10 FatTree 32nodes-3lvl-is1.topo 10 FatTree 2-ary-4-tree.topo 10 FatTree 12-node-spaced.topo 10 FatTree 12-ary-2-tree.topo Failures: 1 OsmStress IS1-16.topo From mst at mellanox.co.il Thu Feb 1 22:03:22 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 08:03:22 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <1170373827.16637.92.camel@stevo-desktop> References: <1170373827.16637.92.camel@stevo-desktop> Message-ID: <20070202060228.GQ17617@mellanox.co.il> > We could use a global refcnt to count the number of pending destructions > and use a completion object to block unload until all the destructors > fire and the refcnt goes to zero. It has the same race as module refcnt. So just use that. -- MST From bugzilla-daemon at lists.openfabrics.org Thu Feb 1 22:16:04 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 1 Feb 2007 22:16:04 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070202061604.ECEC7E607F9@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #11 from dmitry.yulov at intel.com 2007-02-01 22:16 ------- (In reply to comment #10) > What is the output of uname -r ? This is VERY important. Also, can you run `cat /etc/issue` and send the results? > > As you can see my first message I wrote the my machine configuration: >The machine configuration: >Kernel: Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux >OS: SUSE Linux Enterprise Server 10 (x86_64) >gcc version: gcc (GCC) 4.1.0 (SUSE Linux) Unfortunately my machine didn't have the version of Linux in /etc/issue because it is not right by IT requrements. I have saw the ofed_scripts/configure file and I saw that for right choice of patches configure needed the file /etc/issue. I think that not good idea because first of all need to run command: cat /etc/*release* and find the version Linux in this file and after this check (if neccessary) file /etc/issue -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Thu Feb 1 22:56:14 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 08:56:14 +0200 Subject: [openib-general] IPoIB CM for merge? In-Reply-To: References: Message-ID: <20070202065547.GS17617@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: IPoIB CM for merge? > > > Could you please spend some time reviewing IPoIB CM code? > > I am concerned about missing the 2.6.21 merge window. > > Thanks for the reminder. > > Can we trade? Have you looked at the cxgb3 iwarp driver? Any comments? I haven't yet, sorry. OK. I am not sure I have the last version posted so I am going to go by what is there in OFED git tree. And I also only looked under drivers/infiniband/. So, here are some questions: I looked in the archives and have not seen these addressed. Maybe these can be answered and then I'll go from there? Does this sound OK? Files with names like ./core/cxio_hal.c ./core/cxio_hal.h normally generate a fair bit of discussion which wasn't present here, I did not guess everyone was just busy. For example, why is there both struct iwch_cq and struct t3_cq? File tcb.h comment says: /* This file is automatically generated --- do not edit */ This looks like a GPL violation, does it not? What's the deal with the naming convention? Is there a reason in cxgb3, some files start with iwch and some with cxio? How about using cxgb3 prefix all over? -- MST From philippe.gregoire at cea.fr Fri Feb 2 02:10:16 2007 From: philippe.gregoire at cea.fr (Philippe Gregoire) Date: Fri, 02 Feb 2007 11:10:16 +0100 Subject: [openib-general] dry-run mode for opensm ? Message-ID: <45C30E08.1030502@cea.fr> Hal Is there any way to run opensm in a dry-run mode just to make it dump the route tables it will generate ? We alve already an embedded SM and I would like to compare the current route tables with those that OpenSM would generate. Thanks Philippe From vlad at lists.openfabrics.org Fri Feb 2 02:20:43 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Fri, 2 Feb 2007 02:20:43 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily build status Message-ID: <20070202102043.4FA07E607F9@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From mst at mellanox.co.il Fri Feb 2 03:15:32 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Feb 2007 13:15:32 +0200 Subject: [openib-general] IPoIB CM for merge? In-Reply-To: References: Message-ID: <20070202111532.GT17617@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: IPoIB CM for merge? > > > Could you please spend some time reviewing IPoIB CM code? > > I am concerned about missing the 2.6.21 merge window. > > Thanks for the reminder. > > Can we trade? Have you looked at the cxgb3 iwarp driver? Any comments? OK. I am not sure I have the last version posted so I am going to go by what is there in OFED git tree. And I also only looked under drivers/infiniband/. So, here are some questions: I looked in the archives and have not seen these addressed. Maybe these can be answered and then I'll go from there? Does this sound OK? Files with names like ./core/cxio_hal.c ./core/cxio_hal.h normally generate a fair bit of discussion which wasn't present here, I did not guess everyone was just busy. For example, why is there both struct iwch_cq and struct t3_cq? File tcb.h comment says: /* This file is automatically generated --- do not edit */ This looks like a GPL violation, does it not? What's the deal with the naming convention? Is there a reason in cxgb3, some files start with iwch and some with cxio? How about using cxgb3 prefix all over? -- MST From bugzilla-daemon at lists.openfabrics.org Fri Feb 2 03:42:54 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Fri, 2 Feb 2007 03:42:54 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070202114254.39BAAE607F9@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #12 from dmitry.yulov at intel.com 2007-02-02 03:42 ------- Created an attachment (id=74) --> (https://bugs.openfabrics.org/attachment.cgi?id=74&action=view) Patch for ofed_scripts/configure I have added a patch file for configure in my case. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Fri Feb 2 03:56:43 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Fri, 2 Feb 2007 03:56:43 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070202115643.76DD4E607F9@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #13 from dmitry.yulov at intel.com 2007-02-02 03:56 ------- I want to ask someone how I can apply the patch during build.sh run script? As I know when I run build.sh my old files with patch always update throught run rpm -i openib-1.1.src.rpm. How I can do it (apply my patches) or I need to wait new releases? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Fri Feb 2 06:31:36 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Feb 2007 09:31:36 -0500 Subject: [openib-general] dry-run mode for opensm ? In-Reply-To: <45C30E08.1030502@cea.fr> References: <45C30E08.1030502@cea.fr> Message-ID: <1170426648.15660.351722.camel@hal.voltaire.com> Hi Phillipe, On Fri, 2007-02-02 at 05:10, Philippe Gregoire wrote: > Hal > Is there any way to run opensm in a dry-run mode > just to make it dump the route tables it will generate ? Not that I'm aware of. > We alve already an embedded SM and I would like to compare the > current route tables with those that OpenSM would generate. There are two options here from what I know: 1. Turn off the embedded SM temporarily and run OpenSM (in one of it's various routing modes) 2. Get your topology into a simulator and run OpenSM on it BTW, there are scripts which will work with any SM to dump the routing tables (dump_lfts/mgfts.sh) if that is how you are doing the comparison. -- Hal > Thanks > Philippe From swise at opengridcomputing.com Fri Feb 2 07:18:24 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Feb 2007 09:18:24 -0600 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <20070202060228.GQ17617@mellanox.co.il> References: <1170373827.16637.92.camel@stevo-desktop> <20070202060228.GQ17617@mellanox.co.il> Message-ID: <1170429504.26115.1.camel@stevo-desktop> On Fri, 2007-02-02 at 08:03 +0200, Michael S. Tsirkin wrote: > > We could use a global refcnt to count the number of pending destructions > > and use a completion object to block unload until all the destructors > > fire and the refcnt goes to zero. > > It has the same race as module refcnt. So just use that. > I don't understand the race. Can you explain please? This should be able to be done without a race with a refcnt, a spinlock, a bit saying we're unloading, and a completion object. But maybe I'm confused ;-) From swise at opengridcomputing.com Fri Feb 2 07:28:59 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Feb 2007 09:28:59 -0600 Subject: [openib-general] [Fwd: Re: [PATCH 1/10] cxgb3 - main header files] In-Reply-To: References: <1169216896.15842.6.camel@stevo-desktop> <1170363934.16637.58.camel@stevo-desktop> Message-ID: <1170430139.26115.9.camel@stevo-desktop> On Thu, 2007-02-01 at 20:48 -0800, Roland Dreier wrote: > > Have you had a chance to review this? > > Still on my list. > > Can we trade? Can you look at the IPoIB connected mode stuff in the > ipoib-cm branch in > > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git > > and let me know if you see anything you don't like? > > - R. Ok. I'll review the IPoIB connected mode code. Steve. From halr at voltaire.com Fri Feb 2 07:28:06 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Feb 2007 10:28:06 -0500 Subject: [openib-general] components that have not opend the ofed_1_2 branch In-Reply-To: <45C209EA.1040207@mellanox.co.il> References: <45C209EA.1040207@mellanox.co.il> Message-ID: <1170430064.15660.354336.camel@hal.voltaire.com> On Thu, 2007-02-01 at 10:40, Tziporet Koren wrote: > The following components have not opened ofed_1_2 branch: > > * libibverbs - Roland > * libmthca - Roland > * libipathverbs - Bryan > * tvflash - Roland > * srptools - Ishai > * management - Hal > > > Please open the branch today or tomorrow at the latest . Done; just created the ofed_1_2 branch for management. -- Hal > Thanks, > Tziporet From swise at opengridcomputing.com Fri Feb 2 07:41:09 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Feb 2007 09:41:09 -0600 Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily build status In-Reply-To: <20070202102043.4FA07E607F9@openfabrics.org> References: <20070202102043.4FA07E607F9@openfabrics.org> Message-ID: <1170430869.26115.12.camel@stevo-desktop> On Fri, 2007-02-02 at 02:20 -0800, vlad at lists.openfabrics.org wrote: > This email was generated automatically, please do not reply Which distro is 2.6.16.21-0.8-default? I'm sure I didn't do a netevent backport that. > Failed: > Build failed on ia64 with linux-2.6.16.21-0.8-default > Log: > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ > make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 > make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' > make: *** [kernel] Error 2 > ---------------------------------------------------------------------------------- > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Fri Feb 2 07:54:31 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Feb 2007 09:54:31 -0600 Subject: [openib-general] IPoIB CM for merge? In-Reply-To: <20070202111532.GT17617@mellanox.co.il> References: <20070202111532.GT17617@mellanox.co.il> Message-ID: <1170431671.26115.25.camel@stevo-desktop> On Fri, 2007-02-02 at 13:15 +0200, Michael S. Tsirkin wrote: > > Quoting Roland Dreier : > > Subject: Re: IPoIB CM for merge? > > > > > Could you please spend some time reviewing IPoIB CM code? > > > I am concerned about missing the 2.6.21 merge window. > > > > Thanks for the reminder. > > > > Can we trade? Have you looked at the cxgb3 iwarp driver? Any comments? > > OK. > I am not sure I have the last version posted so I am going to go by what > is there in OFED git tree. > > And I also only looked under drivers/infiniband/. > > So, here are some questions: I looked in the archives and have not seen > these addressed. Maybe these can be answered and then I'll go from there? > Does this sound OK? > > Files with names like > ./core/cxio_hal.c > ./core/cxio_hal.h > normally generate a fair bit of discussion which wasn't present here, > I did not guess everyone was just busy. > For example, why is there both struct iwch_cq and struct t3_cq? > The cxgb3/core code defines a low level interface to the RDMA bits of the T3 device. This code was originally a separate module (named cxio) that allowed other RDMA middleware layers to sit on top of the this core rdma module. At the time, there was RNIC-PI and OFA being developed. So that is the history of this. As per the first openib review (about a year ago) of this code I merged this core module into the cxgb3 module. I left the file structure and names as-is because it was low priority IMO. The t3_cq struct is the low level CQ structure used to manage both a HW accessed CQ and a SW CQ (needed to handle error cases and out of order completions). The iwch_cq struct contains the stuff needed to integrate with the OFA core and uverbs code. It contains a t3_cq inline. > File tcb.h comment says: > /* This file is automatically generated --- do not edit */ > This looks like a GPL violation, does it not? > I can add the license if that's what you mean. > What's the deal with the naming convention? > Is there a reason in cxgb3, some files start with iwch and some with cxio? > How about using cxgb3 prefix all over? The cxio_ prefix is used for the low-level functions/types that talk directly with the HW. iwch_ is the provider driver functions that interface with the OFA stack. I'd rather not change the names. Especially since this has already gone through several review cycles. I'm hoping we can get this in and improve it with subsequent submissions. Is that reasonable? Steve. From mshefty at ichips.intel.com Fri Feb 2 09:59:05 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 02 Feb 2007 09:59:05 -0800 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <45BF8E17.2010805@ichips.intel.com> References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> Message-ID: <45C37BE9.5040105@ichips.intel.com> > Sean Hefty (3): > rdma_cm: Increment port number after close to avoid re-use. > ib_sa: track multicast join/leave requests > rdma_cm: add multicast communication support Assuming that you haven't look at this yet, I updated the ib_sa patch above to shorten the workqueue name, plus added a fourth patch to shorten the workqueue names for ib_addr and rdma_cm. E.g. "ib_mcast_wq" became "ib_mcast". Let me know if you need any assistance. - Sean From swise at opengridcomputing.com Fri Feb 2 11:18:13 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Feb 2007 13:18:13 -0600 Subject: [openib-general] IPoIB connected mode review comments In-Reply-To: References: <1169216896.15842.6.camel@stevo-desktop> <1170363934.16637.58.camel@stevo-desktop> Message-ID: <1170443893.26115.59.camel@stevo-desktop> On Thu, 2007-02-01 at 20:48 -0800, Roland Dreier wrote: > > Have you had a chance to review this? > > Still on my list. > > Can we trade? Can you look at the IPoIB connected mode stuff in the > ipoib-cm branch in > > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git > > and let me know if you see anything you don't like? > > - R. Here are my comments. I'm not an ib cm expert though. These are mostly questions: Since IPoIB is using IP addresses already, wouldn't it be simpler to use the rdma cm to setup connections? Could you optimize this design and only signal some of the tx wrs? In ipoib_cm_send() you call ipoib_cm_skb_too_long() if the packet is too large for the interface mtu. And you print a warning. But ipoib_cm_skb_too_long() actually queues the packet for the cm case. For ud it just drops the packet. The skb task for cm then will send a ICMP_DEST_UNREACH for these packets. Why the difference? Also if this packet came from the local stack via a local application, you don't want to send DEST_UNREACH, right? (I'm probably just confused about the purpose of this). In ipoib_cm_tx_completion() you rearm, then drain the cq. I thought there was some reason that it was better to do drain/rearm/drain? Something about if you rearm and there's a cq entry mthca does another immediate interrupt? In ipoib_cm_handle_tx_wc(): When can a tx completion happen with a wr_id that isn't within the ipoib_sendq_size range? This looks like it is really a bug condition that should never happen. I see the same code in the rx completion path too. Also, what's up with the /* FIXME */ comment? You lock the priv->lock inside of the priv->tx_lock. Is this ordering correct and consistent across all the code? ipoib_cm_handle_rx_wc() - what's up with the XXX comment? What's the algorithm to keep enough buffers posted in the SRQ? From akepner at sgi.com Fri Feb 2 13:34:15 2007 From: akepner at sgi.com (akepner at sgi.com) Date: Fri, 2 Feb 2007 13:34:15 -0800 (PST) Subject: [openib-general] [RFC/BUG] libibverbs: DMA vs. CQ race In-Reply-To: References: Message-ID: Thanks for having a look at this. On Mon, 29 Jan 2007, Roland Dreier wrote: > .... > Well, first the changes to the userspace libmthca need to be such that > new libmthca continues to work with old kernels.... Absolutely. > ..... > The really strange thing about this is that this Altix > coherent/consistent memory really isn't about the memory itself, but > about the relationship of that memory with DMA elsewhere -- as I > understand the code, doing dma_alloc_coherent() returns normal memory > with a special DMA address that tells the system to flush other DMAs > before doing DMA to the coherent region. Which isn't really what most > people understand coherent memory to be, but it has the magic property > of making most drivers work. > .... I agree that this isn't a very elegant solution, but I don't know of a better one. Assuming that something along the lines of the previous patch is used, we need to address userspace/kernel compatibility. The existing abi versioning doesn't seem to be exactly what we want to use, though, because we want to change a verb's semantics to work around a bug. (Changing the abi_version may be an inevitable result, though.) How about adding "semantic flags" to the mthca_* commands (mthca_create_cq, etc.)? Userspace could read the contents of a new sysfs file which, if found, would indicate the flags that the kernel understands. Then it could pass the flags, if it chooses, to get the kernel to use the desired semantics. Something like: # cat /sys/class/infiniband_verbs/uverbs0/abi_flags 0000000000000001 [64 bits of flags] where: enum abi_flags { COHERENT_USER_CQ = (1<<0), ..... }; Better/different ideas? -- Arthur From pasquale.davide at gmail.com Fri Feb 2 15:17:45 2007 From: pasquale.davide at gmail.com (Davide Pasquale) Date: Sat, 3 Feb 2007 00:17:45 +0100 Subject: [openib-general] OFED 1.1 build issue In-Reply-To: <1169128895.31746.73017.camel@hal.voltaire.com> References: <20070112112201.GB2802@mellanox.co.il> <1169123080.31746.67663.camel@hal.voltaire.com> <1169126162.31746.70598.camel@hal.voltaire.com> <1169128895.31746.73017.camel@hal.voltaire.com> Message-ID: Solved upgrading blade enclosure firmware to version 1.20! Thanks. On 18 Jan 2007 09:01:45 -0500, Hal Rosenstock wrote: > > On Thu, 2007-01-18 at 08:52, Davide Pasquale wrote: > > On 18 Jan 2007 08:19:34 -0500, Hal Rosenstock > > wrote: > > On Thu, 2007-01-18 at 08:02, Davide Pasquale wrote: > > > > > > On 18 Jan 2007 07:34:43 -0500, Hal Rosenstock > > > > > wrote: > > > On Thu, 2007-01-18 at 06:19, Davide Pasquale wrote: > > > > Starting opensm I see this error in > > /var/log/osm.log: > > > > > > > > OpenSM Rev:openib-2.0.5 OpenIB svn Exported > > revision > > > > Jan 18 12:11:39 628147 [95AA8160] -> > > osm_vendor_bind: > > > Binding to port > > > > 0x18feffff8c7a8d > > > > Jan 18 12:11:39 629557 [95AA8160] -> > > osm_vendor_bind: > > > Binding to port > > > > 0x18feffff8c7a8d > > > > Jan 18 12:11:39 630605 [41401960] -> SM port is > > down > > > > Jan 18 12:11:39 630693 [41401960] -> > > > __osm_sm_state_mgr_signal_error: > > > > ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in > > state > > > > IB_SMINFO_STATE_DISCOVERING > > > > Jan 18 12:11:49 631170 [41E02960] -> SM port is > > down > > > > Jan 18 12:11:49 631238 [41E02960] -> > > > __osm_sm_state_mgr_signal_error: > > > > ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in > > state > > > > IB_SMINFO_STATE_DISCOVERING > > > > > > > > and the SM port is always down. > > > > > > The error message is benign. > > > > > > Is the SM port plugged into any other IB device ? > > > > > > -- Hal > > > > > > Hi Hal, > > > > > > we are using HP Blade System and each blade has an > > infiniband card > > > onboard. > > > The SM port is plugged in the Infiniband switch internal to > > the blade > > > enclosure. > > > Is this information helpful for you ? > > > > The port being down has nothing to do with SM operation. For > > some > > reason, there is no connectivity or negotiation between the > > blades and > > the switch. > > > > -- Hal > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > What can I look to in order to solve this problem ? > > I don't know the HP blade system so the only thing I can say to try is > to unseat and reseat all the blades (HCAs and switch(es)) to see if this > resolves the problem. If it doesn't, I have no clue. > > -- Hal > > > > > Regards, > > Davide. > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Fri Feb 2 16:02:23 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 2 Feb 2007 16:02:23 -0800 Subject: [openib-general] [RFC] [PATCH] ib_usa: export multicast and informinfo registration to userspace Message-ID: <000001c74726$94d0f500$e598070a@amr.corp.intel.com> Export SA client capabilities for multicast and SA event registration to userspace. Multicast and event registration are tracked on a per port basis, with tracking done by the ib_sa kernel module. Based on feedback from the list, a new userspace SA module was added, rather than trying to rework the usermad interface. The user to kernel interface is minimal, but was designed to be flexible enough to add additional SA client support if needed. (E.g. local SA cache lookup, SA queries, service registration, etc.) Signed-off-by: Sean Hefty --- The following patch is also available from the user_sa branch of my rdma-dev.git tree, and is dependent on the informinfo branch/patch posted earlier to the list. (A couple of small fixes to the informinfo code have been added since the original patches.) A userspace sa library is also available. The informinfo and userspace support was completed as part of the PathForward project at the request of the US National Laboratories. diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 9edface..b5ffc78 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -18,15 +18,15 @@ config INFINIBAND_USER_MAD need libibumad from . config INFINIBAND_USER_ACCESS - tristate "InfiniBand userspace access (verbs and CM)" + tristate "InfiniBand userspace access (verbs, CM, SA client)" depends on INFINIBAND ---help--- Userspace InfiniBand access support. This enables the - kernel side of userspace verbs and the userspace - communication manager (CM). This allows userspace processes - to set up connections and directly access InfiniBand + kernel side of userspace verbs, the userspace communication + manager (CM), and userspace SA client. This allows userspace + processes to set up connections and directly access InfiniBand hardware for fast-path operations. You will also need - libibverbs, libibcm and a hardware driver library from + libibverbs, libibcm, libibsa, and a hardware driver library from . config INFINIBAND_ADDR_TRANS diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 2e9c4b2..e89cf2e 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -4,7 +4,7 @@ user_access-$(CONFIG_INFINIBAND_ADDR_TRANS) := rdma_ucm.o obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ ib_cm.o iw_cm.o $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o -obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ +obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o ib_usa.o \ $(user_access-y) ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ @@ -28,5 +28,7 @@ ib_umad-y := user_mad.o ib_ucm-y := ucm.o +ib_usa-y := usa.o + ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_mem.o \ uverbs_marshall.o diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 172a450..771f52a 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -464,6 +464,46 @@ static const struct ib_field notice_table[] = { .size_bits = 128 }, }; +int ib_sa_pack_attr(void *dst, void *src, int attr_id) +{ + switch (attr_id) { + case IB_SA_ATTR_MC_MEMBER_REC: + ib_pack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table), + src, dst); + break; + case IB_SA_ATTR_INFORM_INFO: + ib_pack(inform_table, ARRAY_SIZE(inform_table), src, dst); + break; + case IB_SA_ATTR_NOTICE: + ib_pack(notice_table, ARRAY_SIZE(notice_table), src, dst); + break; + default: + return -EINVAL; + } + return 0; +} +EXPORT_SYMBOL(ib_sa_pack_attr); + +int ib_sa_unpack_attr(void *dst, void *src, int attr_id) +{ + switch (attr_id) { + case IB_SA_ATTR_MC_MEMBER_REC: + ib_unpack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table), + src, dst); + break; + case IB_SA_ATTR_INFORM_INFO: + ib_unpack(inform_table, ARRAY_SIZE(inform_table), src, dst); + break; + case IB_SA_ATTR_NOTICE: + ib_unpack(notice_table, ARRAY_SIZE(notice_table), src, dst); + break; + default: + return -EINVAL; + } + return 0; +} +EXPORT_SYMBOL(ib_sa_unpack_attr); + static void free_sm_ah(struct kref *kref) { struct ib_sa_sm_ah *sm_ah = container_of(kref, struct ib_sa_sm_ah, ref); diff --git a/drivers/infiniband/core/usa.c b/drivers/infiniband/core/usa.c new file mode 100644 index 0000000..ae05091 --- /dev/null +++ b/drivers/infiniband/core/usa.c @@ -0,0 +1,792 @@ +/* + * Copyright (c) 2006-2007 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include + +#include + +MODULE_AUTHOR("Sean Hefty"); +MODULE_DESCRIPTION("IB userspace SA"); +MODULE_LICENSE("Dual BSD/GPL"); + +static void usa_add_one(struct ib_device *device); +static void usa_remove_one(struct ib_device *device); + +static struct ib_client usa_client = { + .name = "ib_usa", + .add = usa_add_one, + .remove = usa_remove_one +}; + +struct usa_device { + struct list_head list; + struct ib_device *device; + struct completion comp; + atomic_t refcount; + int start_port; + int end_port; +}; + +struct usa_file { + struct mutex file_mutex; + struct file *filp; + struct ib_sa_client sa_client; + struct list_head event_list; + struct list_head id_list; + wait_queue_head_t poll_wait; + int event_id; +}; + +struct usa_id { + struct usa_file *file; + struct usa_device *dev; + struct list_head list; + u64 uid; + int num; + int events_reported; + u16 attr_id; +}; + +struct usa_event { + struct usa_id *id; + struct list_head list; + struct ib_usa_event_resp resp; +}; + +struct usa_multicast { + struct usa_id id; + struct usa_event event; + struct ib_sa_multicast *multicast; +}; + +struct usa_inform_info { + struct usa_id id; + struct ib_inform_info *inform_info; +}; + +static DEFINE_MUTEX(usa_mutex); +static LIST_HEAD(dev_list); +static DEFINE_IDR(usa_idr); + +static struct usa_device *get_dev(__be64 guid, __u8 port_num) +{ + struct usa_device *dev; + + mutex_lock(&usa_mutex); + list_for_each_entry(dev, &dev_list, list) { + if (dev->device->node_guid == guid) { + if (port_num < dev->start_port || + port_num > dev->end_port) + break; + atomic_inc(&dev->refcount); + mutex_unlock(&usa_mutex); + return dev; + } + } + mutex_unlock(&usa_mutex); + return NULL; +} + +static void put_dev(struct usa_device *dev) +{ + if (atomic_dec_and_test(&dev->refcount)) + complete(&dev->comp); +} + +static int insert_id(struct usa_id *id) +{ + int ret; + + do { + ret = idr_pre_get(&usa_idr, GFP_KERNEL); + if (!ret) + break; + + mutex_lock(&usa_mutex); + ret = idr_get_new(&usa_idr, id, &id->num); + mutex_unlock(&usa_mutex); + } while (ret == -EAGAIN); + + return ret; +} + +static void remove_id(struct usa_id *id) +{ + mutex_lock(&usa_mutex); + idr_remove(&usa_idr, id->num); + mutex_unlock(&usa_mutex); +} + +static struct usa_id *get_id(int num, struct usa_file *file, u16 attr_id) +{ + struct usa_id *id; + + id = idr_find(&usa_idr, num); + if (!id) + return ERR_PTR(-ENOENT); + + if ((id->file != file) || (id->attr_id != attr_id)) + return ERR_PTR(-EINVAL); + + return id; +} + +static void insert_file_id(struct usa_file *file, struct usa_id *id) +{ + mutex_lock(&file->file_mutex); + list_add_tail(&id->list, &file->id_list); + mutex_unlock(&file->file_mutex); +} + +static void remove_file_id(struct usa_file *file, struct usa_id *id) +{ + mutex_lock(&file->file_mutex); + list_del(&id->list); + mutex_unlock(&file->file_mutex); +} + +static void finish_event(struct usa_event *event) +{ + switch (be16_to_cpu(event->resp.attr_id)) { + case IB_SA_ATTR_MC_MEMBER_REC: + list_del_init(&event->list); + event->id->events_reported++; + break; + default: + list_del(&event->list); + if (event->id) + event->id->events_reported++; + kfree(event); + break; + } +} + +static ssize_t usa_get_event(struct usa_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct ib_usa_get_event cmd; + struct usa_event *event; + int ret = 0; + DEFINE_WAIT(wait); + + if (out_len < sizeof(event->resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&file->file_mutex); + while (list_empty(&file->event_list)) { + mutex_unlock(&file->file_mutex); + + if (file->filp->f_flags & O_NONBLOCK) + return -EAGAIN; + + if (wait_event_interruptible(file->poll_wait, + !list_empty(&file->event_list))) + return -ERESTARTSYS; + + mutex_lock(&file->file_mutex); + } + + event = list_entry(file->event_list.next, struct usa_event, list); + + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &event->resp, sizeof(event->resp))) { + ret = -EFAULT; + goto done; + } + + finish_event(event); +done: + mutex_unlock(&file->file_mutex); + return ret; +} + +static void queue_event(struct usa_file *file, struct usa_event *event) +{ + mutex_lock(&file->file_mutex); + list_move_tail(&event->list, &file->event_list); + wake_up_interruptible(&file->poll_wait); + mutex_unlock(&file->file_mutex); +} + +/* + * We can get up to two events for a single multicast member. A second event + * only occurs if there's an error on an existing multicast membership. + * Report only the last event. + */ +static int multicast_handler(int status, struct ib_sa_multicast *multicast) +{ + struct usa_multicast *mcast = multicast->context; + struct usa_file *file = mcast->id.file; + + mcast->event.resp.status = status; + if (!status) { + mcast->event.resp.data_len = IB_SA_ATTR_MC_MEMBER_REC_LEN; + ib_sa_pack_attr(mcast->event.resp.data, &multicast->rec, + IB_SA_ATTR_MC_MEMBER_REC); + } + + queue_event(file, &mcast->event); + return 0; +} + +static int join_mcast(struct usa_file *file, struct ib_usa_request *req, + int out_len) +{ + struct usa_multicast *mcast; + struct ib_sa_mcmember_rec rec; + int ret; + + if (out_len < sizeof(u32)) + return -ENOSPC; + + mcast = kzalloc(sizeof *mcast, GFP_KERNEL); + if (!mcast) + return -ENOMEM; + + mcast->id.dev = get_dev(req->node_guid, req->port_num); + if (!mcast->id.dev) { + ret = -ENODEV; + goto err1; + } + + if (copy_from_user(mcast->event.resp.data, + (void __user *) (unsigned long) req->attr, + IB_SA_ATTR_MC_MEMBER_REC_LEN)) { + ret = -EFAULT; + goto err2; + } + + INIT_LIST_HEAD(&mcast->event.list); + mcast->event.id = &mcast->id; + mcast->event.resp.attr_id = cpu_to_be16(IB_SA_ATTR_MC_MEMBER_REC); + mcast->event.resp.uid = req->uid; + mcast->id.attr_id = IB_SA_ATTR_MC_MEMBER_REC; + mcast->id.uid = req->uid; + + ret = insert_id(&mcast->id); + if (ret) + goto err2; + + mcast->event.resp.id = mcast->id.num; + if (copy_to_user((void __user *) (unsigned long) req->response, + &mcast->id.num, sizeof(u32))) { + ret = EFAULT; + goto err3; + } + + mcast->id.file = file; + insert_file_id(file, &mcast->id); + + ib_sa_unpack_attr(&rec, mcast->event.resp.data, + IB_SA_ATTR_MC_MEMBER_REC); + mcast->multicast = ib_sa_join_multicast(&file->sa_client, + mcast->id.dev->device, + req->port_num, &rec, + (ib_sa_comp_mask) req->comp_mask, + GFP_KERNEL, multicast_handler, + mcast); + if (IS_ERR(mcast->multicast)) { + ret = PTR_ERR(mcast->multicast); + goto err4; + } + + return 0; + +err4: + remove_file_id(file, &mcast->id); +err3: + remove_id(&mcast->id); +err2: + put_dev(mcast->id.dev); +err1: + kfree(mcast); + return ret; +} + +static int get_mcast(struct usa_file *file, struct ib_usa_request *req, + int out_len) +{ + struct usa_device *dev; + struct ib_sa_mcmember_rec rec; + u8 mcmember_rec[IB_SA_ATTR_MC_MEMBER_REC_LEN]; + int ret; + + if (out_len < sizeof(IB_SA_ATTR_MC_MEMBER_REC_LEN)) + return -ENOSPC; + + if (req->comp_mask != IB_SA_MCMEMBER_REC_MGID) + return -ENOSYS; + + if (copy_from_user(mcmember_rec, + (void __user *) (unsigned long) req->attr, + IB_SA_ATTR_MC_MEMBER_REC_LEN)) + return -EFAULT; + + dev = get_dev(req->node_guid, req->port_num); + if (!dev) + return -ENODEV; + + ib_sa_unpack_attr(&rec, mcmember_rec, IB_SA_ATTR_MC_MEMBER_REC); + ret = ib_sa_get_mcmember_rec(dev->device, req->port_num, + &rec.mgid, &rec); + if (!ret) { + ib_sa_pack_attr(mcmember_rec, &rec, IB_SA_ATTR_MC_MEMBER_REC); + if (copy_to_user((void __user *) (unsigned long) req->response, + mcmember_rec, IB_SA_ATTR_MC_MEMBER_REC_LEN)) + ret = -EFAULT; + } + + put_dev(dev); + return ret; +} + +static int process_mcast(struct usa_file *file, struct ib_usa_request *req, + int out_len) +{ + /* Only indirect requests are currently supported. */ + if (!req->local) + return -ENOSYS; + + switch (req->method) { + case IB_MGMT_METHOD_GET: + return get_mcast(file, req, out_len); + case IB_MGMT_METHOD_SET: + return join_mcast(file, req, out_len); + default: + return -EINVAL; + } +} + +static int notice_handler(int status, struct ib_inform_info *info, + struct ib_sa_notice *notice) +{ + struct usa_inform_info *inform = info->context; + struct usa_file *file = inform->id.file; + struct usa_event *event; + + event = kzalloc(sizeof *event, GFP_KERNEL); + if (!event) + return 0; + + event->resp.uid = inform->id.uid; + event->id = &inform->id; + event->resp.status = status; + INIT_LIST_HEAD(&event->list); + + if (notice) { + event->resp.attr_id = cpu_to_be16(IB_SA_ATTR_NOTICE); + event->resp.data_len = IB_SA_ATTR_NOTICE_LEN; + ib_sa_pack_attr(event->resp.data, notice, IB_SA_ATTR_NOTICE); + } else + event->resp.attr_id = cpu_to_be16(IB_SA_ATTR_INFORM_INFO); + + queue_event(file, event); + return 0; +} + +static int reg_inform(struct usa_file *file, struct ib_usa_request *req, + int out_len) +{ + struct usa_inform_info *inform; + struct ib_sa_inform sa_inform_info; + u8 net_inform_info[IB_SA_ATTR_INFORM_INFO_LEN]; + u16 trap_number; + int ret; + + if (out_len < sizeof(u32)) + return -ENOSPC; + + if (copy_from_user(&net_inform_info, + (void __user *) (unsigned long) req->attr, + IB_SA_ATTR_INFORM_INFO_LEN)) + return -EFAULT; + + inform = kzalloc(sizeof *inform, GFP_KERNEL); + if (!inform) + return -ENOMEM; + + inform->id.dev = get_dev(req->node_guid, req->port_num); + if (!inform->id.dev) { + ret = -ENODEV; + goto err1; + } + + inform->id.attr_id = IB_SA_ATTR_INFORM_INFO; + inform->id.uid = req->uid; + + ret = insert_id(&inform->id); + if (ret) + goto err2; + + if (copy_to_user((void __user *) (unsigned long) req->response, + &inform->id.num, sizeof(u32))) { + ret = EFAULT; + goto err3; + } + + inform->id.file = file; + insert_file_id(file, &inform->id); + + ib_sa_unpack_attr(&sa_inform_info, &net_inform_info, + IB_SA_ATTR_INFORM_INFO); + trap_number = be16_to_cpu(sa_inform_info.trap.generic.trap_num); + inform->inform_info = + ib_sa_register_inform_info(&file->sa_client, + inform->id.dev->device, + req->port_num, trap_number, + GFP_KERNEL, notice_handler, + inform); + if (IS_ERR(inform->inform_info)) { + ret = PTR_ERR(inform->inform_info); + goto err4; + } + + return 0; + +err4: + remove_file_id(file, &inform->id); +err3: + remove_id(&inform->id); +err2: + put_dev(inform->id.dev); +err1: + kfree(inform); + return ret; +} + +static int process_inform(struct usa_file *file, struct ib_usa_request *req, + int out_len) +{ + /* Only indirect requests are currently supported. */ + if (!req->local) + return -ENOSYS; + + if (req->method != IB_MGMT_METHOD_SET) + return -EINVAL; + + return reg_inform(file, req, out_len); +} + +static ssize_t usa_request(struct usa_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct ib_usa_request req; + + if (copy_from_user(&req, inbuf, sizeof(req))) + return -EFAULT; + + switch (be16_to_cpu(req.attr_id)) { + case IB_SA_ATTR_MC_MEMBER_REC: + return process_mcast(file, &req, out_len); + case IB_SA_ATTR_INFORM_INFO: + return process_inform(file, &req, out_len); + default: + return -EINVAL; + } +} + +static void *cleanup_mcast(struct usa_id *id) +{ + struct usa_multicast *mcast; + + mcast = container_of(id, struct usa_multicast, id); + ib_sa_free_multicast(mcast->multicast); + + mutex_lock(&id->file->file_mutex); + list_del(&id->list); + list_del(&mcast->event.list); + mutex_unlock(&id->file->file_mutex); + + return mcast; +} + +static void *cleanup_inform(struct usa_id *id) +{ + struct usa_inform_info *inform; + + inform = container_of(id, struct usa_inform_info, id); + ib_sa_unregister_inform_info(inform->inform_info); + + mutex_lock(&id->file->file_mutex); + list_del(&id->list); + /* TODO cleanup events */ + mutex_unlock(&id->file->file_mutex); + + return inform; +} + +static int free_id(struct usa_id *id) +{ + void *free_obj; + int events_reported; + + switch (id->attr_id) { + case IB_SA_ATTR_MC_MEMBER_REC: + free_obj = cleanup_mcast(id); + break; + case IB_SA_ATTR_INFORM_INFO: + free_obj = cleanup_inform(id); + break; + default: + free_obj = NULL; + break; + } + + events_reported = id->events_reported; + put_dev(id->dev); + kfree(free_obj); + + return events_reported; +} + +static ssize_t usa_free(struct usa_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct ib_usa_free cmd; + struct ib_usa_free_resp resp; + struct usa_id *id; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&usa_mutex); + id = get_id(cmd.id, file, be16_to_cpu(cmd.attr_id)); + if (!IS_ERR(id)) + idr_remove(&usa_idr, id->num); + mutex_unlock(&usa_mutex); + + resp.events_reported = free_id(id); + + if (copy_to_user((void __user *) (unsigned long) cmd.response, + &resp, sizeof resp)) + ret = -EFAULT; + + return ret; +} + +static ssize_t (*usa_cmd_table[])(struct usa_file *file, + const char __user *inbuf, + int in_len, int out_len) = { + [IB_USA_CMD_REQUEST] = usa_request, + [IB_USA_CMD_GET_EVENT] = usa_get_event, + [IB_USA_CMD_FREE] = usa_free, +}; + +static ssize_t usa_write(struct file *filp, const char __user *buf, + size_t len, loff_t *pos) +{ + struct usa_file *file = filp->private_data; + struct ib_usa_cmd_hdr hdr; + ssize_t ret; + + if (len < sizeof(hdr)) + return -EINVAL; + + if (copy_from_user(&hdr, buf, sizeof(hdr))) + return -EFAULT; + + if (hdr.cmd < 0 || hdr.cmd >= ARRAY_SIZE(usa_cmd_table)) + return -EINVAL; + + if (hdr.in + sizeof(hdr) > len) + return -EINVAL; + + ret = usa_cmd_table[hdr.cmd](file, buf + sizeof(hdr), hdr.in, hdr.out); + if (!ret) + ret = len; + + return ret; +} + +static unsigned int usa_poll(struct file *filp, struct poll_table_struct *wait) +{ + struct usa_file *file = filp->private_data; + unsigned int mask = 0; + + poll_wait(filp, &file->poll_wait, wait); + + if (!list_empty(&file->event_list)) + mask = POLLIN | POLLRDNORM; + + return mask; +} + +static int usa_open(struct inode *inode, struct file *filp) +{ + struct usa_file *file; + + file = kmalloc(sizeof *file, GFP_KERNEL); + if (!file) + return -ENOMEM; + + ib_sa_register_client(&file->sa_client); + + INIT_LIST_HEAD(&file->event_list); + INIT_LIST_HEAD(&file->id_list); + init_waitqueue_head(&file->poll_wait); + mutex_init(&file->file_mutex); + + filp->private_data = file; + file->filp = filp; + return 0; +} + +static int usa_close(struct inode *inode, struct file *filp) +{ + struct usa_file *file = filp->private_data; + struct usa_id *id; + + while (!list_empty(&file->id_list)) { + id = list_entry(file->id_list.next, struct usa_id, list); + remove_id(id); + free_id(id); + } + ib_sa_unregister_client(&file->sa_client); + + kfree(file); + return 0; +} + +static void usa_add_one(struct ib_device *device) +{ + struct usa_device *dev; + + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + dev = kmalloc(sizeof *dev, GFP_KERNEL); + if (!dev) + return; + + dev->device = device; + if (device->node_type == RDMA_NODE_IB_SWITCH) + dev->start_port = dev->end_port = 0; + else { + dev->start_port = 1; + dev->end_port = device->phys_port_cnt; + } + + init_completion(&dev->comp); + atomic_set(&dev->refcount, 1); + ib_set_client_data(device, &usa_client, dev); + + mutex_lock(&usa_mutex); + list_add_tail(&dev->list, &dev_list); + mutex_unlock(&usa_mutex); +} + +static void usa_remove_one(struct ib_device *device) +{ + struct usa_device *dev; + + dev = ib_get_client_data(device, &usa_client); + if (!dev) + return; + + mutex_lock(&usa_mutex); + list_del(&dev->list); + mutex_unlock(&usa_mutex); + + /* TODO: force immediate device removal */ + put_dev(dev); + wait_for_completion(&dev->comp); + kfree(dev); +} + +static struct file_operations usa_fops = { + .owner = THIS_MODULE, + .open = usa_open, + .release = usa_close, + .write = usa_write, + .poll = usa_poll, +}; + +static struct miscdevice usa_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "ib_usa", + .fops = &usa_fops, +}; + +static ssize_t show_abi_version(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", IB_USA_ABI_VERSION); +} +static DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); + +static int __init usa_init(void) +{ + int ret; + + ret = misc_register(&usa_misc); + if (ret) + return ret; + + ret = device_create_file(usa_misc.this_device, &dev_attr_abi_version); + if (ret) + goto err1; + + ret = ib_register_client(&usa_client); + if (ret) + goto err2; + + return 0; + +err2: + device_remove_file(usa_misc.this_device, &dev_attr_abi_version); +err1: + misc_deregister(&usa_misc); + return ret; +} + +static void __exit usa_cleanup(void) +{ + ib_unregister_client(&usa_client); + device_remove_file(usa_misc.this_device, &dev_attr_abi_version); + misc_deregister(&usa_misc); + idr_destroy(&usa_idr); +} + +module_init(usa_init); +module_exit(usa_cleanup); diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h index a8e5221..f36be98 100644 --- a/include/rdma/ib_sa.h +++ b/include/rdma/ib_sa.h @@ -557,4 +557,7 @@ ib_sa_register_inform_info(struct ib_sa_client *client, */ void ib_sa_unregister_inform_info(struct ib_inform_info *info); +int ib_sa_pack_attr(void *dst, void *src, int attr_id); +int ib_sa_unpack_attr(void *dst, void *src, int attr_id); + #endif /* IB_SA_H */ diff --git a/include/rdma/ib_usa.h b/include/rdma/ib_usa.h new file mode 100644 index 0000000..0180cab --- /dev/null +++ b/include/rdma/ib_usa.h @@ -0,0 +1,97 @@ +/* + * Copyright (c) 2006-2007 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef IB_USA_H +#define IB_USA_H + +#include +#include + +#define IB_USA_ABI_VERSION 1 + +#define IB_USA_EVENT_DATA 256 + +enum { + IB_USA_CMD_REQUEST, + IB_USA_CMD_GET_EVENT, + IB_USA_CMD_FREE +}; + +enum { + IB_SA_ATTR_NOTICE_LEN = 80, + IB_SA_ATTR_INFORM_INFO_LEN = 36, + IB_SA_ATTR_MC_MEMBER_REC_LEN = 52 +}; + +struct ib_usa_cmd_hdr { + __u32 cmd; + __u16 in; + __u16 out; +}; + +struct ib_usa_request { + __u64 response; + __u64 uid; + __u64 node_guid; + __u64 comp_mask; + __u64 attr; + __be16 attr_id; + __u8 method; + __u8 port_num; + __u8 local; +}; + +struct ib_usa_free { + __u64 response; + __u32 id; + __be16 attr_id; +}; + +struct ib_usa_free_resp { + __u32 events_reported; +}; + +struct ib_usa_get_event { + __u64 response; +}; + +struct ib_usa_event_resp { + __u64 uid; + __u32 id; + __u32 status; + __u32 data_len; + __be16 attr_id; + __u16 reserved; + __u8 data[IB_USA_EVENT_DATA]; +}; + +#endif /* IB_USA_H */ From pradeep at us.ibm.com Fri Feb 2 16:31:32 2007 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Fri, 2 Feb 2007 16:31:32 -0800 Subject: [openib-general] IPoIB CM for merge? In-Reply-To: <1170431671.26115.25.camel@stevo-desktop> Message-ID: Hello Michael, Here are a few more observations : 1. For the SRQ case, the skbs and recieve biffers are posted during init and even before the rx_qp is created. This causes a problem (atleast for non SRQs) for the ehca. We need to call the ipoib_cm_alloc_skb() and ipoib_cm_post_recieve() after the rx_qp is in the RTR state. 2. Also found that in ipoib_cm_create_rx_qp() one needs to initialize .cap.max_recv_wr and .cap.max_recv_sge. Otherwise this leads to some problems like rq overflows and causing communication failures. Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisw at sous-sol.org Fri Feb 2 18:35:15 2007 From: chrisw at sous-sol.org (Chris Wright) Date: Fri, 02 Feb 2007 18:35:15 -0800 Subject: [openib-general] [patch 11/59] [stable] [PATCH] IB/mthca: Fix off-by-one in FMR handling on memfree References: <20070203023504.435051000@sous-sol.org> Message-ID: <20070203023916.739906000@sous-sol.org> An embedded and charset-unspecified text was scrubbed... Name: ib-mthca-fix-off-by-one-in-fmr-handling-on-memfree.patch URL: From kazeigan at yahoo.co.jp Fri Feb 2 18:37:39 2007 From: kazeigan at yahoo.co.jp () Date: Sat, 3 Feb 2007 11:37:39 +0900 (JST) Subject: [openib-general] =?ISO-2022-JP?B?g4GBW4OLgqCC6IKqgsaCpIKygrSCooLcgrWCvYH0?= Message-ID: 20070203113738 お久し振りです。瑞奈です。 先日はメールありがとうございました。 返事が遅くなってしまい、申し訳ありません。 前のメールで質問されていた仕事の話ですが・・・ 私は専業主婦なんです。 去年の12月からずっと家のことをやってて、それで忙しかったんです。 家事は楽しいんですが、さすがに疲れが・・・(>< こんな生活なので出会いもないし、誰かに甘えたくなっちゃう事も多くて。 それで、急にこんな事をいうと変に思われるかもしれませんが 一度会ってお話をしたいのですが、ご迷惑でしょうか? 私は世田谷区に住んでいる31歳です。 一緒にゴハンを食べたり、たくさんお話がしたいです♪ できれば今週末、新宿か渋谷あたりが私は都合がいいのですが いかがでしょうか? http://chu.punyu.jp/mizuna/ 最近、このサイトを利用しているので ここからメールを下さいませんか? mixiもやっているのですが、こちらの方が居心地がいいので このサイトばかりを使ってます(^^; それでは、お返事をお待ちしていますね。 瑞奈 From xma at us.ibm.com Fri Feb 2 20:58:37 2007 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 2 Feb 2007 20:58:37 -0800 Subject: [openib-general] Multicast join group failure prevents IPoIB performing Message-ID: When bringing IPoIB interface up, I hit default group multicast join failure. (This could be fixed in SM set up?) ib0: multicast join failed for xxxx, status -22 Then the interface was UP but not RUNNING. So the nodes couldn't ping each other. I think the right behavior of the interface should be UP and RUNNING even with some multicast join failure. I would like to provide a patch if there is no problem. Please advise. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at sw053.yok.mtl.com Fri Feb 2 21:28:01 2007 From: eitan at sw053.yok.mtl.com (Eitan Zahavi) Date: Sat, 3 Feb 2007 07:28:01 +0200 Subject: [openib-general] nightly osm_sim report 2007-02-03:normal completion Message-ID: <200702030528.l135S13O000650@sw053.yok.mtl.com> OSM Simulation Regression Summary OpenSM rev = Fri_Feb_2_09:16:30_2007 db386c ibutils rev = Wed_Jan_3_11:42:12_2007 913448 Total=410 Pass=410 Fail=0 Pass: 30 Stability IS1-16.topo 30 Pkey IS1-16.topo 30 OsmTest IS1-16.topo 30 OsmStress IS1-16.topo 30 Multicast IS1-16.topo 30 LidMgr IS1-16.topo 10 Stability IS3-loop.topo 10 Stability IS3-128.topo 10 Pkey IS3-128.topo 10 OsmTest IS3-loop.topo 10 OsmTest IS3-128.topo 10 OsmStress IS3-128.topo 10 Multicast IS3-loop.topo 10 Multicast IS3-128.topo 10 LidMgr IS3-128.topo 10 FatTree part-4-ary-3-tree.topo 10 FatTree merge-roots-reorder-4-ary-2-tree.topo 10 FatTree merge-roots-4-ary-2-tree.topo 10 FatTree merge-root-4-ary-3-tree.topo 10 FatTree merge-root-12-ary-2-tree.topo 10 FatTree merge-2-ary-4-tree.topo 10 FatTree half-4-ary-3-tree.topo 10 FatTree blend-4-ary-2-tree.topo 10 FatTree 4-ary-4-tree.topo 10 FatTree 4-ary-3-tree.topo 10 FatTree 32nodes-3lvl-is1.topo 10 FatTree 2-ary-4-tree.topo 10 FatTree 12-node-spaced.topo 10 FatTree 12-ary-2-tree.topo Failures: From vlad at lists.openfabrics.org Sat Feb 3 02:21:53 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sat, 3 Feb 2007 02:21:53 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070203-0200 daily build status Message-ID: <20070203102154.36F92E607F9@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.15 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From halr at voltaire.com Sat Feb 3 06:30:36 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Feb 2007 09:30:36 -0500 Subject: [openib-general] OpenIB management libraries release 1.0.2 Message-ID: <1170513034.4525.15093.camel@hal.voltaire.com> http://www.openfabrics.org/~halr/ md5sum b9b4bdf899f1d0ff15e06915cd846a3a libibcommon-1.0.2.tar.gz 2af3ff7e38a1f49fb7514660a9991c89 libibmad-1.0.2.tar.gz 7d7690abfe9b08c8240fbf0157653b90 libibumad-1.0.2.tar.gz From xma at us.ibm.com Sat Feb 3 08:54:41 2007 From: xma at us.ibm.com (Shirley Ma) Date: Sat, 3 Feb 2007 09:54:41 -0700 Subject: [openib-general] Multicast join group failure prevents IPoIB performing In-Reply-To: Message-ID: According to IPoIB RFC4391 section 5, once IPoIB broadcast group has been joined, the IPoIB link should be UP, since it's ready for data transfer, the interface should be able to run for broadcast and unicast, do not need to wait for all multicast join successfully. Here is the patch to allow IPoIB interface running without waiting for all multicast join succesful, like all host group multicast join .... Here is the patch: diff -urpN ipoib/ipoib_multicast.c ipoib-patch/ipoib_multicast.c --- ipoib/ipoib_multicast.c 2006-11-29 13:57:37.000000000 -0800 +++ ipoib-patch/ipoib_multicast.c 2007-02-03 00:52:23.000000000 -0800 @@ -566,6 +566,7 @@ void ipoib_mcast_join_task(void *dev_ptr if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { ipoib_mcast_join(dev, priv->broadcast, 0); + netif_carrier_on(dev); return; } @@ -599,7 +600,6 @@ void ipoib_mcast_join_task(void *dev_ptr ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); clear_bit(IPOIB_MCAST_RUN, &priv->flags); - netif_carrier_on(dev); } int ipoib_mcast_start_thread(struct net_device *dev) (See attached file: multicast.patch) http://www.rfc-editor.org/rfc/rfc4391.txt 5. Setting Up an IPoIB Link The broadcast-GID, as defined in the previous section, MUST be set up for an IPoIB subnet to be formed. Every IPoIB interface MUST "FullMember" join the IB multicast group defined by the broadcast- GID. This multicast group will henceforth be referred to as the broadcast group. The join operation returns the MTU, the Q_Key, and other parameters associated with the broadcast group. The node then associates the parameters received as a result of the join operation with its IPoIB interface. The broadcast group also serves to provide a link-layer broadcast service for protocols like ARP, net-directed, subnet-directed, and all-subnets-directed broadcasts in IPv4 over IB networks. The join operation is successful only if the Subnet Manager (SM) determines that the joining node can support the MTU registered with the broadcast group [RFC4392] ensuring support for a common link MTU. The SM also ensures that all the nodes joining the broadcast-GID have paths to one another and can therefore send and receive unicast packets. It further ensures that all the nodes do indeed form a multicast tree that allows packets sent from any member to be replicated to every other member. Thus, the IPoIB link is formed by the IPoIB nodes joining the broadcast group. There is no physical demarcation of the IPoIB link other than that determined by the broadcast group membership. Shirley Ma Shirley Ma/Beaverton/IBM@ IBMUS To Sent by: openib-general at openib.org openib-general-bo cc unces at openib.org Subject [openib-general] Multicast join 02/02/07 08:58 PM group failure prevents IPoIB performing When bringing IPoIB interface up, I hit default group multicast join failure. (This could be fixed in SM set up?) ib0: multicast join failed for xxxx, status -22 Then the interface was UP but not RUNNING. So the nodes couldn't ping each other. I think the right behavior of the interface should be UP and RUNNING even with some multicast join failure. I would like to provide a patch if there is no problem. Please advise. Thanks Shirley Ma_______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic07588.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: multicast.patch Type: application/octet-stream Size: 684 bytes Desc: not available URL: From bugzilla-daemon at lists.openfabrics.org Sat Feb 3 23:07:21 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Sat, 3 Feb 2007 23:07:21 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070204070721.CAE32E607F9@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #14 from erezz at voltaire.com 2007-02-03 23:07 ------- (In reply to comment #13) > I want to ask someone how I can apply the patch during build.sh run script? > As I know when I run build.sh my old files with patch always update throught > run rpm -i openib-1.1.src.rpm. How I can do it (apply my patches) or I need to > wait new releases? > (In reply to comment #11) > (In reply to comment #10) > > What is the output of uname -r ? This is VERY important. Also, can you run > `cat /etc/issue` and send the results? > > > > > As you can see my first message I wrote the my machine configuration: > >The machine configuration: > >Kernel: Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 > x86_64 x86_64 GNU/Linux > >OS: SUSE Linux Enterprise Server 10 (x86_64) > >gcc version: gcc (GCC) 4.1.0 (SUSE Linux) > > Unfortunately my machine didn't have the version of Linux in /etc/issue because > it is not right by IT requrements. Why? OFED 1.1 expects that you don't change this file. This is how SuSE ships it with SLES 10. I have saw the ofed_scripts/configure file > and I saw that for right choice of patches configure needed the file > /etc/issue. I think that not good idea because first of all need to run > command: cat /etc/*release* and find the version Linux in this file and after > this check (if neccessary) file /etc/issue > I don't understand the problem. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Sat Feb 3 23:14:59 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Sat, 3 Feb 2007 23:14:59 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070204071459.64292E607F9@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 erezz at voltaire.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |INVALID ------- Comment #15 from erezz at voltaire.com 2007-02-03 23:14 ------- (In reply to comment #13) > I want to ask someone how I can apply the patch during build.sh run script? I don't agree with your patch. It assumes that SLES 10 may be corrupted. OFED should not try to support this. If you want to use this patch for your own purposes, just apply it (manually) before running OFED build scripts. OFED's backport patches mechanism is not suitable for such patches. > As I know when I run build.sh my old files with patch always update throught > run rpm -i openib-1.1.src.rpm. How I can do it (apply my patches) or I need to > wait new releases? > -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ogerlitz at voltaire.com Sun Feb 4 00:13:40 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 04 Feb 2007 10:13:40 +0200 Subject: [openib-general] Detecting when an RDMA writer process disappears In-Reply-To: <45C2C7B1.7090204@evergrid.com> References: <45C2C7B1.7090204@evergrid.com> Message-ID: <45C595B4.3000700@voltaire.com> Mike Heffner wrote: > Is there any method by which a receiving process that is polling in > preregistered memory regions for data from a sender performing RDMA > writes, can detect if the sender is killed? Say by a SIGKILL signal? The > RC connection is setup using the RDMA CM and there do not appear to be > any CM events created on the event channel If you have a process with connected RDMA CM ID whose associated peer process died you should get DISCONNECTED event. how do you verify that there is no rdma cm event present at the polling side? Or. From ogerlitz at voltaire.com Sun Feb 4 00:32:02 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 04 Feb 2007 10:32:02 +0200 Subject: [openib-general] ip_ib_mc_map? In-Reply-To: <15ddcffd0702011518qf115aaey862ef168784e81ca@mail.gmail.com> References: <1170275331.14294.1.camel@stevo-desktop> <45C1ABD0.5090404@voltaire.com> <1170325052.2716.229.camel@fc6.xsintricity.com> <15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com> <1170368361.2716.239.camel@fc6.xsintricity.com> <15ddcffd0702011518qf115aaey862ef168784e81ca@mail.gmail.com> Message-ID: <45C59A02.6080900@voltaire.com> Or Gerlitz wrote: > On 2/2/07, Doug Ledford wrote: >> Yeah, I've got a setup, I just don't have any multicast tests that I >> run. Any test programs you have for multicast in particular would be >> helpful. > This is farely simple to do: have some multicast traffic routed over > an IPoIB subnet on two nodes, eg using > > $ route add -net 224.0.0.0 netmask 255.0.0.0 dev ib0 > $ iperf -usB 224.5.5.5 -i 1 OK, to verifying the problem is away based on running client/server is actually harder, since when the problem persist data is being moved on the broadcast group... so basically, first thing you want to do is set routing, then open an iperf server and see if the netstack has computed a correct IPoIB multicast hw address and instructed the device to use it. > # iperf -usB 224.5.5.5 & this is on U3, the stack computed fine the hw addresses for 224.5.5.5 and 224.0.0.1 > # ip maddr show ib0 > 5: ib0 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:05:05:05 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > inet 224.5.5.5 > inet 224.0.0.1 this is on U4, the stack did not compute any hw addresses for 224.5.5.5 and 224.0.0.1, the inet addresses are the output of /proc/net/igmp which means the stack is aware this node joins these groups but as we know the ARPHRD_INFINIBAND case was removed from the code computing a multicast link layer address... > # ip maddr show ib0 > 8: ib0 > inet 224.5.5.5 > inet 224.0.0.1 So basically, if on your U5-staged node, you have the same # ip maddr show output as over U3 we made a progress. Really verifying that this traffic does not go over the broadcast group is a little bit harder, you would need a third active IPoIB device (that is another node or a second ipoib running device on the rx machine - eg ib1), run the iperf multicast test and make sure the --rx counters-- of the third device doe not get progress, where on U3 they would progress since all mcast traffic goes on the broadcast channel. Please let me know if you need any further clarifications on how to test this, and... thanks! for taking care of it. Or. From vlad at lists.openfabrics.org Sun Feb 4 02:22:23 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sun, 4 Feb 2007 02:22:23 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070204-0200 daily build status Message-ID: <20070204102223.9F1DDE607F9@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From mst at mellanox.co.il Sun Feb 4 02:57:57 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 12:57:57 +0200 Subject: [openib-general] IPoIB CM for merge? In-Reply-To: <1170431671.26115.25.camel@stevo-desktop> References: <1170431671.26115.25.camel@stevo-desktop> Message-ID: <20070204105757.GC8630@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [openib-general] IPoIB CM for merge? > > On Fri, 2007-02-02 at 13:15 +0200, Michael S. Tsirkin wrote: > > > Quoting Roland Dreier : > > > Subject: Re: IPoIB CM for merge? > > > > > > > Could you please spend some time reviewing IPoIB CM code? > > > > I am concerned about missing the 2.6.21 merge window. > > > > > > Thanks for the reminder. > > > > > > Can we trade? Have you looked at the cxgb3 iwarp driver? Any comments? > > > > OK. > > I am not sure I have the last version posted so I am going to go by what > > is there in OFED git tree. > > > > And I also only looked under drivers/infiniband/. > > > > So, here are some questions: I looked in the archives and have not seen > > these addressed. Maybe these can be answered and then I'll go from there? > > Does this sound OK? > > > > Files with names like > > ./core/cxio_hal.c > > ./core/cxio_hal.h > > normally generate a fair bit of discussion which wasn't present here, > > I did not guess everyone was just busy. > > For example, why is there both struct iwch_cq and struct t3_cq? > > > > The cxgb3/core code defines a low level interface to the RDMA bits of > the T3 device. > > This code was originally a separate module (named cxio) that allowed > other RDMA middleware layers to sit on top of the this core rdma module. > At the time, there was RNIC-PI and OFA being developed. So that is the > history of this. As per the first openib review (about a year ago) of > this code I merged this core module into the cxgb3 module. I left the > file structure and names as-is because it was low priority IMO. > > The t3_cq struct is the low level CQ structure used to manage both a HW > accessed CQ and a SW CQ (needed to handle error cases and out of order > completions). The iwch_cq struct contains the stuff needed to integrate > with the OFA core and uverbs code. It contains a t3_cq inline. So now that there's a common module, there's no technical reason for the two-level structure to exist? I would say you want to at least move the files into a common directory. I think you will also find that for datapath operations such as poll cq, converting completion from hardware to struct t3_cqe, and from that to ib_wc adds an untrivial amount of overhead. > > File tcb.h comment says: > > /* This file is automatically generated --- do not edit */ > > This looks like a GPL violation, does it not? > > > > I can add the license if that's what you mean. I mean that this file does not seem to be the source, in the GPL sense. The following comes from COPYING under linux source directory: The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. So I think you must make the actual source available under the terms of GPL. > > What's the deal with the naming convention? > > Is there a reason in cxgb3, some files start with iwch and some with cxio? > > How about using cxgb3 prefix all over? > > The cxio_ prefix is used for the low-level functions/types that talk > directly with the HW. iwch_ is the provider driver functions that > interface with the OFA stack. I'd rather not change the names. > Especially since this has already gone through several review cycles. > I'm hoping we can get this in and improve it with subsequent > submissions. Is that reasonable? -- MST From monis at voltaire.com Sun Feb 4 04:21:02 2007 From: monis at voltaire.com (Moni Shoua) Date: Sun, 04 Feb 2007 14:21:02 +0200 Subject: [openib-general] IB/mthca: question about HCA profile module parameters In-Reply-To: <45C1C3D5.1050301@dev.mellanox.co.il> References: <45C1C3D5.1050301@dev.mellanox.co.il> Message-ID: <45C5CFAE.9000302@voltaire.com> Dotan Barak wrote: > Hi Moni. > > I tried to use the mthca module parameter: for example i tried to change > the number of QPs. > > I got several failures when i used the HCA 25204: > * sometimes i got the following error message (when using big values, > for example 512K QPs): > ib_mthca: 0000:0c: INIT_HCA command failed aborting. > ib_mthca: probe of 0000:0c: failed with error -16 > * when i tried to use small amount of QPs (1024) the machine just hanged > and i noticed a kernel oops message on the console > > > Did you verify the HCA profile module parameter feature? > Is there is any known limitation for the values that should be used? > (for example: only values which are power of two) > > > thanks > Dotan > Hi Dotan, I verified the profile feature up to the level of successful modprobe. I am working now to look into your report. thanks From mst at mellanox.co.il Sun Feb 4 04:58:20 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 14:58:20 +0200 Subject: [openib-general] IPoIB connected mode review comments In-Reply-To: <1170443893.26115.59.camel@stevo-desktop> References: <1170443893.26115.59.camel@stevo-desktop> Message-ID: <20070204125820.GA14288@mellanox.co.il> > Quoting Steve Wise : > Subject: IPoIB connected mode review comments > > On Thu, 2007-02-01 at 20:48 -0800, Roland Dreier wrote: > > > Have you had a chance to review this? > > > > Still on my list. > > > > Can we trade? Can you look at the IPoIB connected mode stuff in the > > ipoib-cm branch in > > > > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git > > > > and let me know if you see anything you don't like? > > > > - R. > > Here are my comments. I'm not an ib cm expert though. These are mostly > questions: Steve, thanks for looking at the code! I hope the following answers your questions. > > Since IPoIB is using IP addresses already, wouldn't it be simpler to use > the rdma cm to setup connections? IPoIB is not using IP addresses. It uses hardware addresses as any network device would. So using rdma cm does not make sense. > Could you optimize this design and only signal some of the tx wrs? This optimization would apply to UD mode too. No one so far came up with a way to do this cleanly. > In ipoib_cm_send() you call ipoib_cm_skb_too_long() if the packet is too > large for the interface mtu. And you print a warning. But > ipoib_cm_skb_too_long() actually queues the packet for the cm case. For > ud it just drops the packet. The skb task for cm then will send a > ICMP_DEST_UNREACH for these packets. Why the difference? For UD I just kept the current behaviour - I think this can actually only happen in case of a race when packet was queued before MTU was changed, so the originator was already notified of the MTU change by the stack above us. For CM the local MTU may exceed the size of a buffer that was posted on the remote QP. So we need to send ICMP_DEST_UNREACH to reduce the originator's dest MTU to whatever this QP actually can support. Since this needs the original skb, and must be done from task or bh context, so we queue the skb and handle it in task context. > Also if this > packet came from the local stack via a local application, you don't want > to send DEST_UNREACH, right? (I'm probably just confused about the > purpose of this). Yes, sending DEST_UNREACH does not seem to affect local interface. That's why I call update_pmtu too. It is also good to update the MTU ASAP to reduce the number lot of packets that are dropped - and update_pmtu can be called from atomic context. I do not know how to tell the packet is from local stack and it does not seem to do any harm to handle all packets in a uniform manner. net/ipv4/ip_gre.c and net/ipv4/ipip.c are examples of code that do something similiar. > In ipoib_cm_tx_completion() you rearm, then drain the cq. I thought > there was some reason that it was better to do drain/rearm/drain? > Something about if you rearm and there's a cq entry mthca does another > immediate interrupt? Again, this comment applies to UD mode as well. AFAIK so far this worked best. > In ipoib_cm_handle_tx_wc(): > > When can a tx completion happen with a wr_id that isn't within the > ipoib_sendq_size range? This looks like it is really a bug condition > that should never happen. Because of this: post_send(priv, tx, tx->tx_head & (ipoib_sendq_size - 1)) so wr_id is always within range. Again, this is exactly the same logic as in UD case. > I see the same code in the rx completion path too. It's even simpler there: + for (i = 0; i < ipoib_recvq_size; ++i) { ... + if (ipoib_cm_post_receive(dev, i)) { ... + } + } So i is always within RX size range. > Also, what's up with the /* FIXME */ comment? Since I have QPs which I never post send WRs on, I should be able to set .cap.max_send_wr to 0 and .cap.max_send_sge should not matter. However, low level drivers do not seem to support this at the moment, so I set these to 1 for now - this is also correct but has a small memory cost. > You lock the priv->lock inside of the priv->tx_lock. Is this ordering > correct and consistent across all the code? Yes, that's the nesting rule. > ipoib_cm_handle_rx_wc() - what's up with the XXX comment? We have the same comment in UD code - that's where this comes from. Basically we don't have an easy way to know the correct packet type, and always setting it to PACKET_HOST seems to work. > What's the algorithm to keep enough buffers posted in the SRQ? Same as with UD really - if I can't allocate a new skb I repost the old one and increment the dropped packet counter. -- MST From mst at mellanox.co.il Sun Feb 4 05:06:06 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 15:06:06 +0200 Subject: [openib-general] IPoIB CM for merge? In-Reply-To: References: Message-ID: <20070204130606.GB14288@mellanox.co.il> > Quoting Pradeep Satyanarayana : > Subject: Re: [openib-general] IPoIB CM for merge? > > > Hello Michael, > > Here are a few more observations : Pradeep, I think you are posting in the wrong thread: it seems you are not talking about my code, but rather about the project you mentioned of implementing IPoIB CM without SRQ. IPoIB CM currently falls back on UD mode for HCAs that do not support SRQ, so there should be no problem for the ehca - as new code won't be activated. As I said already, I do not see a clean way to address this limitation, so I would rather have current IPoIB CM code merged upstream first, and think about enhancements later. > > 1. For the SRQ case, the skbs and recieve biffers are posted during init and even before the rx_qp is created. This causes a problem (atleast for non SRQs) for the ehca. We need to call the ipoib_cm_alloc_skb() and ipoib_cm_post_recieve() after the rx_qp > is in the RTR state. > > 2. Also found that in ipoib_cm_create_rx_qp() one needs to initialize .cap.max_recv_wr and .cap.max_recv_sge. Otherwise this leads to some problems like rq overflows and causing communication failures. Yes, I think these are some of the things that would need to be done to make IPoIB CM work without SRQ. It is clearly not something we want to do for SRQ case however: for example, posting WRs to SRQ during connection setup would race against completion events for other connections. And assigning .cap.max_recv_wr > 0 for a QP not connected to SRQ does not make sense, and might thinkably confuse low level drivers. -- MST From mst at mellanox.co.il Sun Feb 4 05:07:18 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 15:07:18 +0200 Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus In-Reply-To: <1170430869.26115.12.camel@stevo-desktop> References: <1170430869.26115.12.camel@stevo-desktop> Message-ID: <20070204130718.GC14288@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus > > On Fri, 2007-02-02 at 02:20 -0800, vlad at lists.openfabrics.org wrote: > > This email was generated automatically, please do not reply > > Which distro is 2.6.16.21-0.8-default? I'm sure I didn't do a netevent > backport that. That's SLES10 actually. > Failed: > Build failed on ia64 with linux-2.6.16.21-0.8-default > Log: > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ > make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 > make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' > make: *** [kernel] Error 2 -- MST From mst at mellanox.co.il Sun Feb 4 05:14:14 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 15:14:14 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <1170429504.26115.1.camel@stevo-desktop> References: <1170429504.26115.1.camel@stevo-desktop> Message-ID: <20070204131414.GD14288@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [PATCH 00/12] ofed_1_2 - Neighbour update support > > On Fri, 2007-02-02 at 08:03 +0200, Michael S. Tsirkin wrote: > > > We could use a global refcnt to count the number of pending destructions > > > and use a completion object to block unload until all the destructors > > > fire and the refcnt goes to zero. > > > > It has the same race as module refcnt. So just use that. > > > > I don't understand the race. Can you explain please? This should be > able to be done without a race with a refcnt, a spinlock, a bit saying > we're unloading, and a completion object. > > But maybe I'm confused ;-) In short, the rule is that you can't pass a pointer to your function to another module, and the unload module safely without synchronizing with that other module. Simplified example: destructor { complete(&foo); A: return; } module_cleanup: { wait(foo) return; } Now, assume destructor runs up to point A, then your module unloads, and the memory its text occupied is overwritten by something else. An attempt to execute code from point A will now crash. So completion is not better than just module refcount here. That said, I think the race is unlikely and just using module refcount should be sufficient, and it's certainly simple. -- MST From mst at mellanox.co.il Sun Feb 4 05:15:00 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 15:15:00 +0200 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <1170355331.16637.25.camel@stevo-desktop> References: <1170355331.16637.25.camel@stevo-desktop> Message-ID: <20070204131500.GE14288@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [PATCH] RE: regression in ofed 1.2 > > Um, now on rhel4u4 we crash creating the mcast workqueue. > > The name is "ib_mcast_wq" which is too long for older kernels. > > Did we loose a backport patch? Not sure what happened here. Sean, could you rename ib_mcast_wq to ib_mcast please? -- MST From mst at mellanox.co.il Sun Feb 4 06:00:19 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 16:00:19 +0200 Subject: [openib-general] [RFC] [PATCH] ib_usa: export multicast and informinfo registration to userspace In-Reply-To: <000001c74726$94d0f500$e598070a@amr.corp.intel.com> References: <000001c74726$94d0f500$e598070a@amr.corp.intel.com> Message-ID: <20070204140019.GC18543@mellanox.co.il> +static void usa_remove_one(struct ib_device *device) +{ + struct usa_device *dev; + + dev = ib_get_client_data(device, &usa_client); + if (!dev) + return; + + mutex_lock(&usa_mutex); + list_del(&dev->list); + mutex_unlock(&usa_mutex); + + /* TODO: force immediate device removal */ + put_dev(dev); + wait_for_completion(&dev->comp); + kfree(dev); +} I think we really need to address this TODO. An application waiting for data from SA needs to get woken up and get an error code indicating that the device was removed. This is currently broken in umad, but let's do it correctly here. -- MST From mst at mellanox.co.il Sun Feb 4 06:02:49 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 16:02:49 +0200 Subject: [openib-general] Detecting when an RDMA writer process disappears In-Reply-To: <45C595B4.3000700@voltaire.com> References: <45C2C7B1.7090204@evergrid.com> <45C595B4.3000700@voltaire.com> Message-ID: <20070204140249.GD18543@mellanox.co.il> > Quoting Or Gerlitz : > Subject: Re: Detecting when an RDMA writer process disappears > > Mike Heffner wrote: > > Is there any method by which a receiving process that is polling in > > preregistered memory regions for data from a sender performing RDMA > > writes, can detect if the sender is killed? Say by a SIGKILL signal? The > > RC connection is setup using the RDMA CM and there do not appear to be > > any CM events created on the event channel > > If you have a process with connected RDMA CM ID whose associated peer > process died you should get DISCONNECTED event. how do you verify that > there is no rdma cm event present at the polling side? You may or may not get this event in case of packet loss - same as with sockets. Sending keepalives is really the only way if you want to handle all cases such as remote node crash. -- MST From vlad at mellanox.co.il Sun Feb 4 06:34:25 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 04 Feb 2007 16:34:25 +0200 Subject: [openib-general] openib diags installation issue Message-ID: <1170599665.5887.14.camel@vladsk-laptop> Hi Hal, I have the following issue while executing 'make DESTDIR=/var/tmp/OFED install': See the patch below for fixing this issue. /usr/bin/install -c -m 644 './man/ibprintca.8' '/var/tmp/OFED/usr/local/ofed/share/man/man8/ibprintca.8' /usr/bin/install -c -m 644 './man/ibfindnodesusing.8' '/var/tmp/OFED/usr/local/ofed/share/man/man8/ibfindnodesusing.8' make install-data-hook make[3]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/management/diags' for script in scripts/ibqueryerrors.pl scripts/ibswportwatch.pl scripts/iblinkinfo.pl scripts/ibprintswitch.pl scripts/ibprintca.pl scripts/ibfindnodesusing.pl; do \ binname=`echo $script | sed -e "s/scripts\/\(.*\)/\1/"`; \ cat $script | sed -e "s,use lib \"\(/lib/perl\)\";,use lib \"/usr/local/ofed\1\";," > /usr/local/ofed/bin/$binname; \ chmod 755 /usr/local/ofed/bin/$binname; \ done /bin/bash: line 2: /usr/local/ofed/bin/ibqueryerrors.pl: No such file or directory chmod: cannot access `/usr/local/ofed/bin/ibqueryerrors.pl': No such file or directory /bin/bash: line 2: /usr/local/ofed/bin/ibswportwatch.pl: No such file or directory chmod: cannot access `/usr/local/ofed/bin/ibswportwatch.pl': No such file or directory /bin/bash: line 2: /usr/local/ofed/bin/iblinkinfo.pl: No such file or directory chmod: cannot access `/usr/local/ofed/bin/iblinkinfo.pl': No such file or directory /bin/bash: line 2: /usr/local/ofed/bin/ibprintswitch.pl: No such file or directory chmod: cannot access `/usr/local/ofed/bin/ibprintswitch.pl': No such file or directory /bin/bash: line 2: /usr/local/ofed/bin/ibprintca.pl: No such file or directory chmod: cannot access `/usr/local/ofed/bin/ibprintca.pl': No such file or directory /bin/bash: line 2: /usr/local/ofed/bin/ibfindnodesusing.pl: No such file or directory chmod: cannot access `/usr/local/ofed/bin/ibfindnodesusing.pl': No such file or directory make[3]: *** [install-data-hook] Error 1 make[3]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/management/diags' make[2]: *** [install-data-am] Error 2 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/management/diags' make[1]: *** [install-am] Error 2 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/management/diags' make: *** [install_diags] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.37589 (%install) Patch for fixing the issue above: diff --git a/diags/Makefile.am b/diags/Makefile.am index 06b21fc..81ece28 100644 --- a/diags/Makefile.am +++ b/diags/Makefile.am @@ -150,9 +150,9 @@ dist-hook: diags.spec install-data-hook: for script in $(IB_SW_COUNT_DEPENDANT); do \ binname=`echo $$script | sed -e "s/scripts\/\(.*\)/\1/"`; \ - cat $$script | sed -e "s,use lib \"\(/lib/perl\)\";,use lib \"$(prefix)\1\";," > $(bindir)/$$binname; \ - chmod 755 $(bindir)/$$binname; \ + cat $$script | sed -e "s,use lib \"\(/lib/perl\)\";,use lib \"$(prefix)\1\";," > $(DESTDIR)$(bindir)/$$binname; \ + chmod 755 $(DESTDIR)$(bindir)/$$binname; \ done - $(top_srcdir)/config/install-sh -m 755 -d $(prefix)/lib/perl - $(top_srcdir)/config/install-sh -m 755 scripts/IBswcountlimits.pm $(prefix)/lib/perl + $(top_srcdir)/config/install-sh -m 755 -d $(DESTDIR)$(prefix)/lib/perl + $(top_srcdir)/config/install-sh -m 755 scripts/IBswcountlimits.pm $(DESTDIR)$(prefix)/lib/perl From monis at voltaire.com Sun Feb 4 06:57:14 2007 From: monis at voltaire.com (Moni Shoua) Date: Sun, 04 Feb 2007 16:57:14 +0200 Subject: [openib-general] IB/mthca: question about HCA profile module parameters In-Reply-To: <45C1C3D5.1050301@dev.mellanox.co.il> References: <45C1C3D5.1050301@dev.mellanox.co.il> Message-ID: <45C5F44A.9020802@voltaire.com> Dotan Barak wrote: > Hi Moni. > > I tried to use the mthca module parameter: for example i tried to change > the number of QPs. > > I got several failures when i used the HCA 25204: > * sometimes i got the following error message (when using big values, > for example 512K QPs): > ib_mthca: 0000:0c: INIT_HCA command failed aborting. > ib_mthca: probe of 0000:0c: failed with error -16 > * when i tried to use small amount of QPs (1024) the machine just hanged > and i noticed a kernel oops message on the console > OK. So I ran more tests on my setup which now include - Dual x86_64 processor (Intel Xeon) - 1GB RAM - 25204 HCA - fw_ver=1.1.0 In the range of 16K - to 256K of value for num_qp I got no errors. For lower and higher values I got errors from INIT_HCA and (not always and just for very low values) a machine hung. Do you have the Oops saved somewhere? Can you put it here please? > > Did you verify the HCA profile module parameter feature? As I mentioned earlier, I verified that non default values can be assigned and that the HCA works for some selected values. I also noticed that illegal cause the driver to throw a message to the kernel log. However, I didn't test the exact behaviout of all possible values for each profile variable. > Is there is any known limitation for the values that should be used? > (for example: only values which are power of two) > > I guess that it is clear that there are hardware limitations that don't allow setting of any value. Unfotunately, even after looking for them in the PRM, I couldn't figure out which are they. The software limits the value to be a power of 2 and corrects the users if they try to set a wrong value (to the nearest power of 2). In that case a warning message is thrown to the kernel log. > thanks > Dotan > From mst at mellanox.co.il Sun Feb 4 06:59:58 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 16:59:58 +0200 Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update support In-Reply-To: <1170372217.16637.87.camel@stevo-desktop> References: <1170372217.16637.87.camel@stevo-desktop> Message-ID: <20070204145958.GA20087@mellanox.co.il> > If you're worried about regressing straight rdma address > translation, then you can call the address translation timer function > synchronously in the snoop function like before and change the > addr_trans module to not use netevents... This seems the prudent thing to do. OK, I'll do that. -- MST From swise at opengridcomputing.com Sun Feb 4 07:48:57 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Sun, 04 Feb 2007 09:48:57 -0600 Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 netevent backport] In-Reply-To: <1170360441.16637.41.camel@stevo-desktop> References: <1170360441.16637.41.camel@stevo-desktop> Message-ID: <1170604137.4129.13.camel@linux-q667.site> Vlad/Michael, I'm still tracking this as an outstanding patch. Have you pulled this in yet? Thanks, Steve. On Thu, 2007-02-01 at 14:07 -0600, Steve Wise wrote: > From: Steve Wise > > Add skbuff.h to include list for RHEL4U4 netevent.c file. This makes > it identical to the SLES9SP3 file. > > Signed-off-by: Steve Wise > --- > > .../backport/2.6.9_U4/include/src/netevent.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > index 1589300..87fb55c 100644 > --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > @@ -13,6 +13,7 @@ > * Fixes: > */ > > +#include > #include > #include > #include > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Sun Feb 4 07:49:41 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Sun, 04 Feb 2007 09:49:41 -0600 Subject: [openib-general] [PATCH] ofed_1_2 Chelsio ethernet driver updates. In-Reply-To: <1170360543.16637.45.camel@stevo-desktop> References: <1170360543.16637.45.camel@stevo-desktop> Message-ID: <1170604181.4129.15.camel@linux-q667.site> Vlad/Michael, I'm still tracking this as an outstanding patch. Can you pull it in please? Thanks, Steve. On Thu, 2007-02-01 at 14:09 -0600, Steve Wise wrote: > From: Steve Wise > > This patch updates the ofed_1_2 cxgb3 module to the latest queued > for 2.6.21. > > Signed-off-by: Steve Wise > --- > > drivers/net/cxgb3/firmware_exports.h | 2 +- > drivers/net/cxgb3/sge.c | 21 +++++++++------------ > drivers/net/cxgb3/t3_cpl.h | 3 --- > 3 files changed, 10 insertions(+), 16 deletions(-) > > diff --git a/drivers/net/cxgb3/firmware_exports.h b/drivers/net/cxgb3/firmware_exports.h > index 4538377..6a835f6 100755 > --- a/drivers/net/cxgb3/firmware_exports.h > +++ b/drivers/net/cxgb3/firmware_exports.h > @@ -129,7 +129,7 @@ #define FW_OFLD_NUM 8 > #define FW_OFLD_SGEEC_START 0 > > /* > - * > + * > */ > #define FW_RI_NUM 1 > #define FW_RI_SGEEC_START 65527 > diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c > index 6b053bf..3f2cf8a 100755 > --- a/drivers/net/cxgb3/sge.c > +++ b/drivers/net/cxgb3/sge.c > @@ -601,17 +601,16 @@ static struct sk_buff *get_packet(struct > if (len <= SGE_RX_COPY_THRES) { > skb = alloc_skb(len, GFP_ATOMIC); > if (likely(skb != NULL)) { > - struct rx_desc *d = &fl->desc[fl->cidx]; > - dma_addr_t mapping = > - (dma_addr_t)((u64) be32_to_cpu(d->addr_hi) << 32 | > - be32_to_cpu(d->addr_lo)); > - > __skb_put(skb, len); > - pci_dma_sync_single_for_cpu(adap->pdev, mapping, len, > - PCI_DMA_FROMDEVICE); > + pci_dma_sync_single_for_cpu(adap->pdev, > + pci_unmap_addr(sd, > + dma_addr), > + len, PCI_DMA_FROMDEVICE); > memcpy(skb->data, sd->skb->data, len); > - pci_dma_sync_single_for_device(adap->pdev, mapping, len, > - PCI_DMA_FROMDEVICE); > + pci_dma_sync_single_for_device(adap->pdev, > + pci_unmap_addr(sd, > + dma_addr), > + len, PCI_DMA_FROMDEVICE); > } else if (!drop_thres) > goto use_orig_buf; > recycle: > @@ -1667,7 +1666,7 @@ #endif > credits = G_RSPD_TXQ0_CR(flags); > if (credits) > qs->txq[TXQ_ETH].processed += credits; > - > + > credits = G_RSPD_TXQ2_CR(flags); > if (credits) > qs->txq[TXQ_CTRL].processed += credits; > @@ -2220,14 +2219,12 @@ static irqreturn_t t3b_intr_napi(int irq > if (likely(map & 1)) { > dev = adap->sge.qs[0].netdev; > > - BUG_ON(napi_is_scheduled(dev)); > if (likely(__netif_rx_schedule_prep(dev))) > __netif_rx_schedule(dev); > } > if (map & 2) { > dev = adap->sge.qs[1].netdev; > > - BUG_ON(napi_is_scheduled(dev)); > if (likely(__netif_rx_schedule_prep(dev))) > __netif_rx_schedule(dev); > } > diff --git a/drivers/net/cxgb3/t3_cpl.h b/drivers/net/cxgb3/t3_cpl.h > index 96b2f36..b7a1a31 100755 > --- a/drivers/net/cxgb3/t3_cpl.h > +++ b/drivers/net/cxgb3/t3_cpl.h > @@ -184,9 +184,6 @@ #define V_OPCODE(x) ((x) << S_OPCODE) > #define G_OPCODE(x) (((x) >> S_OPCODE) & 0xFF) > #define G_TID(x) ((x) & 0xFFFFFF) > > -#define S_QNUM 0 > -#define G_QNUM(x) (((x) >> S_QNUM) & 0xFFFF) > - > /* tid is assumed to be 24-bits */ > #define MK_OPCODE_TID(opcode, tid) (V_OPCODE(opcode) | (tid)) > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Sun Feb 4 07:52:44 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 17:52:44 +0200 Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 neteventbackport] In-Reply-To: <1170604137.4129.13.camel@linux-q667.site> References: <1170604137.4129.13.camel@linux-q667.site> Message-ID: <20070204155244.GC20087@mellanox.co.il> No, but it really makes sense. Vlad? Quoting Steve WIse : Subject: Re: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 neteventbackport] Vlad/Michael, I'm still tracking this as an outstanding patch. Have you pulled this in yet? Thanks, Steve. On Thu, 2007-02-01 at 14:07 -0600, Steve Wise wrote: > From: Steve Wise > > Add skbuff.h to include list for RHEL4U4 netevent.c file. This makes > it identical to the SLES9SP3 file. > > Signed-off-by: Steve Wise > --- > > .../backport/2.6.9_U4/include/src/netevent.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > index 1589300..87fb55c 100644 > --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > @@ -13,6 +13,7 @@ > * Fixes: > */ > > +#include > #include > #include > #include > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > -- MST From mst at mellanox.co.il Sun Feb 4 07:54:47 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 17:54:47 +0200 Subject: [openib-general] [PATCH] ofed_1_2 Chelsio ethernet driver updates. In-Reply-To: <1170604181.4129.15.camel@linux-q667.site> References: <1170360543.16637.45.camel@stevo-desktop> <1170604181.4129.15.camel@linux-q667.site> Message-ID: <20070204155447.GD20087@mellanox.co.il> Vlad? Quoting Steve WIse : Subject: Re: [PATCH] ofed_1_2 Chelsio ethernet driver updates. Vlad/Michael, I'm still tracking this as an outstanding patch. Can you pull it in please? Thanks, Steve. On Thu, 2007-02-01 at 14:09 -0600, Steve Wise wrote: > From: Steve Wise > > This patch updates the ofed_1_2 cxgb3 module to the latest queued > for 2.6.21. > > Signed-off-by: Steve Wise -- MST From swise at opengridcomputing.com Sun Feb 4 07:57:57 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Sun, 04 Feb 2007 09:57:57 -0600 Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus In-Reply-To: <20070204130718.GC14288@mellanox.co.il> References: <1170430869.26115.12.camel@stevo-desktop> <20070204130718.GC14288@mellanox.co.il> Message-ID: <1170604677.4129.20.camel@linux-q667.site> So its building sles10 ok on all other platforms but ia64? It seems like its not including the netevent.c file. But that backport does exist. On Sun, 2007-02-04 at 15:07 +0200, Michael S. Tsirkin wrote: > > Quoting Steve Wise : > > Subject: Re: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus > > > > On Fri, 2007-02-02 at 02:20 -0800, vlad at lists.openfabrics.org wrote: > > > This email was generated automatically, please do not reply > > > > Which distro is 2.6.16.21-0.8-default? I'm sure I didn't do a netevent > > backport that. > > That's SLES10 actually. > > > Failed: > > Build failed on ia64 with linux-2.6.16.21-0.8-default > > Log: > > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ > > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: > > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ > > make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 > > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 > > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 > > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 > > make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' > > make: *** [kernel] Error 2 > > From swise at opengridcomputing.com Sun Feb 4 08:14:33 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Sun, 04 Feb 2007 10:14:33 -0600 Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus In-Reply-To: <20070204130718.GC14288@mellanox.co.il> References: <1170430869.26115.12.camel@stevo-desktop> <20070204130718.GC14288@mellanox.co.il> Message-ID: <1170605673.4129.43.camel@linux-q667.site> Michael, You've setup a cross-compile environment on staging.openfabrics.org, eh? How can I utilize that to resolve this issue? Or is someone else handling it? Steve. On Sun, 2007-02-04 at 15:07 +0200, Michael S. Tsirkin wrote: > > Quoting Steve Wise : > > Subject: Re: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus > > > > On Fri, 2007-02-02 at 02:20 -0800, vlad at lists.openfabrics.org wrote: > > > This email was generated automatically, please do not reply > > > > Which distro is 2.6.16.21-0.8-default? I'm sure I didn't do a netevent > > backport that. > > That's SLES10 actually. > > > Failed: > > Build failed on ia64 with linux-2.6.16.21-0.8-default > > Log: > > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ > > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: > > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ > > make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 > > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 > > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 > > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 > > make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' > > make: *** [kernel] Error 2 > > From vlad at mellanox.co.il Sun Feb 4 08:54:30 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 04 Feb 2007 18:54:30 +0200 Subject: [openib-general] [PATCH] ofed_1_2 Chelsio ethernet driver updates. In-Reply-To: <1170360543.16637.45.camel@stevo-desktop> References: <1170360543.16637.45.camel@stevo-desktop> Message-ID: <1170608070.5887.15.camel@vladsk-laptop> On Thu, 2007-02-01 at 14:09 -0600, Steve Wise wrote: > From: Steve Wise > > This patch updates the ofed_1_2 cxgb3 module to the latest queued > for 2.6.21. > > Signed-off-by: Steve Wise > --- > > drivers/net/cxgb3/firmware_exports.h | 2 +- > drivers/net/cxgb3/sge.c | 21 +++++++++------------ > drivers/net/cxgb3/t3_cpl.h | 3 --- > 3 files changed, 10 insertions(+), 16 deletions(-) Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From mst at mellanox.co.il Sun Feb 4 09:58:33 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 19:58:33 +0200 Subject: [openib-general] idea for ofed 1 2 kernel file structure Message-ID: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> Hi! I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: It is hard to see changes that are specific to OFED since we have whole kernel history mixed in. It would easy to split OFED specific files In separate directory and have OFED scripts combine that with upstream kernel. All out of tree modules we distribute would go there too. What do others think about this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Sun Feb 4 10:19:20 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Sun, 04 Feb 2007 12:19:20 -0600 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> Message-ID: <1170613160.4129.110.camel@linux-q667.site> On Sun, 2007-02-04 at 19:58 +0200, Michael S. Tsirkin wrote: > Hi! > > I looked a current ofed 1.2 kernel tree and there is 1 thing I > dislike: > > It is hard to see changes that are specific to OFED since we have > whole kernel history mixed in. > > > > It would easy to split OFED specific files In separate directory and > have OFED scripts combine that with upstream kernel. > > > > All out of tree modules we distribute would go there too. > > What do others think about this? > > I'm not exactly clear what you mean. Could you expand a little on your idea? From mst at mellanox.co.il Sun Feb 4 10:27:59 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 20:27:59 +0200 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <1170613160.4129.110.camel@linux-q667.site> References: <1170613160.4129.110.camel@linux-q667.site> Message-ID: <20070204182759.GA28729@mellanox.co.il> > Quoting Steve WIse : > Subject: Re: idea for ofed 1 2 kernel file structure > > On Sun, 2007-02-04 at 19:58 +0200, Michael S. Tsirkin wrote: > > Hi! > > > > I looked a current ofed 1.2 kernel tree and there is 1 thing I > > dislike: > > > > It is hard to see changes that are specific to OFED since we have > > whole kernel history mixed in. > > > > > > > > It would easy to split OFED specific files In separate directory and > > have OFED scripts combine that with upstream kernel. > > > > > > > > All out of tree modules we distribute would go there too. > > > > What do others think about this? > > > > > > I'm not exactly clear what you mean. Could you expand a little on your > idea? Well, OFED kernel tree is currently kernel.org files + OFED files. We could have OFED files in a separate tree and build script would put them together. -- MST From swise at opengridcomputing.com Sun Feb 4 11:43:27 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Sun, 04 Feb 2007 13:43:27 -0600 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <20070204182759.GA28729@mellanox.co.il> References: <1170613160.4129.110.camel@linux-q667.site> <20070204182759.GA28729@mellanox.co.il> Message-ID: <1170618207.4129.118.camel@linux-q667.site> > > Well, OFED kernel tree is currently kernel.org files + OFED files. > We could have OFED files in a separate tree and build script > would put them together. > So the ofed_1_2 tree would become just new drivers/ulps that are not in the kernel its based on (2.6.20), kernel_patches/, kernel_addons/, and ofed_scripts/. Right? I think that's a reasonable approach, and it keeps the kernel tree clean and makes it very clear which files are added to any given kernel release that ofed bases on. my 2 cents. Steve. From mst at mellanox.co.il Sun Feb 4 12:58:55 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Feb 2007 22:58:55 +0200 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <1170618207.4129.118.camel@linux-q667.site> References: <1170613160.4129.110.camel@linux-q667.site> <20070204182759.GA28729@mellanox.co.il> <1170618207.4129.118.camel@linux-q667.site> Message-ID: <20070204205855.GF29029@mellanox.co.il> > Quoting Steve WIse : > Subject: Re: idea for ofed 1 2 kernel file structure > > > > > > Well, OFED kernel tree is currently kernel.org files + OFED files. > > We could have OFED files in a separate tree and build script > > would put them together. > > > > So the ofed_1_2 tree would become just new drivers/ulps that are not in > the kernel its based on (2.6.20), kernel_patches/, kernel_addons/, and > ofed_scripts/. Right? Yes. > I think that's a reasonable approach, and it keeps the kernel tree clean > and makes it very clear which files are added to any given kernel > release that ofed bases on. On the other hand, we are at feature freeze, so this is only acceptable only if this can be done with only minor changes in Vlad's build scripts. So I'll check with him. -- MST From dotanb at dev.mellanox.co.il Mon Feb 5 01:31:34 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 05 Feb 2007 11:31:34 +0200 Subject: [openib-general] IB/mthca: question about HCA profile module parameters In-Reply-To: <45C5F44A.9020802@voltaire.com> References: <45C1C3D5.1050301@dev.mellanox.co.il> <45C5F44A.9020802@voltaire.com> Message-ID: <45C6F976.3000802@dev.mellanox.co.il> Hi Mini and thanks for the quick response. Moni Shoua wrote: > OK. So I ran more tests on my setup which now include > - Dual x86_64 processor (Intel Xeon) > - 1GB RAM > - 25204 HCA - fw_ver=1.1.0 > > In the range of 16K - to 256K of value for num_qp I got no errors. > For lower and higher values I got errors from INIT_HCA and (not always and just for very low values) a machine hung. > Do you have the Oops saved somewhere? Can you put it here please? > > Sorry but i don't have a dump of the kernel oops but i have a strong belief that we saw the same kernel oops ... If it is needed, i will try to reproduce it one more time. > >> Did you verify the HCA profile module parameter feature? >> > As I mentioned earlier, I verified that non default values can be assigned > and that the HCA works for some selected values. > I also noticed that illegal cause the driver to throw a message to the kernel log. > However, I didn't test the exact behaviout of all possible values for each profile variable. > I guess that this is something that need to be done. i will add this to our regression in the future .... >> Is there is any known limitation for the values that should be used? >> (for example: only values which are power of two) >> >> >> > I guess that it is clear that there are hardware limitations that don't allow setting of any value. > Unfotunately, even after looking for them in the PRM, I couldn't figure out which are they. > The software limits the value to be a power of 2 and corrects the users if they try to set a wrong value (to the nearest power of 2). In that case a warning message is thrown to the kernel log. > As much as i know, the minimum amount of any resource (for example, QPs) are the number of resources that the HCA report as reserved. I will open a bug in the Bugzilla, so we will know that there are problems in this feature. thanks Dotan From vlad at dev.mellanox.co.il Mon Feb 5 01:50:47 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 05 Feb 2007 11:50:47 +0200 Subject: [openib-general] MVAPICH2 SRPM and install file patches In-Reply-To: <45C14344.9010602@cse.ohio-state.edu> References: <45C14344.9010602@cse.ohio-state.edu> Message-ID: <1170669047.6049.4.camel@vladsk-laptop> On Wed, 2007-01-31 at 20:32 -0500, Shaun Rowland wrote: > I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2, > and it is linked to here: > > http://www.openfabrics.org/~rowland/ofed_1_2/ > Hi Shaun, Please change mvapich2.spec to avoid using of %build macro. It removes RPM_BUILD_ROOT on SuSE distros: Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.9418 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + /bin/rm -rf /var/tmp/OFED ++ dirname /var/tmp/OFED + /bin/mkdir -p /var/tmp + /bin/mkdir /var/tmp/OFED + cd mvapich2-0.9.8 + export OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed + OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed -- Vladimir Sokolovsky Mellanox Technologies Ltd. From rdreier at cisco.com Mon Feb 5 02:15:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 05 Feb 2007 02:15:25 -0800 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This is my first merge for 2.6.21: Hoang-Nam Nguyen (2): IB/ehca: Remove use of do_mmap() IB/ehca: Remove obsolete prototypes Ishai Rabinovitz (1): IB/srp: Don't wait for response when QP is in error state. Jason Gunthorpe (1): IB: Make sure struct ib_user_mad.data is aligned Michael S. Tsirkin (2): IB: Include explicitly in IB: Return qp pointer as part of ib_wc Steve Wise (1): RDMA/addr: Handle ethernet neighbour updates during route resolution drivers/infiniband/core/addr.c | 3 +- drivers/infiniband/core/mad.c | 11 +- drivers/infiniband/core/uverbs_cmd.c | 2 +- drivers/infiniband/hw/amso1100/c2_cq.c | 2 +- drivers/infiniband/hw/ehca/ehca_classes.h | 29 +-- drivers/infiniband/hw/ehca/ehca_cq.c | 65 ++---- drivers/infiniband/hw/ehca/ehca_iverbs.h | 8 - drivers/infiniband/hw/ehca/ehca_main.c | 6 +- drivers/infiniband/hw/ehca/ehca_qp.c | 78 +----- drivers/infiniband/hw/ehca/ehca_reqs.c | 2 +- drivers/infiniband/hw/ehca/ehca_uverbs.c | 395 ++++++++++++----------------- drivers/infiniband/hw/ipath/ipath_qp.c | 2 +- drivers/infiniband/hw/ipath/ipath_rc.c | 8 +- drivers/infiniband/hw/ipath/ipath_ruc.c | 8 +- drivers/infiniband/hw/ipath/ipath_uc.c | 4 +- drivers/infiniband/hw/ipath/ipath_ud.c | 8 +- drivers/infiniband/hw/mthca/mthca_cmd.c | 2 +- drivers/infiniband/hw/mthca/mthca_cq.c | 2 +- drivers/infiniband/ulp/srp/ib_srp.c | 7 + drivers/infiniband/ulp/srp/ib_srp.h | 1 + include/rdma/ib_user_mad.h | 2 +- include/rdma/ib_verbs.h | 3 +- 22 files changed, 243 insertions(+), 405 deletions(-) From vlad at lists.openfabrics.org Mon Feb 5 02:22:18 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Mon, 5 Feb 2007 02:22:18 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070205-0200 daily build status Message-ID: <20070205102221.765A7E607FE@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From hello001 at emirates.net.ae Mon Feb 5 15:04:56 2007 From: hello001 at emirates.net.ae (International IP - Dubai (WorldWide Trademarks Attorneys)) Date: Mon, 05 Feb 2007 15:04:56 -0800 Subject: [openib-general] Our ref. 702/a5tms/12 Message-ID: <0a0e01c7497a$1b938940$0201a8c0@YASSER4> February5th, 2007 Our ref. 702/a5tms/12 Kind Attn. of General Manager ESQ, CC. Kind Attn. of Marketing Manager ESQ. Dear Sir, Good Afternoon.... As a leading company specializing in the registration of trademarks/ logos and Commercial Agencies in United Arab Emirates & WorldWide, we would like to express our sincere desire to be at your service concerning the same in both of UAE and worldwide. For setting up your company branch in Dubai, It's our most pleasure to assist you in this regard. Awaiting your kind inquiries, instructions, suggestions, we always remain. Warm regards, Sincerely, For International IP - Dubai (WorldWide Trademarks Attorneys) Main Branch - Dubai P.O. Box:64246, Dubai, United Arab Emirates Tel. #+ 971-4-2977-930 Fax. #+ 971-4-2977-776 Cellular # +971-50-2519-528 E-mail: hello001 at emirates.net.ae Rashid Khalfan Bin Sabt General Manager -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Clear Day Bkgrd.JPG Type: image/jpeg Size: 5675 bytes Desc: not available URL: From bugzilla-daemon at lists.openfabrics.org Mon Feb 5 03:17:05 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 5 Feb 2007 03:17:05 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070205111705.7372CE607FE@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #16 from dmitry.yulov at intel.com 2007-02-05 03:17 ------- > I don't agree with your patch. It assumes that SLES 10 may be corrupted. OFED > should not try to support this. If you want to use this patch for your own > purposes, just apply it (manually) before running OFED build scripts. OFED's > backport patches mechanism is not suitable for such patches. I don't agree with you because my patch do not any changes in system files. It only search version of SUSE, but if you think that OFED should not try to support this I think that many Intel people who will install OFED on SLES10 platform will be unhappy. Thanks a lot for you help. -- Dmitry. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at dev.mellanox.co.il Mon Feb 5 03:44:23 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 05 Feb 2007 13:44:23 +0200 Subject: [openib-general] MVAPICH2 rpmbuild issue In-Reply-To: <45C14344.9010602@cse.ohio-state.edu> References: <45C14344.9010602@cse.ohio-state.edu> Message-ID: <1170675863.6049.11.camel@vladsk-laptop> Hi Shaun, Please check the following issue: Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.84872 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + cd mvapich2-0.9.8 + export OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed + OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed + '[' -d /var/tmp/OFED/usr/local/ofed/lib ']' + '[' -d /var/tmp/OFED/usr/local/ofed/lib64 ']' + export PREFIX=/var/tmp/OFED/usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1 + PREFIX=/var/tmp/OFED/usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1 + export CC=gcc CXX=g++ F77=gfortran + CC=gcc + CXX=g++ + F77=gfortran + export ROMIO=yes + ROMIO=yes + export SHARED_LIBS=yes + SHARED_LIBS=yes + ./make.mvapich2.gen2 Could not find the OPEN_IB_HOME/lib64 or OPEN_IB_HOME/lib directory. Exiting. error: Bad exit status from /var/tmp/rpm-tmp.84872 (%install) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.84872 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_name mvapich2_gcc' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1' --define 'build_root /var/tmp/OFED' --define 'open_ib_home /usr/local/ofed' --define 'ofed_build_root /var/tmp/OFED' --define 'comp_env CC=gcc CXX=g++ F77=gfortran' --define 'iwarp 0' --define 'romio 1' --define 'shared_libs 1' --define 'auto_req 1' /mswg2/work/vlad/ofed/test/OFED-1.2-alpha1/SRPMS/mvapich2-0.9.8-1.src.rpm" -- Vladimir Sokolovsky Mellanox Technologies Ltd. From bugzilla-daemon at lists.openfabrics.org Mon Feb 5 03:52:57 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 5 Feb 2007 03:52:57 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070205115257.F1917E607FE@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 erezz at voltaire.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vlad at mellanox.co.il ------- Comment #17 from erezz at voltaire.com 2007-02-05 03:52 ------- (In reply to comment #16) > > I don't agree with your patch. It assumes that SLES 10 may be corrupted. OFED > > should not try to support this. If you want to use this patch for your own > > purposes, just apply it (manually) before running OFED build scripts. OFED's > > backport patches mechanism is not suitable for such patches. > > I don't agree with you because my patch do not any changes in system files. It > only search version of SUSE, but if you think that OFED should not try to > support this I think that many Intel people who will install OFED on SLES10 > platform will be unhappy. Thanks a lot for you help. > > -- Dmitry. > Note that /etc/issue belongs to a SLES package: rpm thyme:~ # rpm -qf /etc/issue sles-release-10-15.2 Deleting it means that you corrupt your system. One can also delete /etc/SuSE-release and expect that OFED will work. If you decide to delete /etc/issue (or any other file that comes with SLES 10), you'll need to change OFED scripts for your special needs. Anyway, I maintain iSER in OFED. You may want to ask Vlad (vlad at mellanox.co.il) what he thinks about it. He maintains OFED's build scripts. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Mon Feb 5 04:02:29 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 5 Feb 2007 04:02:29 -0800 (PST) Subject: [openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa In-Reply-To: Message-ID: <20070205120229.2A624E607FE@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=334 ------- Comment #18 from dmitry.yulov at intel.com 2007-02-05 04:02 ------- > Note that /etc/issue belongs to a SLES package: > rpm thyme:~ # rpm -qf /etc/issue > sles-release-10-15.2 > Deleting it means that you corrupt your system. One can also delete > /etc/SuSE-release and expect that OFED will work. If you decide to delete > /etc/issue (or any other file that comes with SLES 10), you'll need to change > OFED scripts for your special needs. Anyway, I maintain iSER in OFED. You may > want to ask Vlad (vlad at mellanox.co.il) what he thinks about it. He maintains > OFED's build scripts. Thank you. I do not delete /etc/issue file. I have had it file, but it contain next information: : cat /etc/issue ************************************************ Use of this system by unauthorized persons or in an unauthorized manner is strictly prohibited ************************************************ That is all. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tziporet at mellanox.co.il Mon Feb 5 04:04:29 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 05 Feb 2007 14:04:29 +0200 Subject: [openib-general] QoS in opensm will not be part of OFED 1.2 Message-ID: <45C71D4D.4060503@mellanox.co.il> Hi Hal, I had an AI to check the QoS status with OSM. Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 (I updated the plan on the Wiki) The reasons for this are: 1. Code not ready at code freeze. 2. There are technical discussion in the list regarding some implementation details (e.g. XML or text syntax). 3. SPEC is not published by IBTA yet. Hal & Yevgeny - please work on a plan that will enable QoS to be merged on the main trunk once its ready. Tziporet From kliteyn at dev.mellanox.co.il Mon Feb 5 04:37:41 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 05 Feb 2007 14:37:41 +0200 Subject: [openib-general] OSM QoS policy file Message-ID: <45C72515.8090100@dev.mellanox.co.il> Hi Hal. I added osm/doc/qos-policy.txt file with the description of the QoS policy file, and an example of such file (with more comments inside). I'm sure you'll have questions and corrections regarding this file, so for now, to make our work easier, I'm not sending it as patch, but just as text. Please review the file. Thanks -- Yevgeny ============================================================= QoS Policy File =============== The QoS policy file is divided into 4 sub sections: - Port Group: a set of CAs, Routers or Switches that share the same settings. A port group might be a partition defined by the partition manager policy in terms of GUIDs. Future implementations might provide support for NodeDescription based definition of port groups. - Fabric Setup: Defines how the SL2VL and VLArb tables should be setup. This policy definition assumes the computation of target behavior should be performed outside of OpenSM. - QoS-Levels Definition: This section defines the possible sets of parameters for QoS that a client might be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Packet Lifiteme and QoS Class. - Matching Rules: A list of rules that match an incoming PathRecord request to a QoS-Level. The rules are processed in order such as the first match is applied. Each rule is built out of set of match expressions which should all match for the rule to apply. The matching expressions are defined for the following fields: - SRC and DST to lists of port groups - Service-ID to a list of Service-ID or Service-ID ranges - QoS Class to a list of QoS Class values or ranges Example of the QoS policy file ============================== Storage our SRP storage targets 0x1000000000000001 0x1000000000000002 Virtual Servers node desc and IB port # vs1/HCA-1/P1 vs3/HCA-1/P1 vs3/HCA-2/P1 Partition 1 default settings Part1 Routers all routers ROUTER Part1 * * 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 Storage1 Storage2 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0 Storage Storage 0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1 8:255,9:127,10:63,11:31,12:15,13:7,14:3 10 1 for the lowest priority comm 16 123 16 2 low latency best bandwidth 0 7 3 just an example 0 32 1 1 12 1 low latency by class 7-9 or 11 7-9,11 2 Storage targets connection> Storage 22,4719-5000 3 bla bla Storage Explanation of some fields ========================== Most of the tags meaning is either intuitive or explained by the comments along the file. One section that deserves a special explanation is SL2VL tables definition - . In general, VL is a function of in-port (the port that the packet has entered through), out-port (the port that the packet is supposed to come out from) and the SL. In OpenSM, SL2VL table is defined on every port, where this port is an out-port. Hence, on every port, SL2VL table is defined as function of in-port and SL. n,m This means that of all the ports of the specified port group, define SL2VL tables where to-ports are ports number n and m. Since SL2VL table is defined per out-port, using effectively means defining SL2VL table on ports n and m. In order to specify that SL2VL table should be defined on all the ports, an asterisk (*) can be used. i,j This means that of all the ports of the specified port group that were not filtered out by the value, define SL2VL table only for entries where from-ports are ports number i and j. In order to specify that SL2VL table should be defined for all the in-ports, an asterisk (*) can be used. To specify that all the SL2VL tables entries should be defined for all the ports of a certain group, use the following: port_group * * PortGroupName This is combination of keyword (that can be found in VLArb tables definition) and keyword. PortGroupName means that the ports that we're talking about are all the ports that are connected to ports that belong to PortGroupName. Essintially, PortGroupName means the folowing: list_of_all_the_ports_that_are_connected_to_group_PortGroupName Example of usage of : A user has a set of 'special' nodes (e.g. storage nodes), and all the traffic to these nodes has to get specific VL. The solution is to define port group (i.e "Storage") that will include all the ports of these nodes, and then to configure SL2VL tables on all the switch ports that are connected to the Storage port group by specifying Storage PortGroupName Similar to , is combination of and keywords. From rdreier at cisco.com Mon Feb 5 06:20:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 05 Feb 2007 06:20:25 -0800 Subject: [openib-general] idea for ofed 1 2 kernel file structure References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> Message-ID: > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: > It is hard to see changes that are specific to OFED since we have whole > kernel history mixed in. I'm not sure how you have your branches set up, but if you have something like a "linus" branch that tracks the upstream kernel, it's easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband" and see the differences that way. Using git that way (which is what it's designed for, after all) seems better than some scripts to munge together two trees. - R. From halr at voltaire.com Mon Feb 5 06:00:50 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Feb 2007 09:00:50 -0500 Subject: [openib-general] QoS in opensm will not be part of OFED 1.2 In-Reply-To: <45C71D4D.4060503@mellanox.co.il> References: <45C71D4D.4060503@mellanox.co.il> Message-ID: <1170684049.4525.195527.camel@hal.voltaire.com> Hi Tziporet, On Mon, 2007-02-05 at 07:04, Tziporet Koren wrote: > Hi Hal, > > I had an AI to check the QoS status with OSM. > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 > (I updated the plan on the Wiki) > > The reasons for this are: > 1. Code not ready at code freeze. > 2. There are technical discussion in the list regarding some > implementation details (e.g. XML or text syntax). > 3. SPEC is not published by IBTA yet. I think this last reason also applies to the end client QoS changes as well. -- Hal > Hal & Yevgeny - please work on a plan that will enable QoS to be merged > on the main trunk once its ready. > Tziporet > > > From xma at us.ibm.com Mon Feb 5 06:50:55 2007 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 5 Feb 2007 07:50:55 -0700 Subject: [openib-general] [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Hi, Roland, Please review this patch. According to IPoIB RFC4391 section 5, once IPoIB broacast group has been joined, the interface should be ready for data transfer. In current IPoIB implementation, the interface is UP and RUNNING when all default multicast join successful. We hit a problem while the broadcast join finishe and sucessful but the all hosts multicast join failure. Here is the patch, if possible please give your input asap, we have an urgent customer issue need to be resolved: diff -urpN ipoib/ipoib_multicast.c ipoib-multicast/ipoib_multicast.c --- ipoib/ipoib_multicast.c 2006-11-29 13:57:37.000000000 -0800 +++ ipoib-multicast/ipoib_multicast.c 2007-02-04 22:34:16.000000000 -0800 @@ -402,6 +402,11 @@ static void ipoib_mcast_join_complete(in queue_work(ipoib_workqueue, &priv->mcast_task); mutex_unlock(&mcast_mutex); complete(&mcast->done); + /* + * broadcast join finished, enable carrier + */ + if (mcast == priv->broadcast) + netif_carrier_on(dev); return; } @@ -599,7 +604,6 @@ void ipoib_mcast_join_task(void *dev_ptr ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); clear_bit(IPOIB_MCAST_RUN, &priv->flags); - netif_carrier_on(dev); } int ipoib_mcast_start_thread(struct net_device *dev) (See attached file: ipoib-multicast.patch) Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ipoib-multicast.patch Type: application/octet-stream Size: 777 bytes Desc: not available URL: From michael.arndt at informatik.tu-chemnitz.de Mon Feb 5 07:18:24 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Mon, 5 Feb 2007 16:18:24 +0100 Subject: [openib-general] Unknown SMP Recv Message-ID: <000901c74938$e10b2a30$21606d86@one7> Hi, I have change the driver (smi) a little and have written a tool like a router or a bridge. It receives directed route smp's on one port and sends it to another port. I use 3 nodes (sender on node 1, the router on node 2, normal node on 3) and send a subnGet SMP with [0][1][1] as initial path. And it works fine, but on way back the router also receives a second subnGetResp packet with no data. The header is almost the same as the real subnGetResp packet, just the DrSLID,DrDLID, initial path, return path are 0. Are there any ideas where this packet come from? Ack? Thanks Michael From mst at mellanox.co.il Mon Feb 5 07:25:08 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Feb 2007 17:25:08 +0200 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: References: Message-ID: <20070205152507.GA4246@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure > > > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: > > It is hard to see changes that are specific to OFED since we have whole > > kernel history mixed in. > > I'm not sure how you have your branches set up, but if you have > something like a "linus" branch that tracks the upstream kernel, it's > easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband" > and see the differences that way. limit to drivers/infiniband is no longer sufficient as we have components under drivers/net etc. Another problem is that history-rewriting tools such as git rebase seem to easily get confused by the complicated linux history. > Using git that way (which is what it's designed for, after all) seems > better than some scripts to munge together two trees. Problem is, OFED kernel code actually consists of 2 parts: upstream kernel developed separately at lkml and out of kernel components, developed separately. OFED does not really track linux all the time: we only update at -RC time. Mixing such 2 projects together does not seem to be what git was designed for. For example, when a patch is applied upstream we need to remove it from fixes. So after I do git pull from upstream I get a broken tree that won't even build. Not good. Another problem I'm trying to address is the confusion around what gets applied as patch and what directly. This way, a bad patch won't even apply. -- MST From halr at voltaire.com Mon Feb 5 07:34:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Feb 2007 10:34:15 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <000901c74938$e10b2a30$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> Message-ID: <1170689654.4525.201415.camel@hal.voltaire.com> On Mon, 2007-02-05 at 10:18, Michael Arndt wrote: > Hi, > > I have change the driver (smi) a little and have written a tool like a > router or a bridge. It receives directed route smp's on one port and sends > it to another port. I use 3 nodes (sender on node 1, the router on node 2, > normal node on 3) and send a subnGet SMP with [0][1][1] as initial path. And > it works fine, but on way back the router also receives a second subnGetResp > packet with no data. The header is almost the same as the real subnGetResp > packet, just the DrSLID,DrDLID, initial path, return path are 0. Are there > any ideas where this packet come from? Ack? A router should not allow a SMP to cross a subnet boundary. SMPs are restricted to the local subnet. -- Hal > Thanks Michael > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Mon Feb 5 07:38:26 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Feb 2007 17:38:26 +0200 Subject: [openib-general] QoS in opensm will not be part of OFED 1.2 In-Reply-To: <1170684049.4525.195527.camel@hal.voltaire.com> References: <45C71D4D.4060503@mellanox.co.il> <1170684049.4525.195527.camel@hal.voltaire.com> Message-ID: <20070205153826.GB4246@mellanox.co.il> > > I had an AI to check the QoS status with OSM. > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 > > (I updated the plan on the Wiki) > > > > The reasons for this are: > > 1. Code not ready at code freeze. > > 2. There are technical discussion in the list regarding some > > implementation details (e.g. XML or text syntax). > > 3. SPEC is not published by IBTA yet. > > I think this last reason also applies to the end client QoS changes as > well. Yes. But the other 2 don't. -- MST From changquing.tang at hp.com Mon Feb 5 07:48:29 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 5 Feb 2007 15:48:29 -0000 Subject: [openib-general] Immediate data question In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> Message-ID: <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> Roland: If I only want to send/recv 4 bytes with immediate data: On sender side: opcode = IBV_WR_SEND_WITH_IMM; imm_data = my_4_bytes_data; Do I still need to specify sg_list and num_sge ? On receiver side, because the immediate data is inside the completion structure, do I need to post a receive for above message ? If I need to post a receive, do I need to specify sg_list and num_sge for the receive ? I looked the spec but did not find useful information. The reason I ask is that at some point, I can not(or hard) to provide registered memory only for 4 bytes data. Thank you. --CQ > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier > Sent: Monday, February 05, 2007 8:20 AM > To: Michael S. Tsirkin > Cc: openib-general at openib.org > Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure > > > I looked a current ofed 1.2 kernel tree and there is 1 > thing I dislike: > > It is hard to see changes that are specific to OFED since > we have whole > kernel history mixed in. > > I'm not sure how you have your branches set up, but if you > have something like a "linus" branch that tracks the upstream > kernel, it's easy to do stuff like "git log linus.." or "git > diff linus.. drivers/infiniband" > and see the differences that way. > > Using git that way (which is what it's designed for, after > all) seems better than some scripts to munge together two trees. > > - R. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Mon Feb 5 07:49:22 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Feb 2007 17:49:22 +0200 Subject: [openib-general] QoS in opensm will not be part of OFED 1.2 In-Reply-To: <1170690105.4525.201879.camel@hal.voltaire.com> References: <1170690105.4525.201879.camel@hal.voltaire.com> Message-ID: <20070205154922.GC4246@mellanox.co.il> > > > > I had an AI to check the QoS status with OSM. > > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 > > > > (I updated the plan on the Wiki) > > > > > > > > The reasons for this are: > > > > 1. Code not ready at code freeze. > > > > 2. There are technical discussion in the list regarding some > > > > implementation details (e.g. XML or text syntax). > > > > 3. SPEC is not published by IBTA yet. > > > > > > I think this last reason also applies to the end client QoS changes as > > > well. > > > > Yes. But the other 2 don't. > > Right but I think that precludes it from being included in OFED right > now. Since the code is already included in OFED, moving it out would violate the feature freeze rules, unless there's an actual bug this would fix. -- MST From halr at voltaire.com Mon Feb 5 07:41:48 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Feb 2007 10:41:48 -0500 Subject: [openib-general] QoS in opensm will not be part of OFED 1.2 In-Reply-To: <20070205153826.GB4246@mellanox.co.il> References: <45C71D4D.4060503@mellanox.co.il> <1170684049.4525.195527.camel@hal.voltaire.com> <20070205153826.GB4246@mellanox.co.il> Message-ID: <1170690105.4525.201879.camel@hal.voltaire.com> On Mon, 2007-02-05 at 10:38, Michael S. Tsirkin wrote: > > > I had an AI to check the QoS status with OSM. > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 > > > (I updated the plan on the Wiki) > > > > > > The reasons for this are: > > > 1. Code not ready at code freeze. > > > 2. There are technical discussion in the list regarding some > > > implementation details (e.g. XML or text syntax). > > > 3. SPEC is not published by IBTA yet. > > > > I think this last reason also applies to the end client QoS changes as > > well. > > Yes. But the other 2 don't. Right but I think that precludes it from being included in OFED right now. -- Hal From guyg at Voltaire.COM Mon Feb 5 08:43:14 2007 From: guyg at Voltaire.COM (guyg) Date: Mon, 05 Feb 2007 18:43:14 +0200 Subject: [openib-general] [libmthca] deadlock while trying to destroy QP Message-ID: <45C75EA2.6000905@Voltaire.COM> Hi Roland, I am running a proprietary test over ofed1.1 (userspace). I have one context where I poll my cq and another (signal handler context) where I try to destroy my QP. It looks like mthca_destroy_qp is trying to take a lock that mthca_poll_cq is holding. The deadlock is occurring at the end of the test run where there are no more completions, hence deadlocking and the test never exists. Here is a core dump: #0 0x0000003a6ce09172 in pthread_spin_lock () from /lib64/tls/libpthread.so.0 #1 0x0000002a959cf449 in mthca_cq_clean (cq=0x607240, qpn=3277830, srq=0x0) at src/cq.c:554 #2 0x0000002a959d28b9 in mthca_destroy_qp (qp=0x607400) at src/mthca.h:246 #3 0x000000000040117b in client_sig_handler () #4 #5 0x0000003a6ce09165 in pthread_spin_lock () from /lib64/tls/libpthread.so.0 #6 0x0000002a959cec91 in mthca_poll_cq (ibcq=0x607240, ne=1, wc=0x7fbffff590) at src/cq.c:467 #7 0x0000002a9557bf73 in ibv_poll_cq (cq=0x607240, num_entries=1, wc=0x7fbffff590) at /usr/local/ofed/include/infiniband/verbs.h:824 Does destroy_qp needs to be dependent on the CQ? Do you have any suggestions? Thanks, Guy From michael.arndt at informatik.tu-chemnitz.de Mon Feb 5 08:56:58 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Mon, 5 Feb 2007 17:56:58 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> Message-ID: <001401c74946$a664a2e0$21606d86@one7> Hi, > A router should not allow a SMP to cross a subnet boundary. SMPs are > restricted to the local subnet. I work on a discovering mechanism for switchless InfiniBand Architectures like Rings, Tori or maybe Hyper-Cubes. There is just one single subnet, no switches or routers. Please ignore the background and focus to the problem about the second packet. Maybe you have some ideas even you are not involved in the hole project. That would be nice. Thanks Michael From mshefty at ichips.intel.com Mon Feb 5 09:07:34 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 05 Feb 2007 09:07:34 -0800 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <20070204131500.GE14288@mellanox.co.il> References: <1170355331.16637.25.camel@stevo-desktop> <20070204131500.GE14288@mellanox.co.il> Message-ID: <45C76456.6090804@ichips.intel.com> >>The name is "ib_mcast_wq" which is too long for older kernels. >> >>Did we loose a backport patch? > > > Not sure what happened here. > Sean, could you rename ib_mcast_wq to ib_mcast please? I renamed the workqueue for what I requested to pull upstream, and I added a patch to my pull request to rename a couple of other workqueues. Didn't you already apply a rename patch to the ofed code? - Sean From halr at voltaire.com Mon Feb 5 09:13:12 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Feb 2007 12:13:12 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <001401c74946$a664a2e0$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> Message-ID: <1170695591.4525.207604.camel@hal.voltaire.com> On Mon, 2007-02-05 at 11:56, Michael Arndt wrote: > Hi, > > > A router should not allow a SMP to cross a subnet boundary. SMPs are > > restricted to the local subnet. > > I work on a discovering mechanism for switchless InfiniBand Architectures > like Rings, Tori or maybe Hyper-Cubes. There is just one single subnet, no > switches or routers. Please ignore the background and focus to the problem > about the second packet. Maybe you have some ideas even you are not involved > in the hole project. That would be nice. Guess you don't mean IB router when you say router in your description. I also have no theories without more information: Is the sender a normal node ? Is normal node mean standard OpenIB without changes ? How was the SMI changed ? On which nodes ? Only the intermediate one ? Aside from the initial path being [0][1][1], what are the hop count and hop pointer ? What are DrDLID and DrSLID as well as the LIDs in the LRH of the SMP ? -- Hal > Thanks Michael From swise at opengridcomputing.com Mon Feb 5 09:19:21 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Feb 2007 11:19:21 -0600 Subject: [openib-general] cxgb3.git tree merged to 2.6.20 Message-ID: <1170695961.16661.26.camel@stevo-desktop> All, I've updated my tree git://staging.openfabrics.org/~swise/cxgb3.git to linux-2.6.20. Branches: cxgb3 - my development branch with commits that were used to review the rdma driver (large patch series) + the T3 Ethernet driver. for-roland - branch where roland can pull the latest rdma driver (the same code that is in OFED 1.2) for-ofed_1_2 - branch used to deliver the original ethernet and rdma driver code to the ofed_1_2 tree. It is up to date with the ofed_1_2 tree wrt the drivers. Steve. From suri at baymicrosystems.com Mon Feb 5 09:31:02 2007 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Mon, 5 Feb 2007 12:31:02 -0500 Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation In-Reply-To: <1170072757.4555.242192.camel@hal.voltaire.com> References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com> <1170072757.4555.242192.camel@hal.voltaire.com> Message-ID: <039701c7494b$6bd5d860$1914a8c0@surioffice> Hal: We are upgrading to 2.6.19.1 kernel and I finally ported the changes required for Switch operation from my current kernel (2.6.12) version. I have tested these changes for a switch with different SM(s). But I need the community's help to test the changes on different HCAs to make sure I have not broken anything. Please see if the changes look OK. Thanks, Suri -------------- next part -------------- A non-text attachment was scrubbed... Name: smi.c.ptch Type: application/octet-stream Size: 1257 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: agent.c.ptch Type: application/octet-stream Size: 1079 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mad.c.ptch Type: application/octet-stream Size: 3501 bytes Desc: not available URL: From akepner at sgi.com Mon Feb 5 09:33:22 2007 From: akepner at sgi.com (akepner at sgi.com) Date: Mon, 5 Feb 2007 09:33:22 -0800 (PST) Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> Message-ID: On Sun, 4 Feb 2007, Michael S. Tsirkin wrote: > Hi! > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: > It is hard to see changes that are specific to OFED since we have whole > kernel history mixed in. I agree. > > It would easy to split OFED specific files In separate directory and > have OFED scripts combine that with upstream kernel. > > All out of tree modules we distribute would go there too. > What do others think about this? > I like that idea very much. -- Arthur From or.gerlitz at gmail.com Mon Feb 5 10:16:00 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 5 Feb 2007 20:16:00 +0200 Subject: [openib-general] Immediate data question In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> Message-ID: <15ddcffd0702051016x4587a6das87c4ef116296662b@mail.gmail.com> On 2/5/07, Tang, Changqing wrote: > On sender side: > opcode = IBV_WR_SEND_WITH_IMM; > imm_data = my_4_bytes_data; > Do I still need to specify sg_list and num_sge ? At the sender side i think you can do well with: opcode = IBV_WR_SEND send_flags |= IBV_SEND_INLINE sge.addr = pointer to the 4 bytes sge.len = 4 sge.lkey = don't care since the 4 bytes are --copied-- by the IB library from sge.addr during the execution of ibv_post_send(), the owenership of sge.addr is yours once the call returns. > On receiver side, because the immediate data is inside the completion > structure, do I need to post a receive for above message ? yes, i don't see how you can get a way from posting a receive WR > The reason I ask is that at some point, I can not(or hard) to provide > registered memory only for 4 bytes data. what about the mpi impl. header ??? do you have a case where only 4 bytes need to be passed to the other side? Or. From mst at mellanox.co.il Mon Feb 5 10:42:07 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Feb 2007 20:42:07 +0200 Subject: [openib-general] [PATCH] RE: regression in ofed 1.2 In-Reply-To: <45C76456.6090804@ichips.intel.com> References: <45C76456.6090804@ichips.intel.com> Message-ID: <20070205184207.GB15775@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [openib-general] [PATCH] RE: regression in ofed 1.2 > > >>The name is "ib_mcast_wq" which is too long for older kernels. > >> > >>Did we loose a backport patch? > > > > > > Not sure what happened here. > > Sean, could you rename ib_mcast_wq to ib_mcast please? > > I renamed the workqueue for what I requested to pull upstream, and I added a > patch to my pull request to rename a couple of other workqueues. > > Didn't you already apply a rename patch to the ofed code? You but I assumed it's in your branch so I threw it out when I took your latest code. -- MST From mst at mellanox.co.il Mon Feb 5 10:42:46 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Feb 2007 20:42:46 +0200 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: References: Message-ID: <20070205184246.GC15775@mellanox.co.il> > Quoting akepner at sgi.com : > Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure > > On Sun, 4 Feb 2007, Michael S. Tsirkin wrote: > > > Hi! > > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: > > It is hard to see changes that are specific to OFED since we have whole > > kernel history mixed in. > > I agree. > > > > > It would easy to split OFED specific files In separate directory and > > have OFED scripts combine that with upstream kernel. > > > > All out of tree modules we distribute would go there too. > > What do others think about this? > > > > I like that idea very much. Could you address Roland's proposal as well? -- Arthur -- MST From mst at mellanox.co.il Mon Feb 5 10:57:09 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Feb 2007 20:57:09 +0200 Subject: [openib-general] Fwd: bug in mthca_qp.c (GEN 2) Message-ID: <20070205185709.GB16598@mellanox.co.il> Roland, what do you think? Looks pretty severe actually. ----- Forwarded message from Jack Morgenstein ----- Subject: bug in mthca_qp.c (GEN 2) Date: Mon, 5 Feb 2007 12:44:11 +0200 From: Jack Morgenstein static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr, struct mthca_qp_path *path) { memset(ib_ah_attr, 0, sizeof *path); SHOULD BE: memset(ib_ah_attr, 0, sizeof *ib_ah_attr); ----- End forwarded message ----- -- MST From swise at opengridcomputing.com Mon Feb 5 11:43:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Feb 2007 13:43:43 -0600 Subject: [openib-general] [PATCH] ofed_1_2 - iw_cxgb3 - Add standard GPL header to tcb.h Message-ID: <1170704623.16661.54.camel@stevo-desktop> Add standard GPL header to tcb.h From: Steve Wise Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/tcb.h | 33 +++++++++++++++++++++++++++++++-- 1 files changed, 31 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/tcb.h b/drivers/infiniband/hw/cxgb3/tcb.h index f287a7c..c702dc1 100644 --- a/drivers/infiniband/hw/cxgb3/tcb.h +++ b/drivers/infiniband/hw/cxgb3/tcb.h @@ -1,5 +1,34 @@ -/* This file is automatically generated --- do not edit */ - +/* + * Copyright (c) 2007 Chelsio, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ #ifndef _TCB_DEFS_H #define _TCB_DEFS_H From mst at mellanox.co.il Mon Feb 5 12:12:23 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Feb 2007 22:12:23 +0200 Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support Message-ID: <20070205201223.GD16598@mellanox.co.il> The following patch adds experimental support for IPoIB connected mode. The idea is to increase performance by increasing the MTU from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD. With this code, I'm able to get 800MByte/sec or more with netperf without options on a Mellanox 4x back-to-back DDR system. Some notes on code: 1. SRQ is used for scalability to large cluster sizes 2. Only RC connections are used (UC does not support SRQ now) 3. Retry count is set to 0 since spec draft warns against retries 4. Each connection is used for data transfers in only 1 direction, so each connection is either active(TX) or passive (RX). 2 sides that want to communicate create 2 connections. 5. Each active (TX) connection has a separate CQ for send completions - this keeps the code simple without CQ resize and other tricks 6. To detect stale passive side connections (where the remote side is down), we keep an LRU list of passive connections (updated once per second per connection) and destroy a connection after it has been unused for several seconds. The LRU rule makes it possible to avoid scanning connections that have recently been active. Signed-off-by: Michael S. Tsirkin --- OK, I have addressed the comment from Pradeep Satyanarayana and added a small cosmetic improvement. This patch is hopefully the final version for review before I request upstream merge in a couple of days. This is the last call for comments before I submit it for upstream inclusion. Please review. Besides the 2 consmetic changes above this is just a rebase on top of Roland's for-linus branch, so it should be functionally equivalent to what's in -mm now. However, and just for the record, I can't access the lab now and might not be able to do this tomorrow either - so this patch was only compile-tested. This applies on top of Roland's for-linus tree. I still keep the sysfs flag to enable/disable CM - this is safe, but maybe we can go back to only looking at the device MTU, now that multicast works? Changes from PATCHv5: - with debug enabled, show qpn instead of a pointer - this is prettier - rename ipoib_cm_modify_rx_rts to ipoib_cm_modify_rx_qp, since the RX QP actually stays in RTR. Thanks to Pradeep Satyanarayana for pointing this out. - Reduce MTU on connected->datagram mode change Changes from PATCHv4: - Fix TX ring full recovery when TX ring is destroyed (bug 320) Changes from PATCHv3: - Fix TX ring full recovery - Whitespace fix Changes from PATCHv2: - Using path MTU discovery, multicast and UDP traffic to UD mode now work, only a small number of packets is dropped. - Use timer to clean up stale RX connections - Make CM use same CQ IPoIB uses for UD (good for mixed UD/CM traffic and for NAPI if we ever enable it) - Tone down warning messages - only some packets are now dropped in CM/UD setup CM support is also still labeled as experimental, and set it to disabled by default, although its been very stable for me, and the code is complete as far as I'm concerned. Is it be easier to merge it this way? Note that the connected mode support adds very little overhead when not activated at run time, and zero data-path overhead when not activated at compile time. Here's a short description of what the patch does: a. The code is here: git://git.openfabrics.org/~mst/linux-2.6 ipoib-cm-for-roland >git show will show this patch b. How to activate: Server: #modprobe ib_ipoib #echo connected > /sys/class/net/ib0/mode #/sbin/ifconfig ib0 mtu 65520 #./netperf-2.4.2/src/netserver Client: #modprobe ib_ipoib #echo connected > /sys/class/net/ib0/mode #/sbin/ifconfig ib0 mtu 65520 #./netperf-2.4.2/src/netperf -H 11.4.3.68 -f M TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. MBytes/sec 87380 16384 16384 10.01 891.21 c. TODO list (Optional) Send side S/G support d. Limitations With MTU > 2044, UDP multicast and UDP connections to IPoIB UD mode currently will drop some packets since we sometimes get packets that are too large to send over a UD QP. Typically a single packet will be dropped each several minutes until path MTU discovery kicks in and lowers the path MTU to this destination. diff --git a/drivers/infiniband/ulp/ipoib/Kconfig b/drivers/infiniband/ulp/ipoib/Kconfig index c75322d..0ffca11 100644 --- a/drivers/infiniband/ulp/ipoib/Kconfig +++ b/drivers/infiniband/ulp/ipoib/Kconfig @@ -8,6 +8,20 @@ config INFINIBAND_IPOIB See Documentation/infiniband/ipoib.txt for more information +config INFINIBAND_IPOIB_CM + bool "IP-over-InfiniBand Connected Mode support" + depends on INFINIBAND_IPOIB && EXPERIMENTAL + default n + ---help--- + This option enables experimental support for IPoIB connected mode. + After enabling this option, you need to switch to connected mode through + /sys/class/net/ibXXX/mode to actually create connections, and then increase + the interface MTU with e.g. ifconfig ib0 mtu 65520. + + WARNING: Enabling connected mode will trigger some + packet drops for multicast and UD mode traffic from this interface, + unless you limit mtu for these destinations to 2044. + config INFINIBAND_IPOIB_DEBUG bool "IP-over-InfiniBand debugging" if EMBEDDED depends on INFINIBAND_IPOIB diff --git a/drivers/infiniband/ulp/ipoib/Makefile b/drivers/infiniband/ulp/ipoib/Makefile index 8935e74..98ee38e 100644 --- a/drivers/infiniband/ulp/ipoib/Makefile +++ b/drivers/infiniband/ulp/ipoib/Makefile @@ -5,5 +5,6 @@ ib_ipoib-y := ipoib_main.o \ ipoib_multicast.o \ ipoib_verbs.o \ ipoib_vlan.o +ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_CM) += ipoib_cm.o ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_DEBUG) += ipoib_fs.o diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 07deee8..8082d50 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -62,6 +62,10 @@ enum { IPOIB_ENCAP_LEN = 4, + IPOIB_CM_MTU = 0x10000 - 0x10, /* padding to align header to 16 */ + IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, + IPOIB_CM_HEAD_SIZE = IPOIB_CM_BUF_SIZE % PAGE_SIZE, + IPOIB_CM_RX_SG = ALIGN(IPOIB_CM_BUF_SIZE, PAGE_SIZE) / PAGE_SIZE, IPOIB_RX_RING_SIZE = 128, IPOIB_TX_RING_SIZE = 64, IPOIB_MAX_QUEUE_SIZE = 8192, @@ -81,6 +85,8 @@ enum { IPOIB_MCAST_RUN = 6, IPOIB_STOP_REAPER = 7, IPOIB_MCAST_STARTED = 8, + IPOIB_FLAG_NETIF_STOPPED = 9, + IPOIB_FLAG_ADMIN_CM = 10, IPOIB_MAX_BACKOFF_SECONDS = 16, @@ -90,6 +96,14 @@ enum { IPOIB_MCAST_FLAG_ATTACHED = 3, }; + +#define IPOIB_OP_RECV (1ul << 31) +#ifdef CONFIG_INFINIBAND_IPOIB_CM +#define IPOIB_CM_OP_SRQ (1ul << 30) +#else +#define IPOIB_CM_OP_SRQ (0) +#endif + /* structs */ struct ipoib_header { @@ -113,6 +127,61 @@ struct ipoib_tx_buf { u64 mapping; }; +#ifdef CONFIG_INFINIBAND_IPOIB_CM +struct ib_cm_id; + +struct ipoib_cm_data { + __be32 qpn; /* High byte MUST be ignored on receive */ + __be32 mtu; +}; + +struct ipoib_cm_rx { + struct ib_cm_id *id; + struct ib_qp *qp; + struct list_head list; + struct net_device *dev; + unsigned long jiffies; +}; + +struct ipoib_cm_tx { + struct ib_cm_id *id; + struct ib_cq *cq; + struct ib_qp *qp; + struct list_head list; + struct net_device *dev; + struct ipoib_neigh *neigh; + struct ipoib_path *path; + struct ipoib_tx_buf *tx_ring; + unsigned tx_head; + unsigned tx_tail; + unsigned long flags; + u32 mtu; + struct ib_wc ibwc[IPOIB_NUM_WC]; +}; + +struct ipoib_cm_rx_buf { + struct sk_buff *skb; + u64 mapping[IPOIB_CM_RX_SG]; +}; + +struct ipoib_cm_dev_priv { + struct ib_srq *srq; + struct ipoib_cm_rx_buf *srq_ring; + struct ib_cm_id *id; + struct list_head passive_ids; + struct work_struct start_task; + struct work_struct reap_task; + struct work_struct skb_task; + struct delayed_work stale_task; + struct sk_buff_head skb_queue; + struct list_head start_list; + struct list_head reap_list; + struct ib_wc ibwc[IPOIB_NUM_WC]; + struct ib_sge rx_sge[IPOIB_CM_RX_SG]; + struct ib_recv_wr rx_wr; +}; + +#endif /* * Device private locking: tx_lock protects members used in TX fast * path (and we use LLTX so upper layers don't do extra locking). @@ -179,6 +248,10 @@ struct ipoib_dev_priv { struct list_head child_intfs; struct list_head list; +#ifdef CONFIG_INFINIBAND_IPOIB_CM + struct ipoib_cm_dev_priv cm; +#endif + #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG struct list_head fs_list; struct dentry *mcg_dentry; @@ -212,6 +285,9 @@ struct ipoib_path { struct ipoib_neigh { struct ipoib_ah *ah; +#ifdef CONFIG_INFINIBAND_IPOIB_CM + struct ipoib_cm_tx *cm; +#endif union ib_gid dgid; struct sk_buff_head queue; @@ -315,6 +391,146 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); void ipoib_pkey_poll(struct work_struct *work); int ipoib_pkey_dev_delay_open(struct net_device *dev); +#ifdef CONFIG_INFINIBAND_IPOIB_CM + +#define IPOIB_FLAGS_RC 0x80 +#define IPOIB_FLAGS_UC 0x40 + +/* We don't support UC connections at the moment */ +#define IPOIB_CM_SUPPORTED(ha) (ha[0] & (IPOIB_FLAGS_RC)) + +static inline int ipoib_cm_admin_enabled(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + return IPOIB_CM_SUPPORTED(dev->dev_addr) && + test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); +} + +static inline int ipoib_cm_enabled(struct net_device *dev, struct neighbour *n) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + return IPOIB_CM_SUPPORTED(n->ha) && + test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); +} + +static inline int ipoib_cm_up(struct ipoib_neigh *neigh) + +{ + return test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags); +} + +static inline struct ipoib_cm_tx *ipoib_cm_get(struct ipoib_neigh *neigh) +{ + return neigh->cm; +} + +static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *tx) +{ + neigh->cm = tx; +} + +void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx); +int ipoib_cm_dev_open(struct net_device *dev); +void ipoib_cm_dev_stop(struct net_device *dev); +int ipoib_cm_dev_init(struct net_device *dev); +int ipoib_cm_add_mode_attr(struct net_device *dev); +void ipoib_cm_dev_cleanup(struct net_device *dev); +struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, + struct ipoib_neigh *neigh); +void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); +void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, + unsigned int mtu); +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc); +#else + +struct ipoib_cm_tx; + +static inline int ipoib_cm_admin_enabled(struct net_device *dev) +{ + return 0; +} +static inline int ipoib_cm_enabled(struct net_device *dev, struct neighbour *n) + +{ + return 0; +} + +static inline int ipoib_cm_up(struct ipoib_neigh *neigh) + +{ + return 0; +} + +static inline struct ipoib_cm_tx *ipoib_cm_get(struct ipoib_neigh *neigh) +{ + return NULL; +} + +static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *tx) +{ +} + +static inline +void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx) +{ + return; +} + +static inline +int ipoib_cm_dev_open(struct net_device *dev) +{ + return 0; +} + +static inline +void ipoib_cm_dev_stop(struct net_device *dev) +{ + return; +} + +static inline +int ipoib_cm_dev_init(struct net_device *dev) +{ + return -ENOSYS; +} + +static inline +void ipoib_cm_dev_cleanup(struct net_device *dev) +{ + return; +} + +static inline +struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, + struct ipoib_neigh *neigh) +{ + return NULL; +} + +static inline +void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx) +{ + return; +} + +static inline +int ipoib_cm_add_mode_attr(struct net_device *dev) +{ + return 0; +} + +static inline void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, + unsigned int mtu) +{ + dev_kfree_skb_any(skb); +} + +static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +{ +} + +#endif + #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG void ipoib_create_debug_files(struct net_device *dev); void ipoib_delete_debug_files(struct net_device *dev); @@ -392,4 +608,6 @@ extern int ipoib_debug_level; #define IPOIB_GID_ARG(gid) IPOIB_GID_RAW_ARG((gid).raw) +#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff) + #endif /* _IPOIB_H */ diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c new file mode 100644 index 0000000..a618a40 --- /dev/null +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -0,0 +1,1236 @@ +/* + * Copyright (c) 2006 Mellanox Technologies. All rights reserved + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + +#include +#include +#include +#include + +#ifdef CONFIG_IPV6 +#include +#endif + +#ifdef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA +static int data_debug_level; + +module_param_named(cm_data_debug_level, data_debug_level, int, 0644); +MODULE_PARM_DESC(cm_data_debug_level, + "Enable data path debug tracing for connected mode if > 0"); +#endif + +#include "ipoib.h" + +#define IPOIB_CM_IETF_ID 0x1000000000000000ULL + +#define IPOIB_CM_RX_UPDATE_TIME (256 * HZ) +#define IPOIB_CM_RX_TIMEOUT (2 * 256 * HZ) +#define IPOIB_CM_RX_DELAY (3 * 256 * HZ) +#define IPOIB_CM_RX_UPDATE_MASK (0x3) + +struct ipoib_cm_id { + struct ib_cm_id *id; + int flags; + u32 remote_qpn; + u32 remote_mtu; +}; + +static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *event); + +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, + u64 mapping[IPOIB_CM_RX_SG]) +{ + int i; + + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); + + for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i) + ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); +} + +static int ipoib_cm_post_receive(struct net_device *dev, int id) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_recv_wr *bad_wr; + int i, ret; + + priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; + + for (i = 0; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i]; + + ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr); + if (unlikely(ret)) { + ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret); + ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[id].mapping); + dev_kfree_skb_any(priv->cm.srq_ring[id].skb); + priv->cm.srq_ring[id].skb = NULL; + } + + return ret; +} + +static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, + u64 mapping[IPOIB_CM_RX_SG]) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct sk_buff *skb; + int i; + + skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12); + if (unlikely(!skb)) + return -ENOMEM; + + /* + * IPoIB adds a 4 byte header. So we need 12 more bytes to align the + * IP header to a multiple of 16. + */ + skb_reserve(skb, 12); + + mapping[0] = ib_dma_map_single(priv->ca, skb->data, IPOIB_CM_HEAD_SIZE, + DMA_FROM_DEVICE); + if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) { + dev_kfree_skb_any(skb); + return -EIO; + } + + for (i = 0; i < IPOIB_CM_RX_SG - 1; i++) { + struct page *page = alloc_page(GFP_ATOMIC); + + if (!page) + goto partial_error; + skb_fill_page_desc(skb, i, page, 0, PAGE_SIZE); + + mapping[i + 1] = ib_dma_map_page(priv->ca, skb_shinfo(skb)->frags[i].page, + 0, PAGE_SIZE, DMA_TO_DEVICE); + if (unlikely(ib_dma_mapping_error(priv->ca, mapping[i + 1]))) + goto partial_error; + } + + priv->cm.srq_ring[id].skb = skb; + return 0; + +partial_error: + + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); + + for (; i >= 0; --i) + ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); + + kfree_skb(skb); + return -ENOMEM; +} + +static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev, + struct ipoib_cm_rx *p) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_qp_init_attr attr = { + .send_cq = priv->cq, /* does not matter, we never send anything */ + .recv_cq = priv->cq, + .srq = priv->cm.srq, + .cap.max_send_wr = 1, /* FIXME: 0 Seems not to work */ + .cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */ + .sq_sig_type = IB_SIGNAL_ALL_WR, + .qp_type = IB_QPT_RC, + .qp_context = p, + }; + return ib_create_qp(priv->pd, &attr); +} + +static int ipoib_cm_modify_rx_qp(struct net_device *dev, + struct ib_cm_id *cm_id, struct ib_qp *qp) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; + + qp_attr.qp_state = IB_QPS_INIT; + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for INIT: %d\n", ret); + return ret; + } + ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to INIT: %d\n", ret); + return ret; + } + qp_attr.qp_state = IB_QPS_RTR; + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret); + return ret; + } + qp_attr.rq_psn = 0 /* FIXME */; + ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret); + return ret; + } + return 0; +} + +static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id, + struct ib_qp *qp, struct ib_cm_req_event_param *req) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_data data = {}; + struct ib_cm_rep_param rep = {}; + + data.qpn = cpu_to_be32(priv->qp->qp_num); + data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE); + + rep.private_data = &data; + rep.private_data_len = sizeof data; + rep.flow_control = 0; + rep.rnr_retry_count = req->rnr_retry_count; + rep.target_ack_delay = 20; /* FIXME */ + rep.srq = 1; + rep.qp_num = qp->qp_num; + rep.starting_psn = 0 /* FIXME */; + return ib_send_cm_rep(cm_id, &rep); +} + +static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) +{ + struct net_device *dev = cm_id->context; + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_rx *p; + unsigned long flags; + int ret; + + ipoib_dbg(priv, "REQ arrived\n"); + p = kzalloc(sizeof *p, GFP_KERNEL); + if (!p) + return -ENOMEM; + p->dev = dev; + p->id = cm_id; + p->qp = ipoib_cm_create_rx_qp(dev, p); + if (IS_ERR(p->qp)) { + ret = PTR_ERR(p->qp); + goto err_qp; + } + + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp); + if (ret) + goto err_modify; + + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd); + if (ret) { + ipoib_warn(priv, "failed to send REP: %d\n", ret); + goto err_rep; + } + + cm_id->context = p; + p->jiffies = jiffies; + spin_lock_irqsave(&priv->lock, flags); + list_add(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + return 0; + +err_rep: +err_modify: + ib_destroy_qp(p->qp); +err_qp: + kfree(p); + return ret; +} + +static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *event) +{ + struct ipoib_cm_rx *p; + struct ipoib_dev_priv *priv; + unsigned long flags; + int ret; + + switch (event->event) { + case IB_CM_REQ_RECEIVED: + return ipoib_cm_req_handler(cm_id, event); + case IB_CM_DREQ_RECEIVED: + p = cm_id->context; + ib_send_cm_drep(cm_id, NULL, 0); + /* Fall through */ + case IB_CM_REJ_RECEIVED: + p = cm_id->context; + priv = netdev_priv(p->dev); + spin_lock_irqsave(&priv->lock, flags); + if (list_empty(&p->list)) + ret = 0; /* Connection is going away already. */ + else { + list_del_init(&p->list); + ret = -ECONNRESET; + } + spin_unlock_irqrestore(&priv->lock, flags); + if (ret) { + ib_destroy_qp(p->qp); + kfree(p); + return ret; + } + return 0; + default: + return 0; + } +} +/* Adjust length of skb with fragments to match received data */ +static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, + unsigned int length) +{ + int i, num_frags; + unsigned int size; + + /* put header into skb */ + size = min(length, hdr_space); + skb->tail += size; + skb->len += size; + length -= size; + + num_frags = skb_shinfo(skb)->nr_frags; + for (i = 0; i < num_frags; i++) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + + if (length == 0) { + /* don't need this page */ + __free_page(frag->page); + --skb_shinfo(skb)->nr_frags; + } else { + size = min(length, (unsigned) PAGE_SIZE); + + frag->size = size; + skb->data_len += size; + skb->truesize += size; + skb->len += size; + length -= size; + } + } +} + +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; + struct sk_buff *skb; + struct ipoib_cm_rx *p; + unsigned long flags; + u64 mapping[IPOIB_CM_RX_SG]; + + ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n", + wr_id, wc->opcode, wc->status); + + if (unlikely(wr_id >= ipoib_recvq_size)) { + ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n", + wr_id, ipoib_recvq_size); + return; + } + + skb = priv->cm.srq_ring[wr_id].skb; + + if (unlikely(wc->status != IB_WC_SUCCESS)) { + ipoib_dbg(priv, "cm recv error " + "(status=%d, wrid=%d vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); + ++priv->stats.rx_dropped; + goto repost; + } + + if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { + p = wc->qp->qp_context; + if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { + spin_lock_irqsave(&priv->lock, flags); + p->jiffies = jiffies; + /* Move this entry to list head, but do + * not re-add it if it has been removed. */ + if (!list_empty(&p->list)) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + } + } + + if (unlikely(ipoib_cm_alloc_rx_skb(dev, wr_id, mapping))) { + /* + * If we can't allocate a new RX buffer, dump + * this packet and reuse the old buffer. + */ + ipoib_dbg(priv, "failed to allocate receive buffer %d\n", wr_id); + ++priv->stats.rx_dropped; + goto repost; + } + + ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[wr_id].mapping); + memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, sizeof mapping); + + ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", + wc->byte_len, wc->slid); + + skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len); + + skb->protocol = ((struct ipoib_header *) skb->data)->proto; + skb->mac.raw = skb->data; + skb_pull(skb, IPOIB_ENCAP_LEN); + + dev->last_rx = jiffies; + ++priv->stats.rx_packets; + priv->stats.rx_bytes += skb->len; + + skb->dev = dev; + /* XXX get correct PACKET_ type here */ + skb->pkt_type = PACKET_HOST; + netif_rx_ni(skb); + +repost: + if (unlikely(ipoib_cm_post_receive(dev, wr_id))) + ipoib_warn(priv, "ipoib_cm_post_receive failed " + "for buf %d\n", wr_id); +} + +static inline int post_send(struct ipoib_dev_priv *priv, + struct ipoib_cm_tx *tx, + unsigned int wr_id, + u64 addr, int len) +{ + struct ib_send_wr *bad_wr; + + priv->tx_sge.addr = addr; + priv->tx_sge.length = len; + + priv->tx_wr.wr_id = wr_id; + + return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr); +} + +void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_tx_buf *tx_req; + u64 addr; + + if (unlikely(skb->len > tx->mtu)) { + ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", + skb->len, tx->mtu); + ++priv->stats.tx_dropped; + ++priv->stats.tx_errors; + ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN); + return; + } + + ipoib_dbg_data(priv, "sending packet: head 0x%x length %d connection 0x%x\n", + tx->tx_head, skb->len, tx->qp->qp_num); + + /* + * We put the skb into the tx_ring _before_ we call post_send() + * because it's entirely possible that the completion handler will + * run before we execute anything after the post_send(). That + * means we have to make sure everything is properly recorded and + * our state is consistent before we call post_send(). + */ + tx_req = &tx->tx_ring[tx->tx_head & (ipoib_sendq_size - 1)]; + tx_req->skb = skb; + addr = ib_dma_map_single(priv->ca, skb->data, skb->len, DMA_TO_DEVICE); + if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { + ++priv->stats.tx_errors; + dev_kfree_skb_any(skb); + return; + } + + tx_req->mapping = addr; + + if (unlikely(post_send(priv, tx, tx->tx_head & (ipoib_sendq_size - 1), + addr, skb->len))) { + ipoib_warn(priv, "post_send failed\n"); + ++priv->stats.tx_errors; + ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE); + dev_kfree_skb_any(skb); + } else { + dev->trans_start = jiffies; + ++tx->tx_head; + + if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) { + ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n", + tx->qp->qp_num); + netif_stop_queue(dev); + set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); + } + } +} + +static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx, + struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned int wr_id = wc->wr_id; + struct ipoib_tx_buf *tx_req; + unsigned long flags; + + ipoib_dbg_data(priv, "cm send completion: id %d, op %d, status: %d\n", + wr_id, wc->opcode, wc->status); + + if (unlikely(wr_id >= ipoib_sendq_size)) { + ipoib_warn(priv, "cm send completion event with wrid %d (> %d)\n", + wr_id, ipoib_sendq_size); + return; + } + + tx_req = &tx->tx_ring[wr_id]; + + ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, DMA_TO_DEVICE); + + /* FIXME: is this right? Shouldn't we only increment on success? */ + ++priv->stats.tx_packets; + priv->stats.tx_bytes += tx_req->skb->len; + + dev_kfree_skb_any(tx_req->skb); + + spin_lock_irqsave(&priv->tx_lock, flags); + ++tx->tx_tail; + if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) && + tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) { + clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); + netif_wake_queue(dev); + } + + if (wc->status != IB_WC_SUCCESS && + wc->status != IB_WC_WR_FLUSH_ERR) { + struct ipoib_neigh *neigh; + + ipoib_dbg(priv, "failed cm send event " + "(status=%d, wrid=%d vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); + + spin_lock(&priv->lock); + neigh = tx->neigh; + + if (neigh) { + neigh->cm = NULL; + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + + tx->neigh = NULL; + } + + /* queue would be re-started anyway when TX is destroyed, + * but it makes sense to do it ASAP here. */ + if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) + netif_wake_queue(dev); + + if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { + list_move(&tx->list, &priv->cm.reap_list); + queue_work(ipoib_workqueue, &priv->cm.reap_task); + } + + clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags); + + spin_unlock(&priv->lock); + } + + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr) +{ + struct ipoib_cm_tx *tx = tx_ptr; + int n, i; + + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + do { + n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc); + for (i = 0; i < n; ++i) + ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i); + } while (n == IPOIB_NUM_WC); +} + +int ipoib_cm_dev_open(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int ret; + + if (!IPOIB_CM_SUPPORTED(dev->dev_addr)) + return 0; + + priv->cm.id = ib_create_cm_id(priv->ca, ipoib_cm_rx_handler, dev); + if (IS_ERR(priv->cm.id)) { + printk(KERN_WARNING "%s: failed to create CM ID\n", priv->ca->name); + return IS_ERR(priv->cm.id); + } + + ret = ib_cm_listen(priv->cm.id, cpu_to_be64(IPOIB_CM_IETF_ID | priv->qp->qp_num), + 0, NULL); + if (ret) { + printk(KERN_WARNING "%s: failed to listen on ID 0x%llx\n", priv->ca->name, + IPOIB_CM_IETF_ID | priv->qp->qp_num); + ib_destroy_cm_id(priv->cm.id); + return ret; + } + return 0; +} + +void ipoib_cm_dev_stop(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_rx *p; + unsigned long flags; + + if (!IPOIB_CM_SUPPORTED(dev->dev_addr)) + return; + + ib_destroy_cm_id(priv->cm.id); + spin_lock_irqsave(&priv->lock, flags); + while (!list_empty(&priv->cm.passive_ids)) { + p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); + list_del_init(&p->list); + spin_unlock_irqrestore(&priv->lock, flags); + ib_destroy_cm_id(p->id); + ib_destroy_qp(p->qp); + kfree(p); + spin_lock_irqsave(&priv->lock, flags); + } + spin_unlock_irqrestore(&priv->lock, flags); + + cancel_delayed_work(&priv->cm.stale_task); +} + +static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) +{ + struct ipoib_cm_tx *p = cm_id->context; + struct ipoib_dev_priv *priv = netdev_priv(p->dev); + struct ipoib_cm_data *data = event->private_data; + struct sk_buff_head skqueue; + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; + struct sk_buff *skb; + unsigned long flags; + + p->mtu = be32_to_cpu(data->mtu); + + if (p->mtu < priv->dev->mtu + IPOIB_ENCAP_LEN) { + ipoib_warn(priv, "Rejecting connection: mtu %d < device mtu %d + 4\n", + p->mtu, priv->dev->mtu); + return -EINVAL; + } + + qp_attr.qp_state = IB_QPS_RTR; + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret); + return ret; + } + + qp_attr.rq_psn = 0 /* FIXME */; + ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret); + return ret; + } + + qp_attr.qp_state = IB_QPS_RTS; + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for RTS: %d\n", ret); + return ret; + } + ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to RTS: %d\n", ret); + return ret; + } + + skb_queue_head_init(&skqueue); + + spin_lock_irqsave(&priv->lock, flags); + set_bit(IPOIB_FLAG_OPER_UP, &p->flags); + if (p->neigh) + while ((skb = __skb_dequeue(&p->neigh->queue))) + __skb_queue_tail(&skqueue, skb); + spin_unlock_irqrestore(&priv->lock, flags); + + while ((skb = __skb_dequeue(&skqueue))) { + skb->dev = p->dev; + if (dev_queue_xmit(skb)) + ipoib_warn(priv, "dev_queue_xmit failed " + "to requeue packet\n"); + } + + ret = ib_send_cm_rtu(cm_id, NULL, 0); + if (ret) { + ipoib_warn(priv, "failed to send RTU: %d\n", ret); + return ret; + } + return 0; +} + +static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_qp_init_attr attr = {}; + attr.recv_cq = priv->cq; + attr.srq = priv->cm.srq; + attr.cap.max_send_wr = ipoib_sendq_size; + attr.cap.max_send_sge = 1; + attr.sq_sig_type = IB_SIGNAL_ALL_WR; + attr.qp_type = IB_QPT_RC; + attr.send_cq = cq; + return ib_create_qp(priv->pd, &attr); +} + +static int ipoib_cm_send_req(struct net_device *dev, + struct ib_cm_id *id, struct ib_qp *qp, + u32 qpn, + struct ib_sa_path_rec *pathrec) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_data data = {}; + struct ib_cm_req_param req = {}; + + data.qpn = cpu_to_be32(priv->qp->qp_num); + data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE); + + req.primary_path = pathrec; + req.alternate_path = NULL; + req.service_id = cpu_to_be64(IPOIB_CM_IETF_ID | qpn); + req.qp_num = qp->qp_num; + req.qp_type = qp->qp_type; + req.private_data = &data; + req.private_data_len = sizeof data; + req.flow_control = 0; + + req.starting_psn = 0; /* FIXME */ + + /* + * Pick some arbitrary defaults here; we could make these + * module parameters if anyone cared about setting them. + */ + req.responder_resources = 4; + req.remote_cm_response_timeout = 20; + req.local_cm_response_timeout = 20; + req.retry_count = 0; /* RFC draft warns against retries */ + req.rnr_retry_count = 0; /* RFC draft warns against retries */ + req.max_cm_retries = 15; + req.srq = 15; + return ib_send_cm_req(id, &req); +} + +static int ipoib_cm_modify_tx_init(struct net_device *dev, + struct ib_cm_id *cm_id, struct ib_qp *qp) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; + ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index); + if (ret) { + ipoib_warn(priv, "pkey 0x%x not in cache: %d\n", priv->pkey, ret); + return ret; + } + + qp_attr.qp_state = IB_QPS_INIT; + qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE; + qp_attr.port_num = priv->port; + qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_PORT; + + ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify tx QP to INIT: %d\n", ret); + return ret; + } + return 0; +} + +static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn, + struct ib_sa_path_rec *pathrec) +{ + struct ipoib_dev_priv *priv = netdev_priv(p->dev); + int ret; + + p->tx_ring = kzalloc(ipoib_sendq_size * sizeof *p->tx_ring, + GFP_KERNEL); + if (!p->tx_ring) { + ipoib_warn(priv, "failed to allocate tx ring\n"); + ret = -ENOMEM; + goto err_tx; + } + + p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p, + ipoib_sendq_size + 1); + if (IS_ERR(p->cq)) { + ret = PTR_ERR(p->cq); + ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret); + goto err_cq; + } + + ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP); + if (ret) { + ipoib_warn(priv, "failed to request completion notification: %d\n", ret); + goto err_req_notify; + } + + p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq); + if (IS_ERR(p->qp)) { + ret = PTR_ERR(p->qp); + ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret); + goto err_qp; + } + + p->id = ib_create_cm_id(priv->ca, ipoib_cm_tx_handler, p); + if (IS_ERR(p->id)) { + ret = PTR_ERR(p->id); + ipoib_warn(priv, "failed to create tx cm id: %d\n", ret); + goto err_id; + } + + ret = ipoib_cm_modify_tx_init(p->dev, p->id, p->qp); + if (ret) { + ipoib_warn(priv, "failed to modify tx qp to rtr: %d\n", ret); + goto err_modify; + } + + ret = ipoib_cm_send_req(p->dev, p->id, p->qp, qpn, pathrec); + if (ret) { + ipoib_warn(priv, "failed to send cm req: %d\n", ret); + goto err_send_cm; + } + + ipoib_dbg(priv, "Request connection 0x%x for gid " IPOIB_GID_FMT " qpn 0x%x\n", + p->qp->qp_num, IPOIB_GID_ARG(pathrec->dgid), qpn); + + return 0; + +err_send_cm: +err_modify: + ib_destroy_cm_id(p->id); +err_id: + p->id = NULL; + ib_destroy_qp(p->qp); +err_req_notify: +err_qp: + p->qp = NULL; + ib_destroy_cq(p->cq); +err_cq: + p->cq = NULL; +err_tx: + return ret; +} + +static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) +{ + struct ipoib_dev_priv *priv = netdev_priv(p->dev); + struct ipoib_tx_buf *tx_req; + + ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n", + p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail); + + if (p->id) + ib_destroy_cm_id(p->id); + + if (p->qp) + ib_destroy_qp(p->qp); + + if (p->cq) + ib_destroy_cq(p->cq); + + if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags)) + netif_wake_queue(p->dev); + + if (p->tx_ring) { + while ((int) p->tx_tail - (int) p->tx_head < 0) { + tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)]; + ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, + DMA_TO_DEVICE); + dev_kfree_skb_any(tx_req->skb); + ++p->tx_tail; + } + + kfree(p->tx_ring); + } + + kfree(p); +} + +static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *event) +{ + struct ipoib_cm_tx *tx = cm_id->context; + struct ipoib_dev_priv *priv = netdev_priv(tx->dev); + struct net_device *dev = priv->dev; + struct ipoib_neigh *neigh; + unsigned long flags; + int ret; + + switch (event->event) { + case IB_CM_DREQ_RECEIVED: + ipoib_dbg(priv, "DREQ received.\n"); + ib_send_cm_drep(cm_id, NULL, 0); + break; + case IB_CM_REP_RECEIVED: + ipoib_dbg(priv, "REP received.\n"); + ret = ipoib_cm_rep_handler(cm_id, event); + if (ret) + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); + break; + case IB_CM_REQ_ERROR: + case IB_CM_REJ_RECEIVED: + case IB_CM_TIMEWAIT_EXIT: + ipoib_dbg(priv, "CM error %d.\n", event->event); + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + neigh = tx->neigh; + + if (neigh) { + neigh->cm = NULL; + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + + tx->neigh = NULL; + } + + if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { + list_move(&tx->list, &priv->cm.reap_list); + queue_work(ipoib_workqueue, &priv->cm.reap_task); + } + + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); + break; + default: + break; + } + + return 0; +} + +struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, + struct ipoib_neigh *neigh) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_tx *tx; + + tx = kzalloc(sizeof *tx, GFP_ATOMIC); + if (!tx) + return NULL; + + neigh->cm = tx; + tx->neigh = neigh; + tx->path = path; + tx->dev = dev; + list_add(&tx->list, &priv->cm.start_list); + set_bit(IPOIB_FLAG_INITIALIZED, &tx->flags); + queue_work(ipoib_workqueue, &priv->cm.start_task); + return tx; +} + +void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx) +{ + struct ipoib_dev_priv *priv = netdev_priv(tx->dev); + if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { + list_move(&tx->list, &priv->cm.reap_list); + queue_work(ipoib_workqueue, &priv->cm.reap_task); + ipoib_dbg(priv, "Reap connection for gid " IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(tx->neigh->dgid)); + tx->neigh = NULL; + } +} + +static void ipoib_cm_tx_start(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + cm.start_task); + struct net_device *dev = priv->dev; + struct ipoib_neigh *neigh; + struct ipoib_cm_tx *p; + unsigned long flags; + int ret; + + struct ib_sa_path_rec pathrec; + u32 qpn; + + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + while (!list_empty(&priv->cm.start_list)) { + p = list_entry(priv->cm.start_list.next, typeof(*p), list); + list_del_init(&p->list); + neigh = p->neigh; + qpn = IPOIB_QPN(neigh->neighbour->ha); + memcpy(&pathrec, &p->path->pathrec, sizeof pathrec); + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); + ret = ipoib_cm_tx_init(p, qpn, &pathrec); + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + if (ret) { + neigh = p->neigh; + if (neigh) { + neigh->cm = NULL; + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + } + list_del(&p->list); + kfree(p); + } + } + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +static void ipoib_cm_tx_reap(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + cm.reap_task); + struct ipoib_cm_tx *p; + unsigned long flags; + + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + while (!list_empty(&priv->cm.reap_list)) { + p = list_entry(priv->cm.reap_list.next, typeof(*p), list); + list_del(&p->list); + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); + ipoib_cm_tx_destroy(p); + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + } + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +static void ipoib_cm_skb_reap(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + cm.skb_task); + struct net_device *dev = priv->dev; + struct sk_buff *skb; + unsigned long flags; + + __be32 mtu = cpu_to_be32(priv->mcast_mtu); + + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + while ((skb = skb_dequeue(&priv->cm.skb_queue))) { + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); + if (skb->protocol == htons(ETH_P_IP)) + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); +#ifdef CONFIG_IPV6 + else if (skb->protocol == htons(ETH_P_IPV6)) + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); +#endif + dev_kfree_skb_any(skb); + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + } + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, + unsigned int mtu) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int e = skb_queue_empty(&priv->cm.skb_queue); + + if (skb->dst) + skb->dst->ops->update_pmtu(skb->dst, mtu); + + skb_queue_tail(&priv->cm.skb_queue, skb); + if (e) + queue_work(ipoib_workqueue, &priv->cm.skb_task); +} + +static void ipoib_cm_stale_task(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + cm.stale_task.work); + struct ipoib_cm_rx *p; + unsigned long flags; + + spin_lock_irqsave(&priv->lock, flags); + while (!list_empty(&priv->cm.passive_ids)) { + /* List if sorted by LRU, start from tail, + * stop when we see a recently used entry */ + p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); + if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) + break; + list_del_init(&p->list); + spin_unlock_irqrestore(&priv->lock, flags); + ib_destroy_cm_id(p->id); + ib_destroy_qp(p->qp); + kfree(p); + spin_lock_irqsave(&priv->lock, flags); + } + spin_unlock_irqrestore(&priv->lock, flags); +} + + +static ssize_t show_mode(struct class_device *cdev, char *buf) +{ + struct net_device *dev = container_of(cdev, struct net_device, class_dev); + struct ipoib_dev_priv *priv = netdev_priv(dev); + + if (test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags)) + return sprintf(buf, "connected\n"); + else + return sprintf(buf, "datagram\n"); +} + +static ssize_t set_mode(struct class_device *cdev, + const char *buf, size_t count) +{ + struct net_device *dev = container_of(cdev, struct net_device, class_dev); + struct ipoib_dev_priv *priv = netdev_priv(dev); + + /* flush paths if we switch modes so that connections are restarted */ + if (IPOIB_CM_SUPPORTED(dev->dev_addr) && !strcmp(buf, "connected\n")) { + set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); + ipoib_warn(priv, "enabling connected mode " + "will cause multicast packet drops\n"); + ipoib_flush_paths(dev); + return count; + } + + if (!strcmp(buf, "datagram\n")) { + clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); + dev->mtu = min(priv->mcast_mtu, dev->mtu); + ipoib_flush_paths(dev); + return count; + } + + return -EINVAL; +} + +static CLASS_DEVICE_ATTR(mode, S_IWUGO | S_IRUGO, show_mode, set_mode); + +int ipoib_cm_add_mode_attr(struct net_device *dev) +{ + return class_device_create_file(&dev->class_dev, &class_device_attr_mode); +} + +int ipoib_cm_dev_init(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_srq_init_attr srq_init_attr = { + .attr = { + .max_wr = ipoib_recvq_size, + .max_sge = IPOIB_CM_RX_SG + } + }; + int ret, i; + + INIT_LIST_HEAD(&priv->cm.passive_ids); + INIT_LIST_HEAD(&priv->cm.reap_list); + INIT_LIST_HEAD(&priv->cm.start_list); + INIT_WORK(&priv->cm.start_task, ipoib_cm_tx_start); + INIT_WORK(&priv->cm.reap_task, ipoib_cm_tx_reap); + INIT_WORK(&priv->cm.skb_task, ipoib_cm_skb_reap); + INIT_DELAYED_WORK(&priv->cm.stale_task, ipoib_cm_stale_task); + + skb_queue_head_init(&priv->cm.skb_queue); + + priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); + if (IS_ERR(priv->cm.srq)) { + ret = PTR_ERR(priv->cm.srq); + priv->cm.srq = NULL; + return ret; + } + + priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring, + GFP_KERNEL); + if (!priv->cm.srq_ring) { + printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n", + priv->ca->name, ipoib_recvq_size); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + + for (i = 0; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].lkey = priv->mr->lkey; + + priv->cm.rx_sge[0].length = IPOIB_CM_HEAD_SIZE; + for (i = 1; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].length = PAGE_SIZE; + priv->cm.rx_wr.next = NULL; + priv->cm.rx_wr.sg_list = priv->cm.rx_sge; + priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; + + for (i = 0; i < ipoib_recvq_size; ++i) { + if (ipoib_cm_alloc_rx_skb(dev, i, priv->cm.srq_ring[i].mapping)) { + ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + if (ipoib_cm_post_receive(dev, i)) { + ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -EIO; + } + } + + priv->dev->dev_addr[0] = IPOIB_FLAGS_RC; + return 0; +} + +void ipoib_cm_dev_cleanup(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int i, ret; + + if (!priv->cm.srq) + return; + + ipoib_dbg(priv, "Cleanup ipoib connected mode.\n"); + + ret = ib_destroy_srq(priv->cm.srq); + if (ret) + ipoib_warn(priv, "ib_destroy_srq failed: %d\n", ret); + + priv->cm.srq = NULL; + if (!priv->cm.srq_ring) + return; + for (i = 0; i < ipoib_recvq_size; ++i) + if (priv->cm.srq_ring[i].skb) { + ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[i].mapping); + dev_kfree_skb_any(priv->cm.srq_ring[i].skb); + priv->cm.srq_ring[i].skb = NULL; + } + kfree(priv->cm.srq_ring); + priv->cm.srq_ring = NULL; +} diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 59d9594..f2aa923 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -50,8 +50,6 @@ MODULE_PARM_DESC(data_debug_level, "Enable data path debug tracing if > 0"); #endif -#define IPOIB_OP_RECV (1ul << 31) - static DEFINE_MUTEX(pkey_mutex); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, @@ -268,10 +266,11 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; - if (netif_queue_stopped(dev) && - test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) + if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) && + priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) { + clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); netif_wake_queue(dev); + } spin_unlock_irqrestore(&priv->tx_lock, flags); if (wc->status != IB_WC_SUCCESS && @@ -283,7 +282,9 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc) { - if (wc->wr_id & IPOIB_OP_RECV) + if (wc->wr_id & IPOIB_CM_OP_SRQ) + ipoib_cm_handle_rx_wc(dev, wc); + else if (wc->wr_id & IPOIB_OP_RECV) ipoib_ib_handle_rx_wc(dev, wc); else ipoib_ib_handle_tx_wc(dev, wc); @@ -327,12 +328,12 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_tx_buf *tx_req; u64 addr; - if (unlikely(skb->len > dev->mtu + INFINIBAND_ALEN)) { + if (unlikely(skb->len > priv->mcast_mtu + INFINIBAND_ALEN)) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", - skb->len, dev->mtu + INFINIBAND_ALEN); + skb->len, priv->mcast_mtu + INFINIBAND_ALEN); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; - dev_kfree_skb_any(skb); + ipoib_cm_skb_too_long(dev, skb, priv->mcast_mtu); return; } @@ -372,6 +373,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); netif_stop_queue(dev); + set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); } } } @@ -424,6 +426,13 @@ int ipoib_ib_dev_open(struct net_device *dev) return -1; } + ret = ipoib_cm_dev_open(dev); + if (ret) { + ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); + ipoib_ib_dev_stop(dev); + return -1; + } + clear_bit(IPOIB_STOP_REAPER, &priv->flags); queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task, HZ); @@ -509,6 +518,8 @@ int ipoib_ib_dev_stop(struct net_device *dev) clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); + ipoib_cm_dev_stop(dev); + /* * Move our QP to the error state and then reinitialize in * when all work requests have completed or have been flushed. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 705eb1d..19e82db 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -49,8 +49,6 @@ #include -#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff) - MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("IP-over-InfiniBand net driver"); MODULE_LICENSE("Dual BSD/GPL"); @@ -145,6 +143,8 @@ static int ipoib_stop(struct net_device *dev) netif_stop_queue(dev); + clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); + /* * Now flush workqueue to make sure a scheduled task doesn't * bring our internal state back up. @@ -178,8 +178,18 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu) { struct ipoib_dev_priv *priv = netdev_priv(dev); - if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) + /* dev->mtu > 2K ==> connected mode */ + if (ipoib_cm_admin_enabled(dev) && new_mtu <= IPOIB_CM_MTU) { + if (new_mtu > priv->mcast_mtu) + ipoib_warn(priv, "mtu > %d will cause multicast packet drops.\n", + priv->mcast_mtu); + dev->mtu = new_mtu; + return 0; + } + + if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) { return -EINVAL; + } priv->admin_mtu = new_mtu; @@ -414,6 +424,20 @@ static void path_rec_completion(int status, memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, sizeof(union ib_gid)); + if (ipoib_cm_enabled(dev, neigh->neighbour)) { + if (!ipoib_cm_get(neigh)) + ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, + path, + neigh)); + if (!ipoib_cm_get(neigh)) { + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + continue; + } + } + while ((skb = __skb_dequeue(&neigh->queue))) __skb_queue_tail(&skqueue, skb); } @@ -520,7 +544,25 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, sizeof(union ib_gid)); - ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha)); + if (ipoib_cm_enabled(dev, neigh->neighbour)) { + if (!ipoib_cm_get(neigh)) + ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, path, neigh)); + if (!ipoib_cm_get(neigh)) { + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + goto err_drop; + } + if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) + __skb_queue_tail(&neigh->queue, skb); + else { + ipoib_warn(priv, "queue length limit %d. Packet drop.\n", + skb_queue_len(&neigh->queue)); + goto err_drop; + } + } else + ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha)); } else { neigh->ah = NULL; @@ -538,6 +580,7 @@ err_list: err_path: ipoib_neigh_free(dev, neigh); +err_drop: ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -640,7 +683,12 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) neigh = *to_ipoib_neigh(skb->dst->neighbour); - if (likely(neigh->ah)) { + if (ipoib_cm_get(neigh)) { + if (ipoib_cm_up(neigh)) { + ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); + goto out; + } + } else if (neigh->ah) { if (unlikely(memcmp(&neigh->dgid.raw, skb->dst->neighbour->ha + 4, sizeof(union ib_gid)))) { @@ -805,6 +853,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) neigh->neighbour = neighbour; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); + ipoib_cm_set(neigh, NULL); return neigh; } @@ -818,6 +867,8 @@ void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh) ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); } + if (ipoib_cm_get(neigh)) + ipoib_cm_destroy_tx(ipoib_cm_get(neigh)); kfree(neigh); } @@ -1081,6 +1132,8 @@ static struct net_device *ipoib_add_port(const char *format, ipoib_create_debug_files(priv->dev); + if (ipoib_cm_add_mode_attr(priv->dev)) + goto sysfs_failed; if (ipoib_add_pkey_attr(priv->dev)) goto sysfs_failed; if (class_device_create_file(&priv->dev->class_dev, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index b04b72c..fea737f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -597,7 +597,9 @@ void ipoib_mcast_join_task(struct work_struct *work) priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) - IPOIB_ENCAP_LEN; - dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); + + if (!ipoib_cm_admin_enabled(dev)) + dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 7b717c6..3cb551b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -168,35 +168,41 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) .qp_type = IB_QPT_UD }; + int ret, size; + priv->pd = ib_alloc_pd(priv->ca); if (IS_ERR(priv->pd)) { printk(KERN_WARNING "%s: failed to allocate PD\n", ca->name); return -ENODEV; } - priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, - ipoib_sendq_size + ipoib_recvq_size + 1); + priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(priv->mr)) { + printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name); + goto out_free_pd; + } + + size = ipoib_sendq_size + ipoib_recvq_size + 1; + ret = ipoib_cm_dev_init(dev); + if (!ret) + size += ipoib_recvq_size; + + priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size); if (IS_ERR(priv->cq)) { printk(KERN_WARNING "%s: failed to create CQ\n", ca->name); - goto out_free_pd; + goto out_free_mr; } if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP)) goto out_free_cq; - priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE); - if (IS_ERR(priv->mr)) { - printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name); - goto out_free_cq; - } - init_attr.send_cq = priv->cq; init_attr.recv_cq = priv->cq, priv->qp = ib_create_qp(priv->pd, &init_attr); if (IS_ERR(priv->qp)) { printk(KERN_WARNING "%s: failed to create QP\n", ca->name); - goto out_free_mr; + goto out_free_cq; } priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff; @@ -212,12 +218,12 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) return 0; -out_free_mr: - ib_dereg_mr(priv->mr); - out_free_cq: ib_destroy_cq(priv->cq); +out_free_mr: + ib_dereg_mr(priv->mr); + out_free_pd: ib_dealloc_pd(priv->pd); return -ENODEV; @@ -235,12 +241,14 @@ void ipoib_transport_dev_cleanup(struct net_device *dev) clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); } - if (ib_dereg_mr(priv->mr)) - ipoib_warn(priv, "ib_dereg_mr failed\n"); - if (ib_destroy_cq(priv->cq)) ipoib_warn(priv, "ib_cq_destroy failed\n"); + ipoib_cm_dev_cleanup(dev); + + if (ib_dereg_mr(priv->mr)) + ipoib_warn(priv, "ib_dereg_mr failed\n"); + if (ib_dealloc_pd(priv->pd)) ipoib_warn(priv, "ib_dealloc_pd failed\n"); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index f887780..d9fd82d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -115,6 +115,8 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) ipoib_create_debug_files(priv->dev); + if (ipoib_cm_add_mode_attr(priv->dev)) + goto sysfs_failed; if (ipoib_add_pkey_attr(priv->dev)) goto sysfs_failed; -- MST From rowland at cse.ohio-state.edu Mon Feb 5 12:12:59 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Mon, 05 Feb 2007 15:12:59 -0500 Subject: [openib-general] MVAPICH2 SRPM and install file patches In-Reply-To: <1170669047.6049.4.camel@vladsk-laptop> References: <45C14344.9010602@cse.ohio-state.edu> <1170669047.6049.4.camel@vladsk-laptop> Message-ID: <45C78FCB.8010807@cse.ohio-state.edu> Vladimir Sokolovsky wrote: > On Wed, 2007-01-31 at 20:32 -0500, Shaun Rowland wrote: >> I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2, >> and it is linked to here: >> >> http://www.openfabrics.org/~rowland/ofed_1_2/ >> > > Hi Shaun, > Please change mvapich2.spec to avoid using of %build macro. > It removes RPM_BUILD_ROOT on SuSE distros: > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.9418 > + umask 022 > + cd /var/tmp/OFEDRPM/BUILD > + /bin/rm -rf /var/tmp/OFED > ++ dirname /var/tmp/OFED > + /bin/mkdir -p /var/tmp > + /bin/mkdir /var/tmp/OFED > + cd mvapich2-0.9.8 > + export OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed > + OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed > Thank you for pointing out this issue on SuSE. I've made the change and placed a new SRPM in my directory (mvapich2-0.9.8-2.src.rpm) and updated my latest.txt file. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From rowland at cse.ohio-state.edu Mon Feb 5 12:30:46 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Mon, 05 Feb 2007 15:30:46 -0500 Subject: [openib-general] MVAPICH2 rpmbuild issue In-Reply-To: <1170675863.6049.11.camel@vladsk-laptop> References: <45C14344.9010602@cse.ohio-state.edu> <1170675863.6049.11.camel@vladsk-laptop> Message-ID: <45C793F6.8090003@cse.ohio-state.edu> Vladimir Sokolovsky wrote: > Hi Shaun, > Please check the following issue: Hi Vladimir. I can tell from the output what seems to have happened, but I don't know why it happened. When I tested using the install/build scripts you had given us originally, I tested against OFED 1.1 files to understand how the build procedure worked. From that testing, the first thing I found that I had to deal with was the fact that the openib packages were built in /var/tmp/OFED and left there for other packages to be built against. Since the openib files are in a location other than their final destination, I created the %ofed_build_root macro to define this location and in addition, set a %ofed_bootstarp condition in the RPM. From the rpmbuild command below, this appears to be called how I expect. However: > Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.84872 > + umask 022 > + cd /var/tmp/OFEDRPM/BUILD > + cd mvapich2-0.9.8 > + export OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed > + OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed > + '[' -d /var/tmp/OFED/usr/local/ofed/lib ']' > + '[' -d /var/tmp/OFED/usr/local/ofed/lib64 ']' In the two lines above, I am setting LD_LIBRARY_PATH so that MVAPICH2 can be built. I do this because, again, the files are not in their final destination directory, but in /var/tmp/OFED/$STACK_PREFIX/lib[64]. Above, I am testing for either the lib or lib64 directory in that path, but neither is being found because there is no associated export of LD_LIBRARY_PATH above. This is also why: > + export PREFIX=/var/tmp/OFED/usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1 > + PREFIX=/var/tmp/OFED/usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1 > + export CC=gcc CXX=g++ F77=gfortran > + CC=gcc > + CXX=g++ > + F77=gfortran > + export ROMIO=yes > + ROMIO=yes > + export SHARED_LIBS=yes > + SHARED_LIBS=yes > + ./make.mvapich2.gen2 > Could not find the OPEN_IB_HOME/lib64 or OPEN_IB_HOME/lib directory. > Exiting. > error: Bad exit status from /var/tmp/rpm-tmp.84872 (%install) our make.mvapich2.gen2 script fails. It actually exists if either of these directories cannot be found. It is basically the same check, except in make.mvapich2.gen2 LD_LIBRARY_PATH is not set, and it also exists if the directories are not found. It would be possible to do the LD_LIBRARY_PATH setting in make.mvapich2.gen2 as well, but usually it isn't necessary - so I had added the code to the spec file myself. So my question in this case, given the error output, is what happened to /var/tmp/OFED/usr/local/ofed/lib or /var/tmp/OFED/usr/local/ofed/lib64? The rpmbuild is not finding those directories, but the files should still be there for MVAPICH2 to be built against, yes? Unless the build process has changed, it seems these directories do not exist when I was expecting them to exist. They should be there at that location, right? I've not tried the new install/build scripts since you've updated them. I think I need to make an openib SRPM for this or ? I am currently investigating this and will attempt to use the new scripts on my own testing system. I will also check if there are any files I can use instead of making an SRPM if that's even necessary (it seems that it was, so I had not done it yet). -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From swise at opengridcomputing.com Mon Feb 5 13:43:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Feb 2007 15:43:43 -0600 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> Message-ID: <1170711823.16661.78.camel@stevo-desktop> On Mon, 2007-02-05 at 06:20 -0800, Roland Dreier wrote: > > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: > > It is hard to see changes that are specific to OFED since we have whole > > kernel history mixed in. > > I'm not sure how you have your branches set up, but if you have > something like a "linus" branch that tracks the upstream kernel, it's > easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband" > and see the differences that way. > > Using git that way (which is what it's designed for, after all) seems > better than some scripts to munge together two trees. > So git "log linus.." would show commits in the current branch that are not in the linus branch, correct? That would work. Two branches: one with the main kernel git tree, and based on that + the ofed-specific changes. From sweitzen at cisco.com Mon Feb 5 13:44:52 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 5 Feb 2007 13:44:52 -0800 Subject: [openib-general] MVAPICH2 SRPM and install file patches In-Reply-To: <45C14344.9010602@cse.ohio-state.edu> References: <45C14344.9010602@cse.ohio-state.edu> Message-ID: Shaun, Thanks for doing this. I see things like romio and shlibs configurable in the patch, what about other MVAPICH2 features like fault tolerance, multi rail, threads, and MPD? How can configure them when I use install.sh to compile and install OFED? I also didn't quite understand the ib-vs-iwarp configuration, I thought OFED 1.2 would support both. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Shaun Rowland > Sent: Wednesday, January 31, 2007 5:33 PM > To: vlad at dev.mellanox.co.il > Cc: openfabrics-ewg at openib.org; openib-general at openib.org > Subject: [openib-general] MVAPICH2 SRPM and install file patches > > I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2, > and it is linked to here: > > http://www.openfabrics.org/~rowland/ofed_1_2/ > > Additionally, I am including a patch in this email that updates the > ofed_1_2_scripts files from the GIT repository we were given to > handle the MVAPICH2 SRPM file. Basically, installing MVAPICH2 > is similar > to the other MPI packages, except that I have added a choice option to > build with iWARP support or not. The default is IB only. If > the user has > selected the librdmacm packages and the mvapich2 package, > this choice is > presented. This is also saved in the ofed.conf file using an > MVAPICH2_IMPL variable, and the librdmacm packages are added as > dependencies if the iWARP version of MVAPICH2 is desired and they are > not already in the ofed.conf file, which seems like standard > behavior in > the scripts. The resulting binary RPM uses the name convention > mvapich2_ as normal in either case. There are various ways > this could be implemented, perhaps in a better manner. This is what I > was able to come up with by today. Since the installation > scripts given > were very similar to the original OFED 1.1 scripts, I was able to test > the installation procedure using OFED 1.1 files. Everything worked for > me, including building the mpitests package against the mvapich2 > package. There are some comments about this in what I have > done. I hope > that it is helpful in getting our SRPM integrated into the > installation > scripts. > > Additionally, I put a README file in my ofed_1_2 directory > that contains > information about the macros that can be used with our SRPM file. The > SRPM can be used to install against an existing OFED installation, and > those macros control various aspects of the result. There is > one special > macro I use for when the SRPM is being built along with the > OFED source, > and its use should be clear in the patched build.sh script and > associated comment. > -- > Shaun Rowland rowland at cse.ohio-state.edu > http://www.cse.ohio-state.edu/~rowland/ > From rdreier at cisco.com Mon Feb 5 14:00:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 05 Feb 2007 14:00:51 -0800 Subject: [openib-general] Fwd: bug in mthca_qp.c (GEN 2) In-Reply-To: <20070205185709.GB16598@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 5 Feb 2007 20:57:09 +0200") References: <20070205185709.GB16598@mellanox.co.il> Message-ID: > Roland, what do you think? > Looks pretty severe actually. > static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr, > struct mthca_qp_path *path) > { > memset(ib_ah_attr, 0, sizeof *path); It's definitely a bug but I don't think it's very severe -- the only calls to to_ib_ah_attr are in mthca_query_qp, where the function is used to fill in fields embedded in a struct ib_qp_attr, and even though the memset overruns the ib_ah_attr slightly, it only zeros out fields that are set later in the function anyway. So with current code at least the bug is harmless. anyway, I queued the patch below for 2.6.21: IB/mthca: Use correct structure size in call to memset() When clearing the ib_ah_attr parameter in to_ib_ah_attr(), use sizeof *ib_ah_attr instead of sizeof *path. Pointed out by Jack Morgenstein . Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_qp.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 5f5214c..224c93d 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -399,7 +399,7 @@ static int to_ib_qp_access_flags(int mthca_flags) static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr, struct mthca_qp_path *path) { - memset(ib_ah_attr, 0, sizeof *path); + memset(ib_ah_attr, 0, sizeof *ib_ah_attr); ib_ah_attr->port_num = (be32_to_cpu(path->port_pkey) >> 24) & 0x3; if (ib_ah_attr->port_num == 0 || ib_ah_attr->port_num > dev->limits.num_ports) -- 1.4.4.1 From rdreier at cisco.com Mon Feb 5 14:09:12 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 05 Feb 2007 14:09:12 -0800 Subject: [openib-general] Immediate data question In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Mon, 5 Feb 2007 15:48:29 -0000") References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> Message-ID: > If I only want to send/recv 4 bytes with immediate data: I assume you mean that you only want to send the 4 bytes of immediate data, and nothing else. > On sender side: > opcode = IBV_WR_SEND_WITH_IMM; > imm_data = my_4_bytes_data; > > Do I still need to specify sg_list and num_sge ? Well, you should be able to specify num_sge = 0. But to be honest I'm not positive that 0-length sends are allowed; I know that 0-length RDMA WRITE operations are allowed. > On receiver side, because the immediate data is inside the completion > structure, do I need to post a receive for above message ? Yes, otherwise how would you get the immediate data? > If I need to post a receive, do I need to specify sg_list and num_sge > for the receive ? I believe that a 0-length receive with num_sge = 0 should be fine, at least to handle an RDMA write with immediate data. But again I'm not positive. - R. From vlad at mellanox.co.il Mon Feb 5 14:25:51 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 6 Feb 2007 00:25:51 +0200 Subject: [openib-general] OFED-1.2 first release Message-ID: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Hi, OFED-1.2-20070205-1823.tgz can be downloaded from http://www.openfabrics.org/builds/ofed-1.2/ The first OFED package includes: ofa_kernel-1.2-alpha1.src.rpm ofa_user-1.2-alpha1.src.rpm mvapich-0.9.9-971.src.rpm mvapich2-0.9.8-1.src.rpm openmpi-1.2b4ofedr13470-1ofed.src.rpm mpitests-2.0-698.src.rpm open-iscsi-generic-2.0-742.src.rpm ib-bonding-0.9.0-1.src.rpm ofed-docs-1.2-0.src.rpm ofed-scripts-1.2-0.src.rpm Known issues: srptools - compilation fails openib_diags - compilation fails ibutils - not included yet To build OFED RPMs: cd OFED-1.2-20070205-1823 ./build.sh Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/ directory. To install OFED RPMs: cd OFED-1.2-20070205-1823 ./install.sh For a detailed installation guide, see OFED-1.2-xxx/docs/OFED_Installation_Guide.txt -- Vladimir Sokolovsky Mellanox Technologies Ltd. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dledford at redhat.com Mon Feb 5 14:26:10 2007 From: dledford at redhat.com (Doug Ledford) Date: Mon, 05 Feb 2007 17:26:10 -0500 Subject: [openib-general] Web site needs update Message-ID: <1170714371.2716.275.camel@fc6.xsintricity.com> The web site lists the svn repo, which is mostly empty now, and the README says the web site lists the various git repos for accessing the source code, but there are no git repos listed on the web site. Could we please have the authoritative git repos for the different components being worked on listed on the web site for easy reference? -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rowland at cse.ohio-state.edu Mon Feb 5 14:47:11 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Mon, 05 Feb 2007 17:47:11 -0500 Subject: [openib-general] MVAPICH2 SRPM and install file patches In-Reply-To: References: <45C14344.9010602@cse.ohio-state.edu> Message-ID: <45C7B3EF.2030903@cse.ohio-state.edu> Scott Weitzenkamp (sweitzen) wrote: > Shaun, > > Thanks for doing this. > > I see things like romio and shlibs configurable in the patch, what about > other MVAPICH2 features like fault tolerance, multi rail, threads, and > MPD? How can configure them when I use install.sh to compile and > install OFED? Hi Scott. I had thought about this a little when I was testing with the install/build scripts Vlad gave us. I would appreciate his input if I get anything wrong here as well. From the perspective of the user running the install.sh script, the MPI packages are essentially built one way. You do get to pick the compiler(s) to use, but as for other options - you would have to edit the build.sh function associated with the desired package. I created a hack for the iwarp vs ib configuration for MVAPICH2 because I needed to distinguish between the two (for reasons I will outline at the end of this message). Theoretically, you should be able to export the proper variables from our make.mvapich2.* scripts before running the install.sh script, and the features would be enabled. For instance, you could do: export MULTI_THREAD=yes ./install.sh This is not a good solution for installing OFED, but should work due to not conflicting with anything else - at least that I am aware. I see that I need to update the make.mvapich2.iwarp script to have the multithreading option anyway as well, so it would not quite work 100% right now. As far as each feature you asked about: * fault tolerance - this is controlled during the build process with $ENABLE_CKPT and requires $BLCR_HOME pointing to a BLCR installation. This only works for single threaded builds without rdmacm support (the ib case only, essentially). * multi rail - this is controlled by runtime environment variables after installation. * threads - This is controlled by $MULTI_THREAD during the build process. As noted above, there's a restriction with fault tolerance. * MPD - MPD is used by MVAPICH2 as it is based on MPICH2. There are actually a number of options that could be chosen. I believe from our side, it will be good for me to go ahead and put these in our SRPM now. Our SRPM can be used outside of the OFED installation system of course, and these should really be there. There are even other "devices", like uDAPL. I did the SRPM in the install/build script patches the way I did because that seemed like a good set of options for how the OFED installation system works. There's no framework or examples of asking about features to build in an MPI package. I just quickly tacked on the iwarp question and made up a new configuration variable for the ofed.conf file, but it's not necessarily a good way to do it. One possibility would be to create a shell function that sets various build options for MPI packages. Variables could be set in this function using some name convention, in our case perhaps MVAPICH2_OPT_. In such a function (probably one for each package, that seems to be the convention), it would be easier to code all the exceptions for features - if there are any. There are some in our case, as I've mentioned. This configuration function could be called when the user is choosing to install MVAPICH2. This leads to a number of problems. Can the user select different options for each of the compiler versions of the MPI package? I think clearly the answer should be "no". Even as implemented now, you cannot install the iwarp and ib version of MVAPICH2 at the same time during the install process. You must choose one or the other. Being able to do either would require one of two changes: 1. Having another level of installer system configuration where I could selected the devices desired, and options for each device (by device here, I mean uDAPL, IB, iWARP). - or - 2. Make multiple RPM packages to fit into how the installer currently interacts with SRPMs, prompts, etc. I've only had a limitted time to investigate this, so what I have done so far mostly fits with how the OFED install system does things with the other packages - except for my iwarp vs ib question prompt. I think there's potential for a lot of compilication here. A configuration function for each package would be one possible way to contain that, however I'd have to go back and check out how things work again to see how something like that would fit in. So, I will add these new feature options to our SRPM because they could be used outside of the OFED installation system anyway, and we would like that to be possible and give the ability to set these options. However, I cannot say what would be best for the OFED installation system. It might be better to just go with what we have now - more "mainstream" builds, and let the user do their own build if they want to highly customize or something. Otherwise, I've given one possible idea from the perspective of someone who is new to the install system. Vlad, do you have any opinion here? Do you see where I am coming from as far as what kind of situation we are talking about with presenting options for MPI package builds? > I also didn't quite understand the ib-vs-iwarp configuration, I thought > OFED 1.2 would support both. There are 2 reasons our SRPM has to be told whether it is being built for iWARP or IB: 1. We need to use -DRDMA_CM_RNIC during the build for iWARP (this is actually done by invoking our make.mvapich2.iwarp script in the RPM build). 2. If the %auto_req macro is set to 0, then simple RPM names for the install requirements are used: Autoreq: 0 Requires: libibumad libibverbs [default] Requires: libibumad libibverbs librdmacm [iWARP] This is actually not done, but it is there as a possibility (Autoreq is used right now I mean): Vlad, I was thinking that you might want to change our function in build.sh to set auto_req to 0 instead of 1. I see that is how MVAPICH is doing requires, instead of letting Autoreq do it. I think it will work either way probably, but using --define 'auto_req 0' will probably cut down on some potential issues. I had set it to 1 because I saw in OFED 1.1 it seemed that this was how things worked. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From changquing.tang at hp.com Mon Feb 5 14:54:33 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 5 Feb 2007 22:54:33 -0000 Subject: [openib-general] Immediate data question In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> Thank you. Other than using immediate data to send notification from one end to the other of a QP, is there any other way to do this ? For example, can I modify QP state from RTS to other state on one end, and then the other end gets some notification when I query the QP ? --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Monday, February 05, 2007 4:09 PM > To: Tang, Changqing > Cc: Michael S. Tsirkin; openib-general at openib.org > Subject: Re: Immediate data question > > > If I only want to send/recv 4 bytes with immediate data: > > I assume you mean that you only want to send the 4 bytes of > immediate data, and nothing else. > > > On sender side: > > opcode = IBV_WR_SEND_WITH_IMM; > > imm_data = my_4_bytes_data; > > > > Do I still need to specify sg_list and num_sge ? > > Well, you should be able to specify num_sge = 0. But to be > honest I'm not positive that 0-length sends are allowed; I > know that 0-length RDMA WRITE operations are allowed. > > > On receiver side, because the immediate data is inside the > completion > structure, do I need to post a receive for > above message ? > > Yes, otherwise how would you get the immediate data? > > > If I need to post a receive, do I need to specify sg_list > and num_sge > for the receive ? > > I believe that a 0-length receive with num_sge = 0 should be > fine, at least to handle an RDMA write with immediate data. > But again I'm not positive. > > - R. > From rdreier at cisco.com Mon Feb 5 15:02:32 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 05 Feb 2007 15:02:32 -0800 Subject: [openib-general] Immediate data question In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Mon, 5 Feb 2007 22:54:33 -0000") References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> Message-ID: Changqing> Thank you. Other than using immediate data to send Changqing> notification from one end to the other of a QP, is Changqing> there any other way to do this ? For example, can I Changqing> modify QP state from RTS to other state on one end, and Changqing> then the other end gets some notification when I query Changqing> the QP ? Not that I know of. You would need to do something that triggers something to be sent on the wire, and I don't know of any way to do that other than posting a work request. - R. From swise at opengridcomputing.com Mon Feb 5 15:09:29 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Feb 2007 17:09:29 -0600 Subject: [openib-general] MVAPICH2 SRPM and install file patches In-Reply-To: <45C7B3EF.2030903@cse.ohio-state.edu> References: <45C14344.9010602@cse.ohio-state.edu> <45C7B3EF.2030903@cse.ohio-state.edu> Message-ID: <1170716969.16661.97.camel@stevo-desktop> > > I also didn't quite understand the ib-vs-iwarp configuration, I thought > > OFED 1.2 would support both. > > There are 2 reasons our SRPM has to be told whether it is being built > for iWARP or IB: > > 1. We need to use -DRDMA_CM_RNIC during the build for iWARP (this is > actually done by invoking our make.mvapich2.iwarp script in the RPM build). I believe the iWARP build will work over IB too. The difference, I think, is that the iWARP build uses the RDMA-CM and the IB build uses the IB-CM. Shaun, is this correct? If so, I suggest you define these options differently. Perhaps IBCM vs RDMACM? Right now it implies that you cannot run the same mvapich build over both transports. My 2 cents. Steve. From swise at opengridcomputing.com Mon Feb 5 16:19:23 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Feb 2007 18:19:23 -0600 Subject: [openib-general] OFED-1.2 first release In-Reply-To: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: <1170721163.16661.111.camel@stevo-desktop> BTW: The README.txt still talks about OFED-1.1 and the October 2006 release. Steve. On Tue, 2007-02-06 at 00:25 +0200, Vladimir Sokolovsky wrote: > Hi, > > OFED-1.2-20070205-1823.tgz can be downloaded from > > http://www.openfabrics.org/builds/ofed-1.2/ > > From akepner at sgi.com Mon Feb 5 16:33:02 2007 From: akepner at sgi.com (akepner at sgi.com) Date: Mon, 5 Feb 2007 16:33:02 -0800 (PST) Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <20070205184246.GC15775@mellanox.co.il> References: <20070205184246.GC15775@mellanox.co.il> Message-ID: On Mon, 5 Feb 2007, Michael S. Tsirkin wrote: > .... > Could you address Roland's proposal as well? > Regarding the use of git to track the differences in OFED/kernel.org trees? I had to go (re)learn some git stuff, but now I think that this will work fine. -- Arthur From swise at opengridcomputing.com Mon Feb 5 17:07:03 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Feb 2007 19:07:03 -0600 Subject: [openib-general] OFED-1.2 first release In-Reply-To: <1170721163.16661.111.camel@stevo-desktop> References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> <1170721163.16661.111.camel@stevo-desktop> Message-ID: <1170724023.19728.5.camel@stevo-desktop> I think there might be some dependency problem. I selected libibverbs, libcxgb3, librdmacm, perftest, mvapich2/IWARP and mpitests. For some reason it pulled in libibumad as a prereq, but not libibcommon... Also, I think mvapich2/IWARP links with libibumad or libibcommon and it doesn't need to when using librdmacm. [root at r2-iw redhat-release-4AS-5.5]# rpm -U * error: Failed dependencies: libibcommon.so.1()(64bit) is needed by libibumad-1.0.2-0.x86_64 libibcommon.so.1(IBCOMMON_1.0)(64bit) is needed by libibumad-1.0.2-0.x86_64 Suggested resolutions: libibcommon-1.0-1.x86_64.rpm > On Tue, 2007-02-06 at 00:25 +0200, Vladimir Sokolovsky wrote: > > Hi, > > > > OFED-1.2-20070205-1823.tgz can be downloaded from > > > > http://www.openfabrics.org/builds/ofed-1.2/ > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Mon Feb 5 21:13:56 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 07:13:56 +0200 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <1170711823.16661.78.camel@stevo-desktop> References: <1170711823.16661.78.camel@stevo-desktop> Message-ID: <20070206051356.GF16598@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure > > On Mon, 2007-02-05 at 06:20 -0800, Roland Dreier wrote: > > > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: > > > It is hard to see changes that are specific to OFED since we have whole > > > kernel history mixed in. > > > > I'm not sure how you have your branches set up, but if you have > > something like a "linus" branch that tracks the upstream kernel, it's > > easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband" > > and see the differences that way. > > > > Using git that way (which is what it's designed for, after all) seems > > better than some scripts to munge together two trees. > > > > So git "log linus.." would show commits in the current branch that are > not in the linus branch, correct? > > That would work. Two branches: one with the main kernel git tree, and > based on that + the ofed-specific changes. Well, that's what we have now. The master branch tracks upstream kernel. -- MST From mst at mellanox.co.il Mon Feb 5 21:16:33 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 07:16:33 +0200 Subject: [openib-general] Web site needs update In-Reply-To: <1170714371.2716.275.camel@fc6.xsintricity.com> References: <1170714371.2716.275.camel@fc6.xsintricity.com> Message-ID: <20070206051633.GG16598@mellanox.co.il> > Quoting Doug Ledford : > Subject: Web site needs update > > The web site lists the svn repo, which is mostly empty now, and the > README says the web site lists the various git repos for accessing the > source code, but there are no git repos listed on the web site. Could > we please have the authoritative git repos for the different components > being worked on listed on the web site for easy reference? I think the thing to do now is to finally move openfabrics.org and openib.org to point to the new server. Then we'll be able to fix this. -- MST From sweitzen at cisco.com Mon Feb 5 21:26:49 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 5 Feb 2007 21:26:49 -0800 Subject: [openib-general] OFED-1.2 first release In-Reply-To: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: Vlad and Tziporet, It might help if you elaborated on what you meant by "first release", you have been saying "code freeze" but really this is "feature freeze", right? This announcement is quite a bit different from previous OFED announcements, where you detailed what features were available and what OS were supported. The daily build email mentions compiling against kernels, but I haven't seen what distros were actually tested. Are we starting from scratch on compiling and testing with distros like RHEL4? Do you anticipate we will just go day by day with builds trying to stabilize things initially? In any case, here's what I see when I try to compile with install.sh on RHEL4 U3 x86_64: ... /tmp/OFED-1.2-20070205-1823/build.sh: line 802: kernel-ib: command not found Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1. 2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/ OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts -1.2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" See log file: /tmp/OFED.10899.log # tail -10 /tmp/OFED.10899.log Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/ib-bonding-0. 9.0-root Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + rm -rf ib-bonding-0.9.0 + exit 0 /bin/mv: cannot stat `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm ': No such file or directory ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir Sokolovsky Sent: Monday, February 05, 2007 2:26 PM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openib-general] OFED-1.2 first release Hi, OFED-1.2-20070205-1823.tgz can be downloaded from http://www.openfabrics.org/builds/ofed-1.2/ The first OFED package includes: ofa_kernel-1.2-alpha1.src.rpm ofa_user-1.2-alpha1.src.rpm mvapich-0.9.9-971.src.rpm mvapich2-0.9.8-1.src.rpm openmpi-1.2b4ofedr13470-1ofed.src.rpm mpitests-2.0-698.src.rpm open-iscsi-generic-2.0-742.src.rpm ib-bonding-0.9.0-1.src.rpm ofed-docs-1.2-0.src.rpm ofed-scripts-1.2-0.src.rpm Known issues: srptools - compilation fails openib_diags - compilation fails ibutils - not included yet To build OFED RPMs: cd OFED-1.2-20070205-1823 ./build.sh Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/ directory. To install OFED RPMs: cd OFED-1.2-20070205-1823 ./install.sh For a detailed installation guide, see OFED-1.2-xxx/docs/OFED_Installation_Guide.txt -- Vladimir Sokolovsky Mellanox Technologies Ltd. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Mon Feb 5 21:43:41 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 5 Feb 2007 21:43:41 -0800 Subject: [openib-general] OFED-1.2 first release In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: Moving on, I set ib_bonding=n in ofed.conf and try install.sh again, and now get this: ... Building MVAPICH RPM. Please wait... Using gcc compiler Running rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_nam e mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --define 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/loc al/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9 71.src.rpm ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRP M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --defi ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM S/mvapich-0.9.9-971.src.rpm" See log file: /tmp/OFED.6120.log # tail /tmp/OFED.6120.log + LANG=C + export LANG + unset DISPLAY /var/tmp/rpm-tmp.870: line 33: syntax error near unexpected token `)' error: Bad exit status from /var/tmp/rpm-tmp.870 (%install) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.870 (%install) ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRP M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --defi ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM S/mvapich-0.9.9-971.src.rpm" Scott ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Monday, February 05, 2007 9:27 PM To: Vladimir Sokolovsky; openfabrics-ewg at openib.org; Tziporet Koren; Scott Weitzenkamp (sweitzen) Cc: openib-general at openib.org Subject: RE: [openib-general] OFED-1.2 first release Vlad and Tziporet, It might help if you elaborated on what you meant by "first release", you have been saying "code freeze" but really this is "feature freeze", right? This announcement is quite a bit different from previous OFED announcements, where you detailed what features were available and what OS were supported. The daily build email mentions compiling against kernels, but I haven't seen what distros were actually tested. Are we starting from scratch on compiling and testing with distros like RHEL4? Do you anticipate we will just go day by day with builds trying to stabilize things initially? In any case, here's what I see when I try to compile with install.sh on RHEL4 U3 x86_64: ... /tmp/OFED-1.2-20070205-1823/build.sh: line 802: kernel-ib: command not found Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1. 2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/ OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts -1.2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" See log file: /tmp/OFED.10899.log # tail -10 /tmp/OFED.10899.log Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/ib-bonding-0. 9.0-root Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + rm -rf ib-bonding-0.9.0 + exit 0 /bin/mv: cannot stat `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm ': No such file or directory ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir Sokolovsky Sent: Monday, February 05, 2007 2:26 PM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openib-general] OFED-1.2 first release Hi, OFED-1.2-20070205-1823.tgz can be downloaded from http://www.openfabrics.org/builds/ofed-1.2/ The first OFED package includes: ofa_kernel-1.2-alpha1.src.rpm ofa_user-1.2-alpha1.src.rpm mvapich-0.9.9-971.src.rpm mvapich2-0.9.8-1.src.rpm openmpi-1.2b4ofedr13470-1ofed.src.rpm mpitests-2.0-698.src.rpm open-iscsi-generic-2.0-742.src.rpm ib-bonding-0.9.0-1.src.rpm ofed-docs-1.2-0.src.rpm ofed-scripts-1.2-0.src.rpm Known issues: srptools - compilation fails openib_diags - compilation fails ibutils - not included yet To build OFED RPMs: cd OFED-1.2-20070205-1823 ./build.sh Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/ directory. To install OFED RPMs: cd OFED-1.2-20070205-1823 ./install.sh For a detailed installation guide, see OFED-1.2-xxx/docs/OFED_Installation_Guide.txt -- Vladimir Sokolovsky Mellanox Technologies Ltd. -------------- next part -------------- An HTML attachment was scrubbed... URL: From monil at voltaire.com Mon Feb 5 23:48:00 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 6 Feb 2007 09:48:00 +0200 Subject: [openib-general] OFED-1.2 first release In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: <6a122cc00702052348u5cf38560j689f6072992fd4ad@mail.gmail.com> Vlad, > # tail -10 /tmp/OFED.10899.log > Wrote: > /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm > Wrote: > /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm > Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615 > + umask 022 > + cd /var/tmp/OFEDRPM/BUILD > + rm -rf ib-bonding-0.9.0 > + exit 0 > /bin/mv: cannot stat > `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm I see that there is a small difference in the expected RPM name. Can you fix that in the script or should we change the name of the RPM ? -- Moni > ': No such file or directory > ERROR: Failed executing "/bin/mv -f > /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. > 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" From dotanb at dev.mellanox.co.il Mon Feb 5 23:56:04 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 06 Feb 2007 09:56:04 +0200 Subject: [openib-general] Immediate data question In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> Message-ID: <45C83494.6@dev.mellanox.co.il> Hi CQ. Tang, Changqing wrote: > Roland: > If I only want to send/recv 4 bytes with immediate data: > > On sender side: > opcode = IBV_WR_SEND_WITH_IMM; > imm_data = my_4_bytes_data; > > Do I still need to specify sg_list and num_sge ? > If the data that is being sent is only the immediate data, so no MR should be registered in this side. The SR will look like this: sr.opcode = IBV_WR_SEND_WITH_IMM; sr.imm_data = my_4_bytes_data; sr.num_sge = 0; > On receiver side, because the immediate data is inside the completion > structure, do I need to post a receive for above message ? > If I need to post a receive, do I need to specify sg_list and num_sge > for the receive ? > In the receiver side you must post RR (because SEND opcode consumes a RR). If you are using UD QP, you must add s/g list with 40 bytes (of registered memory). If you are not using UD QP, the s/g list in this side can be empty (num_sge = 0) and the data that was sent will be provided to you in wc.imm_data. > I looked the spec but did not find useful information. > > The reason I ask is that at some point, I can not(or hard) to provide > registered memory only for 4 bytes data. > I think that you can avoid registering those 4 bytes ... Hope this helped you Dotan From sweitzen at cisco.com Tue Feb 6 00:06:45 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 6 Feb 2007 00:06:45 -0800 Subject: [openib-general] OFED-1.2 first release In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: Not getting MPI RPMS for Intel compilers, either. Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp itests_mvapich2_gcc-2.0-698.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mvapich2_intel-0 .9.8-1.x 86_64.rpm not found Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/op enmpi_gcc-1.2b4ofedr13470-1ofed.x86_64.rpm Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp itests_openmpi_gcc-2.0-698.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/openmpi_intel-1. 2b4ofedr 13470-1ofed.x86_64.rpm not found ERROR: -.x86_64.rpm not found under /tmp/OFED-1.2-20070205-1823/RPMS/redhat-rele ase-4AS-4.1. Installation finished successfully... Scott ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Monday, February 05, 2007 9:44 PM To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet Koren' Cc: 'openib-general at openib.org' Subject: RE: [openib-general] OFED-1.2 first release Moving on, I set ib_bonding=n in ofed.conf and try install.sh again, and now get this: ... Building MVAPICH RPM. Please wait... Using gcc compiler Running rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_nam e mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --define 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/loc al/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9 71.src.rpm ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRP M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --defi ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM S/mvapich-0.9.9-971.src.rpm" See log file: /tmp/OFED.6120.log # tail /tmp/OFED.6120.log + LANG=C + export LANG + unset DISPLAY /var/tmp/rpm-tmp.870: line 33: syntax error near unexpected token `)' error: Bad exit status from /var/tmp/rpm-tmp.870 (%install) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.870 (%install) ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRP M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --defi ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM S/mvapich-0.9.9-971.src.rpm" Scott ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Monday, February 05, 2007 9:27 PM To: Vladimir Sokolovsky; openfabrics-ewg at openib.org; Tziporet Koren; Scott Weitzenkamp (sweitzen) Cc: openib-general at openib.org Subject: RE: [openib-general] OFED-1.2 first release Vlad and Tziporet, It might help if you elaborated on what you meant by "first release", you have been saying "code freeze" but really this is "feature freeze", right? This announcement is quite a bit different from previous OFED announcements, where you detailed what features were available and what OS were supported. The daily build email mentions compiling against kernels, but I haven't seen what distros were actually tested. Are we starting from scratch on compiling and testing with distros like RHEL4? Do you anticipate we will just go day by day with builds trying to stabilize things initially? In any case, here's what I see when I try to compile with install.sh on RHEL4 U3 x86_64: ... /tmp/OFED-1.2-20070205-1823/build.sh: line 802: kernel-ib: command not found Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1. 2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/ OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts -1.2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" See log file: /tmp/OFED.10899.log # tail -10 /tmp/OFED.10899.log Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/ib-bonding-0. 9.0-root Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + rm -rf ib-bonding-0.9.0 + exit 0 /bin/mv: cannot stat `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm ': No such file or directory ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir Sokolovsky Sent: Monday, February 05, 2007 2:26 PM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openib-general] OFED-1.2 first release Hi, OFED-1.2-20070205-1823.tgz can be downloaded from http://www.openfabrics.org/builds/ofed-1.2/ The first OFED package includes: ofa_kernel-1.2-alpha1.src.rpm ofa_user-1.2-alpha1.src.rpm mvapich-0.9.9-971.src.rpm mvapich2-0.9.8-1.src.rpm openmpi-1.2b4ofedr13470-1ofed.src.rpm mpitests-2.0-698.src.rpm open-iscsi-generic-2.0-742.src.rpm ib-bonding-0.9.0-1.src.rpm ofed-docs-1.2-0.src.rpm ofed-scripts-1.2-0.src.rpm Known issues: srptools - compilation fails openib_diags - compilation fails ibutils - not included yet To build OFED RPMs: cd OFED-1.2-20070205-1823 ./build.sh Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/ directory. To install OFED RPMs: cd OFED-1.2-20070205-1823 ./install.sh For a detailed installation guide, see OFED-1.2-xxx/docs/OFED_Installation_Guide.txt -- Vladimir Sokolovsky Mellanox Technologies Ltd. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at mellanox.co.il Tue Feb 6 00:19:45 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 06 Feb 2007 10:19:45 +0200 Subject: [openib-general] openib diags installation issue In-Reply-To: <1170599665.5887.14.camel@vladsk-laptop> References: <1170599665.5887.14.camel@vladsk-laptop> Message-ID: <1170749985.6537.2.camel@vladsk-laptop> Hi Hal, Please merge the following commit to the ofed_1_2 branch of the management.git: commit 6c819523a6a58e2ac4948327f256e49984dce9fb Diags/Makefile.am: Fix for executing 'make DESTDIR=/var/tmp/OFED install' Thanks, -- Vladimir Sokolovsky Mellanox Technologies Ltd. From vlad at lists.openfabrics.org Tue Feb 6 02:22:40 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Tue, 6 Feb 2007 02:22:40 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070206-0200 daily build status Message-ID: <20070206102240.A9687E60807@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From tziporet at mellanox.co.il Tue Feb 6 02:35:18 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 06 Feb 2007 12:35:18 +0200 Subject: [openib-general] [openfabrics-ewg] OFED-1.2 first package (was release) In-Reply-To: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: <45C859E6.7020507@mellanox.co.il> Vladimir Sokolovsky wrote: > Hi, > > OFED-1.2-20070205-1823.tgz can be downloaded from > > http://www.openfabrics.org/builds/ofed-1.2/ Just a clarification: This is the first OFED package and its not the alpha release yet. We published it so everybody can fix issues we already found and basic installation testing. Daily builds will be available from tomorrow. Plan is to have first alpha release on Monday. A detailed release mail will be sent with the release. All - please work closely with Vlad to resolve all issues so we can make this Alpha. Thanks, Tziporet From ogerlitz at voltaire.com Tue Feb 6 02:40:57 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 06 Feb 2007 12:40:57 +0200 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <45C37BE9.5040105@ichips.intel.com> References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> Message-ID: <45C85B39.4080700@voltaire.com> Sean Hefty wrote: >> Sean Hefty (3): >> rdma_cm: Increment port number after close to avoid re-use. >> ib_sa: track multicast join/leave requests >> rdma_cm: add multicast communication support > > Assuming that you haven't look at this yet, I updated the ib_sa patch > above to shorten the workqueue name, plus added a fourth patch to > shorten the workqueue names for ib_addr and rdma_cm. E.g. "ib_mcast_wq" > became "ib_mcast". > Let me know if you need any assistance. Roland, Can you comment on the multicast changes merge for 2.6.21 status? We are working (developing and testing) with a userspace rdma cm based multicast app over this code during the last two months and are very satisfied with it. The testing included IPoIB, the user space app and multicast interoperability between them. Or. From halr at voltaire.com Tue Feb 6 04:18:26 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Feb 2007 07:18:26 -0500 Subject: [openib-general] openib diags installation issue In-Reply-To: <1170749985.6537.2.camel@vladsk-laptop> References: <1170599665.5887.14.camel@vladsk-laptop> <1170749985.6537.2.camel@vladsk-laptop> Message-ID: <1170764304.4525.280004.camel@hal.voltaire.com> Hi Vlad, On Tue, 2007-02-06 at 03:19, Vladimir Sokolovsky wrote: > Hi Hal, > Please merge the following commit to the ofed_1_2 branch of the management.git: > > commit 6c819523a6a58e2ac4948327f256e49984dce9fb > Diags/Makefile.am: Fix for executing 'make DESTDIR=/var/tmp/OFED install' > > Thanks, Applied. Thanks. -- Hal From tziporet at mellanox.co.il Tue Feb 6 04:51:36 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 6 Feb 2007 14:51:36 +0200 Subject: [openib-general] OFED-1.2 first release Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0DD5C@mtlexch01.mtl.com> I know - I just took the docs from OFED 1.1 I will work on the docs after we will have a working package. Tziporet -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Steve Wise Sent: Tuesday, February 06, 2007 2:19 AM To: Vladimir Sokolovsky Cc: openib-general Subject: Re: [openib-general] OFED-1.2 first release BTW: The README.txt still talks about OFED-1.1 and the October 2006 release. From mst at mellanox.co.il Tue Feb 6 05:26:39 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 15:26:39 +0200 Subject: [openib-general] Backport and fix patches for ipath driver In-Reply-To: <45C7AB72.4040400@pathscale.com> References: <45C7AB72.4040400@pathscale.com> Message-ID: <20070206132639.GA6937@mellanox.co.il> > Quoting Bryan O'Sullivan : > Subject: Backport and fix patches for ipath driver > > Hi, Vlad and Tziporet - > > Here's a round of fix and backport patches for the ipath driver, for > dropping into the OFED 1.2 tree. The way in which they're organised > should, I hope, be clear. Looks good, fixes look much cleaner than what we had for OFED 1.1. I think fixes can be applied already. However, I'm not sure the backports are ready to be applied as is yet. Just taking a look at random: ./backport/2.6.18/ipath-50-mad-kmem_cache-2.6.19.patch BACKPORT - kmem_cache_t disappeared after 2.6.19 diff -r a290ff6e9ae7 drivers/infiniband/core/mad.c --- a/drivers/infiniband/core/mad.c Wed Jan 31 14:47:02 2007 -0800 +++ b/drivers/infiniband/core/mad.c Wed Jan 31 14:48:00 2007 -0800 @@ -46,7 +46,7 @@ MODULE_AUTHOR("Hal Rosenstock"); MODULE_AUTHOR("Hal Rosenstock"); MODULE_AUTHOR("Sean Hefty"); -static struct kmem_cache *ib_mad_cache; +static kmem_cache_t *ib_mad_cache; static struct list_head ib_mad_port_list; static u32 ib_mad_client_id = 0; This changes a core file, and does not seem to be related to ipath at all. What problem does this solve? I note that mad.c already seems to build fine on 2.6.18 for us - this is part of daily build. Another example that looks strange: BACKPORT - workqueues changed in 2.6.20 diff -r 8b94fcef1edd drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Thu Feb 01 08:54:29 2007 -0800 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Thu Feb 01 08:57:19 2007 -0800 @@ -241,7 +241,7 @@ static struct ipath_devdata *ipath_alloc dd->pcidev = pdev; pci_set_drvdata(pdev, dd); - INIT_DELAYED_WORK(&dd->link_work, check_link_status); + INIT_WORK(&dd->link_work, check_link_status); list_add(&dd->ipath_list, &ipath_dev_list); INIT_DELAYED_WORK is implemented in kernel_addons, so this should not be necessary. @@ -725,6 +725,7 @@ static void __devexit ipath_remove_one(s */ ipath_shutdown_device(dd); +#undef cancel_delayed_work cancel_delayed_work(&dd->link_work); flush_scheduled_work(); This undef looks quite ugly. What does it do? Please go over the backport patches and check whether they are really necessary. I think you will mostly discover that the kernel_addons mechanism makes the backport patches unnecessary. If not, you should try adding things under kernel_addons as first choice so that everyone benefits. -- MST From swise at opengridcomputing.com Tue Feb 6 05:53:03 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 07:53:03 -0600 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <20070206051356.GF16598@mellanox.co.il> References: <1170711823.16661.78.camel@stevo-desktop> <20070206051356.GF16598@mellanox.co.il> Message-ID: <1170769983.19662.0.camel@stevo-desktop> On Tue, 2007-02-06 at 07:13 +0200, Michael S. Tsirkin wrote: > > Quoting Steve Wise : > > Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure > > > > On Mon, 2007-02-05 at 06:20 -0800, Roland Dreier wrote: > > > > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: > > > > It is hard to see changes that are specific to OFED since we have whole > > > > kernel history mixed in. > > > > > > I'm not sure how you have your branches set up, but if you have > > > something like a "linus" branch that tracks the upstream kernel, it's > > > easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband" > > > and see the differences that way. > > > > > > Using git that way (which is what it's designed for, after all) seems > > > better than some scripts to munge together two trees. > > > > > > > So git "log linus.." would show commits in the current branch that are > > not in the linus branch, correct? > > > > That would work. Two branches: one with the main kernel git tree, and > > based on that + the ofed-specific changes. > > Well, that's what we have now. > The master branch tracks upstream kernel. > I didn't realize git "log master.." would show only the ofed-specific commits... From mst at mellanox.co.il Tue Feb 6 05:58:30 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 15:58:30 +0200 Subject: [openib-general] QoS in opensm will not be part of OFED 1.2 In-Reply-To: <20070205154922.GC4246@mellanox.co.il> References: <1170690105.4525.201879.camel@hal.voltaire.com> <20070205154922.GC4246@mellanox.co.il> Message-ID: <20070206135830.GA7750@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: QoS in opensm will not be part of OFED 1.2 > > > > > > I had an AI to check the QoS status with OSM. > > > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 > > > > > (I updated the plan on the Wiki) > > > > > > > > > > The reasons for this are: > > > > > 1. Code not ready at code freeze. > > > > > 2. There are technical discussion in the list regarding some > > > > > implementation details (e.g. XML or text syntax). > > > > > 3. SPEC is not published by IBTA yet. > > > > > > > > I think this last reason also applies to the end client QoS changes as > > > > well. > > > > > > Yes. But the other 2 don't. > > > > Right but I think that precludes it from being included in OFED right > > now. > > Since the code is already included in OFED, moving it out would violate the feature > freeze rules, unless there's an actual bug this would fix. OTOH, you are right in that without SM support we can't claim to have this feature at all. So, to avoid controversy, I have just removed the QoS patches from IB core and pushed the code out. -- MST From soporte at banesco.ve Tue Feb 6 06:14:23 2007 From: soporte at banesco.ve (Banesco Banco Universal) Date: Tue, 06 Feb 2007 06:14:23 -0800 Subject: [openib-general] Seguridad en su cuenta. Message-ID: An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Feb 6 06:21:52 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Feb 2007 09:21:52 -0500 Subject: [openib-general] [RFC] [PATCH] ib_usa: export multicast and informinfo registration to userspace In-Reply-To: <000001c74726$94d0f500$e598070a@amr.corp.intel.com> References: <000001c74726$94d0f500$e598070a@amr.corp.intel.com> Message-ID: <1170771710.4525.287718.camel@hal.voltaire.com> On Fri, 2007-02-02 at 19:02, Sean Hefty wrote: > Export SA client capabilities for multicast and SA event registration > to userspace. Multicast and event registration are tracked on a per > port basis, with tracking done by the ib_sa kernel module. > > Based on feedback from the list, a new userspace SA module was added, > rather than trying to rework the usermad interface. The user to kernel > interface is minimal, but was designed to be flexible enough to add > additional SA client support if needed. (E.g. local SA cache lookup, > SA queries, service registration, etc.) > > Signed-off-by: Sean Hefty > --- > The following patch is also available from the user_sa branch of my > rdma-dev.git tree, and is dependent on the informinfo branch/patch > posted earlier to the list. (A couple of small fixes to the informinfo > code have been added since the original patches.) A userspace sa library > is also available. > > The informinfo and userspace support was completed as part of the > PathForward project at the request of the US National Laboratories. > [snip...] > diff --git a/drivers/infiniband/core/usa.c b/drivers/infiniband/core/usa.c > new file mode 100644 > index 0000000..ae05091 > --- /dev/null > +++ b/drivers/infiniband/core/usa.c > @@ -0,0 +1,792 @@ [snip...] > +static int process_mcast(struct usa_file *file, struct ib_usa_request *req, > + int out_len) > +{ > + /* Only indirect requests are currently supported. */ > + if (!req->local) > + return -ENOSYS; > + > + switch (req->method) { > + case IB_MGMT_METHOD_GET: > + return get_mcast(file, req, out_len); > + case IB_MGMT_METHOD_SET: > + return join_mcast(file, req, out_len); > + default: > + return -EINVAL; Should leaving a multicast group also be supported ? -- Hal From halr at voltaire.com Tue Feb 6 06:34:25 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Feb 2007 09:34:25 -0500 Subject: [openib-general] QoS in opensm will not be part of OFED 1.2 In-Reply-To: <20070206135830.GA7750@mellanox.co.il> References: <1170690105.4525.201879.camel@hal.voltaire.com> <20070205154922.GC4246@mellanox.co.il> <20070206135830.GA7750@mellanox.co.il> Message-ID: <1170772464.4525.288496.camel@hal.voltaire.com> On Tue, 2007-02-06 at 08:58, Michael S. Tsirkin wrote: > > Quoting Michael S. Tsirkin : > > Subject: Re: QoS in opensm will not be part of OFED 1.2 > > > > > > > > I had an AI to check the QoS status with OSM. > > > > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 > > > > > > (I updated the plan on the Wiki) > > > > > > > > > > > > The reasons for this are: > > > > > > 1. Code not ready at code freeze. > > > > > > 2. There are technical discussion in the list regarding some > > > > > > implementation details (e.g. XML or text syntax). > > > > > > 3. SPEC is not published by IBTA yet. > > > > > > > > > > I think this last reason also applies to the end client QoS changes as > > > > > well. > > > > > > > > Yes. But the other 2 don't. > > > > > > Right but I think that precludes it from being included in OFED right > > > now. > > > > Since the code is already included in OFED, moving it out would violate the feature > > freeze rules, unless there's an actual bug this would fix. > > OTOH, you are right in that without SM support we can't claim to have this > feature at all. So, to avoid controversy, I have just removed the QoS patches > from IB core and pushed the code out. I think that the mthca patch to encode SL in sched_queue field to improve hardware QoS guarantees for connected QPs is useful as this can be exercised by IPoIB-CM. If so, should/can this be included ? -- Hal From mst at mellanox.co.il Tue Feb 6 06:41:03 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 16:41:03 +0200 Subject: [openib-general] QoS in opensm will not be part of OFED 1.2 In-Reply-To: <1170772464.4525.288496.camel@hal.voltaire.com> References: <1170772464.4525.288496.camel@hal.voltaire.com> Message-ID: <20070206144103.GB9534@mellanox.co.il> > > > Quoting Michael S. Tsirkin : > > > Subject: Re: QoS in opensm will not be part of OFED 1.2 > > > > > > > > > > I had an AI to check the QoS status with OSM. > > > > > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 > > > > > > > (I updated the plan on the Wiki) > > > > > > > > > > > > > > The reasons for this are: > > > > > > > 1. Code not ready at code freeze. > > > > > > > 2. There are technical discussion in the list regarding some > > > > > > > implementation details (e.g. XML or text syntax). > > > > > > > 3. SPEC is not published by IBTA yet. > > > > > > > > > > > > I think this last reason also applies to the end client QoS changes as > > > > > > well. > > > > > > > > > > Yes. But the other 2 don't. > > > > > > > > Right but I think that precludes it from being included in OFED right > > > > now. > > > > > > Since the code is already included in OFED, moving it out would violate the feature > > > freeze rules, unless there's an actual bug this would fix. > > > > OTOH, you are right in that without SM support we can't claim to have this > > feature at all. So, to avoid controversy, I have just removed the QoS patches > > from IB core and pushed the code out. > > I think that the mthca patch to encode SL in sched_queue field to > improve hardware QoS guarantees for connected QPs is useful as this can > be exercised by IPoIB-CM. If so, should/can this be included ? OK. Note this is still untested, and off by default. -- MST From vlad at dev.mellanox.co.il Tue Feb 6 06:59:19 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 06 Feb 2007 16:59:19 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.2 release - to be reviewed in the meeting today In-Reply-To: <6a122cc00702010817j52958d85n1d141316e29a7ebf@mail.gmail.com> References: <45BDFF11.9080901@mellanox.co.il> <45BFF296.8000908@cse.ohio-state.edu> <45C08E47.2040506@mellanox.co.il> <6a122cc00702010817j52958d85n1d141316e29a7ebf@mail.gmail.com> Message-ID: <1170773959.6537.17.camel@vladsk-laptop> On Thu, 2007-02-01 at 18:17 +0200, Moni Levy wrote: > Tziporet, > On 1/31/07, Tziporet Koren wrote: > > Shaun Rowland wrote: > > > > > > Hi. I am not exactly sure where the ofed_1_2 directory for MPI SRPMs is > > > supposed to go. I assume from previous meetings this is just a > > > filesystem directory. Should it be a directory in my home directory on > > > staging.openfabrics.org, in ~/public_html, or is there something else I > > > need to do to put this into place? I think from the previous MPI > > > specific meeting, this was supposed to be done in a web directory. Since > > > I am unclear, I wanted to ask here. > > > > Please place your SRPM under your home directory at ofed_1_2 directory. > > Then you can make this directory accessible to the web in this way: > > 1. mkdir public_html > > 2. chmod 755 public_html > > > > Now you can put any stuff under public_html (also symbolic links) and it > > will be available via web > > www.openfabrics.org/~/ > > I have put the ib-bonding SRPM in ~monis/ofed_1_2 > > --Moni Hi Moni, Please move ~monis/ofed_1_2 to ~monis/public_html/ofed_1_2 Thanks, -- Vladimir Sokolovsky Mellanox Technologies Ltd. From mst at mellanox.co.il Tue Feb 6 07:03:21 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 17:03:21 +0200 Subject: [openib-general] idea for ofed 1 2 kernel file structure In-Reply-To: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> Message-ID: <20070206150321.GA21776@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: idea for ofed 1 2 kernel file structure > > Hi! > > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike: > > It is hard to see changes that are specific to OFED since we have whole kernel > history mixed in. > > > > It would easy to split OFED specific files In separate directory and have OFED > scripts combine that with upstream kernel. > > > > All out of tree modules we distribute would go there too. > > What do others think about this? OK, I didn't quite get whether the majority likes this or not, so I created such a repository, extracted the ofed specific history and imported it there. Take a look here: git://git.openfabrics.org/~mst/newofed.git Build scripts will have to be adjusted to add necessary kernel components that we use. Another nice thing about this layout, is that users (if they so wish) will be able to use just linux kernel source tarball instead of full linux kernel git. OFED maintainers, you are the primary users of the OFED git. Please comment which layout is better for you. -- MST From swise at opengridcomputing.com Tue Feb 6 07:15:57 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 09:15:57 -0600 Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 neteventbackport] In-Reply-To: <20070204155244.GC20087@mellanox.co.il> References: <1170604137.4129.13.camel@linux-q667.site> <20070204155244.GC20087@mellanox.co.il> Message-ID: <1170774957.19662.13.camel@stevo-desktop> Hey guys, This still hasn't been pulled in yet. Its trivial and its up to you if it goes in, but lemme know so I can remove it from my list of pending patches. Thanks, Steve. On Sun, 2007-02-04 at 17:52 +0200, Michael S. Tsirkin wrote: > No, but it really makes sense. Vlad? > > Quoting Steve WIse : > Subject: Re: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 neteventbackport] > > Vlad/Michael, > > I'm still tracking this as an outstanding patch. Have you pulled this > in yet? > > Thanks, > > Steve. > > > On Thu, 2007-02-01 at 14:07 -0600, Steve Wise wrote: > > From: Steve Wise > > > > Add skbuff.h to include list for RHEL4U4 netevent.c file. This makes > > it identical to the SLES9SP3 file. > > > > Signed-off-by: Steve Wise > > --- > > > > .../backport/2.6.9_U4/include/src/netevent.c | 1 + > > 1 files changed, 1 insertions(+), 0 deletions(-) > > > > diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > > index 1589300..87fb55c 100644 > > --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > > +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c > > @@ -13,6 +13,7 @@ > > * Fixes: > > */ > > > > +#include > > #include > > #include > > #include > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From swise at opengridcomputing.com Tue Feb 6 07:38:41 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 09:38:41 -0600 Subject: [openib-general] OFED-1.2 first release - provider library install problem In-Reply-To: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: <1170776321.19662.28.camel@stevo-desktop> Vlad, After installing the test alpha1 build rpms on rhel4u4 with a kernel.org 2.6.20 kernel, it appears that the provider library config files didn't get installed for libcxgb3: [root at r1-iw ~]# rping -s -a 0.0.0.0 -p 9999 libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'. libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'. libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 libibverbs: Warning: no userspace device-specific driver found for uverbs0 Segmentation fault [root at r1-iw ~]# ls /usr/local/ofed/etc ls: /usr/local/ofed/etc: No such file or directory [root at r1-iw ~]# I'm running with the cxgb3 driver, so I guess libcxgb3 didn't install itself correctly? This works when doing 'make install' from the userspace tarballs. Is there some rpm magic missing? I'm not sure how to debug this as I'm rpm-challenged (but willing to learn :). It appears libcxgb3 installed its v2 libs correctly: [root at r1-iw ~]# ls /usr/local/ofed/lib64 libcxgb3.a libdat.a libibcommon.so libibumad.a libibverbs.so.1.0.0 libcxgb3-rdmav2.so libdat.so libibcommon.so.1 libibumad.so librdmacm.so libcxgb3.so libdat.so.1 libibcommon.so.1.0.0 libibumad.so.1 librdmacm.so.0.9.0 libdaplcma.a libdat.so.1.0.2 libibmad.a libibumad.so.1.0.0 libdaplcma.so libibcm.so libibmad.so libibverbs.a libdaplcma.so.1 libibcm.so.0.9.0 libibmad.so.1 libibverbs.so libdaplcma.so.1.0.2 libibcommon.a libibmad.so.1.2.0 libibverbs.so.1 [root at r1-iw ~]# But /usr/local/ofed/etc/libibverbs.d didn't get created and the cxgb3.driver file installed. Steve. From monis at voltaire.com Tue Feb 6 07:47:57 2007 From: monis at voltaire.com (Moni Shoua) Date: Tue, 06 Feb 2007 17:47:57 +0200 Subject: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour Message-ID: <45C8A32D.2000504@voltaire.com> Michael, Roland, I'd appreciate if you take a look at this and give your comments. The patch here refers to this thread about adding bonding support for IPoIB interfaces and is necessary for it to work properly. http://openib.org/pipermail/openib-general/2007-January/031934.html The patch here is for upstream kernel while there is a version of the patch for OFED as well (for kernels up to 2.6.16) http://openib.org/pipermail/openib-general/2007-January/031935.html thanks - MoniS ------------------------------------------------------------------------------ IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Combing these two flows, there is a hole if some code at ipoib (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev is an ipoib device so for example netdev_priv(n->dev) would be of type struct ipoib_dev_priv. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one. Signed-off-by: Moni Shoua Signed-off-by: Or Gerlitz --- ipoib.h | 4 +++- ipoib_main.c | 23 +++++++++++++---------- ipoib_multicast.c | 2 +- 3 files changed, 17 insertions(+), 12 deletions(-) Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-22 12:11:25.000000000 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-22 12:18:06.101698456 +0200 @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_head list; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-01-22 12:11:33.000000000 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-01-22 12:34:57.599156580 +0200 @@ -490,7 +490,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -769,32 +769,34 @@ static void ipoib_set_mcast_list(struct static void ipoib_neigh_destructor(struct neighbour *n) { struct ipoib_neigh *neigh; - struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct ipoib_dev_priv *priv; unsigned long flags; struct ipoib_ah *ah = NULL; - ipoib_dbg(priv, - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", - IPOIB_QPN(n->ha), - IPOIB_GID_RAW_ARG(n->ha + 4)); - - spin_lock_irqsave(&priv->lock, flags); neigh = *to_ipoib_neigh(n); if (neigh) { + priv = netdev_priv(neigh->dev); + ipoib_dbg(priv, + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", + IPOIB_QPN(n->ha), + IPOIB_GID_RAW_ARG(n->ha + 4)); + + spin_lock_irqsave(&priv->lock, flags); if (neigh->ah) ah = neigh->ah; list_del(&neigh->list); ipoib_neigh_free(n->dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); } - spin_unlock_irqrestore(&priv->lock, flags); if (ah) ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -803,6 +805,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh->neighbour = neighbour; + neigh->dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-01-22 12:11:25.000000000 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-01-22 12:18:06.151689482 +0200 @@ -774,7 +774,7 @@ out: if (skb->dst && skb->dst->neighbour && !*to_ipoib_neigh(skb->dst->neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (neigh) { kref_get(&mcast->ah->ref); From halr at voltaire.com Tue Feb 6 07:52:50 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Feb 2007 10:52:50 -0500 Subject: [openib-general] [PATCH 1/2] OpenSM: Add a printable node description to osm_node_t Message-ID: <1170777169.4525.293473.camel@hal.voltaire.com> OpenSM: Add a printable node description to osm_node_t Also, convert memcpy's to use this rather than temporary one Signed-off-by: Ira K. Weiny Signed-off-by: Hal Rosenstock diff --git a/osm/include/opensm/osm_node.h b/osm/include/opensm/osm_node.h index 8417f10..6f95d5d 100644 --- a/osm/include/opensm/osm_node.h +++ b/osm/include/opensm/osm_node.h @@ -107,6 +107,7 @@ typedef struct _osm_node ib_node_desc_t node_desc; uint32_t discovery_count; uint32_t physp_tbl_size; + char print_desc[IB_NODE_DESCRIPTION_SIZE+1]; osm_physp_t physp_table[1]; } osm_node_t; /* @@ -135,6 +136,9 @@ typedef struct _osm_node * than the number of ports in the node, since port numbers * start with 1 for some bizzare reason. * +* print_desc +* A printable version of the node description. +* * phsyp_table * Array of physical port objects belonging to this node. * Index is contiguous by local port number. diff --git a/osm/opensm/osm_drop_mgr.c b/osm/opensm/osm_drop_mgr.c index 6c5939e..0d08ff6 100644 --- a/osm/opensm/osm_drop_mgr.c +++ b/osm/opensm/osm_drop_mgr.c @@ -367,19 +367,12 @@ __osm_drop_mgr_remove_port( if (osm_log_is_active( p_mgr->p_log, OSM_LOG_INFO )) { - char desc[IB_NODE_DESCRIPTION_SIZE + 1]; - - if (p_node) - { - memcpy(desc, p_node->node_desc.description, IB_NODE_DESCRIPTION_SIZE); - desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; - } osm_log( p_mgr->p_log, OSM_LOG_INFO, "__osm_drop_mgr_remove_port: " "Removed port with GUID:0x%016" PRIx64 " LID range [0x%X,0x%X] of node:%s\n", cl_ntoh64( port_gid.unicast.interface_id ), - min_lid_ho, max_lid_ho, p_node ? desc : "UNKNOWN" ); + min_lid_ho, max_lid_ho, p_node ? p_node->print_desc : "UNKNOWN" ); } Exit: diff --git a/osm/opensm/osm_node_desc_rcv.c b/osm/opensm/osm_node_desc_rcv.c index 13c5a93..fc96c12 100644 --- a/osm/opensm/osm_node_desc_rcv.c +++ b/osm/opensm/osm_node_desc_rcv.c @@ -69,23 +69,23 @@ __osm_nd_rcv_process_nd( IN osm_node_t* const p_node, IN const ib_node_desc_t* const p_nd ) { - char desc[IB_NODE_DESCRIPTION_SIZE + 1]; OSM_LOG_ENTER( p_rcv->p_log, __osm_nd_rcv_process_nd ); + memcpy( &p_node->node_desc.description, p_nd, sizeof(*p_nd) ); + + /* also set up a printable version */ + memcpy( &p_node->print_desc, p_nd, sizeof(*p_nd) ); + p_node->print_desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + if( osm_log_is_active( p_rcv->p_log, OSM_LOG_VERBOSE ) ) { - memcpy( desc, p_nd, sizeof(*p_nd) ); - /* Guarantee null termination before printing. */ - desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; - osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, "__osm_nd_rcv_process_nd: " "Node 0x%" PRIx64 "\n\t\t\t\tDescription = %s\n", - cl_ntoh64( osm_node_get_node_guid( p_node )), desc ); + cl_ntoh64( osm_node_get_node_guid( p_node )), + p_node->print_desc); } - memcpy( &p_node->node_desc.description, p_nd, sizeof(*p_nd) ); - OSM_LOG_EXIT( p_rcv->p_log ); } diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 16297c9..2905857 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -1076,7 +1076,6 @@ __osm_topology_file_create( const osm_node_t *p_node; char *file_name; FILE *rc; - char desc[IB_NODE_DESCRIPTION_SIZE + 1]; OSM_LOG_ENTER( p_mgr->p_log, __osm_topology_file_create ); @@ -1139,10 +1138,6 @@ __osm_topology_file_create( p_default_physp = p_physp; } - memcpy(desc, p_node->node_desc.description, - IB_NODE_DESCRIPTION_SIZE); - desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; - fprintf( rc, "{ %s%s Ports:%02X" " SystemGUID:%016" PRIx64 " NodeGUID:%016" PRIx64 @@ -1165,7 +1160,7 @@ __osm_topology_file_create( ( &p_node->node_info ) ), cl_ntoh16( p_node->node_info.device_id ), cl_ntoh32( p_node->node_info.revision ), - desc, + p_node->print_desc, cl_ntoh16( p_default_physp->port_info.base_lid ), cPort ); @@ -1180,10 +1175,6 @@ __osm_topology_file_create( p_default_physp = p_rphysp; } - memcpy(desc, p_nbnode->node_desc.description, - IB_NODE_DESCRIPTION_SIZE); - desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; - fprintf( rc, "{ %s%s Ports:%02X" " SystemGUID:%016" PRIx64 " NodeGUID:%016" PRIx64 @@ -1206,7 +1197,7 @@ __osm_topology_file_create( ( &p_nbnode->node_info ) ), cl_ntoh32( p_nbnode->node_info.device_id ), cl_ntoh32( p_nbnode->node_info.revision ), - desc, + p_nbnode->print_desc, cl_ntoh16( p_default_physp->port_info.base_lid ), p_rphysp->port_num ); @@ -1662,7 +1653,6 @@ __osm_state_mgr_report_new_ports( ib_net64_t port_guid; uint16_t min_lid_ho; uint16_t max_lid_ho; - char desc[IB_NODE_DESCRIPTION_SIZE + 1]; OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_report_new_ports ); @@ -1704,19 +1694,13 @@ __osm_state_mgr_report_new_ports( ib_get_err_str( status ) ); } osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho ); - if (p_port->p_node) - { - memcpy(desc, p_port->p_node->node_desc.description, - IB_NODE_DESCRIPTION_SIZE); - desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; - } osm_log( p_mgr->p_log, OSM_LOG_INFO, "__osm_state_mgr_report_new_ports: " "Discovered new port with GUID:0x%016" PRIx64 " LID range [0x%X,0x%X] of node:%s\n", cl_ntoh64( port_gid.unicast.interface_id ), min_lid_ho, max_lid_ho, - p_port->p_node ? desc : "UNKNOWN" ); + p_port->p_node ? p_port->p_node->print_desc : "UNKNOWN" ); p_port = ( osm_port_t diff --git a/osm/opensm/osm_ucast_ftree.c b/osm/opensm/osm_ucast_ftree.c index cb40ab6..21aa4a8 100644 --- a/osm/opensm/osm_ucast_ftree.c +++ b/osm/opensm/osm_ucast_ftree.c @@ -1251,7 +1251,6 @@ __osm_ftree_fabric_dump_hca_ordering( uint32_t i; uint32_t j; - char desc[IB_NODE_DESCRIPTION_SIZE + 1]; char path[1024]; FILE * p_hca_ordering_file; char * filename = "osm-ftree-ca-order.dump"; @@ -1278,11 +1277,10 @@ __osm_ftree_fabric_dump_hca_ordering( { p_group = p_sw->down_port_groups[j]; p_hca = p_group->remote_hca_or_sw.remote_hca; - memcpy(desc,p_hca->p_osm_node->node_desc.description,IB_NODE_DESCRIPTION_SIZE); - desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; fprintf(p_hca_ordering_file,"0x%x\t%s\n", - cl_ntoh16(p_group->remote_base_lid), desc); + cl_ntoh16(p_group->remote_base_lid), + p_hca->p_osm_node->print_desc); } /* now print dummy HCAs */ diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index ded3880..3564ba7 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -361,14 +361,12 @@ ucast_mgr_dump_lfts(cl_map_item_t *p_map unsigned max_port = osm_switch_get_num_ports(p_sw); uint16_t lid; uint8_t port; - char desc[IB_NODE_DESCRIPTION_SIZE + 1]; - memcpy(desc, p_node->node_desc.description, IB_NODE_DESCRIPTION_SIZE); - desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; fprintf(file, "Unicast lids [0x0-0x%x] of switch Lid %u guid 0x%016" PRIx64 " (\'%s\'):\n", max_lid, osm_node_get_base_lid(p_node, 0), - cl_ntoh64(osm_node_get_node_guid(p_node)), desc); + cl_ntoh64(osm_node_get_node_guid(p_node)), + p_node->print_desc); for (lid = 0; lid <= max_lid; lid++) { osm_port_t *p_port; port = osm_switch_get_port_by_lid(p_sw, lid); @@ -381,12 +379,10 @@ ucast_mgr_dump_lfts(cl_map_item_t *p_map p_port = cl_ptr_vector_get(&p_mgr->p_subn->port_lid_tbl, lid); if (p_port) { p_node = osm_port_get_parent_node(p_port); - memcpy(desc, p_node->node_desc.description, - IB_NODE_DESCRIPTION_SIZE); - desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; fprintf(file, "%s portguid 0x016%" PRIx64 ": \'%s\'", ib_get_node_type_str(osm_node_get_type(p_node)), - cl_ntoh64(osm_port_get_guid(p_port)), desc); + cl_ntoh64(osm_port_get_guid(p_port)), + p_node->print_desc); } else fprintf(file, "unknown node and type"); From halr at voltaire.com Tue Feb 6 07:53:05 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Feb 2007 10:53:05 -0500 Subject: [openib-general] [PATCH 2/2] OpenSM/osm_sa_mcmember_record.c: Add NodeDescription to mcast group join error messages Message-ID: <1170777169.4525.293474.camel@hal.voltaire.com> OpenSM/osm_sa_mcmember_record.c: Add NodeDescription to mcast group join error messages Signed-off-by: Ira K. Weiny Signed-off-by: Hal Rosenstock diff --git a/osm/opensm/osm_sa_mcmember_record.c b/osm/opensm/osm_sa_mcmember_record.c index 2c55198..62d00ac 100644 --- a/osm/opensm/osm_sa_mcmember_record.c +++ b/osm/opensm/osm_sa_mcmember_record.c @@ -1610,9 +1610,11 @@ __osm_mcmr_rcv_join_mgrp( "__osm_mcmr_rcv_join_mgrp: ERR 1B10: " "Provided Join State != FullMember - required for create, " "MGID: 0x%016" PRIx64 " : " - "0x%016" PRIx64 "\n", + "0x%016" PRIx64 " from port 0x%016" PRIx64 " (%s)\n", cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.prefix ), - cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ) ); + cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ), + cl_ntoh64( portguid ), + p_port->p_node->print_desc); sa_status = IB_SA_MAD_STATUS_REQ_INVALID; osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); goto Exit; @@ -1649,14 +1651,15 @@ __osm_mcmr_rcv_join_mgrp( "component mask = 0x%016" PRIx64 ", " "expected comp mask = 0x%016" PRIx64 ", " "MGID: 0x%016" PRIx64 " : " - "0x%016" PRIx64 " from port 0x%016" PRIx64 "\n", + "0x%016" PRIx64 " from port 0x%016" PRIx64 " (%s)\n", ib_get_sa_method_str(p_sa_mad->method), p_recvd_mcmember_rec->scope_state, cl_ntoh64(p_sa_mad->comp_mask), CL_NTOH64(REQUIRED_MC_CREATE_COMP_MASK), cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.prefix ), cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ), - cl_ntoh64( portguid ) ); + cl_ntoh64( portguid ), + p_port->p_node->print_desc); sa_status = IB_SA_MAD_STATUS_INSUF_COMPS; osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); @@ -1713,9 +1716,10 @@ __osm_mcmr_rcv_join_mgrp( osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_mcmr_rcv_join_mgrp: ERR 1B12: " "__validate_more_comp_fields, __validate_port_caps, " - "or JoinState = 0 failed from port 0x%016" PRIx64 ", " + "or JoinState = 0 failed from port 0x%016" PRIx64 " (%s), " "sending IB_SA_MAD_STATUS_REQ_INVALID\n", - cl_ntoh64( portguid ) ); + cl_ntoh64( portguid ), + p_port->p_node->print_desc); sa_status = IB_SA_MAD_STATUS_REQ_INVALID; osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); @@ -1742,8 +1746,10 @@ __osm_mcmr_rcv_join_mgrp( osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_mcmr_rcv_join_mgrp: ERR 1B13: " - "__validate_modify failed, " - "sending IB_SA_MAD_STATUS_REQ_INVALID\n" ); + "__validate_modify failed from port 0x%016" PRIx64 " (%s), " + "sending IB_SA_MAD_STATUS_REQ_INVALID\n", + cl_ntoh64( portguid ), + p_port->p_node->print_desc); sa_status = IB_SA_MAD_STATUS_REQ_INVALID; osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); @@ -1794,8 +1800,10 @@ __osm_mcmr_rcv_join_mgrp( { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_mcmr_rcv_join_mgrp: ERR 1B14: " - "osm_sm_mcgrp_join failed, " - "sending IB_SA_MAD_STATUS_NO_RESOURCES\n" ); + "osm_sm_mcgrp_join failed from port 0x%016" PRIx64 " (%s), " + "sending IB_SA_MAD_STATUS_NO_RESOURCES\n", + cl_ntoh64( portguid ), + p_port->p_node->print_desc); CL_PLOCK_EXCL_ACQUIRE(p_rcv->p_lock); From mst at mellanox.co.il Tue Feb 6 08:02:59 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 18:02:59 +0200 Subject: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour In-Reply-To: <45C8A32D.2000504@voltaire.com> References: <45C8A32D.2000504@voltaire.com> Message-ID: <20070206160259.GC21776@mellanox.co.il> > ------------------------------------------------------------------------------ > IPoIB uses a two layer neighboring scheme, such that for each struct neighbour > whose device is an ipoib one, there is a struct ipoib_neigh buddy which is > created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) > call. > > When using the bonding driver, neighbours are created by the net stack on behalf > of the bonding (master) device. On the tx flow the bonding code gets an skb such > that skb->dev points to the master device, it changes this skb to point on the > slave device and calls the slave hard_start_xmit function. > > Combing these two flows, there is a hole if some code at ipoib > (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev > is an ipoib device so for example netdev_priv(n->dev) would be of type struct > ipoib_dev_priv. Could you plese elaborate how ipoib_neigh_destructor comes to be called at all? At what point does ipoib_neigh_setup_dev get called? > To fix it, this patch adds a dev field to struct ipoib_neigh which is used > instead of the struct neighbour dev one. What I am concerned with is - if the master is not an IPoIB device, what guarantee do we have that to_ipoib_neigh will return 0 and not part of an actual hardware address? Without bonding, the reason is that dev points to an ipoib device, so we know hw address is 20 bytes. -- MST From swise at opengridcomputing.com Tue Feb 6 08:06:27 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 10:06:27 -0600 Subject: [openib-general] build.sh not building libmthca Message-ID: <1170777987.19662.31.camel@stevo-desktop> Another build problem with the alpha test package: If I run build.sh and _only_ select libmthca, it claims it builds it ok, but doesn't produce the .rpm file... Steve. From darwish.07 at gmail.com Tue Feb 6 08:07:25 2007 From: darwish.07 at gmail.com (Ahmed S. Darwish) Date: Tue, 6 Feb 2007 18:07:25 +0200 Subject: [openib-general] [PATCH 2.6.20] infinband: Use ARRAY_SIZE macro when appropriate In-Reply-To: <20070206160204.GA8991@Ahmed> Message-ID: <20070206160725.GJ8991@Ahmed> Hi all, A patch to use ARRAY_SIZE macro already defined in kernel.h Signed-off-by: Ahmed S. Darwish --- diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 63d2a39..7fabb42 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -93,7 +94,7 @@ static int ib_device_check_mandatory(struct ib_device *device) }; int i; - for (i = 0; i < sizeof mandatory_table / sizeof mandatory_table[0]; ++i) { + for (i = 0; i < ARRAY_SIZE(mandatory_table); ++i) { if (!*(void **) ((void *) device + mandatory_table[i].offset)) { printk(KERN_WARNING "Device %s is missing mandatory function %s\n", device->name, mandatory_table[i].name); -- Ahmed S. Darwish http://darwish-07.blogspot.com From monis at voltaire.com Tue Feb 6 08:24:10 2007 From: monis at voltaire.com (Moni Shoua) Date: Tue, 06 Feb 2007 18:24:10 +0200 Subject: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour In-Reply-To: <20070206160259.GC21776@mellanox.co.il> References: <45C8A32D.2000504@voltaire.com> <20070206160259.GC21776@mellanox.co.il> Message-ID: <45C8ABAA.10500@voltaire.com> Michael S. Tsirkin wrote: >>------------------------------------------------------------------------------ >>IPoIB uses a two layer neighboring scheme, such that for each struct neighbour >>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is >>created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) >>call. >> >>When using the bonding driver, neighbours are created by the net stack on behalf >>of the bonding (master) device. On the tx flow the bonding code gets an skb such >>that skb->dev points to the master device, it changes this skb to point on the >>slave device and calls the slave hard_start_xmit function. >> >>Combing these two flows, there is a hole if some code at ipoib >>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev >>is an ipoib device so for example netdev_priv(n->dev) would be of type struct >>ipoib_dev_priv. > > > Could you plese elaborate how ipoib_neigh_destructor comes to be called at all? > At what point does ipoib_neigh_setup_dev get called? > > The bond device uses its slave's neigh_setup function. Please look at line 19 below from the bonding code. static void bond_setup_by_slave(struct net_device *bond_dev, 11 + struct net_device *slave_dev) 12 +{ 13 + bond_dev->hard_header = slave_dev->hard_header; 14 + bond_dev->rebuild_header = slave_dev->rebuild_header; 15 + bond_dev->hard_header_cache = slave_dev->hard_header_cache; 16 + bond_dev->header_cache_update = slave_dev->header_cache_update; 17 + bond_dev->hard_header_parse = slave_dev->hard_header_parse; 18 + 19 + bond_dev->neigh_setup = slave_dev->neigh_setup; 20 + 21 + bond_dev->type = slave_dev->type; 22 + bond_dev->hard_header_len = slave_dev->hard_header_len; 23 + bond_dev->addr_len = slave_dev->addr_len; 24 + 25 + memcpy(bond_dev->broadcast, slave_dev->broadcast, 26 + slave_dev->addr_len); 27 +} >>To fix it, this patch adds a dev field to struct ipoib_neigh which is used >>instead of the struct neighbour dev one. > > > What I am concerned with is - if the master is not an IPoIB device, > what guarantee do we have that to_ipoib_neigh will return 0 > and not part of an actual hardware address? > > Without bonding, the reason is that dev points to an ipoib device, > so we know hw address is 20 bytes. > I guess you meant "if the slave is not an IPoIB device"... The bond device doesn't allow devices of different types to be grouped together as its slaves. Furthermore, bond_setup_by_slave is called only for non Ethernet devices (we consider to change the logic to "called only for IPoIB devices just for safety). From swise at opengridcomputing.com Tue Feb 6 08:41:42 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 10:41:42 -0600 Subject: [openib-general] build.sh not building libmthca In-Reply-To: <1170777987.19662.31.camel@stevo-desktop> References: <1170777987.19662.31.camel@stevo-desktop> Message-ID: <1170780102.19662.45.camel@stevo-desktop> Do you want me to use bugzilla to track these issues? On Tue, 2007-02-06 at 10:06 -0600, Steve Wise wrote: > Another build problem with the alpha test package: > > If I run build.sh and _only_ select libmthca, it claims it builds it ok, > but doesn't produce the .rpm file... > > Steve. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Tue Feb 6 08:41:10 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Feb 2007 08:41:10 -0800 Subject: [openib-general] [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Roland, Could you please review this patch when you have time? I am looking forward to seeing your comments to address a customer issue. Appreciate your help. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at mellanox.co.il Tue Feb 6 08:48:36 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 06 Feb 2007 18:48:36 +0200 Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 netevent backport] In-Reply-To: <1170604137.4129.13.camel@linux-q667.site> References: <1170360441.16637.41.camel@stevo-desktop> <1170604137.4129.13.camel@linux-q667.site> Message-ID: <1170780516.6537.28.camel@vladsk-laptop> On Sun, 2007-02-04 at 09:48 -0600, Steve WIse wrote: > Vlad/Michael, > > I'm still tracking this as an outstanding patch. Have you pulled this > in yet? > > Thanks, > > Steve. Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From vlad at mellanox.co.il Tue Feb 6 08:49:18 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 06 Feb 2007 18:49:18 +0200 Subject: [openib-general] [PATCH] ofed_1_2 - iw_cxgb3 - Add standard GPL header to tcb.h In-Reply-To: <1170704623.16661.54.camel@stevo-desktop> References: <1170704623.16661.54.camel@stevo-desktop> Message-ID: <1170780558.6537.30.camel@vladsk-laptop> On Mon, 2007-02-05 at 13:43 -0600, Steve Wise wrote: > Add standard GPL header to tcb.h > > From: Steve Wise > > Signed-off-by: Steve Wise > --- Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From swise at opengridcomputing.com Tue Feb 6 08:50:46 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 10:50:46 -0600 Subject: [openib-general] OFED-1.2 first release - provider library install problem In-Reply-To: <1170776321.19662.28.camel@stevo-desktop> References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> <1170776321.19662.28.camel@stevo-desktop> Message-ID: <1170780646.19662.48.camel@stevo-desktop> FYI: The libmthca rpm has the same issue... Steve. On Tue, 2007-02-06 at 09:38 -0600, Steve Wise wrote: > Vlad, > > After installing the test alpha1 build rpms on rhel4u4 with a > kernel.org 2.6.20 kernel, it appears that the provider library config > files didn't get installed for libcxgb3: > > [root at r1-iw ~]# rping -s -a 0.0.0.0 -p 9999 > libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'. > libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 > libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'. > libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 > libibverbs: Warning: no userspace device-specific driver found for uverbs0 > Segmentation fault > [root at r1-iw ~]# ls /usr/local/ofed/etc > ls: /usr/local/ofed/etc: No such file or directory > [root at r1-iw ~]# > > I'm running with the cxgb3 driver, so I guess libcxgb3 didn't install > itself correctly? This works when doing 'make install' from the > userspace tarballs. Is there some rpm magic missing? I'm not sure how > to debug this as I'm rpm-challenged (but willing to learn :). > > It appears libcxgb3 installed its v2 libs correctly: > > [root at r1-iw ~]# ls /usr/local/ofed/lib64 > libcxgb3.a libdat.a libibcommon.so libibumad.a libibverbs.so.1.0.0 > libcxgb3-rdmav2.so libdat.so libibcommon.so.1 libibumad.so librdmacm.so > libcxgb3.so libdat.so.1 libibcommon.so.1.0.0 libibumad.so.1 librdmacm.so.0.9.0 > libdaplcma.a libdat.so.1.0.2 libibmad.a libibumad.so.1.0.0 > libdaplcma.so libibcm.so libibmad.so libibverbs.a > libdaplcma.so.1 libibcm.so.0.9.0 libibmad.so.1 libibverbs.so > libdaplcma.so.1.0.2 libibcommon.a libibmad.so.1.2.0 libibverbs.so.1 > [root at r1-iw ~]# > > But /usr/local/ofed/etc/libibverbs.d didn't get created and the cxgb3.driver file installed. > > > > Steve. > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From jsquyres at cisco.com Tue Feb 6 09:05:36 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 6 Feb 2007 12:05:36 -0500 Subject: [openib-general] build.sh not building libmthca In-Reply-To: <1170780102.19662.45.camel@stevo-desktop> References: <1170777987.19662.31.camel@stevo-desktop> <1170780102.19662.45.camel@stevo-desktop> Message-ID: Yes, please file all bugs in bugzilla. Thanks! On Feb 6, 2007, at 11:41 AM, Steve Wise wrote: > Do you want me to use bugzilla to track these issues? > > > On Tue, 2007-02-06 at 10:06 -0600, Steve Wise wrote: >> Another build problem with the alpha test package: >> >> If I run build.sh and _only_ select libmthca, it claims it builds >> it ok, >> but doesn't produce the .rpm file... >> >> Steve. >> >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ >> openib-general >> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From swise at opengridcomputing.com Tue Feb 6 09:09:07 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 11:09:07 -0600 Subject: [openib-general] OFED-1.2 first release - provider library install problem In-Reply-To: <1170780646.19662.48.camel@stevo-desktop> References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> <1170776321.19662.28.camel@stevo-desktop> <1170780646.19662.48.camel@stevo-desktop> Message-ID: <1170781747.19662.54.camel@stevo-desktop> bug 339 opened. On Tue, 2007-02-06 at 10:50 -0600, Steve Wise wrote: > provider library install problem From swise at opengridcomputing.com Tue Feb 6 09:09:27 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 11:09:27 -0600 Subject: [openib-general] build.sh not building libmthca In-Reply-To: References: <1170777987.19662.31.camel@stevo-desktop> <1170780102.19662.45.camel@stevo-desktop> Message-ID: <1170781767.19662.56.camel@stevo-desktop> bug 338 opened. On Tue, 2007-02-06 at 12:05 -0500, Jeff Squyres wrote: > Yes, please file all bugs in bugzilla. > > Thanks! > > > On Feb 6, 2007, at 11:41 AM, Steve Wise wrote: > > > Do you want me to use bugzilla to track these issues? > > > > > > On Tue, 2007-02-06 at 10:06 -0600, Steve Wise wrote: > >> Another build problem with the alpha test package: > >> > >> If I run build.sh and _only_ select libmthca, it claims it builds > >> it ok, > >> but doesn't produce the .rpm file... > >> > >> Steve. > >> > >> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ > >> openib-general > >> > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > > openib-general > > From mshefty at ichips.intel.com Tue Feb 6 09:08:52 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Feb 2007 09:08:52 -0800 Subject: [openib-general] [RFC] [PATCH] ib_usa: export multicast and informinfo registration to userspace In-Reply-To: <1170771710.4525.287718.camel@hal.voltaire.com> References: <000001c74726$94d0f500$e598070a@amr.corp.intel.com> <1170771710.4525.287718.camel@hal.voltaire.com> Message-ID: <45C8B624.9020602@ichips.intel.com> >>+static int process_mcast(struct usa_file *file, struct ib_usa_request *req, >>+ int out_len) >>+{ >>+ /* Only indirect requests are currently supported. */ >>+ if (!req->local) >>+ return -ENOSYS; >>+ >>+ switch (req->method) { >>+ case IB_MGMT_METHOD_GET: >>+ return get_mcast(file, req, out_len); >>+ case IB_MGMT_METHOD_SET: >>+ return join_mcast(file, req, out_len); >>+ default: >>+ return -EINVAL; > > > Should leaving a multicast group also be supported ? It is - just in a different way. Once a user has joined a multicast group or registered for notices, they leave/unregister by issuing a 'free' request. The majority of the code ends up being the same. - Sean From mst at mellanox.co.il Tue Feb 6 09:14:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 19:14:24 +0200 Subject: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour In-Reply-To: <45C8ABAA.10500@voltaire.com> References: <45C8ABAA.10500@voltaire.com> Message-ID: <20070206171424.GB24372@mellanox.co.il> > >>------------------------------------------------------------------------------ > >>IPoIB uses a two layer neighboring scheme, such that for each struct neighbour > >>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is > >>created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) > >>call. > >> > >>When using the bonding driver, neighbours are created by the net stack on behalf > >>of the bonding (master) device. On the tx flow the bonding code gets an skb such > >>that skb->dev points to the master device, it changes this skb to point on the > >>slave device and calls the slave hard_start_xmit function. > >> > >>Combing these two flows, there is a hole if some code at ipoib > >>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev > >>is an ipoib device so for example netdev_priv(n->dev) would be of type struct > >>ipoib_dev_priv. > > > > > > Could you plese elaborate how ipoib_neigh_destructor comes to be called at all? > > At what point does ipoib_neigh_setup_dev get called? > > > > > The bond device uses its slave's neigh_setup function. > Please look at line 19 below from the bonding code. > static void bond_setup_by_slave(struct net_device *bond_dev, > 11 + struct net_device *slave_dev) > 12 +{ > 13 + bond_dev->hard_header = slave_dev->hard_header; > 14 + bond_dev->rebuild_header = slave_dev->rebuild_header; > 15 + bond_dev->hard_header_cache = slave_dev->hard_header_cache; > 16 + bond_dev->header_cache_update = slave_dev->header_cache_update; > 17 + bond_dev->hard_header_parse = slave_dev->hard_header_parse; > 18 + > 19 + bond_dev->neigh_setup = slave_dev->neigh_setup; > 20 + > 21 + bond_dev->type = slave_dev->type; > 22 + bond_dev->hard_header_len = slave_dev->hard_header_len; > 23 + bond_dev->addr_len = slave_dev->addr_len; > 24 + > 25 + memcpy(bond_dev->broadcast, slave_dev->broadcast, > 26 + slave_dev->addr_len); > 27 +} Another concern: assume that one device goes away (e.g. hotplug). It seems that neighbours whose dev field point to another device, will not be destroyed. Correct? Therefore in your design, it seems that to_ipoib_neigh()->dev will get us a pointer to device that has been removed already. > >>To fix it, this patch adds a dev field to struct ipoib_neigh which is used > >>instead of the struct neighbour dev one. > > > > > > What I am concerned with is - if the master is not an IPoIB device, > > what guarantee do we have that to_ipoib_neigh will return 0 > > and not part of an actual hardware address? > > > > Without bonding, the reason is that dev points to an ipoib device, > > so we know hw address is 20 bytes. > > > > I guess you meant "if the slave is not an IPoIB device"... Yes. > The bond device doesn't allow devices of different types to be grouped > together as its slaves. I see. > Furthermore, bond_setup_by_slave is called only for non > Ethernet devices (we consider to change the logic to "called only for > IPoIB devices just for safety). Why is this necessary, BTW? -- MST From vlad at dev.mellanox.co.il Tue Feb 6 09:25:00 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 06 Feb 2007 19:25:00 +0200 Subject: [openib-general] build.sh not building libmthca In-Reply-To: <1170780102.19662.45.camel@stevo-desktop> References: <1170777987.19662.31.camel@stevo-desktop> <1170780102.19662.45.camel@stevo-desktop> Message-ID: <1170782700.6537.32.camel@vladsk-laptop> On Tue, 2007-02-06 at 10:41 -0600, Steve Wise wrote: > Do you want me to use bugzilla to track these issues? > Yes, please. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From swise at opengridcomputing.com Tue Feb 6 09:28:26 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 11:28:26 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. Message-ID: <1170782906.19662.61.camel@stevo-desktop> I propose the following fix for supporting iWARP on SLES9SP3. This fixes bug 325. Sean, can you please review this? Steve. ----------- SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. Acquire the cma_dev based on the ib device of the incoming connect request. This overcomes a sles9sp3 bug where ip_dev_find(local_ipaddr) always returns the loopback net_device pointer instead of the actual local interface pointer. Note: this workaround leaves the rdma_dev_addr in the new connection request rdma_cm_id incomplete. But ULPs don't really use this, so we'll have to live with it for SLES9SP3. Signed-off-by: Steve Wise --- .../iwcm_ip_dev_find_workaround.patch | 91 +++++++++++++++++++++++ 1 files changed, 91 insertions(+), 0 deletions(-) diff --git a/kernel_patches/backport/2.6.5_sles9_sp3/iwcm_ip_dev_find_workaround.patch b/kernel_patches/backport/2.6.5_sles9_sp3/iwcm_ip_dev_find_workaround.patch new file mode 100644 index 0000000..a9d5bfe --- /dev/null +++ b/kernel_patches/backport/2.6.5_sles9_sp3/iwcm_ip_dev_find_workaround.patch @@ -0,0 +1,91 @@ +SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. + +From: Steve Wise + +Acquire the cma_dev based on the ib device of the incoming +connect request. + +This overcomes a sles9sp3 bug where ip_dev_find(local_ipaddr) always +returns the loopback net_device pointer instead of the actual local +interface pointer. Note: this workaround leaves the rdma_dev_addr in +the new connection request rdma_cm_id incomplete. But ULPs don't really +use this, so we'll have to live with it for SLES9SP3. + +Signed-off-by: Steve Wise +--- + + drivers/infiniband/core/cma.c | 33 +++++++++++++++------------------ + 1 files changed, 15 insertions(+), 18 deletions(-) + +diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c +index 9e0ab04..c89b611 100644 +--- a/drivers/infiniband/core/cma.c ++++ b/drivers/infiniband/core/cma.c +@@ -1128,13 +1128,25 @@ static int cma_iw_handler(struct iw_cm_i + return ret; + } + ++static int iw_cma_acquire_dev(struct iw_cm_id *cm_id, struct rdma_id_private *id_priv) ++{ ++ struct cma_device *cma_dev; ++ ++ list_for_each_entry(cma_dev, &dev_list, list) { ++ if (cma_dev->device == cm_id->device) { ++ cma_attach_to_dev(id_priv, cma_dev); ++ return 0; ++ } ++ } ++ return -ENODEV; ++} ++ + static int iw_conn_req_handler(struct iw_cm_id *cm_id, + struct iw_cm_event *iw_event) + { + struct rdma_cm_id *new_cm_id; + struct rdma_id_private *listen_id, *conn_id; + struct sockaddr_in *sin; +- struct net_device *dev = NULL; + struct rdma_cm_event event; + int ret; + +@@ -1157,22 +1169,8 @@ static int iw_conn_req_handler(struct iw + atomic_inc(&conn_id->dev_remove); + conn_id->state = CMA_CONNECT; + +- dev = ip_dev_find(iw_event->local_addr.sin_addr.s_addr); +- if (!dev) { +- ret = -EADDRNOTAVAIL; +- cma_release_remove(conn_id); +- rdma_destroy_id(new_cm_id); +- goto out; +- } +- ret = rdma_copy_addr(&conn_id->id.route.addr.dev_addr, dev, NULL); +- if (ret) { +- cma_release_remove(conn_id); +- rdma_destroy_id(new_cm_id); +- goto out; +- } +- + mutex_lock(&lock); +- ret = cma_acquire_dev(conn_id); ++ ret = iw_cma_acquire_dev(cm_id, conn_id); + mutex_unlock(&lock); + if (ret) { + cma_release_remove(conn_id); +@@ -1184,6 +1182,7 @@ static int iw_conn_req_handler(struct iw + cm_id->context = conn_id; + cm_id->cm_handler = cma_iw_handler; + ++ new_cm_id->route.addr.dev_addr.dev_type = RDMA_NODE_RNIC; + sin = (struct sockaddr_in *) &new_cm_id->route.addr.src_addr; + *sin = iw_event->local_addr; + sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; +@@ -1203,8 +1202,6 @@ static int iw_conn_req_handler(struct iw + } + + out: +- if (dev) +- dev_put(dev); + cma_release_remove(listen_id); + return ret; + } From sweitzen at cisco.com Tue Feb 6 09:33:42 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 6 Feb 2007 09:33:42 -0800 Subject: [openib-general] OFED-1.2 first release In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: sdpnetstat is getting added to the dapl-devel RPM. # rpm -qlip dapl-devel-1.2.0-0.x86_64.rpm Name : dapl-devel Relocations: (not relocatable) Version : 1.2.0 Vendor: OpenFabrics Release : 0 Build Date: Mon 05 Feb 2007 09:48:50 PM PST Install Date: (not installed) Build Host: svbu-qa1850-1.cisco.com Group : System Environment/Libraries Source RPM: ofa_user-1.2-alpha1.src .rpm Size : 692598 License: GPL/BSD Signature : (none) URL : http://www.openfabrics.org/ Summary : Development files for the libdat and libdapl libraries Description : Static libraries and header files for the libdat and libdapl library. /usr/local/ofed/bin/sdpnetstat /usr/local/ofed/include/dat/dat.h /usr/local/ofed/include/dat/dat_error.h /usr/local/ofed/include/dat/dat_platform_specific.h /usr/local/ofed/include/dat/dat_redirection.h /usr/local/ofed/include/dat/dat_registry.h /usr/local/ofed/include/dat/dat_vendor_specific.h /usr/local/ofed/include/dat/udat.h /usr/local/ofed/include/dat/udat_config.h /usr/local/ofed/include/dat/udat_redirection.h /usr/local/ofed/include/dat/udat_vendor_specific.h /usr/local/ofed/lib64/libdaplcma.a /usr/local/ofed/lib64/libdaplcma.so /usr/local/ofed/lib64/libdat.a /usr/local/ofed/lib64/libdat.so ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Tuesday, February 06, 2007 12:07 AM To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet Koren' Cc: 'openib-general at openib.org' Subject: RE: [openib-general] OFED-1.2 first release Not getting MPI RPMS for Intel compilers, either. Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp itests_mvapich2_gcc-2.0-698.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mvapich2_intel-0 .9.8-1.x 86_64.rpm not found Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/op enmpi_gcc-1.2b4ofedr13470-1ofed.x86_64.rpm Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp itests_openmpi_gcc-2.0-698.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/openmpi_intel-1. 2b4ofedr 13470-1ofed.x86_64.rpm not found ERROR: -.x86_64.rpm not found under /tmp/OFED-1.2-20070205-1823/RPMS/redhat-rele ase-4AS-4.1. Installation finished successfully... Scott ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Monday, February 05, 2007 9:44 PM To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet Koren' Cc: 'openib-general at openib.org' Subject: RE: [openib-general] OFED-1.2 first release Moving on, I set ib_bonding=n in ofed.conf and try install.sh again, and now get this: ... Building MVAPICH RPM. Please wait... Using gcc compiler Running rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_nam e mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --define 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/loc al/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9 71.src.rpm ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRP M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --defi ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM S/mvapich-0.9.9-971.src.rpm" See log file: /tmp/OFED.6120.log # tail /tmp/OFED.6120.log + LANG=C + export LANG + unset DISPLAY /var/tmp/rpm-tmp.870: line 33: syntax error near unexpected token `)' error: Bad exit status from /var/tmp/rpm-tmp.870 (%install) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.870 (%install) ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRP M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --defi ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM S/mvapich-0.9.9-971.src.rpm" Scott ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Monday, February 05, 2007 9:27 PM To: Vladimir Sokolovsky; openfabrics-ewg at openib.org; Tziporet Koren; Scott Weitzenkamp (sweitzen) Cc: openib-general at openib.org Subject: RE: [openib-general] OFED-1.2 first release Vlad and Tziporet, It might help if you elaborated on what you meant by "first release", you have been saying "code freeze" but really this is "feature freeze", right? This announcement is quite a bit different from previous OFED announcements, where you detailed what features were available and what OS were supported. The daily build email mentions compiling against kernels, but I haven't seen what distros were actually tested. Are we starting from scratch on compiling and testing with distros like RHEL4? Do you anticipate we will just go day by day with builds trying to stabilize things initially? In any case, here's what I see when I try to compile with install.sh on RHEL4 U3 x86_64: ... /tmp/OFED-1.2-20070205-1823/build.sh: line 802: kernel-ib: command not found Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1. 2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/ OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts -1.2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" See log file: /tmp/OFED.10899.log # tail -10 /tmp/OFED.10899.log Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/ib-bonding-0. 9.0-root Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + rm -rf ib-bonding-0.9.0 + exit 0 /bin/mv: cannot stat `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm ': No such file or directory ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir Sokolovsky Sent: Monday, February 05, 2007 2:26 PM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openib-general] OFED-1.2 first release Hi, OFED-1.2-20070205-1823.tgz can be downloaded from http://www.openfabrics.org/builds/ofed-1.2/ The first OFED package includes: ofa_kernel-1.2-alpha1.src.rpm ofa_user-1.2-alpha1.src.rpm mvapich-0.9.9-971.src.rpm mvapich2-0.9.8-1.src.rpm openmpi-1.2b4ofedr13470-1ofed.src.rpm mpitests-2.0-698.src.rpm open-iscsi-generic-2.0-742.src.rpm ib-bonding-0.9.0-1.src.rpm ofed-docs-1.2-0.src.rpm ofed-scripts-1.2-0.src.rpm Known issues: srptools - compilation fails openib_diags - compilation fails ibutils - not included yet To build OFED RPMs: cd OFED-1.2-20070205-1823 ./build.sh Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/ directory. To install OFED RPMs: cd OFED-1.2-20070205-1823 ./install.sh For a detailed installation guide, see OFED-1.2-xxx/docs/OFED_Installation_Guide.txt -- Vladimir Sokolovsky Mellanox Technologies Ltd. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue Feb 6 09:37:13 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Feb 2007 09:37:13 -0800 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. In-Reply-To: <1170782906.19662.61.camel@stevo-desktop> References: <1170782906.19662.61.camel@stevo-desktop> Message-ID: <45C8BCC9.4070003@ichips.intel.com> Steve Wise wrote: > I propose the following fix for supporting iWARP on SLES9SP3. > > This fixes bug 325. > > Sean, can you please review this? The changes seem fine with me. Does this bug affect the ib_addr module as well? (addr_resolve_local and rdma_translate_ip) - Sean From swise at opengridcomputing.com Tue Feb 6 10:02:04 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 12:02:04 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. In-Reply-To: <45C8BCC9.4070003@ichips.intel.com> References: <1170782906.19662.61.camel@stevo-desktop> <45C8BCC9.4070003@ichips.intel.com> Message-ID: <1170784924.19662.79.camel@stevo-desktop> On Tue, 2007-02-06 at 09:37 -0800, Sean Hefty wrote: > Steve Wise wrote: > > I propose the following fix for supporting iWARP on SLES9SP3. > > > > This fixes bug 325. > > > > Sean, can you please review this? > > The changes seem fine with me. > > Does this bug affect the ib_addr module as well? (addr_resolve_local and > rdma_translate_ip) > Actually, yes it does. Here's one case (that I just tested :): If you rdma_bind() to an explicit address local address, it will fail. Foo! I guess I'll need to address the uses of ip_dev_find() in addr.c as well before we commit this. What really bothers me is I cannot find the kernel code in the 2.6.5-7.244 kernel that is doing this (returning loopback for all local devices). ip_dev_find() does a FIB lookup to find this. I dug around the fib code but so far haven't found the culprit. I welcome any help from anyone out there interested in the rdma-cm working on sles9sp3. I would think if SDP does an rdma_bind() then SDP will also see this bug when run on sles9sp3. (Are SUSE folks listening?) Any thoughts? Steve. From swise at opengridcomputing.com Tue Feb 6 10:09:17 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 12:09:17 -0600 Subject: [openib-general] OFED-1.2 first release In-Reply-To: <1170724023.19728.5.camel@stevo-desktop> References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> <1170721163.16661.111.camel@stevo-desktop> <1170724023.19728.5.camel@stevo-desktop> Message-ID: <1170785357.19662.83.camel@stevo-desktop> opened bug 340. On Mon, 2007-02-05 at 19:07 -0600, Steve Wise wrote: > I think there might be some dependency problem. I selected libibverbs, > libcxgb3, librdmacm, perftest, mvapich2/IWARP and mpitests. For some > reason it pulled in libibumad as a prereq, but not libibcommon... > > Also, I think mvapich2/IWARP links with libibumad or libibcommon and it > doesn't need to when using librdmacm. > > > > [root at r2-iw redhat-release-4AS-5.5]# rpm -U * > error: Failed dependencies: > libibcommon.so.1()(64bit) is needed by libibumad-1.0.2-0.x86_64 > libibcommon.so.1(IBCOMMON_1.0)(64bit) is needed by libibumad-1.0.2-0.x86_64 > Suggested resolutions: > libibcommon-1.0-1.x86_64.rpm > > > > On Tue, 2007-02-06 at 00:25 +0200, Vladimir Sokolovsky wrote: > > > Hi, > > > > > > OFED-1.2-20070205-1823.tgz can be downloaded from > > > > > > http://www.openfabrics.org/builds/ofed-1.2/ > > > > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Tue Feb 6 10:35:40 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Feb 2007 10:35:40 -0800 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. In-Reply-To: <1170784924.19662.79.camel@stevo-desktop> References: <1170782906.19662.61.camel@stevo-desktop> <45C8BCC9.4070003@ichips.intel.com> <1170784924.19662.79.camel@stevo-desktop> Message-ID: <45C8CA7C.3080705@ichips.intel.com> > Actually, yes it does. Here's one case (that I just tested :): > > If you rdma_bind() to an explicit address local address, it will fail. > > Foo! > > I guess I'll need to address the uses of ip_dev_find() in addr.c as well > before we commit this. Can we just backport our own version of ip_dev_find()? We had this once before in svn when they removed it from being exported from the kernel. - Sean From sweitzen at cisco.com Tue Feb 6 10:54:34 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 6 Feb 2007 10:54:34 -0800 Subject: [openib-general] OFED-1.2 first release In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: libibverbs is not working. I have opened bugs 342-346 for the issues I've found so far: # ibv_devices libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'. libibverbs: Warning: no userspace device-specific driver found for /sys/class/in finiband_verbs/uverbs0 device node GUID ------ ---------------- Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Tuesday, February 06, 2007 9:34 AM To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet Koren' Cc: 'openib-general at openib.org' Subject: RE: [openib-general] OFED-1.2 first release sdpnetstat is getting added to the dapl-devel RPM. # rpm -qlip dapl-devel-1.2.0-0.x86_64.rpm Name : dapl-devel Relocations: (not relocatable) Version : 1.2.0 Vendor: OpenFabrics Release : 0 Build Date: Mon 05 Feb 2007 09:48:50 PM PST Install Date: (not installed) Build Host: svbu-qa1850-1.cisco.com Group : System Environment/Libraries Source RPM: ofa_user-1.2-alpha1.src .rpm Size : 692598 License: GPL/BSD Signature : (none) URL : http://www.openfabrics.org/ Summary : Development files for the libdat and libdapl libraries Description : Static libraries and header files for the libdat and libdapl library. /usr/local/ofed/bin/sdpnetstat /usr/local/ofed/include/dat/dat.h /usr/local/ofed/include/dat/dat_error.h /usr/local/ofed/include/dat/dat_platform_specific.h /usr/local/ofed/include/dat/dat_redirection.h /usr/local/ofed/include/dat/dat_registry.h /usr/local/ofed/include/dat/dat_vendor_specific.h /usr/local/ofed/include/dat/udat.h /usr/local/ofed/include/dat/udat_config.h /usr/local/ofed/include/dat/udat_redirection.h /usr/local/ofed/include/dat/udat_vendor_specific.h /usr/local/ofed/lib64/libdaplcma.a /usr/local/ofed/lib64/libdaplcma.so /usr/local/ofed/lib64/libdat.a /usr/local/ofed/lib64/libdat.so ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Tuesday, February 06, 2007 12:07 AM To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet Koren' Cc: 'openib-general at openib.org' Subject: RE: [openib-general] OFED-1.2 first release Not getting MPI RPMS for Intel compilers, either. Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp itests_mvapich2_gcc-2.0-698.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mvapich2_intel-0 .9.8-1.x 86_64.rpm not found Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/op enmpi_gcc-1.2b4ofedr13470-1ofed.x86_64.rpm Running /bin/rpm -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp itests_openmpi_gcc-2.0-698.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/openmpi_intel-1. 2b4ofedr 13470-1ofed.x86_64.rpm not found ERROR: -.x86_64.rpm not found under /tmp/OFED-1.2-20070205-1823/RPMS/redhat-rele ase-4AS-4.1. Installation finished successfully... Scott ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Monday, February 05, 2007 9:44 PM To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet Koren' Cc: 'openib-general at openib.org' Subject: RE: [openib-general] OFED-1.2 first release Moving on, I set ib_bonding=n in ofed.conf and try install.sh again, and now get this: ... Building MVAPICH RPM. Please wait... Using gcc compiler Running rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_nam e mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --define 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/loc al/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9 71.src.rpm ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRP M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --defi ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM S/mvapich-0.9.9-971.src.rpm" See log file: /tmp/OFED.6120.log # tail /tmp/OFED.6120.log + LANG=C + export LANG + unset DISPLAY /var/tmp/rpm-tmp.870: line 33: syntax error near unexpected token `)' error: Bad exit status from /var/tmp/rpm-tmp.870 (%install) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.870 (%install) ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRP M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --defi ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM S/mvapich-0.9.9-971.src.rpm" Scott ________________________________ From: Scott Weitzenkamp (sweitzen) Sent: Monday, February 05, 2007 9:27 PM To: Vladimir Sokolovsky; openfabrics-ewg at openib.org; Tziporet Koren; Scott Weitzenkamp (sweitzen) Cc: openib-general at openib.org Subject: RE: [openib-general] OFED-1.2 first release Vlad and Tziporet, It might help if you elaborated on what you meant by "first release", you have been saying "code freeze" but really this is "feature freeze", right? This announcement is quite a bit different from previous OFED announcements, where you detailed what features were available and what OS were supported. The daily build email mentions compiling against kernels, but I haven't seen what distros were actually tested. Are we starting from scratch on compiling and testing with distros like RHEL4? Do you anticipate we will just go day by day with builds trying to stabilize things initially? In any case, here's what I see when I try to compile with install.sh on RHEL4 U3 x86_64: ... /tmp/OFED-1.2-20070205-1823/build.sh: line 802: kernel-ib: command not found Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1. 2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/ OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --target=noarch --define '_topdir /var/tmp/OFEDRPM' - -define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts -1.2-0.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm Running /bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" See log file: /tmp/OFED.10899.log # tail -10 /tmp/OFED.10899.log Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/ib-bonding-0. 9.0-root Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + rm -rf ib-bonding-0.9.0 + exit 0 /bin/mv: cannot stat `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm ': No such file or directory ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir Sokolovsky Sent: Monday, February 05, 2007 2:26 PM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openib-general] OFED-1.2 first release Hi, OFED-1.2-20070205-1823.tgz can be downloaded from http://www.openfabrics.org/builds/ofed-1.2/ The first OFED package includes: ofa_kernel-1.2-alpha1.src.rpm ofa_user-1.2-alpha1.src.rpm mvapich-0.9.9-971.src.rpm mvapich2-0.9.8-1.src.rpm openmpi-1.2b4ofedr13470-1ofed.src.rpm mpitests-2.0-698.src.rpm open-iscsi-generic-2.0-742.src.rpm ib-bonding-0.9.0-1.src.rpm ofed-docs-1.2-0.src.rpm ofed-scripts-1.2-0.src.rpm Known issues: srptools - compilation fails openib_diags - compilation fails ibutils - not included yet To build OFED RPMs: cd OFED-1.2-20070205-1823 ./build.sh Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/ directory. To install OFED RPMs: cd OFED-1.2-20070205-1823 ./install.sh For a detailed installation guide, see OFED-1.2-xxx/docs/OFED_Installation_Guide.txt -- Vladimir Sokolovsky Mellanox Technologies Ltd. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Feb 6 10:57:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Feb 2007 10:57:55 -0800 Subject: [openib-general] [PATCH 2.6.20] infinband: Use ARRAY_SIZE macro when appropriate In-Reply-To: <20070206160725.GJ8991@Ahmed> (Ahmed S. Darwish's message of "Tue, 6 Feb 2007 18:07:25 +0200") References: <20070206160725.GJ8991@Ahmed> Message-ID: Thanks, queued in my tree for 2.6.21 From swise at opengridcomputing.com Tue Feb 6 11:07:38 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 13:07:38 -0600 Subject: [openib-general] OFED-1.2 first release In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com> Message-ID: <1170788858.19662.85.camel@stevo-desktop> I already opened one for the libibverbs.d problem. 339 On Tue, 2007-02-06 at 10:54 -0800, Scott Weitzenkamp (sweitzen) wrote: > libibverbs is not working. I have opened bugs 342-346 for the issues > I've found so far: > > # ibv_devices > libibverbs: Warning: couldn't open config directory > '/usr/local/ofed/etc/libibverbs.d'. > libibverbs: Warning: no userspace device-specific driver found for > /sys/class/in > finiband_verbs/uverbs0 > device node GUID > ------ ---------------- > > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > ______________________________________________________________ > From: Scott Weitzenkamp (sweitzen) > Sent: Tuesday, February 06, 2007 9:34 AM > To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky'; > 'openfabrics-ewg at openib.org'; 'Tziporet Koren' > Cc: 'openib-general at openib.org' > Subject: RE: [openib-general] OFED-1.2 first release > > > > sdpnetstat is getting added to the dapl-devel RPM. > > # rpm -qlip dapl-devel-1.2.0-0.x86_64.rpm > Name : dapl-devel Relocations: (not > relocatable) > Version : 1.2.0 Vendor: > OpenFabrics > Release : 0 Build Date: Mon 05 > Feb 2007 09:48:50 > PM PST > Install Date: (not installed) Build Host: > svbu-qa1850-1.cisco.com > Group : System Environment/Libraries Source RPM: > ofa_user-1.2-alpha1.src > .rpm > Size : 692598 License: > GPL/BSD > Signature : (none) > URL : http://www.openfabrics.org/ > Summary : Development files for the libdat and libdapl > libraries > Description : > Static libraries and header files for the libdat and libdapl > library. > /usr/local/ofed/bin/sdpnetstat > /usr/local/ofed/include/dat/dat.h > /usr/local/ofed/include/dat/dat_error.h > /usr/local/ofed/include/dat/dat_platform_specific.h > /usr/local/ofed/include/dat/dat_redirection.h > /usr/local/ofed/include/dat/dat_registry.h > /usr/local/ofed/include/dat/dat_vendor_specific.h > /usr/local/ofed/include/dat/udat.h > /usr/local/ofed/include/dat/udat_config.h > /usr/local/ofed/include/dat/udat_redirection.h > /usr/local/ofed/include/dat/udat_vendor_specific.h > /usr/local/ofed/lib64/libdaplcma.a > /usr/local/ofed/lib64/libdaplcma.so > /usr/local/ofed/lib64/libdat.a > /usr/local/ofed/lib64/libdat.so > > > > ______________________________________________________ > From: Scott Weitzenkamp (sweitzen) > Sent: Tuesday, February 06, 2007 12:07 AM > To: Scott Weitzenkamp (sweitzen); 'Vladimir > Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet > Koren' > Cc: 'openib-general at openib.org' > Subject: RE: [openib-general] OFED-1.2 first release > > > > Not getting MPI RPMS for Intel compilers, either. > > Running /bin/rpm > -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp > itests_mvapich2_gcc-2.0-698.x86_64.rpm > /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mvapich2_intel-0.9.8-1.x > 86_64.rpm not found > Running /bin/rpm > -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/op > enmpi_gcc-1.2b4ofedr13470-1ofed.x86_64.rpm > Running /bin/rpm > -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp > itests_openmpi_gcc-2.0-698.x86_64.rpm > /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/openmpi_intel-1.2b4ofedr > 13470-1ofed.x86_64.rpm not found > ERROR: -.x86_64.rpm not found > under /tmp/OFED-1.2-20070205-1823/RPMS/redhat-rele > ase-4AS-4.1. > Installation finished successfully... > > Scott > > > ______________________________________________ > From: Scott Weitzenkamp (sweitzen) > Sent: Monday, February 05, 2007 9:44 PM > To: Scott Weitzenkamp (sweitzen); 'Vladimir > Sokolovsky'; 'openfabrics-ewg at openib.org'; > 'Tziporet Koren' > Cc: 'openib-general at openib.org' > Subject: RE: [openib-general] OFED-1.2 first > release > > > > Moving on, I set ib_bonding=n in ofed.conf and > try install.sh again, and now get this: > > ... > Building MVAPICH RPM. Please wait... > > Using gcc compiler > Running rpmbuild -v --rebuild --define > '_topdir /var/tmp/OFEDRPM' --define '_nam > e mvapich_gcc' --define 'ofed 1' --define > 'compiler gcc' --define 'openib_prefix > /usr/local/ofed' --define > 'build_root /var/tmp/OFED' --define > '_prefix /usr/loc > al/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9 > 71.src.rpm > > ERROR: Failed executing "rpmbuild -v --rebuild > --define '_topdir /var/tmp/OFEDRP > M' --define '_name mvapich_gcc' --define 'ofed > 1' --define 'compiler gcc' --defi > ne 'openib_prefix /usr/local/ofed' --define > 'build_root /var/tmp/OFED' --define > '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM > S/mvapich-0.9.9-971.src.rpm" > > See log file: /tmp/OFED.6120.log > > # tail /tmp/OFED.6120.log > + LANG=C > + export LANG > + unset DISPLAY > /var/tmp/rpm-tmp.870: line 33: syntax error > near unexpected token `)' > error: Bad exit status > from /var/tmp/rpm-tmp.870 (%install) > > > RPM build errors: > Bad exit status from /var/tmp/rpm-tmp.870 > (%install) > ERROR: Failed executing "rpmbuild -v --rebuild > --define '_topdir /var/tmp/OFEDRP > M' --define '_name mvapich_gcc' --define 'ofed > 1' --define 'compiler gcc' --defi > ne 'openib_prefix /usr/local/ofed' --define > 'build_root /var/tmp/OFED' --define > '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM > S/mvapich-0.9.9-971.src.rpm" > > Scott > > > ______________________________________ > From: Scott Weitzenkamp (sweitzen) > Sent: Monday, February 05, 2007 9:27 > PM > To: Vladimir Sokolovsky; > openfabrics-ewg at openib.org; Tziporet > Koren; Scott Weitzenkamp (sweitzen) > Cc: openib-general at openib.org > Subject: RE: [openib-general] OFED-1.2 > first release > > > > Vlad and Tziporet, > > It might help if you elaborated on > what you meant by "first release", you > have been saying "code freeze" but > really this is "feature freeze", > right? This announcement is quite a > bit different from previous OFED > announcements, where you detailed what > features were available and what OS > were supported. The daily build email > mentions compiling against kernels, > but I haven't seen what distros were > actually tested. Are we starting from > scratch on compiling and testing with > distros like RHEL4? Do you anticipate > we will just go day by day with builds > trying to stabilize things initially? > > In any case, here's what I see when I > try to compile with install.sh on > RHEL4 U3 x86_64: > > ... > /tmp/OFED-1.2-20070205-1823/build.sh: > line 802: kernel-ib: command not found > Running rpmbuild --rebuild > --target=noarch --define > '_topdir /var/tmp/OFEDRPM' - > -define > '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1. > 2-0.src.rpm > Running /bin/mv > -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/ > OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 > Running rpmbuild --rebuild > --target=noarch --define > '_topdir /var/tmp/OFEDRPM' - > -define > '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts > -1.2-0.src.rpm > Running /bin/mv > -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t > mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 > Running rpmbuild --rebuild --define > '_topdir /var/tmp/OFEDRPM' --define > '_prefix > /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm > Running /bin/mv > -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t > mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 > > ERROR: Failed executing "/bin/mv > -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. > 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" > > See log file: /tmp/OFED.10899.log > > # tail -10 /tmp/OFED.10899.log > Checking for unpackaged > file(s): /usr/lib/rpm/check-files /var/tmp/ib-bonding-0. > 9.0-root > Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm > Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm > Executing(--clean): /bin/sh > -e /var/tmp/rpm-tmp.98615 > + umask 022 > + cd /var/tmp/OFEDRPM/BUILD > + rm -rf ib-bonding-0.9.0 > + exit 0 > /bin/mv: cannot stat > `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm > ': No such file or directory > ERROR: Failed executing "/bin/mv > -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. > 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1" > > > > > Scott > > > ______________________________ > From: > openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir Sokolovsky > Sent: Monday, February 05, > 2007 2:26 PM > To: openfabrics-ewg at openib.org > Cc: openib-general at openib.org > Subject: [openib-general] > OFED-1.2 first release > > > > Hi, > > OFED-1.2-20070205-1823.tgz can be downloaded from > > http://www.openfabrics.org/builds/ofed-1.2/ > > > > > > The first OFED package includes: > > > > ofa_kernel-1.2-alpha1.src.rpm > > ofa_user-1.2-alpha1.src.rpm > > mvapich-0.9.9-971.src.rpm > > mvapich2-0.9.8-1.src.rpm > > openmpi-1.2b4ofedr13470-1ofed.src.rpm > > mpitests-2.0-698.src.rpm > > open-iscsi-generic-2.0-742.src.rpm > > ib-bonding-0.9.0-1.src.rpm > > ofed-docs-1.2-0.src.rpm > > ofed-scripts-1.2-0.src.rpm > > > > Known issues: > > srptools - compilation fails > > openib_diags - compilation fails > > ibutils - not included yet > > > > To build OFED RPMs: > > cd OFED-1.2-20070205-1823 > > ./build.sh > > > > Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/ > > directory. > > > > To install OFED RPMs: > > cd OFED-1.2-20070205-1823 > > ./install.sh > > > > For a detailed installation guide, see > > OFED-1.2-xxx/docs/OFED_Installation_Guide.txt > > > > -- > > Vladimir Sokolovsky > > Mellanox Technologies Ltd. > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Tue Feb 6 11:22:31 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 21:22:31 +0200 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. In-Reply-To: <45C8CA7C.3080705@ichips.intel.com> References: <1170782906.19662.61.camel@stevo-desktop> <45C8BCC9.4070003@ichips.intel.com> <1170784924.19662.79.camel@stevo-desktop> <45C8CA7C.3080705@ichips.intel.com> Message-ID: <20070206192231.GG24372@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. > > > Actually, yes it does. Here's one case (that I just tested :): > > > > If you rdma_bind() to an explicit address local address, it will fail. > > > > Foo! > > > > I guess I'll need to address the uses of ip_dev_find() in addr.c as well > > before we commit this. > > Can we just backport our own version of ip_dev_find()? We had this once before > in svn when they removed it from being exported from the kernel. Yes, this is in kernel_addons for 2.6.19 or something like that. Just copy from there, much cleaner than the patch. -- MST From Tim.Snider at lsi.com Tue Feb 6 11:15:32 2007 From: Tim.Snider at lsi.com (Snider, Tim) Date: Tue, 6 Feb 2007 12:15:32 -0700 Subject: [openib-general] Run srp and ipoib on same port simultaneously? Message-ID: <18A61515E49B764AB09447A336E51F5693EC25@NAMAIL2.ad.lsil.com> Is there anything that prevents 2 ULPs - srp and ipoib - from running simultaneously on the same port in OFED 1.1.1? If so what about different ports on the same hca? Timothy Snider Storage Architect Strategic Planning, Technology and Architecture LSI Logic Corporation 3718 North Rock Road Wichita, KS 67226 (316) 636-8736 tim.snider at lsi.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at lists.openfabrics.org Tue Feb 6 11:26:23 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Feb 2007 11:26:23 -0800 (PST) Subject: [openib-general] [Bug 347] New: rdma cm backport to EL4 - U3 broken Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=347 Summary: rdma cm backport to EL4 - U3 broken Product: OpenFabrics Linux Version: 1.2 Platform: X86-64 OS/Version: RHEL 4 Status: NEW Severity: blocker Priority: P1 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: robert.j.woodruff at intel.com librdmacm: couldn't read ABI version. librdmacm: assuming: 4 I was able to fix this by applying the following backport patch when running on EL4-U3 diff -Naurp linux-2.6.9/drivers/infiniband/core/ucma.c linux-2.6.9-openib-drivers-git013007-fixups/drivers/infiniband/core/ucma.c --- linux-2.6.9/drivers/infiniband/core/ucma.c 2007-01-30 13:13:54.000000000 -0800 +++ linux-2.6.9-openib-drivers-git013007-fixups/drivers/infiniband/core/ucma.c 2007-01-30 13:35:56.000000000 -0800 @@ -1045,13 +1045,13 @@ static struct miscdevice ucma_misc = { .fops = &ucma_fops, }; -static ssize_t show_abi_version(struct device *dev, - struct device_attribute *attr, - char *buf) +static struct class *ucma_class; +static ssize_t show_abi_version(struct class *class_dev, char *buf) { - return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION); + return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION); } -static DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); +static CLASS_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); + static int __init ucma_init(void) { @@ -1061,22 +1061,28 @@ static int __init ucma_init(void) if (ret) return ret; - ret = device_create_file(ucma_misc.this_device, &dev_attr_abi_version); - if (ret) { - printk(KERN_ERR "rdma_ucm: couldn't create abi_version attr\n"); - goto err; - } - return 0; + ucma_class = class_create(THIS_MODULE, "infiniband_ucma"); + if (IS_ERR(ucma_class)) { + printk(KERN_ERR "rdma_ucm: couldn't create class infiniband_ucma\n"); + goto err; + } + + ret = class_create_file(ucma_class, &class_attr_abi_version); + if (ret) { + printk(KERN_ERR "user_verbs: couldn't create abi_version attribute\n"); + goto err; + } + + return 0; err: - misc_deregister(&ucma_misc); - return ret; + misc_deregister(&ucma_misc); + return ret; } + static void __exit ucma_cleanup(void) { - device_remove_file(ucma_misc.this_device, &dev_attr_abi_version); misc_deregister(&ucma_misc); - idr_destroy(&ctx_idr); } module_init(ucma_init); -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rdreier at cisco.com Tue Feb 6 11:33:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Feb 2007 11:33:02 -0800 Subject: [openib-general] Run srp and ipoib on same port simultaneously? In-Reply-To: <18A61515E49B764AB09447A336E51F5693EC25@NAMAIL2.ad.lsil.com> (Tim Snider's message of "Tue, 6 Feb 2007 12:15:32 -0700") References: <18A61515E49B764AB09447A336E51F5693EC25@NAMAIL2.ad.lsil.com> Message-ID: > Is there anything that prevents 2 ULPs - srp and ipoib - from running > simultaneously on the same port in OFED 1.1.1? No, there isn't. Are you seeing problems trying it? - R. From swise at opengridcomputing.com Tue Feb 6 11:42:03 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 13:42:03 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. In-Reply-To: <20070206192231.GG24372@mellanox.co.il> References: <1170782906.19662.61.camel@stevo-desktop> <45C8BCC9.4070003@ichips.intel.com> <1170784924.19662.79.camel@stevo-desktop> <45C8CA7C.3080705@ichips.intel.com> <20070206192231.GG24372@mellanox.co.il> Message-ID: <1170790923.19662.95.camel@stevo-desktop> On Tue, 2007-02-06 at 21:22 +0200, Michael S. Tsirkin wrote: > > Quoting Sean Hefty : > > Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. > > > > > Actually, yes it does. Here's one case (that I just tested :): > > > > > > If you rdma_bind() to an explicit address local address, it will fail. > > > > > > Foo! > > > > > > I guess I'll need to address the uses of ip_dev_find() in addr.c as well > > > before we commit this. > > > > Can we just backport our own version of ip_dev_find()? We had this once before > > in svn when they removed it from being exported from the kernel. > > Yes, this is in kernel_addons for 2.6.19 or something like that. > Just copy from there, much cleaner than the patch. > I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find for sles9sp3. So maybe this function is causing the error. Stay tuned. Steve. From rdreier at cisco.com Tue Feb 6 11:54:13 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Feb 2007 11:54:13 -0800 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <45C85B39.4080700@voltaire.com> (Or Gerlitz's message of "Tue, 06 Feb 2007 12:40:57 +0200") References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> Message-ID: > Can you comment on the multicast changes merge for 2.6.21 status? Where are the final patches that you want to merge? - R. From Tim.Snider at lsi.com Tue Feb 6 11:48:34 2007 From: Tim.Snider at lsi.com (Snider, Tim) Date: Tue, 6 Feb 2007 12:48:34 -0700 Subject: [openib-general] Run srp and ipoib on same port simultaneously? Message-ID: <18A61515E49B764AB09447A336E51F5693EC31@NAMAIL2.ad.lsil.com> No specific problems using the 2 just questioning, I've been looking at other stuff recently. I'm trying a single server to: 1. Connect Lustre servers using ipoib and 2. recognize the IB storage using srp. all the ibv_xx_ping_pong routines work between servers. Pings using ipoib IP addresses also work. Lustre says ipoib is down, & srp doesn't see luns as it did yesterday. Trying to rule out pilot errors / configuration problems. thanks -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Tuesday, February 06, 2007 1:33 PM To: Snider, Tim Cc: openib-general at openib.org Subject: Re: [openib-general] Run srp and ipoib on same port simultaneously? > Is there anything that prevents 2 ULPs - srp and ipoib - from running > simultaneously on the same port in OFED 1.1.1? No, there isn't. Are you seeing problems trying it? - R. From rdreier at cisco.com Tue Feb 6 11:59:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Feb 2007 11:59:25 -0800 Subject: [openib-general] [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: (Shirley Ma's message of "Mon, 5 Feb 2007 07:50:55 -0700") References: Message-ID: > Here is the patch, if possible please give your input asap, we have an > urgent customer issue need to be resolved: I guess this is OK, but what is the urgent issue it fixes? - R. From rdreier at cisco.com Tue Feb 6 11:58:38 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Feb 2007 11:58:38 -0800 Subject: [openib-general] [libmthca] deadlock while trying to destroy QP In-Reply-To: <45C75EA2.6000905@Voltaire.COM> (guyg@voltaire.com's message of "Mon, 05 Feb 2007 18:43:14 +0200") References: <45C75EA2.6000905@Voltaire.COM> Message-ID: > #0 0x0000003a6ce09172 in pthread_spin_lock () from /lib64/tls/libpthread.so.0 > #1 0x0000002a959cf449 in mthca_cq_clean (cq=0x607240, qpn=3277830, srq=0x0) at src/cq.c:554 > #2 0x0000002a959d28b9 in mthca_destroy_qp (qp=0x607400) at src/mthca.h:246 > #3 0x000000000040117b in client_sig_handler () > #4 > #5 0x0000003a6ce09165 in pthread_spin_lock () from /lib64/tls/libpthread.so.0 > #6 0x0000002a959cec91 in mthca_poll_cq (ibcq=0x607240, ne=1, wc=0x7fbffff590) at src/cq.c:467 > #7 0x0000002a9557bf73 in ibv_poll_cq (cq=0x607240, num_entries=1, wc=0x7fbffff590) at /usr/local/ofed/include/infiniband/verbs.h:824 I guess my first reaction is "don't do that." Trying to do something as complex as destroying a QP from a signal handler seems very fragile to me, and I wouldn't consider ibv_destroy_qp() safe to call from a signal handler. Can you just have your signal handler set a flag instead, and check the flag from the normal flow of your program? > Does destroy_qp needs to be dependent on the CQ? Yes, it needs to lock the CQ to get rid of stale completions for the QP being destroyed. - R. From sean.hefty at intel.com Tue Feb 6 12:00:22 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Feb 2007 12:00:22 -0800 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: Message-ID: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> > > Can you comment on the multicast changes merge for 2.6.21 status? > >Where are the final patches that you want to merge? Try the for-roland branch at git.openfabrics.org/~shefty/scm/rdma-dev.git. If this doesn't work, or you hit any snags, let me know, and I'll try to correct any issues so that simple pulls work in the future. Note that my tree is still at rc6. There should be 4 patches. - Sean From rdreier at cisco.com Tue Feb 6 12:02:56 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Feb 2007 12:02:56 -0800 Subject: [openib-general] [libmthca] deadlock while trying to destroy QP In-Reply-To: (Roland Dreier's message of "Tue, 06 Feb 2007 11:58:38 -0800") References: <45C75EA2.6000905@Voltaire.COM> Message-ID: > I guess my first reaction is "don't do that." eg look at http://www.gnu.org/software/libc/manual/html_node/Nonreentrancy.html From xma at us.ibm.com Tue Feb 6 12:16:59 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Feb 2007 12:16:59 -0800 Subject: [openib-general] [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Thanks Roland, I will apply the patch to the customer's cluster. The problem I found when failover bringing the new IPoIB interface up in the existing fabric, with a limit number of multicast join groups from our configuration, the interface can join broadcast group successfully, but all hosts multicast group join failure. Then ib interface can be UP, but not RUNNING. The interface couldn't work at all. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Roland Dreier To Shirley Ma/Beaverton/IBM at IBMUS 02/06/2007 11:59 cc AM "Michael S. Tsirkin" , openib-general at openib.org Subject Re: [PATCH] enable IPoIB only if broadcast join finish > Here is the patch, if possible please give your input asap, we have an > urgent customer issue need to be resolved: I guess this is OK, but what is the urgent issue it fixes? - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic21861.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From swise at opengridcomputing.com Tue Feb 6 12:24:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 14:24:43 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug. In-Reply-To: <1170790923.19662.95.camel@stevo-desktop> References: <1170782906.19662.61.camel@stevo-desktop> <45C8BCC9.4070003@ichips.intel.com> <1170784924.19662.79.camel@stevo-desktop> <45C8CA7C.3080705@ichips.intel.com> <20070206192231.GG24372@mellanox.co.il> <1170790923.19662.95.camel@stevo-desktop> Message-ID: <1170793483.19662.112.camel@stevo-desktop> > > > > > > Can we just backport our own version of ip_dev_find()? We had this once before > > > in svn when they removed it from being exported from the kernel. > > > > Yes, this is in kernel_addons for 2.6.19 or something like that. > > Just copy from there, much cleaner than the patch. > > > > I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find > for sles9sp3. So maybe this function is causing the error. Stay tuned. xxx_ip_dev_find() is returning the wrong interface (sometimes). I added printks to xxx_ip_dev_find(). Then I ran rping -s -a and it failed because xxx_ip_dev_find() returned loopback instead of my eth device. Here is the function with printks: static inline struct net_device *xxx_ip_dev_find(u32 addr) { struct net_device *dev; u32 ip; read_lock(&dev_base_lock); printk("%s looking for dev with addr %x\n", __FUNCTION__, addr); for (dev = dev_base; dev; dev = dev->next) { ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); printk("%s dev %p name %s ipaddr %x\n", __FUNCTION__, dev, dev->name, ip); if (ip == addr) { dev_hold(dev); break; } } read_unlock(&dev_base_lock); return dev; } Here is the printk log showing loopback being returned: xxx_ip_dev_find looking for dev with addr 8846a8c0 xxx_ip_dev_find dev ffffffff804000e0 name lo ipaddr 8846a8c0 The address bound to eth3 is 192.168.70.136 (0xc0a84688). For some reason, this line: ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); Returns the 192.168.70.136 address for device->name == "lo". Riddle me that! Also, sometimes it works ok because the loopback interface gets some other ip address that is assigned to the local system as opposed to my rdma address. For example, I booted up the sles9sp3 system with a rebuilt kernel and no ofed modules installed. The system gets 10.10.0.136 via DHCP for its "public" interface. I then built the ofed modules and installed them. I then loaded them and configured my rnic interface with 192.168.70.136. I ran rping and bound to the local ipaddr and it worked. The log showed that inet_select_addr() returned 10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking the list and found the correct ethernet interface. I then rebooted and ran the test again and it failed. So somehow module load order affects this, I think. grrrr. Steve. From mst at mellanox.co.il Tue Feb 6 12:32:54 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 22:32:54 +0200 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug. In-Reply-To: <1170793483.19662.112.camel@stevo-desktop> References: <1170793483.19662.112.camel@stevo-desktop> Message-ID: <20070206203253.GL24372@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug. > > > > > > > > > Can we just backport our own version of ip_dev_find()? We had this once before > > > > in svn when they removed it from being exported from the kernel. > > > > > > Yes, this is in kernel_addons for 2.6.19 or something like that. > > > Just copy from there, much cleaner than the patch. > > > > > > > I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find > > for sles9sp3. So maybe this function is causing the error. Stay tuned. > > xxx_ip_dev_find() is returning the wrong interface (sometimes). I added > printks to xxx_ip_dev_find(). Then I ran rping -s -a > and it failed because xxx_ip_dev_find() returned loopback instead of my > eth device. > > Here is the function with printks: > > static inline struct net_device *xxx_ip_dev_find(u32 addr) > { > struct net_device *dev; > u32 ip; > > read_lock(&dev_base_lock); > printk("%s looking for dev with addr %x\n", __FUNCTION__, addr); > for (dev = dev_base; dev; dev = dev->next) { > ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > printk("%s dev %p name %s ipaddr %x\n", __FUNCTION__, > dev, dev->name, ip); > if (ip == addr) { > dev_hold(dev); > break; > } > } > read_unlock(&dev_base_lock); > > return dev; > } > > > Here is the printk log showing loopback being returned: > > xxx_ip_dev_find looking for dev with addr 8846a8c0 > xxx_ip_dev_find dev ffffffff804000e0 name lo ipaddr 8846a8c0 > > The address bound to eth3 is 192.168.70.136 (0xc0a84688). For some > reason, this line: > > ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > > Returns the 192.168.70.136 address for device->name == "lo". > > Riddle me that! > > Also, sometimes it works ok because the loopback interface gets some > other ip address that is assigned to the local system as opposed to my > rdma address. For example, I booted up the sles9sp3 system with a > rebuilt kernel and no ofed modules installed. The system gets > 10.10.0.136 via DHCP for its "public" interface. I then built the ofed > modules and installed them. I then loaded them and configured my rnic > interface with 192.168.70.136. I ran rping and bound to the local > ipaddr and it worked. The log showed that inet_select_addr() returned > 10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking > the list and found the correct ethernet interface. I then rebooted and > ran the test again and it failed. So somehow module load order affects > this, I think. > > grrrr. Try copying inet_select_addr source in from some upstream kernel, look at that. -- MST From michael.arndt at informatik.tu-chemnitz.de Tue Feb 6 12:58:29 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Tue, 6 Feb 2007 21:58:29 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> Message-ID: <000601c74a31$8e279480$21606d86@one7> Hi, > Guess you don't mean IB router when you say router in your description. yes > Is the sender a normal node ? Is normal node mean standard OpenIB > without changes ? How was the SMI changed ? On which nodes ? Only the > intermediate one ? Yes, the sender is a normal node without any changes. Yes, the SMI is only on intermediate ones changed. > Aside from the initial path being [0][1][1], what are the hop count and > hop pointer ? What are DrDLID and DrSLID as well as the LIDs in the LRH > of the SMP ? node 1 -> node 2 -> node 3 (router on node 2) The orginal packet has the initial path [0][1][1], return path [0][2][2], hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are permissive. LID in the LRH are both 0. Thanks Michael From michael.arndt at informatik.tu-chemnitz.de Tue Feb 6 13:14:17 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Tue, 6 Feb 2007 22:14:17 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> Message-ID: <002001c74a33$c2ec1db0$21606d86@one7> Sorry, there was a little mistake. The orginal packet has the initial path [0][1][1], return path [0][2][2], hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are permissive. The packet I asking for has the initial path [0][0][0], return path [0][0][0], hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are 0. And the LIDs in LRH are 0. The rest of the smp header is the same as it is in the original header. Micheal Arndt From swise at opengridcomputing.com Tue Feb 6 13:17:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 15:17:43 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug. In-Reply-To: <20070206203253.GL24372@mellanox.co.il> References: <1170793483.19662.112.camel@stevo-desktop> <20070206203253.GL24372@mellanox.co.il> Message-ID: <1170796663.19662.117.camel@stevo-desktop> > Try copying inet_select_addr source in from some upstream kernel, > look at that. > It appears that xxx_ip_find_dev() should be calling inet_select_addr with RT_SCOPE_HOST and not RT_SCOPE_LINK. Everything works fine for me if I change xxx_ip_find_dev() to use RT_SCOPE_HOST. >From the header file linux/rtnetlink.h. Note the comment on HOST vs LINK: /* rtm_scope Really it is not scope, but sort of distance to the destination. NOWHERE are reserved for not existing destinations, HOST is our local addresses, LINK are destinations, located on directly attached link and UNIVERSE is everywhere in the Universe. Intermediate values are also possible f.e. interior routes could be assigned a value between UNIVERSE and LINK. */ enum rt_scope_t { RT_SCOPE_UNIVERSE=0, /* User defined values */ RT_SCOPE_SITE=200, RT_SCOPE_LINK=253, RT_SCOPE_HOST=254, RT_SCOPE_NOWHERE=255 }; From swise at opengridcomputing.com Tue Feb 6 13:34:03 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 15:34:03 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug. In-Reply-To: <1170796663.19662.117.camel@stevo-desktop> References: <1170793483.19662.112.camel@stevo-desktop> <20070206203253.GL24372@mellanox.co.il> <1170796663.19662.117.camel@stevo-desktop> Message-ID: <1170797643.19662.120.camel@stevo-desktop> How shall I fix this? I think the correct scope is RT_SCOPE_HOST. Anyone know why RT_SCOPE_LINK was chosen? On Tue, 2007-02-06 at 15:17 -0600, Steve Wise wrote: > > Try copying inet_select_addr source in from some upstream kernel, > > look at that. > > > > It appears that xxx_ip_find_dev() should be calling inet_select_addr > with RT_SCOPE_HOST and not RT_SCOPE_LINK. Everything works fine for me > if I change xxx_ip_find_dev() to use RT_SCOPE_HOST. > > > >From the header file linux/rtnetlink.h. Note the comment on HOST vs > LINK: > > > /* rtm_scope > > Really it is not scope, but sort of distance to the destination. > NOWHERE are reserved for not existing destinations, HOST is our > local addresses, LINK are destinations, located on directly attached > link and UNIVERSE is everywhere in the Universe. > > Intermediate values are also possible f.e. interior routes > could be assigned a value between UNIVERSE and LINK. > */ > > enum rt_scope_t > { > RT_SCOPE_UNIVERSE=0, > /* User defined values */ > RT_SCOPE_SITE=200, > RT_SCOPE_LINK=253, > RT_SCOPE_HOST=254, > RT_SCOPE_NOWHERE=255 > }; > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Tue Feb 6 13:36:56 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Feb 2007 23:36:56 +0200 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug. In-Reply-To: <1170797643.19662.120.camel@stevo-desktop> References: <1170793483.19662.112.camel@stevo-desktop> <20070206203253.GL24372@mellanox.co.il> <1170796663.19662.117.camel@stevo-desktop> <1170797643.19662.120.camel@stevo-desktop> Message-ID: <20070206213656.GN24372@mellanox.co.il> > How shall I fix this? Patch? -- MST From halr at voltaire.com Tue Feb 6 13:46:12 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Feb 2007 16:46:12 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <002001c74a33$c2ec1db0$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> Message-ID: <1170798366.4525.314959.camel@hal.voltaire.com> On Tue, 2007-02-06 at 16:14, Michael Arndt wrote: > Sorry, > > there was a little mistake. > > The orginal packet has the initial path [0][1][1], return path [0][2][2], > hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are > permissive. Is this the response ? If so, what's the status ? What is the attribute ID ? > The packet I asking for Is this the outgoing packet ? > has the initial path [0][0][0], return path > [0][0][0], > hop count and hop pointer are 2 (SubnGetResp), Should this be SubnGet rather than SubnGetResp ? -- Hal > the Dr_DLID and DrSLID are 0. > And the LIDs in LRH are 0. The rest of the smp header is the same as it is > in the original header. > Micheal Arndt > > From swise at opengridcomputing.com Tue Feb 6 14:02:00 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 16:02:00 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug. In-Reply-To: <20070206213656.GN24372@mellanox.co.il> References: <1170793483.19662.112.camel@stevo-desktop> <20070206203253.GL24372@mellanox.co.il> <1170796663.19662.117.camel@stevo-desktop> <1170797643.19662.120.camel@stevo-desktop> <20070206213656.GN24372@mellanox.co.il> Message-ID: <1170799320.19662.124.camel@stevo-desktop> On Tue, 2007-02-06 at 23:36 +0200, Michael S. Tsirkin wrote: > > How shall I fix this? > > Patch? > Riiight. I'm afraid if I use HOST instead of LINK that I'll break some strange SDP loopback feature or some such thing. And I'm not in a position to test that. But I can post a patch. Shall I just change sles9sp3 since we don't see (yet) any problems with the other distros? From mst at mellanox.co.il Tue Feb 6 14:12:32 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 00:12:32 +0200 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug. In-Reply-To: <1170799320.19662.124.camel@stevo-desktop> References: <1170799320.19662.124.camel@stevo-desktop> Message-ID: <20070206221232.GO24372@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug. > > On Tue, 2007-02-06 at 23:36 +0200, Michael S. Tsirkin wrote: > > > How shall I fix this? > > > > Patch? > > > > Riiight. I'm afraid if I use HOST instead of LINK that I'll break some > strange SDP loopback feature or some such thing. And I'm not in a > position to test that. > > But I can post a patch. Shall I just change sles9sp3 since we don't see > (yet) any problems with the other distros? If you post one that updates all kernels it will be easier to test. -- MST From michael.arndt at informatik.tu-chemnitz.de Tue Feb 6 14:14:13 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Tue, 6 Feb 2007 23:14:13 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170798366.4525.314959.camel@hal.voltaire.com> Message-ID: <000401c74a3c$2204c6a0$21606d86@one7> > Is this the response ? If so, what's the status ? What is the attribute > ID ? yes its a response. The attribute is NodeInfo or Portinfo or what ever...the attribute ID didn't change from the original packet (first receive). The status is 0 and the D-Bit is set, because it is a response. >> The packet I asking for > > Is this the outgoing packet ? no, it is a receive. I send one SubnGet and recv two SubnGetResp( one is ok and one is like I described) >Should this be SubnGet rather than SubnGetResp ? I don't know. The szenario is (node1, sender) -> (node2, router, which receive the two SubnGetResp) -> (node3, responder)...The affect appears only on the way back. Thanks Michael From swise at opengridcomputing.com Tue Feb 6 14:15:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 16:15:43 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug. In-Reply-To: <20070206221232.GO24372@mellanox.co.il> References: <1170799320.19662.124.camel@stevo-desktop> <20070206221232.GO24372@mellanox.co.il> Message-ID: <1170800143.19662.125.camel@stevo-desktop> On Wed, 2007-02-07 at 00:12 +0200, Michael S. Tsirkin wrote: > > Quoting Steve Wise : > > Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug. > > > > On Tue, 2007-02-06 at 23:36 +0200, Michael S. Tsirkin wrote: > > > > How shall I fix this? > > > > > > Patch? > > > > > > > Riiight. I'm afraid if I use HOST instead of LINK that I'll break some > > strange SDP loopback feature or some such thing. And I'm not in a > > position to test that. > > > > But I can post a patch. Shall I just change sles9sp3 since we don't see > > (yet) any problems with the other distros? > > If you post one that updates all kernels it will be easier to test. > I'm ok with this. Stay tuned. Steve. From rdreier at cisco.com Tue Feb 6 14:32:40 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Feb 2007 14:32:40 -0800 Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support In-Reply-To: <20070205201223.GD16598@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 5 Feb 2007 22:12:23 +0200") References: <20070205201223.GD16598@mellanox.co.il> Message-ID: Looks pretty good, but one thing worries me: Overall looks great, I'll merge it up. A few quick questions: > +#ifdef CONFIG_IPV6 I think this really needs to be #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) but I'm not clear on what happens if IPoIB is built-in and IPv6 is built as a module, since then icmpv6_send() isn't available until the ipv6 module is loaded. It seems ip_gre.c has the same problem, so I'll ask on netdev about this. Also a few other minor things: > +#ifdef CONFIG_INFINIBAND_IPOIB_CM > +struct ib_cm_id; this #ifdef in ipoib.h is just guarding declarations; we might as well declare everything even if it's not used. > + rep.starting_psn = 0 /* FIXME */; any reason not to just do: rep.starting_psn = random32() & 0xffffff; ? > + req.srq = 15; This just should be 1, right? - R. From swise at opengridcomputing.com Tue Feb 6 15:39:13 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Feb 2007 17:39:13 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug. In-Reply-To: <20070206221232.GO24372@mellanox.co.il> References: <1170799320.19662.124.camel@stevo-desktop> <20070206221232.GO24372@mellanox.co.il> Message-ID: <1170805153.19662.155.camel@stevo-desktop> Here it is (only tested with rping over iWARP on sles9sp3): ---------------- xxx_ip_dev_find() must use scope HOST. From: Steve Wise Function xxx_ip_dev_find(RT_SCOPE_LINK) returns the wrong interface on some kernels. The correct scope is RT_SCOPE_HOST. Signed-off-by: Steve Wise --- .../backport/2.6.11/include/linux/inetdevice.h | 2 +- .../backport/2.6.11_FC4/include/linux/inetdevice.h | 2 +- .../backport/2.6.12/include/linux/inetdevice.h | 2 +- .../backport/2.6.13/include/linux/inetdevice.h | 2 +- .../2.6.13_suse10_0_u/include/linux/inetdevice.h | 2 +- .../backport/2.6.14/include/linux/inetdevice.h | 2 +- .../backport/2.6.15/include/linux/inetdevice.h | 2 +- .../2.6.15_ubuntu606/include/linux/inetdevice.h | 2 +- .../backport/2.6.16/include/linux/inetdevice.h | 2 +- .../backport/2.6.17/include/linux/inetdevice.h | 2 +- .../2.6.5_sles9_sp3/include/linux/inetdevice.h | 2 +- .../backport/2.6.9_U2/include/linux/inetdevice.h | 2 +- .../backport/2.6.9_U3/include/linux/inetdevice.h | 2 +- .../backport/2.6.9_U4/include/linux/inetdevice.h | 2 +- 14 files changed, 14 insertions(+), 14 deletions(-) diff --git a/kernel_addons/backport/2.6.11/include/linux/inetdevice.h b/kernel_addons/backport/2.6.11/include/linux/inetdevice.h index 7244487..2d3c50f 100644 --- a/kernel_addons/backport/2.6.11/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.11/include/linux/inetdevice.h @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h b/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h index 7244487..2d3c50f 100644 --- a/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.12/include/linux/inetdevice.h b/kernel_addons/backport/2.6.12/include/linux/inetdevice.h index 7244487..2d3c50f 100644 --- a/kernel_addons/backport/2.6.12/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.12/include/linux/inetdevice.h @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.13/include/linux/inetdevice.h b/kernel_addons/backport/2.6.13/include/linux/inetdevice.h index 7a32313..fd0aa36 100644 --- a/kernel_addons/backport/2.6.13/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.13/include/linux/inetdevice.h @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h b/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h index 7a32313..fd0aa36 100644 --- a/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.14/include/linux/inetdevice.h b/kernel_addons/backport/2.6.14/include/linux/inetdevice.h index 7a32313..fd0aa36 100644 --- a/kernel_addons/backport/2.6.14/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.14/include/linux/inetdevice.h @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.15/include/linux/inetdevice.h b/kernel_addons/backport/2.6.15/include/linux/inetdevice.h index 7a32313..fd0aa36 100644 --- a/kernel_addons/backport/2.6.15/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.15/include/linux/inetdevice.h @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h b/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h index 7a32313..fd0aa36 100644 --- a/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.16/include/linux/inetdevice.h b/kernel_addons/backport/2.6.16/include/linux/inetdevice.h index 7a32313..fd0aa36 100644 --- a/kernel_addons/backport/2.6.16/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.16/include/linux/inetdevice.h @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.17/include/linux/inetdevice.h b/kernel_addons/backport/2.6.17/include/linux/inetdevice.h index 7a32313..fd0aa36 100644 --- a/kernel_addons/backport/2.6.17/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.17/include/linux/inetdevice.h @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h b/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h index 7244487..2d3c50f 100644 --- a/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h index 7244487..2d3c50f 100644 --- a/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h index 7244487..2d3c50f 100644 --- a/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; diff --git a/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h index 7244487..2d3c50f 100644 --- a/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h +++ b/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); if (ip == addr) { dev_hold(dev); break; From halr at voltaire.com Tue Feb 6 16:19:51 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Feb 2007 19:19:51 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <002001c74a33$c2ec1db0$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> Message-ID: <1170807564.4525.324195.camel@hal.voltaire.com> On Tue, 2007-02-06 at 16:14, Michael Arndt wrote: > Sorry, > > there was a little mistake. I think I understand what you are saying now. The below are the 2 responses you get. > The orginal packet has the initial path [0][1][1], return path [0][2][2], > hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are > permissive. This sounds like the good response and appears to traverse your 3 nodes. > The packet I asking for has the initial path [0][0][0], return path > [0][0][0], > hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are 0. > And the LIDs in LRH are 0. The rest of the smp header is the same as it is > in the original header. This is the bogus extra response. Since your sender node is unmodified, it is unlikely an issue there. It seems like the intermediate node might be responding and forwarding the packet on although it should only do one of those two things. You did mention the SMI on the intermediate node was modified, right ? Also, note that the SMI is not validated and has some known issues for switches (e.g. intermediate hops). -- Hal > Micheal Arndt > > From krkumar2 at in.ibm.com Tue Feb 6 22:56:50 2007 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Wed, 07 Feb 2007 12:26:50 +0530 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() Message-ID: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> (I had submitted this once earlier but got no response) cm_conn_req_handler() : 1. Calling destroy_cm_id leaks 3 work 'free' list entries. 2. cm_id is freed up wrongly and not cm_id_priv (though the effect is the same since cm_id is the first element of cm_id_priv, but still a bug if the top level cm_id changes). 3. Reject message has to be sent on failure. Tested this without the fix and found the client hangs, waited for about 20 mins and then did Ctrl-C but the process is unkillable. 4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle) doesn't achieve anything, since checking for IWCM_F_CALLBACK_DESTROY in the parent's flag (in cm_work_handler) means that this will never be true. All 4 above cases were tested by injecting random error in iw_conn_req_handler() and running rdma_bw/krping, they were confirmed. I added the BUG_ON() to confirm the earlier check for id_priv->refcount==0 should always be true (and could be removed). Patch against 2.6.20 Signed-off-by: Krishna Kumar --- diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c --- org/drivers/infiniband/core/iwcm.c 2007-01-24 10:25:26.000000000 +0530 +++ new/drivers/infiniband/core/iwcm.c 2007-01-24 10:25:31.000000000 +0530 @@ -647,10 +647,9 @@ static void cm_conn_req_handler(struct i /* Call the client CM handler */ ret = cm_id->cm_handler(cm_id, iw_event); if (ret) { - set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); - destroy_cm_id(cm_id); - if (atomic_read(&cm_id_priv->refcount)==0) - kfree(cm_id); + BUG_ON(atomic_read(&cm_id_priv->refcount) != 1); + iw_cm_reject(cm_id, NULL, 0); + iw_destroy_cm_id(cm_id); } out: From mst at mellanox.co.il Tue Feb 6 23:41:39 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 09:41:39 +0200 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug. In-Reply-To: <1170805153.19662.155.camel@stevo-desktop> References: <1170799320.19662.124.camel@stevo-desktop> <20070206221232.GO24372@mellanox.co.il> <1170805153.19662.155.camel@stevo-desktop> Message-ID: <20070207074139.GA20290@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug. > > Here it is (only tested with rping over iWARP on sles9sp3): > > ---------------- > > > xxx_ip_dev_find() must use scope HOST. > > From: Steve Wise > > Function xxx_ip_dev_find(RT_SCOPE_LINK) returns the wrong interface on > some kernels. The correct scope is RT_SCOPE_HOST. > > Signed-off-by: Steve Wise OK. I don't have access to the lab at the moment, but hope to test this by next week. -- MST From mst at mellanox.co.il Tue Feb 6 23:53:39 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 09:53:39 +0200 Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support In-Reply-To: References: Message-ID: <20070207075339.GB20290@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv6 RFC] IPoIB CM Experimental support > > Looks pretty good, but one thing worries me: > > Overall looks great, I'll merge it up. Great, thanks! Just to clarify: do you intend to fix up the comments below or do you prefer for me to do it and repost? If the later, it's easy for me, but I won't have access to the lab today so an updated patch won't be tested till tomorrow. > A few quick questions: > +#ifdef CONFIG_IPV6 > > I think this really needs to be > > #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) > > but I'm not clear on what happens if IPoIB is built-in and IPv6 is > built as a module, since then icmpv6_send() isn't available until the > ipv6 module is loaded. It seems ip_gre.c has the same problem, so > I'll ask on netdev about this. I see this just got answered. > Also a few other minor things: > > > +#ifdef CONFIG_INFINIBAND_IPOIB_CM > > +struct ib_cm_id; > > this #ifdef in ipoib.h is just guarding declarations; we might as well > declare everything even if it's not used. Yes. I wasn't sure which way you'd prefer it. > > + rep.starting_psn = 0 /* FIXME */; > > any reason not to just do: > > rep.starting_psn = random32() & 0xffffff; > > ? Well, randomness is a resource after all, and since we don't have the additional security provided by PSNs in IPoIB UD, it seemed we do not need it for IPoIB CM either. So maybe the right thing is just to remove the FIXME comment. > > + req.srq = 15; > > This just should be 1, right? Of course. It's a 1-bit field. -- MST From guyg at voltaire.com Wed Feb 7 01:37:27 2007 From: guyg at voltaire.com (Guy German) Date: Wed, 07 Feb 2007 11:37:27 +0200 Subject: [openib-general] [libmthca] deadlock while trying to destroy QP In-Reply-To: References: <45C75EA2.6000905@Voltaire.COM> Message-ID: <45C99DD7.9030304@voltaire.com> Roland Dreier wrote: > I guess my first reaction is "don't do that." Trying to do something > as complex as destroying a QP from a signal handler seems very fragile > to me, and I wouldn't consider ibv_destroy_qp() safe to call from a > signal handler. Fair enough. Thanks, Guy From vlad at lists.openfabrics.org Wed Feb 7 02:22:19 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Wed, 7 Feb 2007 02:22:19 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070207-0200 daily build status Message-ID: <20070207102219.8CA72E60804@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From grossmann at hlrs.de Wed Feb 7 03:03:45 2007 From: grossmann at hlrs.de (Thomas =?iso-8859-1?q?Gro=DFmann?=) Date: Wed, 7 Feb 2007 12:03:45 +0100 Subject: [openib-general] Problem with SRP with 512 byte sector size with > 2 TB LUNs Message-ID: <200702071203.45309.grossmann@hlrs.de> Hello, We have a disk-array connected over a Mellanox MT25204 IB card. We have configured LUNs with a size of over 2 TB with 512 byte sector size and are using OpenIB 1.1 and SUSE SLES 10 x86_64. I get the following output in /var/log/messages when adding a LUN: Feb 2 09:59:57 data1 kernel: Vendor: DDN Model: S2A 9550 Rev: 3.03 Feb 2 09:59:57 data1 kernel: Type: Direct-Access ANSI SCSI revision: 06 Feb 2 09:59:57 data1 kernel: sdc : very big device. try to use READ CAPACITY(16). Feb 2 09:59:57 data1 kernel: sdc : READ CAPACITY(16) failed. Feb 2 09:59:57 data1 kernel: sdc : status=0, message=00, host=5, driver=00 Feb 2 09:59:57 data1 kernel: sdc : use 0xffffffff as device size Feb 2 09:59:57 data1 kernel: SCSI device sdc: 4294967296 512-byte hdwr sectors (2199023 MB) Feb 2 09:59:57 data1 kernel: sdc: Write Protect is off Feb 2 09:59:57 data1 kernel: sdc: Mode Sense: 97 00 10 08 Feb 2 09:59:57 data1 kernel: SCSI device sdc: drive cache: write back w/ FUA Feb 2 09:59:57 data1 kernel: sdc : very big device. try to use READ CAPACITY(16). Feb 2 09:59:57 data1 kernel: sdc : READ CAPACITY(16) failed. Feb 2 09:59:57 data1 kernel: sdc : status=0, message=00, host=5, driver=00 Feb 2 09:59:57 data1 kernel: sdc : use 0xffffffff as device size Feb 2 09:59:57 data1 kernel: SCSI device sdc: 4294967296 512-byte hdwr sectors (2199023 MB) Feb 2 09:59:57 data1 kernel: sdc: Write Protect is off Feb 2 09:59:57 data1 kernel: sdc: Mode Sense: 97 00 10 08 Feb 2 09:59:57 data1 kernel: SCSI device sdc: drive cache: write back w/ FUA Feb 2 09:59:57 data1 kernel: sdc: unknown partition table Feb 2 09:59:57 data1 kernel: sd 8:0:0:0: Attached scsi disk sdc Feb 2 09:59:57 data1 kernel: sd 8:0:0:0: Attached scsi generic sg2 type 0 I found in the Changelog of kernel 2.6.20 the following instruction: target_host->max_cmd_len = sizeof ((struct srp_cmd *) (void *) 0L)->cdb; (added to the function srp_create_target to achieve READ CAPACITY(16) ) and added it to the ib_srp module of OpenIB 1.1. The output was then: Feb 5 17:53:07 data1 kernel: Vendor: DDN Model: S2A 9550 Rev: 3.03 Feb 5 17:53:07 data1 kernel: Type: Direct-Access ANSI SCSI revision: 06 Feb 5 17:53:07 data1 kernel: sdc : very big device. try to use READ CAPACITY(16). Feb 5 17:53:07 data1 kernel: sdc : sector size 0 reported, assuming 512. Feb 5 17:53:07 data1 kernel: SCSI device sdc: 1 512-byte hdwr sectors (0 MB) Feb 5 17:53:07 data1 kernel: sdc: Write Protect is off Feb 5 17:53:07 data1 kernel: sdc: Mode Sense: 97 00 10 08 Feb 5 17:53:07 data1 kernel: SCSI device sdc: drive cache: write back w/ FUA Feb 5 17:53:07 data1 kernel: sdc : very big device. try to use READ CAPACITY(16). Feb 5 17:53:07 data1 kernel: sdc : sector size 0 reported, assuming 512. Feb 5 17:53:07 data1 kernel: SCSI device sdc: 1 512-byte hdwr sectors (0 MB) Feb 5 17:53:07 data1 kernel: sdc: Write Protect is off Feb 5 17:53:07 data1 kernel: sdc: Mode Sense: 97 00 10 08 Feb 5 17:53:07 data1 kernel: SCSI device sdc: drive cache: write back w/ FUA Feb 5 17:53:07 data1 kernel: sdc: unknown partition table Feb 5 17:53:07 data1 kernel: sd 9:0:0:0: Attached scsi disk sdc Feb 5 17:53:07 data1 kernel: sd 9:0:0:0: Attached scsi generic sg2 type 0 The same output was shown when trying to add a LUN using kernel 2.6.20. Is it possible to add LUNs with > 2 TB and 512 byte sectors ? Why does the READ CAPACITY(16) comand fail ? Kind regards, Thomas -- Thomas Großmann                  High Performance Computing Center Stuttgart (HLRS)                                         Allmandring 30                                                  70550 Stuttgart, Germany    E-Mail: grossmann at hlrs.de                                                                Phone: ++49-711-685-65529  Fax  : ++49-711-685-65832 From mst at mellanox.co.il Wed Feb 7 04:10:22 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 14:10:22 +0200 Subject: [openib-general] resolving sending mails from OFA new server In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F039CFAB2@ES22SNLNT.srn.sandia.gov> References: <3D84A59A1AD3584DA02AEAD240E8863F039CFAB2@ES22SNLNT.srn.sandia.gov> Message-ID: <20070207121022.GA1102@mellanox.co.il> > Michael, > > I put something together at bugmail at lists.openfabrics.org. I did not > get a chance to try it out, so let me know if it's working out for you. > Keywords used in the e-mail format come from the bugmail_help.html > included w/ Bugzilla (it is posted at > http://www.openfabrics.org/docs/bugmail_help.html). > > Michael I just tried both and it worked flawlessly. Thanks, very much! Guiys, you should try the email gateway, it is amazing especially for adding text to bugs: just put [Bug XXX] in mail subject. Michael, one small request: could the messages that bugzilla generates have From field as bugmail at lists.openfabrics.org and not bugzilla-daemon at openib.org as today? This way I can add text to a bug just by replying to it. Thanks, MST -- MST From mst at mellanox.co.il Wed Feb 7 04:35:34 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 14:35:34 +0200 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> Message-ID: <20070207123534.GD716@mellanox.co.il> - set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); - destroy_cm_id(cm_id); - if (atomic_read(&cm_id_priv->refcount)==0) - kfree(cm_id); + BUG_ON(atomic_read(&cm_id_priv->refcount) != 1); + iw_cm_reject(cm_id, NULL, 0); + iw_destroy_cm_id(cm_id); And BTW, lots of lines with atomic_read()==0 in them have broken whitespace in iwcm.c. Does anyone care enough to fix them? -- MST From halr at voltaire.com Wed Feb 7 05:49:17 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Feb 2007 08:49:17 -0500 Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation In-Reply-To: <039701c7494b$6bd5d860$1914a8c0@surioffice> References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com> <1170072757.4555.242192.camel@hal.voltaire.com> <039701c7494b$6bd5d860$1914a8c0@surioffice> Message-ID: <1170856154.4525.372809.camel@hal.voltaire.com> Suri, On Mon, 2007-02-05 at 12:31, Suresh Shelvapille wrote: > Hal: > > We are upgrading to 2.6.19.1 kernel Glad to hear this. > and I finally ported the changes > required for Switch operation from my current kernel (2.6.12) version. > > I have tested these changes for a switch with different SM(s). But I need > the community's help to test the changes on different HCAs to make sure I > have not broken anything. > > Please see if the changes look OK. Have you tested these changes on end nodes (HCAs) ? If so, what tests have you performed ? It would be easier to comment if your changes were included inline rather than as attachments. Also, you should attach your S-O-B line. Thanks. -- Hal > Thanks, > Suri From swise at opengridcomputing.com Wed Feb 7 06:24:32 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 07 Feb 2007 08:24:32 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> Message-ID: <1170858272.14381.1.camel@stevo-desktop> This looks good for 2.6.21 IMO. Acked-by: Steve Wise On Wed, 2007-02-07 at 12:26 +0530, Krishna Kumar wrote: > (I had submitted this once earlier but got no response) > > cm_conn_req_handler() : > 1. Calling destroy_cm_id leaks 3 work 'free' list entries. > 2. cm_id is freed up wrongly and not cm_id_priv (though the > effect is the same since cm_id is the first element of > cm_id_priv, but still a bug if the top level cm_id changes). > 3. Reject message has to be sent on failure. Tested this > without the fix and found the client hangs, waited for about > 20 mins and then did Ctrl-C but the process is unkillable. > 4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle) > doesn't achieve anything, since checking for > IWCM_F_CALLBACK_DESTROY in the parent's flag (in > cm_work_handler) means that this will never be true. > > All 4 above cases were tested by injecting random error in > iw_conn_req_handler() and running rdma_bw/krping, they were > confirmed. I added the BUG_ON() to confirm the earlier check > for id_priv->refcount==0 should always be true (and could be > removed). > > Patch against 2.6.20 > > Signed-off-by: Krishna Kumar > --- > diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c > --- org/drivers/infiniband/core/iwcm.c 2007-01-24 10:25:26.000000000 +0530 > +++ new/drivers/infiniband/core/iwcm.c 2007-01-24 10:25:31.000000000 +0530 > @@ -647,10 +647,9 @@ static void cm_conn_req_handler(struct i > /* Call the client CM handler */ > ret = cm_id->cm_handler(cm_id, iw_event); > if (ret) { > - set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); > - destroy_cm_id(cm_id); > - if (atomic_read(&cm_id_priv->refcount)==0) > - kfree(cm_id); > + BUG_ON(atomic_read(&cm_id_priv->refcount) != 1); > + iw_cm_reject(cm_id, NULL, 0); > + iw_destroy_cm_id(cm_id); > } > > out: > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From michael.arndt at informatik.tu-chemnitz.de Wed Feb 7 06:38:37 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Wed, 7 Feb 2007 15:38:37 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> Message-ID: <000801c74ac5$a70c6a90$21606d86@one7> Hi, > This sounds like the good response and appears to traverse your 3 nodes. Yes, that's right > This is the bogus extra response. Since your sender node is unmodified, > it is unlikely an issue there. It seems like the intermediate node might > be responding and forwarding the packet on although it should only do > one of those two things. You did mention the SMI on the intermediate > node was modified, right ? Also, note that the SMI is not validated and > has some known issues for switches (e.g. intermediate hops). The sender and the responder is unmodified (node1, node3). I have debugged the hole SMI, ib_mad_recv_done_handler and handle_outgoing_dr_smp functions and did not found the bogus extra response. As debugged is the responder sending one packet, which would be right and the intermediate node isn't receiving an bogus extra packet. So the extra packet didn't pass the SMI that's for sure. I use the libibumad to implement the forwarding mechanism and also use the select function to catch any receive I should handle. Maybe there is something wrong. Thanks Michael Arndt From tom at opengridcomputing.com Wed Feb 7 07:01:23 2007 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 07 Feb 2007 09:01:23 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <1170858272.14381.1.camel@stevo-desktop> References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> <1170858272.14381.1.camel@stevo-desktop> Message-ID: <1170860483.11491.21.camel@trinity.ogc.int> On Wed, 2007-02-07 at 08:24 -0600, Steve Wise wrote: > This looks good for 2.6.21 IMO. > > Acked-by: Steve Wise > > > On Wed, 2007-02-07 at 12:26 +0530, Krishna Kumar wrote: > > (I had submitted this once earlier but got no response) > > > > cm_conn_req_handler() : > > 1. Calling destroy_cm_id leaks 3 work 'free' list entries. When dealloc_work_entries was added to the iw_destroy_cm_id function, it needed ALSO to be added everywhere destroy_cm_id was called. So you need to call dealloc_work_entries everywhere you call destroy_cm_id or this leak remains all over the place, e.g. cm_work_handler > > 2. cm_id is freed up wrongly and not cm_id_priv (though the > > effect is the same since cm_id is the first element of > > cm_id_priv, but still a bug if the top level cm_id changes). > > 3. Reject message has to be sent on failure. Tested this > > without the fix and found the client hangs, waited for about > > 20 mins and then did Ctrl-C but the process is unkillable. This should be added to the switch statement in destroy_cm_id (not here) so that it doesn't need to be added everywhere the cm_id is destroyed when it's in a state that requires a reject. > > 4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle) > > doesn't achieve anything, since checking for > > IWCM_F_CALLBACK_DESTROY in the parent's flag (in > > cm_work_handler) means that this will never be true. destroy_cm_id exists to allow cm_id to be destroyed without waiting. If you're changing it to iw_destroy_cm_id, that may be fine, but all the setbit/getbit stuff is a side show. You must be certain that iw_destroy_cm_id can't wait. If it does, you'll shut down the entire IWCM. > > > > All 4 above cases were tested by injecting random error in > > iw_conn_req_handler() and running rdma_bw/krping, they were > > confirmed. I added the BUG_ON() to confirm the earlier check > > for id_priv->refcount==0 should always be true (and could be > > removed). > > > > Patch against 2.6.20 > > > > Signed-off-by: Krishna Kumar > > --- > > diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c > > --- org/drivers/infiniband/core/iwcm.c 2007-01-24 10:25:26.000000000 +0530 > > +++ new/drivers/infiniband/core/iwcm.c 2007-01-24 10:25:31.000000000 +0530 > > @@ -647,10 +647,9 @@ static void cm_conn_req_handler(struct i > > /* Call the client CM handler */ > > ret = cm_id->cm_handler(cm_id, iw_event); > > if (ret) { > > - set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); > > - destroy_cm_id(cm_id); > > - if (atomic_read(&cm_id_priv->refcount)==0) > > - kfree(cm_id); > > + BUG_ON(atomic_read(&cm_id_priv->refcount) != 1); > > + iw_cm_reject(cm_id, NULL, 0); > > + iw_destroy_cm_id(cm_id); > > } > > > > out: > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From yosefe at voltaire.com Wed Feb 7 07:20:17 2007 From: yosefe at voltaire.com (Yosef Etigin) Date: Wed, 07 Feb 2007 17:20:17 +0200 Subject: [openib-general] issues with compilation of ofed 1.2 Message-ID: <45C9EE31.2040602@voltaire.com> ****************************************************************** 1. When compiling without ibutils I get the following error: RPM build errors: user vladsk does not exist - using root group vladsk does not exist - using root user vladsk does not exist - using root group vladsk does not exist - using root File not found by glob: /var/tmp/OFED/usr/local/ofed/man/man1/ibv_* File not found by glob: /var/tmp/OFED/usr/local/ofed/man/man8/opensm* File not found by glob: /var/tmp/OFED/usr/local/ofed/man/man8/osmtest* ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed ' --define 'build_root /var/tmp/OFED ' --define 'configure_options --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-sdpnetstat --with-mstflint --with-perftest --mandir=/usr/local/ofed /man' --define 'configure_options32 %{nil}' --define 'build_32bit 0' /tmp/regtest/OFED-1.2-20070205-1823/SRPMS/ofa_user-1.2-alpha1.src.rpm" ****************************************************************** 2. After adding ibutils, compilation passes on RH4 (U4 and U3) However, when execution application that uses libibverbs, i get ths error: libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'. libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 No IB devices found Workaround: copy libibverbs.d from installation of ofed 1.2 from daily build packages to /usr/loca/ofed/etc/ ****************************************************************** 3. Uninstall script does not always successfully remove libcxgb3 package ****************************************************************** 4. When compiling on SLES10 I get this error: MTHOME directory /var/tmp/OFED/usr/local/ofed does not exist. Exiting. error: Bad exit status from /var/tmp/rpm-tmp.37387 (%build) RPM build errors: user rowland does not exist - using root group mvapich does not exist - using root user rowland does not exist - using root group mvapich does not exist - using root Bad exit status from /var/tmp/rpm-tmp.37387 (%build) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_name mvapich2_gcc' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1' --define 'build_root /var/tmp/OFED' --define 'open_ib_home /usr/local/ofed' --define 'ofed_build_root /var/tmp/OFED' --define 'comp_env CC=gcc CXX=g++ F77=gfortran' --define 'iwarp 0' --define 'romio 1' --define 'shared_libs 1' --define 'auto_req 1' /tmp/OFED-1.2-20070205-1823/SRPMS/mvapich2-0.9.8-1.src.rpm" ****************************************************************** 5. When compiling on SLES10 SP1 I get this error: In file included from /usr/src/linux-2.6.16.37-0.9/include/linux/inetdevice.h:7, from /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/core/addr.c:32: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:7: error: redefinition of ‘netif_tx_lock’ /usr/src/linux-2.6.16.37-0.9/include/linux/netdevice.h:927: error: previous definition of ‘netif_tx_lock’ was here /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h: In function ‘netif_tx_lock’: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:8: error: ‘struct net_device’ has no member named ‘xmit_lock’ /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h: At top level: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:13: error: redefinition of ‘netif_tx_unlock’ /usr/src/linux-2.6.16.37-0.9/include/linux/netdevice.h:947: error: previous definition of ‘netif_tx_unlock’ was here /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h: In function ‘netif_tx_unlock’: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:15: error: ‘struct net_device’ has no member named ‘xmit_lock’ /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/core/addr.c: At top level: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[6]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/core/addr.o] Error 1 ****************************************************************** 6. On [PPC64/Sles10] I get this compilaton error: make[2]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/librdmacm' if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -m64 -g -O2 -MT cma.lo -MD -MP -MF ".deps/cma.Tpo" -c -o cma.lo `test -f 'src/cma.c' || echo './'`src/cma.c; \ then mv -f ".deps/cma.Tpo" ".deps/cma.Plo"; else rm -f ".deps/cma.Tpo"; exit 1; fi mkdir .libs gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -m64 -g -O2 -MT cma.lo -MD -MP -MF .deps/cma.Tpo -c src/cma.c -fPIC -DPIC -o .libs/cma.o /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -m64 -g -O2 -L../libibverbs/src -libverbs -lsysfs -L. -o src/librdmacm.la -rpath /usr/local/ofed/lib64 -avoid-version -Wl,--version-script=./src/librdmacm.map cma.lo mkdir src/.libs gcc -shared .libs/cma.o -Wl,--rpath -Wl,/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibverbs/src/.libs /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibverbs/src/.libs/libibverbs.so /usr/lib/libsysfs.so -L/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/librdmacm -m64 -Wl,--version-script=./src/librdmacm.map -Wl,-soname -Wl,librdmacm.so -o src/.libs/librdmacm.so /usr/lib/libsysfs.so: could not read symbols: File in wrong format collect2: ld returned 1 exit status make[2]: *** [src/librdmacm.la] Error 1 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/librdmacm' make[1]: *** [all] Error 2 ************************ 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro. -- Yosef Etigin Alex Tabachnik From swise at opengridcomputing.com Wed Feb 7 07:30:18 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 07 Feb 2007 09:30:18 -0600 Subject: [openib-general] dapltest? Message-ID: <1170862218.14381.4.camel@stevo-desktop> Hey Arlin, Shouldn't dapl/test be shipped with OFED? It appears not to be... Steve. From monis at voltaire.com Wed Feb 7 07:35:58 2007 From: monis at voltaire.com (Moni Shoua) Date: Wed, 07 Feb 2007 17:35:58 +0200 Subject: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour In-Reply-To: <20070206171424.GB24372@mellanox.co.il> References: <45C8ABAA.10500@voltaire.com> <20070206171424.GB24372@mellanox.co.il> Message-ID: <45C9F1DE.8090409@voltaire.com> > Another concern: assume that one device goes away (e.g. hotplug). > It seems that neighbours whose dev field point to another device, will not be destroyed. > Correct? I agree. > > Therefore in your design, it seems that to_ipoib_neigh()->dev > will get us a pointer to device that has been removed already. > I agree that this is a problem. It think it would be best to prevent an IPoIB device from disappearing or from ib_ipoib from being unloaded as long as IPoIB device is a slave. Unfortunately, I don't see how this can be done just by fixing something in bonding or IPoIB. However, any slave knows he has a master (dev->master). What do you think about a solution where IPoIB first tries to clean up the neighbours that belong to it's master before deleting the IPoIB device? >> Furthermore, bond_setup_by_slave is called only for non >> Ethernet devices (we consider to change the logic to "called only for >> IPoIB devices just for safety). > > Why is this necessary, BTW? > If we don't do that, we get a memory leak because the neigh destructor will never be called for non IPoIB devices although they carry ipoib_neigh with them. From vlad at dev.mellanox.co.il Wed Feb 7 08:42:02 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 07 Feb 2007 18:42:02 +0200 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 Message-ID: <1170866522.6223.8.camel@vladsk-laptop> Hi Jeff, Please remove %build macro from the RPM spec file. On SuSE distros it removes RPM_BUILD_ROOT. Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + /bin/rm -rf /var/tmp/OFED ++ dirname /var/tmp/OFED + /bin/mkdir -p /var/tmp + /bin/mkdir /var/tmp/OFED + cd openmpi-1.2b4ofedr13470 + fortify_source=1 + test '' '!=' '' ... -- Vladimir Sokolovsky Mellanox Technologies Ltd. From jsquyres at cisco.com Wed Feb 7 08:52:24 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 7 Feb 2007 11:52:24 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <1170866522.6223.8.camel@vladsk-laptop> References: <1170866522.6223.8.camel@vladsk-laptop> Message-ID: <7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com> The "%build" directive is not just a macro, it's also a section qualifier indicating the beginning of the build section. From http://fedora.redhat.com/docs/drafts/rpm-guide-en/ch08s02.html#id2966770 "The build section starts with a %build statement." Is there something else that I should replace it with that will also start the build section? On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote: > Hi Jeff, > Please remove %build macro from the RPM spec file. > On SuSE distros it removes RPM_BUILD_ROOT. > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343 > + umask 022 > + cd /var/tmp/OFEDRPM/BUILD > + /bin/rm -rf /var/tmp/OFED > ++ dirname /var/tmp/OFED > + /bin/mkdir -p /var/tmp > + /bin/mkdir /var/tmp/OFED > + cd openmpi-1.2b4ofedr13470 > + fortify_source=1 > + test '' '!=' '' > ... > > -- > Vladimir Sokolovsky > Mellanox Technologies Ltd. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From vlad at dev.mellanox.co.il Wed Feb 7 09:00:20 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 07 Feb 2007 19:00:20 +0200 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com> References: <1170866522.6223.8.camel@vladsk-laptop> <7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com> Message-ID: <1170867620.6223.11.camel@vladsk-laptop> I propose to replace %build by %install. Otherwise %build removes /var/tmp/OFED (on SuSE) which includes all installed libraries. Regards, Vladimir On Wed, 2007-02-07 at 11:52 -0500, Jeff Squyres wrote: > The "%build" directive is not just a macro, it's also a section > qualifier indicating the beginning of the build section. From > > http://fedora.redhat.com/docs/drafts/rpm-guide-en/ch08s02.html#id2966770 > > "The build section starts with a %build statement." > > Is there something else that I should replace it with that will also > start the build section? > > > > On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote: > > > Hi Jeff, > > Please remove %build macro from the RPM spec file. > > On SuSE distros it removes RPM_BUILD_ROOT. > > > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343 > > + umask 022 > > + cd /var/tmp/OFEDRPM/BUILD > > + /bin/rm -rf /var/tmp/OFED > > ++ dirname /var/tmp/OFED > > + /bin/mkdir -p /var/tmp > > + /bin/mkdir /var/tmp/OFED > > + cd openmpi-1.2b4ofedr13470 > > + fortify_source=1 > > + test '' '!=' '' > > ... > > > > -- > > Vladimir Sokolovsky > > Mellanox Technologies Ltd. > > From rdreier at cisco.com Wed Feb 7 09:58:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 09:58:14 -0800 Subject: [openib-general] Problem with SRP with 512 byte sector size with > 2 TB LUNs References: <200702071203.45309.grossmann@hlrs.de> Message-ID: > Is it possible to add LUNs with > 2 TB and 512 byte sectors ? > Why does the READ CAPACITY(16) comand fail ? It seems that the DDN target is not reporting good information -- I don't see anything obviously wrong in what the kernel is doing (now that SRP sends a READ CAPACITY command). Do you know if the same type of config works over fibre channel? - R. From jsquyres at cisco.com Wed Feb 7 09:58:41 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 7 Feb 2007 12:58:41 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <1170867620.6223.11.camel@vladsk-laptop> References: <1170866522.6223.8.camel@vladsk-laptop> <7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com> <1170867620.6223.11.camel@vladsk-laptop> Message-ID: <212D8756-09B1-4637-ADCA-1CD8A535403A@cisco.com> My $0.02: This is another in a growing list of issues reflecting the whole "build everything in DESTDIR" is a problematic approach. I have distinct %build and %install sections in the Open MPI specfile -- they're really intended for two different things. Specifically: I wouldn't call the SuSE %build behavior a bug -- it reflects how they want RPM designers to write RPMs. It appears that we're trying to circumvent their intended approach. Shouldn't that be a warning flag? :-) I've heard offhand comments that there were problems with trying to use chroot for building OFED. The two that I'm aware of are: 1. need to be root to make a chroot. My thought: who cares? 2. takes up lots of extra disk space. My thought: does it matter? Do we know of anyone who has small- disk servers who are building OFED? (and/or: can you hard-link files to make a chroot environment? I'm don't know) Are there other issues? More specifically, which is going to be simpler: a) fixing the growing list of problems with the DESTDIR approach or b) switching to a chroot environment? A simple search for "chroot" on freshmeat, for example, turns up a number of projects that can be used to help automate the creation of chroot environments. Again -- this is all my $0.02. Comments? On Feb 7, 2007, at 12:00 PM, Vladimir Sokolovsky wrote: > I propose to replace %build by %install. > Otherwise %build removes /var/tmp/OFED (on SuSE) which includes all > installed libraries. > > Regards, > Vladimir > > On Wed, 2007-02-07 at 11:52 -0500, Jeff Squyres wrote: >> The "%build" directive is not just a macro, it's also a section >> qualifier indicating the beginning of the build section. From >> >> http://fedora.redhat.com/docs/drafts/rpm-guide-en/ >> ch08s02.html#id2966770 >> >> "The build section starts with a %build statement." >> >> Is there something else that I should replace it with that will also >> start the build section? >> >> >> >> On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote: >> >>> Hi Jeff, >>> Please remove %build macro from the RPM spec file. >>> On SuSE distros it removes RPM_BUILD_ROOT. >>> >>> Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343 >>> + umask 022 >>> + cd /var/tmp/OFEDRPM/BUILD >>> + /bin/rm -rf /var/tmp/OFED >>> ++ dirname /var/tmp/OFED >>> + /bin/mkdir -p /var/tmp >>> + /bin/mkdir /var/tmp/OFED >>> + cd openmpi-1.2b4ofedr13470 >>> + fortify_source=1 >>> + test '' '!=' '' >>> ... >>> >>> -- >>> Vladimir Sokolovsky >>> Mellanox Technologies Ltd. >> >> -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Wed Feb 7 10:24:26 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 20:24:26 +0200 Subject: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour In-Reply-To: <45C9F1DE.8090409@voltaire.com> References: <45C9F1DE.8090409@voltaire.com> Message-ID: <20070207182426.GB9131@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour > > > > Another concern: assume that one device goes away (e.g. hotplug). > > It seems that neighbours whose dev field point to another device, will not be destroyed. > > Correct? > > I agree. > > > Therefore in your design, it seems that to_ipoib_neigh()->dev > > will get us a pointer to device that has been removed already. > > > I agree that this is a problem. I think we can solve this if we track all ipoib neighbours, like we do for old kernels, and then flush ipoib neighbours on any hotplug event. Roland, does this sound too awful? > It think it would be best to prevent an IPoIB device > from disappearing or from ib_ipoib from being unloaded as long as IPoIB > device is a slave. Unfortunately, I don't see how this can be done just > by fixing something in bonding or IPoIB. So hotplug is blocked potentially forever? This does not sound good. > However, any slave knows he has a master (dev->master). > What do you think about a solution where IPoIB first tries to clean up the > neighbours that belong to it's master before deleting the IPoIB device? How? > >> Furthermore, bond_setup_by_slave is called only for non > >> Ethernet devices (we consider to change the logic to "called only for > >> IPoIB devices just for safety). > > > > Why is this necessary, BTW? > > > If we don't do that, we get a memory leak because the neigh destructor will > never be called for non IPoIB devices although they carry ipoib_neigh > with them. How can this happen? If it does, I think we are back to where we started: to_ipoib_neigh is broken for non-IPoIB device. I thought you said only devices of the same type can be paired? -- MST From rdreier at cisco.com Wed Feb 7 10:39:48 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 10:39:48 -0800 Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support In-Reply-To: <20070207075339.GB20290@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Feb 2007 09:53:39 +0200") References: <20070207075339.GB20290@mellanox.co.il> Message-ID: > Well, randomness is a resource after all, and since we don't have the additional > security provided by PSNs in IPoIB UD, it seemed we do not need it for > IPoIB CM either. So maybe the right thing is just to remove the FIXME comment. random32() doesn't use up any entropy. Random PSNs help avoid problems with stale connections, so I think we should do it. I noticed some funny code in ipoib_cm_skb_reap(): __be32 mtu = cpu_to_be32(priv->mcast_mtu); // htonl(__be32)?? icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); // no htonl() here -- is this correct? icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); what is the right thing? - R. From mshefty at ichips.intel.com Wed Feb 7 10:55:00 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Feb 2007 10:55:00 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070126180840.GD12386@obsidianresearch.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> Message-ID: <45CA2084.7090503@ichips.intel.com> > Oops, I'll fix these style things and send a new patch. Jason, what's the status of this patch? (I ask because I'm starting to look at router support in the stack.) - Sean From mst at mellanox.co.il Wed Feb 7 10:57:45 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 20:57:45 +0200 Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support In-Reply-To: References: <20070207075339.GB20290@mellanox.co.il> Message-ID: <20070207185745.GD9131@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv6 RFC] IPoIB CM Experimental support > > > Well, randomness is a resource after all, and since we don't have the additional > > security provided by PSNs in IPoIB UD, it seemed we do not need it for > > IPoIB CM either. So maybe the right thing is just to remove the FIXME comment. > > random32() doesn't use up any entropy. Random PSNs help avoid problems > with stale connections, so I think we should do it. Well, stale connections don't pose any real problems for IPoIB CM - worst case a connnection is torn down and recreated. But I don't have a strong opinion anyway - that's why I put the FIXME there. So I'm OK with random32, too. > I noticed some funny code in ipoib_cm_skb_reap(): > > __be32 mtu = cpu_to_be32(priv->mcast_mtu); > > // htonl(__be32)?? > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); > // no htonl() here -- is this correct? > icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); > > what is the right thing? Both are right I think. These two functions seem to accept parameters in different format: include/net/icmp.h:extern void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info); include/linux/icmpv6.h:extern void icmpv6_send(struct sk_buff *skb, include/linux/icmpv6.h- int type, int code, include/linux/icmpv6.h- __u32 info, include/linux/icmpv6.h- struct net_device *dev); BTW, I just looked at ip_gre.c and it has the same code. -- MST From ardavis at ichips.intel.com Wed Feb 7 11:03:20 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 07 Feb 2007 11:03:20 -0800 Subject: [openib-general] dapltest? In-Reply-To: <1170862218.14381.4.camel@stevo-desktop> References: <1170862218.14381.4.camel@stevo-desktop> Message-ID: <45CA2278.3090309@ichips.intel.com> Steve Wise wrote: >Hey Arlin, > >Shouldn't dapl/test be shipped with OFED? It appears not to be... > > Yes, I will try to get to this by next week at the latest. Can you add a bugzilla report to track against? -arlin From rdreier at cisco.com Wed Feb 7 11:03:46 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 11:03:46 -0800 Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support In-Reply-To: <20070207185745.GD9131@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Feb 2007 20:57:45 +0200") References: <20070207075339.GB20290@mellanox.co.il> <20070207185745.GD9131@mellanox.co.il> Message-ID: > > I noticed some funny code in ipoib_cm_skb_reap(): > > > > __be32 mtu = cpu_to_be32(priv->mcast_mtu); > > > > // htonl(__be32)?? > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); > > // no htonl() here -- is this correct? > > icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); > > > > what is the right thing? > > Both are right I think. You're right -- the mistake is making mtu __be32 and preswapping it. I'll fix it up in my tree. > These two functions seem to accept parameters in different format: > > include/net/icmp.h:extern void icmp_send(struct sk_buff *skb_in, int type, int > code, __be32 info); > > > include/linux/icmpv6.h:extern void icmpv6_send(struct sk_buff *skb, > include/linux/icmpv6.h- int type, int code, > include/linux/icmpv6.h- __u32 info, > include/linux/icmpv6.h- struct net_device *dev); > > BTW, I just looked at ip_gre.c and it has the same code. no, it leaves mtu as an int rather than swapping it. - R. From sweitzen at cisco.com Wed Feb 7 11:07:06 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 7 Feb 2007 11:07:06 -0800 Subject: [openib-general] dapltest? In-Reply-To: <45CA2278.3090309@ichips.intel.com> References: <1170862218.14381.4.camel@stevo-desktop> <45CA2278.3090309@ichips.intel.com> Message-ID: I opened bug 350, I would like dapltest (and any other useful dapl test programs) too. Scott > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Arlin Davis > Sent: Wednesday, February 07, 2007 11:03 AM > To: Steve Wise > Cc: openib-general; Arlin Davis > Subject: Re: [openib-general] dapltest? > > Steve Wise wrote: > > >Hey Arlin, > > > >Shouldn't dapl/test be shipped with OFED? It appears not to be... > > > > > > Yes, I will try to get to this by next week at the latest. > Can you add > a bugzilla report to track against? > > -arlin > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Wed Feb 7 11:13:58 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 11:13:58 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207191154.GC11411@obsidianresearch.com> (Jason Gunthorpe's message of "Wed, 7 Feb 2007 12:11:54 -0700") References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> Message-ID: > I was going to resend it after Roland's earlier patch to clean up the > ib_init_ah_from_path was accepted.. Sorry, I started having second thoughts about the part about changing it to return void (it seems more sensible to check it the other places it's called). But I'll look at that again soon. - R. From jgunthorpe at obsidianresearch.com Wed Feb 7 11:11:54 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 7 Feb 2007 12:11:54 -0700 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <45CA2084.7090503@ichips.intel.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> Message-ID: <20070207191154.GC11411@obsidianresearch.com> On Wed, Feb 07, 2007 at 10:55:00AM -0800, Sean Hefty wrote: > >Oops, I'll fix these style things and send a new patch. > > Jason, what's the status of this patch? (I ask because I'm starting to > look at router support in the stack.) I was going to resend it after Roland's earlier patch to clean up the ib_init_ah_from_path was accepted.. I didn't get too far on getting CMA to work. Beyond the bad HopLimit feild I was seeing Hal pointed out a number of problems in IBA that would prevent it from working as is :< Jason From changquing.tang at hp.com Wed Feb 7 11:38:07 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Wed, 7 Feb 2007 19:38:07 -0000 Subject: [openib-general] Immediate data question In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> Roland: This is a followup question. If one process uses IBV_WR_SEND_WITH_IMM and IBV_SEND_INLINE to send 8 bytes, but the receiver process does not post the corresponding receive to the QP, instead, this receiver process and other processes are doing heavy RDMA_WRITE/READ traffic each other. Does this pending SEND_WITH_IMM message affect the performance of the receiver process ? Is this message buffered in the receiver's HCA, or the sender retry and get RNR ack until receiver posts a receive ? Thanks. --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Monday, February 05, 2007 5:03 PM > To: Tang, Changqing > Cc: Michael S. Tsirkin; openib-general at openib.org > Subject: Re: Immediate data question > > Changqing> Thank you. Other than using immediate data to send > Changqing> notification from one end to the other of a QP, is > Changqing> there any other way to do this ? For example, can I > Changqing> modify QP state from RTS to other state on one end, and > Changqing> then the other end gets some notification when I query > Changqing> the QP ? > > Not that I know of. You would need to do something that > triggers something to be sent on the wire, and I don't know > of any way to do that other than posting a work request. > > - R. > From mst at mellanox.co.il Wed Feb 7 11:49:49 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 21:49:49 +0200 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <212D8756-09B1-4637-ADCA-1CD8A535403A@cisco.com> References: <1170866522.6223.8.camel@vladsk-laptop> <7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com> <1170867620.6223.11.camel@vladsk-laptop> <212D8756-09B1-4637-ADCA-1CD8A535403A@cisco.com> Message-ID: <20070207194949.GB12140@mellanox.co.il> > Quoting Jeff Squyres : > Subject: Re: Open MPI rpmbuild fails in OFED-1.2 > > My $0.02: This is another in a growing list of issues reflecting the > whole "build everything in DESTDIR" is a problematic approach. I don't know much about RPM, and I am not exactly sure why are our source RPMs so complicated. However, with the plan configure/make we are able to build all openfabrics components within build directory, without any chroot tricks. So let's not give up yet, IMO it is very nice to be able to build in standard environment, without being root. Note that what is biting us here is mostly the large number of modules: simple single-module packages don't have this problem - and this is really a design decision we took. -- MST From mst at mellanox.co.il Wed Feb 7 11:55:19 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 21:55:19 +0200 Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support In-Reply-To: References: <20070207075339.GB20290@mellanox.co.il> <20070207185745.GD9131@mellanox.co.il> Message-ID: <20070207195519.GC12140@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv6 RFC] IPoIB CM Experimental support > > > > I noticed some funny code in ipoib_cm_skb_reap(): > > > > > > __be32 mtu = cpu_to_be32(priv->mcast_mtu); > > > > > > // htonl(__be32)?? > > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); > > > // no htonl() here -- is this correct? > > > icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); > > > > > > what is the right thing? > > > > Both are right I think. > > You're right -- the mistake is making mtu __be32 and preswapping it. > I'll fix it up in my tree. Let me know when you push it out, I'll start testing it. > > These two functions seem to accept parameters in different format: > > > > include/net/icmp.h:extern void icmp_send(struct sk_buff *skb_in, int type, int > > code, __be32 info); > > > > > > include/linux/icmpv6.h:extern void icmpv6_send(struct sk_buff *skb, > > include/linux/icmpv6.h- int type, int code, > > include/linux/icmpv6.h- __u32 info, > > include/linux/icmpv6.h- struct net_device *dev); > > > > BTW, I just looked at ip_gre.c and it has the same code. > > no, it leaves mtu as an int rather than swapping it. You are right of course. sparse would have found it. -- MST From swise at opengridcomputing.com Wed Feb 7 12:02:23 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 07 Feb 2007 14:02:23 -0600 Subject: [openib-general] dapl broken for iWARP Message-ID: <1170878543.30334.52.camel@stevo-desktop> Arlin, The OFED dapl code is assuming the responder_resources and initiator_depth passed up on a connection request event are from the remote peer. This doesn't happen for iWARP. In the current iWARP specifications, its up to the application to exchange this information somehow. So these are defaulting to 0 on the server side of any dapl connection over iWARP. This is a fairly recent change, I think. We need to come up with some way to deal with this for OFED 1.2 IMO. Steve. From mshefty at ichips.intel.com Wed Feb 7 12:24:08 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Feb 2007 12:24:08 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207191154.GC11411@obsidianresearch.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> Message-ID: <45CA3568.1000508@ichips.intel.com> > I didn't get too far on getting CMA to work. Beyond the bad HopLimit > feild I was seeing Hal pointed out a number of problems in IBA that > would prevent it from working as is :< I've started thinking about what it would take to get the rdma cm to work across a router. I think the rdma cm may need to treat IPv6 addresses as a GID for this to work across subnets, versus trying to map an ipoib IP address to a GID based on ARP. - Sean From mst at mellanox.co.il Wed Feb 7 11:59:14 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 21:59:14 +0200 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: <20070206150321.GA21776@mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <20070206150321.GA21776@mellanox.co.il> Message-ID: <20070207195914.GD12140@mellanox.co.il> Repost. Could everyone please look at git://git.openfabrics.org/~mst/newofed.git and tell me whether this looks acceptable? Thanks, MST Quoting r. Michael S. Tsirkin : Subject: Re: idea for ofed 1 2 kernel file structure > Quoting Michael S. Tsirkin : > It would easy to split OFED specific files In separate directory and have OFED > > All out of tree modules we distribute would go there too. > > What do others think about this? OK, I didn't quite get whether the majority likes this or not, so I created such a repository, extracted the ofed specific history and imported it there. Take a look here: git://git.openfabrics.org/~mst/newofed.git Build scripts will have to be adjusted to add necessary kernel components that we use. Another nice thing about this layout, is that users (if they so wish) will be able to use just linux kernel source tarball instead of full linux kernel git. OFED maintainers, you are the primary users of the OFED git. Please comment which layout is better for you. -- MST _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From mshefty at ichips.intel.com Wed Feb 7 13:07:48 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Feb 2007 13:07:48 -0800 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: <20070207195914.GD12140@mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <20070206150321.GA21776@mellanox.co.il> <20070207195914.GD12140@mellanox.co.il> Message-ID: <45CA3FA4.9050900@ichips.intel.com> Michael S. Tsirkin wrote: > Repost. Could everyone please look at > git://git.openfabrics.org/~mst/newofed.git > and tell me whether this looks acceptable? I don't see anything listed for this off of the web site, and cloning it produces an empty tree. - Sean From HNGUYEN at de.ibm.com Wed Feb 7 12:56:17 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 7 Feb 2007 21:56:17 +0100 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: Message-ID: > I could clone it: Should be "I could not clone it" From HNGUYEN at de.ibm.com Wed Feb 7 12:55:19 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 7 Feb 2007 21:55:19 +0100 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: <20070207195914.GD12140@mellanox.co.il> Message-ID: Hi Michael, > Repost. Could everyone please look at > git://git.openfabrics.org/~mst/newofed.git > and tell me whether this looks acceptable? I could clone it: $git clone git://git.openfabrics.org/~mst/newofed.git fatal: Unable to look up git.openfabrics.org (Temporary failure in name resolution) fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed. $git clone git://git.openfabrics.org/~mst/newofed.git fatal: Unable to look up git.openfabrics.org (Temporary failure in name resolution) fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed. I tried to use web git pointing to http://www.openfabrics.org/git/?p=~mst/newofed.git;a=tree and got this: 403 Forbidden - Reading tree failed Is there something else I need to pay attention of? Thanks Nam From mst at mellanox.co.il Wed Feb 7 13:18:00 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 23:18:00 +0200 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: References: Message-ID: <20070207211800.GI12140@mellanox.co.il> > Quoting r. Hoang-Nam Nguyen : > Subject: Re: [openib-general] RFC ofed 1 2 kernel file structure > > Hi Michael, > > Repost. Could everyone please look at > > git://git.openfabrics.org/~mst/newofed.git > > and tell me whether this looks acceptable? > I could clone it: > $git clone git://git.openfabrics.org/~mst/newofed.git > fatal: Unable to look up git.openfabrics.org (Temporary failure in name > resolution) > fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed. > $git clone git://git.openfabrics.org/~mst/newofed.git > fatal: Unable to look up git.openfabrics.org (Temporary failure in name > resolution) > fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed. > > I tried to use web git pointing to > http://www.openfabrics.org/git/?p=~mst/newofed.git;a=tree > and got this: > 403 Forbidden - Reading tree failed > > Is there something else I need to pay attention of? Pls try again. -- MST From mst at mellanox.co.il Wed Feb 7 13:18:23 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Feb 2007 23:18:23 +0200 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: <45CA3FA4.9050900@ichips.intel.com> References: <45CA3FA4.9050900@ichips.intel.com> Message-ID: <20070207211823.GJ12140@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [openib-general] RFC ofed 1 2 kernel file structure > > Michael S. Tsirkin wrote: > > Repost. Could everyone please look at > > git://git.openfabrics.org/~mst/newofed.git > > and tell me whether this looks acceptable? > > I don't see anything listed for this off of the web site, and cloning it > produces an empty tree. Pls try again now. -- MST From rdreier at cisco.com Wed Feb 7 13:26:31 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 13:26:31 -0800 Subject: [openib-general] Immediate data question In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Wed, 7 Feb 2007 19:38:07 -0000") References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> Message-ID: Changqing> Does this pending SEND_WITH_IMM message affect the Changqing> performance of the receiver process ? Is this message Changqing> buffered in the receiver's HCA, or the sender retry and Changqing> get RNR ack until receiver posts a receive ? If no receive is pending, then the responder sends an RNR NAK and the sender will wait for the RNR timeout and retry, etc. - R. From jgunthorpe at obsidianresearch.com Wed Feb 7 13:31:08 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 7 Feb 2007 14:31:08 -0700 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <45CA3568.1000508@ichips.intel.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> Message-ID: <20070207213108.GD11411@obsidianresearch.com> On Wed, Feb 07, 2007 at 12:24:08PM -0800, Sean Hefty wrote: > >I didn't get too far on getting CMA to work. Beyond the bad HopLimit > >feild I was seeing Hal pointed out a number of problems in IBA that > >would prevent it from working as is :< > > I've started thinking about what it would take to get the rdma cm to work > across a router. I think the rdma cm may need to treat IPv6 addresses as a > GID for this to work across subnets, versus trying to map an ipoib IP > address to a GID based on ARP. I don't think that is the main problem - though clearly the way things are now (for better or worse) rdma cm requires the IPoIB subnet to span all of the IB subnets.. The main problem with the protocol is in the LID selection for routed paths on the passive side. It can't rely on the active side to identify the lids if a router is involved. One feature I've thought has been underused in IBA is the raw IPv6 packet feature. It would be nice to have a linux netdev interface to be able to do IPv6 traffic using GID addressing. That would seem to me to be the natural way to bolt native GID addressing into rdma cm.. Jason From rdreier at cisco.com Wed Feb 7 13:35:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 13:35:25 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <45CA3568.1000508@ichips.intel.com> (Sean Hefty's message of "Wed, 07 Feb 2007 12:24:08 -0800") References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> Message-ID: > I've started thinking about what it would take to get the rdma cm to > work across a router. I think the rdma cm may need to treat IPv6 > addresses as a GID for this to work across subnets, versus trying to > map an ipoib IP address to a GID based on ARP. Hmm, why is that? Shouldn't IPoIB work through a router, and correctly get the GID of the final destination via ARP just fine? If the RDMA CM treats IPv6 addresses as GIDs, then this breaks things on a normal subnet with IPoIB interfaces configured with IPv6 addresses. - R. From swise at opengridcomputing.com Wed Feb 7 13:57:40 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 07 Feb 2007 15:57:40 -0600 Subject: [openib-general] dapl broken for iWARP In-Reply-To: <1170878543.30334.52.camel@stevo-desktop> References: <1170878543.30334.52.camel@stevo-desktop> Message-ID: <1170885460.31481.0.camel@stevo-desktop> On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > Arlin, > > The OFED dapl code is assuming the responder_resources and > initiator_depth passed up on a connection request event are from the > remote peer. This doesn't happen for iWARP. In the current iWARP > specifications, its up to the application to exchange this information > somehow. So these are defaulting to 0 on the server side of any dapl > connection over iWARP. > > This is a fairly recent change, I think. We need to come up with some > way to deal with this for OFED 1.2 IMO. > The IWCM could set these to the device max values for instance. Steve. From jgunthorpe at obsidianresearch.com Wed Feb 7 14:03:04 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 7 Feb 2007 15:03:04 -0700 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> Message-ID: <20070207220304.GE11411@obsidianresearch.com> On Wed, Feb 07, 2007 at 01:35:25PM -0800, Roland Dreier wrote: > Hmm, why is that? Shouldn't IPoIB work through a router, and > correctly get the GID of the final destination via ARP just fine? Basically, if IB routers are used, and the IPoIB feature of *not* spanning a subnet is used (for scalabililty?) then you need an alternate way to specify addresses to rdma cm. I agree that special casing some IPv6 addresses is a bad idea. It needs to be integrated correctly with NET and the routing table/etc Jason From mst at mellanox.co.il Wed Feb 7 14:14:49 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Feb 2007 00:14:49 +0200 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: <20070207195914.GD12140@mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <20070206150321.GA21776@mellanox.co.il> <20070207195914.GD12140@mellanox.co.il> Message-ID: <20070207221449.GL12140@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: RFC ofed 1 2 kernel file structure > > Repost. Could everyone please look at > git://git.openfabrics.org/~mst/newofed.git > and tell me whether this looks acceptable? All, pls try now. -- MST From bos at pathscale.com Wed Feb 7 14:18:56 2007 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 07 Feb 2007 14:18:56 -0800 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: <20070207221449.GL12140@mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <20070206150321.GA21776@mellanox.co.il> <20070207195914.GD12140@mellanox.co.il> <20070207221449.GL12140@mellanox.co.il> Message-ID: <45CA5050.2070105@pathscale.com> Michael S. Tsirkin wrote: > All, pls try now. This is similar in layout to the sort of tree we've used internally all along, so it's fine by me. One small problem: I don't like the combination of lower and upper case names of makefile and Makefile in the top-level directory. Also, it's no longer obvious to me to tell what kernel version the sources are pulled from. I used to be able to check the top-level Makefile or git history, but I no longer know what to look at. References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> > Changqing> Does this pending SEND_WITH_IMM message > affect the > Changqing> performance of the receiver process ? Is this message > Changqing> buffered in the receiver's HCA, or the sender retry and > Changqing> get RNR ack until receiver posts a receive ? > > If no receive is pending, then the responder sends an RNR NAK > and the sender will wait for the RNR timeout and retry, etc. What I mean is that, is there any performance penalty for receiver's overall performance if RNR happens continuously on one of the QP ? --CQ > > - R. > From mshefty at ichips.intel.com Wed Feb 7 14:31:10 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Feb 2007 14:31:10 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207220304.GE11411@obsidianresearch.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207220304.GE11411@obsidianresearch.com> Message-ID: <45CA532E.3050605@ichips.intel.com> > Basically, if IB routers are used, and the IPoIB feature of *not* > spanning a subnet is used (for scalabililty?) then you need an > alternate way to specify addresses to rdma cm. This was the case I was thinking of. Without global IB name service resolution, how do you get the GID of the remote system? > I agree that special casing some IPv6 addresses is a bad idea. It > needs to be integrated correctly with NET and the routing table/etc I haven't given this more than a few minutes of thought, but I was thinking more along the lines of a port having an assigned GID that's the same as an assigned IPv6 address. (Is there some reason this wouldn't work?) IP name service resolution would map the name to the IPv6 address. The mapping from the IPv6 address to a GID would then be straightforward, as opposed to using a mapping using ARP. If name service resolution gives me an IPv6 address that's off of the local subnet, but the ARP response gives me an address that's on the local subnet, then I think we can assume that ARP was unsuccessful is resolving the address to the remote GID. (I.e. the GID should be for a router.) If this is true, then we need some other way to acquire the DGID. - Sean From pw at osc.edu Wed Feb 7 14:31:46 2007 From: pw at osc.edu (Pete Wyckoff) Date: Wed, 7 Feb 2007 17:31:46 -0500 Subject: [openib-general] sharing qp between user and kernel Message-ID: <20070207223146.GA28637@osc.edu> We're writing a kernel module that is an IB verbs consumer. The plan was to connect up the QP in userspace and do some preliminary communication, then hand the QP to the kernel and let it use the QP directly to do some more communication. This works fine on ammasso, but fails on mthca. In particular, this code in mthca_alloc_wqe_buf(): /* * If this is a userspace QP, we don't actually have to * allocate anything. All we need is to calculate the WQE * sizes and the send_wqe_offset, so we're done now. */ if (pd->ibpd.uobject) return 0; prevents the allocation of space for WQEs required by kernel-initiated posts. Just commenting out this section led to failures elsewhere (local prot error on a userspace cq poll for a receive). Before I dig into this anymore, do you expect this to work? Are there fundamental problems with QP sharing between user and kernel? It would sure be nice not to have to stick the connection management aspects into the kernel. -- Pete From swise at opengridcomputing.com Wed Feb 7 14:40:48 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 07 Feb 2007 16:40:48 -0600 Subject: [openib-general] sharing qp between user and kernel In-Reply-To: <20070207223146.GA28637@osc.edu> References: <20070207223146.GA28637@osc.edu> Message-ID: <1170888048.31481.3.camel@stevo-desktop> On Wed, 2007-02-07 at 17:31 -0500, Pete Wyckoff wrote: > is an IB verbs consumer. The > plan was to connect up the QP in userspace and do some preliminary > communication, then hand the QP to the kernel and let it use the QP > directly to do some more communication. This works fine on ammasso, > but fails on mthca. I think the only reason it works on ammasso is because ammasso doesn't do any kernel bypass. For devices that _do_ kernel bypass, I'm not sure it will work. It will _not_ work for the Chelsio iWARP device as its implemented today. Once the decision is made to do kernel bypass, the kernel looses track of the state of the resources shared by HW and library. Steve. From mshefty at ichips.intel.com Wed Feb 7 14:40:51 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Feb 2007 14:40:51 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207213108.GD11411@obsidianresearch.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> Message-ID: <45CA5573.80802@ichips.intel.com> > I don't think that is the main problem - though clearly the way things > are now (for better or worse) rdma cm requires the IPoIB subnet to > span all of the IB subnets.. The main problem with the protocol is in > the LID selection for routed paths on the passive side. It can't rely > on the active side to identify the lids if a router is involved. Are you referring to the SLID in the CM REQ? If so, I've been looking at this issue as well. I simply cannot think of any way to come up with this LID, and my current solution is to punt this problem over to the passive side, which could use the SLID of the router that the CM REQ is received from. If not, well, then I just rambled more than usual. - Sean From jgunthorpe at obsidianresearch.com Wed Feb 7 14:49:28 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 7 Feb 2007 15:49:28 -0700 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <45CA5573.80802@ichips.intel.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> Message-ID: <20070207224928.GF11411@obsidianresearch.com> On Wed, Feb 07, 2007 at 02:40:51PM -0800, Sean Hefty wrote: > Are you referring to the SLID in the CM REQ? If so, I've been looking at > this issue as well. I simply cannot think of any way to come up with this > LID, and my current solution is to punt this problem over to the passive > side, which could use the SLID of the router that the CM REQ is received > from. If not, well, then I just rambled more than usual. Yes, this is the problem. The active side clearly cannot learn what the SLID of the passive side's router should be. We don't want to have the routers snoop and alter CM GMPs. The passive side cannot use information from the LRH to get the router LID since the LRH may not be reversible. The only option seems to be to have the passive side do a path record query on a SGID in the CM REQ... This is a spec problem unfortunately. Jason From ardavis at ichips.intel.com Wed Feb 7 15:05:38 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 07 Feb 2007 15:05:38 -0800 Subject: [openib-general] dapl broken for iWARP In-Reply-To: <1170885460.31481.0.camel@stevo-desktop> References: <1170878543.30334.52.camel@stevo-desktop> <1170885460.31481.0.camel@stevo-desktop> Message-ID: <45CA5B42.6090503@ichips.intel.com> Steve Wise wrote: >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > >>Arlin, >> >>The OFED dapl code is assuming the responder_resources and >>initiator_depth passed up on a connection request event are from the >>remote peer. This doesn't happen for iWARP. In the current iWARP >>specifications, its up to the application to exchange this information >>somehow. So these are defaulting to 0 on the server side of any dapl >>connection over iWARP. >> >>This is a fairly recent change, I think. We need to come up with some >>way to deal with this for OFED 1.2 IMO. >> >> Yes, this was changed recently to sync up with the rdma_cm changes that exposed the values. >> >> > >The IWCM could set these to the device max values for instance. > > That would work fine as long as you know the remote settings will be equal or better. The provider just sets the min of local device max values and the remote values provided with the request. -arlin From mshefty at ichips.intel.com Wed Feb 7 15:09:17 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Feb 2007 15:09:17 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207224928.GF11411@obsidianresearch.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> Message-ID: <45CA5C1D.4060009@ichips.intel.com> > We don't want to have the routers snoop and alter CM GMPs. agreed > The passive side cannot use information from the LRH to get the router > LID since the LRH may not be reversible. argh... I was interpreting symmetric paths at the network layer (SGID to DGID) and applying it at the link layer as well. (See the last couple of sentences on page 222 of the spec.) > The only option seems to be to have the passive side do a path record > query on a SGID in the CM REQ... I've thought of that as well, and this is what Yaron mentioned in his OFA DevCon slides as well. I'd just like to avoid adding even more complexity to the ib_cm state management if at all possible. > This is a spec problem unfortunately. aye... - Sean From swise at opengridcomputing.com Wed Feb 7 15:12:09 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 07 Feb 2007 17:12:09 -0600 Subject: [openib-general] dapl broken for iWARP In-Reply-To: <45CA5B42.6090503@ichips.intel.com> References: <1170878543.30334.52.camel@stevo-desktop> <1170885460.31481.0.camel@stevo-desktop> <45CA5B42.6090503@ichips.intel.com> Message-ID: <1170889929.31481.11.camel@stevo-desktop> On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote: > Steve Wise wrote: > > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > > > > >>Arlin, > >> > >>The OFED dapl code is assuming the responder_resources and > >>initiator_depth passed up on a connection request event are from the > >>remote peer. This doesn't happen for iWARP. In the current iWARP > >>specifications, its up to the application to exchange this information > >>somehow. So these are defaulting to 0 on the server side of any dapl > >>connection over iWARP. > >> > >>This is a fairly recent change, I think. We need to come up with some > >>way to deal with this for OFED 1.2 IMO. > >> > >> > Yes, this was changed recently to sync up with the rdma_cm changes that > exposed the values. > > >> > >> > > > >The IWCM could set these to the device max values for instance. > > > > > That would work fine as long as you know the remote settings will be > equal or better. The provider just sets the min of local device max > values and the remote values provided with the request. > I know Krishna Kumar is working on a solution for exchanging this info in private data so the IWCM can "do the right thing". Stay tuned for a patch series to review for this. But this functionality is definitely post OFED-1.2. So for the OFED-1.2, I will set these to the device max in the IWCM. Assuming the other side is OFED 1.2 DAPL, then it will work fine. Steve. From jgunthorpe at obsidianresearch.com Wed Feb 7 15:33:57 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 7 Feb 2007 16:33:57 -0700 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <45CA532E.3050605@ichips.intel.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207220304.GE11411@obsidianresearch.com> <45CA532E.3050605@ichips.intel.com> Message-ID: <20070207233357.GG11411@obsidianresearch.com> On Wed, Feb 07, 2007 at 02:31:10PM -0800, Sean Hefty wrote: > >I agree that special casing some IPv6 addresses is a bad idea. It > >needs to be integrated correctly with NET and the routing table/etc > I haven't given this more than a few minutes of thought, but I was thinking > more along the lines of a port having an assigned GID that's the same as an > assigned IPv6 address. (Is there some reason this wouldn't work?) IP name > service resolution would map the name to the IPv6 address. The mapping > from the IPv6 address to a GID would then be straightforward, as opposed to > using a mapping using ARP. Right, I also like the idea of using DNS as a global GID name service. > If name service resolution gives me an IPv6 address that's off of the local > subnet, but the ARP response gives me an address that's on the local > subnet, then I think we can assume that ARP was unsuccessful is resolving > the address to the remote GID. (I.e. the GID should be for a router.) If > this is true, then we need some other way to acquire the DGID. This is where I think you have problems... Why would you ARP for an off-subnet address? Why would the router answer? You push the address through the route table and ARP the router address that results. All of that is why I think another netdevice is a tidy solution. ping6/tcp/etc using this device would generate packets that follow the same path as RMDA connections would. No special rules about broadcast groups are required. The route table is used to instruct the kernel what IPv6 prefixes are IB GIDs and which are not by associating the output of the route with the ib0 device. The admins can use any means to set that up. Something that looks like: $ ip addr 1: ib0: mtu 2048 qdisc pfifo_fast qlen 1000 link/ib [my GID..] inet6 fe80::c2/64 scope link dynamic <<-- My LL GID inet6 2000::c2/64 scope global dynmaic <<-- My GID Both are maintained by the kernel. $ ip -6 route fe80::/64 dev ib0 2000::/64 dev ib0 src 2000::c2 2001::/64 dev ib0 src 2000::c2 <<-- Tells the kernel that 2001::/64 is a GID and to use path records to do lookups at the SM 2002::/64 via fe80::a0 ib0 src 2000::c2 <<--- 2002::/64 is a GID but don't query the SM and direct things to IB router fe80::a0 $ ping6 -I ib0 2001::b1 ^--- Generate packet structured as: LRH,GRH,ICMP6,PING_DATA Set the GRH.SGID to 2000::c2, DGID to 2001::b1 as per the route table Do a SM Path Record query for 2001::b1 and use that to set the LRH $ ping6 -I ib0 2002::b1 ^--- Generate packet structured as: LRH,GRH,ICMP6,PING_DATA Set the GRH.SGID to 2000::c2, DGID to 2002::b1 as per the route table Do a SM Path Record query for fe80::a0 and use that to set the LRH $ traceroute6 -I ib0 2001::b1 ^--- Same as the ping, except the IB router can capture the packet when the hop limit runs out an produce an ICMP error. Note: In all three cases the LRH.LNH would be set to 1 (non-IBA raw IPv6). RDMA CM would use the usual value of 3. This also provides at least a mechanism, if not a full solution, to the MTU problem. Linux already allows route entries to specify a MTU and with closer integration of the raw IPV6 stuff it becomes possible for routers to send ICMP6 errors as raw IPv6 packet and for Linux to capture them and update the route. The ICMP6 errors are crucial to having path MTU type functions converge quickly. RDMA CM would use the same rules for addressing CM packets. A further refinement would be to layer the entire path record query mechanism in the kernel over this so that the admin has local control over the IB routing table (if desired). A 2nd refinement would be to use the ND cache of such an ib0 device as a local path record query cache (again lets the admin see what is going on and override/discard SA queries using the usual 'ip neigh' command). There might even be good potential for sa replication using the already existing userspace arpd stuff. Overall I would just view something like this as further integrating the IB stack with the existing rich services provided by NET rather than trying to duplicate a small portion of them with seperate interfaces. [For instance with something like this netlink could be used instead of the sysfs probing for many cases] But yes, it is a bit outside what the current framework envisions.. Jason From rdreier at cisco.com Wed Feb 7 15:41:43 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 15:41:43 -0800 Subject: [openib-general] Immediate data question In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Wed, 7 Feb 2007 22:28:48 -0000") References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> Message-ID: Changqing> What I mean is that, is there any performance penalty Changqing> for receiver's overall performance if RNR happens Changqing> continuously on one of the QP ? Not for the receiver, but the sender will be severely slowed down by having to wait for the RNR timeouts. From rdreier at cisco.com Wed Feb 7 15:43:40 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 15:43:40 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207220304.GE11411@obsidianresearch.com> (Jason Gunthorpe's message of "Wed, 7 Feb 2007 15:03:04 -0700") References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207220304.GE11411@obsidianresearch.com> Message-ID: Jason> Basically, if IB routers are used, and the IPoIB feature of Jason> *not* spanning a subnet is used (for scalabililty?) then Jason> you need an alternate way to specify addresses to rdma cm. You mean if the IB router is also an IP router for IPoIB? Then I think there are some serious semantic problems to solve for the RDMA CM -- because you are using an IP address to define a destination, but since that address is on the other side of an IP router, there's no way to know it even belongs to an IB port. - R. From rdreier at cisco.com Wed Feb 7 15:50:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Feb 2007 15:50:25 -0800 Subject: [openib-general] sharing qp between user and kernel In-Reply-To: <20070207223146.GA28637@osc.edu> (Pete Wyckoff's message of "Wed, 7 Feb 2007 17:31:46 -0500") References: <20070207223146.GA28637@osc.edu> Message-ID: Pete> Before I dig into this anymore, do you expect this to work? Pete> Are there fundamental problems with QP sharing between user Pete> and kernel? It would sure be nice not to have to stick the Pete> connection management aspects into the kernel. No, I wouldn't expect this to work. At first glance at least, yes, there are fundamental problems. Sharing a QP between user and kernelspace, where userspace is doing full kernel bypass (as eg mthca does -- there are NO system calls when doing post work request, poll CQ and request CQ notification operations), seems like a huge problem. I don't see any way that the kernel can keep a consistent view of the QP state unless userspace has to call into the kernel for every operation, which would kill performance. - R. From halr at voltaire.com Wed Feb 7 16:19:56 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Feb 2007 19:19:56 -0500 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <45CA3568.1000508@ichips.intel.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> Message-ID: <1170893995.31538.23321.camel@hal.voltaire.com> On Wed, 2007-02-07 at 15:24, Sean Hefty wrote: > > I didn't get too far on getting CMA to work. Beyond the bad HopLimit > > feild I was seeing Hal pointed out a number of problems in IBA that > > would prevent it from working as is :< > > I've started thinking about what it would take to get the rdma cm to work across > a router. I think the rdma cm may need to treat IPv6 addresses as a GID for > this to work across subnets, versus trying to map an ipoib IP address to a GID > based on ARP. An IB GID is IPv6 like but not an IPv6 address so I don't think this is a good idea and don't see how you get around mapping IP addresses to GIDs in an IB routed network given the way things are spec'd. I think that the RDMA CM assumes a single IPoIB subnet. Does it work when the destination is on another subnet ? I think there are some unaddressed gateway issues here to make that work and these may have been punted (during spec time). Arkady might be a good person to comment on this. -- Hal > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Wed Feb 7 16:23:47 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Feb 2007 19:23:47 -0500 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207213108.GD11411@obsidianresearch.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> Message-ID: <1170894226.31538.23544.camel@hal.voltaire.com> On Wed, 2007-02-07 at 16:31, Jason Gunthorpe wrote: > On Wed, Feb 07, 2007 at 12:24:08PM -0800, Sean Hefty wrote: > > >I didn't get too far on getting CMA to work. Beyond the bad HopLimit > > >feild I was seeing Hal pointed out a number of problems in IBA that > > >would prevent it from working as is :< > > > > I've started thinking about what it would take to get the rdma cm to work > > across a router. I think the rdma cm may need to treat IPv6 addresses as a > > GID for this to work across subnets, versus trying to map an ipoib IP > > address to a GID based on ARP. > > I don't think that is the main problem - though clearly the way things > are now (for better or worse) rdma cm requires the IPoIB subnet to > span all of the IB subnets.. The main problem with the protocol is in > the LID selection for routed paths on the passive side. It can't rely > on the active side to identify the lids if a router is involved. > > One feature I've thought has been underused in IBA is the raw IPv6 > packet feature. I thought raw support (including IPv6 header) although still in the spec was largely deprecated as the CRC protection was deemed too weak. -- Hal > It would be nice to have a linux netdev interface to > be able to do IPv6 traffic using GID addressing. That would seem to me > to be the natural way to bolt native GID addressing into rdma > cm.. > > Jason > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Wed Feb 7 16:27:41 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Feb 2007 19:27:41 -0500 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207224928.GF11411@obsidianresearch.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> Message-ID: <1170894459.31538.23768.camel@hal.voltaire.com> On Wed, 2007-02-07 at 17:49, Jason Gunthorpe wrote: > On Wed, Feb 07, 2007 at 02:40:51PM -0800, Sean Hefty wrote: > > Are you referring to the SLID in the CM REQ? If so, I've been looking at > > this issue as well. I simply cannot think of any way to come up with this > > LID, and my current solution is to punt this problem over to the passive > > side, which could use the SLID of the router that the CM REQ is received > > from. If not, well, then I just rambled more than usual. > > Yes, this is the problem. > > The active side clearly cannot learn what the SLID of the passive > side's router should be. > > We don't want to have the routers snoop and alter CM GMPs. > > The passive side cannot use information from the LRH to get the router > LID since the LRH may not be reversible. > > The only option seems to be to have the passive side do a path record > query on a SGID in the CM REQ... > > This is a spec problem unfortunately. Yes and I would expect that this would be changed. -- Hal > > Jason > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From jgunthorpe at obsidianresearch.com Wed Feb 7 17:30:55 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 7 Feb 2007 18:30:55 -0700 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <1170894226.31538.23544.camel@hal.voltaire.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <1170894226.31538.23544.camel@hal.voltaire.com> Message-ID: <20070208013055.GH11411@obsidianresearch.com> On Wed, Feb 07, 2007 at 07:23:47PM -0500, Hal Rosenstock wrote: > > One feature I've thought has been underused in IBA is the raw IPv6 > > packet feature. > > I thought raw support (including IPv6 header) although still in the spec > was largely deprecated as the CRC protection was deemed too weak. I would envision using the raw support primarily for ICMP6. Ie diganostics (ping/traceroute) and router messages (Packet to big, ICMP Redirect, etc). Not to offset IPoIB as a high performance solution. In this role the reduced MTU that you get because of CRC-16's limited protection shouldn't be a big problem. Jason From sean.hefty at intel.com Wed Feb 7 19:23:47 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 7 Feb 2007 19:23:47 -0800 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets In-Reply-To: <20070207233357.GG11411@obsidianresearch.com> Message-ID: <000001c74b30$8bca8fd0$3dd4180a@amr.corp.intel.com> >> If name service resolution gives me an IPv6 address that's off of the local >> subnet, but the ARP response gives me an address that's on the local >> subnet, then I think we can assume that ARP was unsuccessful is resolving >> the address to the remote GID. (I.e. the GID should be for a router.) If >> this is true, then we need some other way to acquire the DGID. > >This is where I think you have problems... Why would you ARP for an >off-subnet address? Why would the router answer? You push the address >through the route table and ARP the router address that results. I'm confusing myself. I was considering different IB subnets, and trying to determine whether they shared the same IP subnet. The GIDs may have different subnet prefixes, but the IP addresses may not, and I'm not sure how to relate this back to using DNS. >All of that is why I think another netdevice is a tidy >solution. ping6/tcp/etc using this device would generate packets that >follow the same path as RMDA connections would. No special rules about >broadcast groups are required. The route table is used to instruct the >kernel what IPv6 prefixes are IB GIDs and which are not by associating >the output of the route with the ib0 device. The admins can use any >means to set that up. Something that looks like: At first glance, this seems like a decent approach to explore. >But yes, it is a bit outside what the current framework envisions.. I'm fine with that. My short-term objective is to enable basic router support within the host stack, and I think I have an idea of what that takes. I'd just also like to have an idea of how an application could transfer data between routed IB subnets, including providing a way for the application to locate a given remote node. - Sean From mst at mellanox.co.il Wed Feb 7 20:37:27 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Feb 2007 06:37:27 +0200 Subject: [openib-general] RFC ofed 1 2 kernel file structure In-Reply-To: <45CA5050.2070105@pathscale.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <20070206150321.GA21776@mellanox.co.il> <20070207195914.GD12140@mellanox.co.il> <20070207221449.GL12140@mellanox.co.il> <45CA5050.2070105@pathscale.com> Message-ID: <20070208043727.GP12140@mellanox.co.il> > Quoting Bryan O'Sullivan : > Subject: Re: RFC ofed 1 2 kernel file structure > > Michael S. Tsirkin wrote: > > > All, pls try now. > > This is similar in layout to the sort of tree we've used internally all > along, so it's fine by me. One small problem: I don't like the > combination of lower and upper case names of makefile and Makefile in > the top-level directory. ofed_1_2 has the same. > Also, it's no longer obvious to me to tell what kernel version the > sources are pulled from. I used to be able to check the top-level > Makefile or git history, but I no longer know what to look at. This will be part of BUILD_ID. -- MST From mst at mellanox.co.il Wed Feb 7 22:40:55 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Feb 2007 08:40:55 +0200 Subject: [openib-general] more comments on cxgb3 Message-ID: <20070208064055.GR12140@mellanox.co.il> OK, so I looked at cxgb3 some more. To summarise my previous comments, I think the cxio hal layer needs to go to make the code readable - if I understand correctly it is there for historical reasons only. I started looking at userspace/kernel interaction, and then went over to other code under cxgb3 (but not core/). - Consider a user that does e.g. create QP, but never calls mmap. Is there some code that will clean out the unclamed mmap object? I couldn't find it, and iwch_dealloc_ucontext does not seem to do anything with it. - Passing physical address to userspace and back looks suspicios. Especially this: uresp.physaddr = virt_to_phys(chp->cq.queue); Could you elaborate on the design here? What are these phy addresses and how come userspace needs to know the phy address? You are not doing DMA by this address, by any chance? - It seems that by passing in huge resource sizes, userspace will be able to drink up unlimited amounts of kernel memory. mthca handles this by using the mlock rlimit, should something be done here as well? A couple of comments on PDBG macro. - I'd like to suggest following the practice of prefixing macro names with module name (same goes for functions like get_mhp really) - unless they are local to file. - You are using __FUNCTION__ a lot - it might be to just to kill it, messages are unique so you'll be able to locate the msg source anyway, save some kernel text and logs will be shorter. In any case I think __func__ is the recommended gcc way to get the name currently. - comment near pr_debug definition in include/linux/kernel.h says: /* If you are writing a driver, please use dev_dbg instead */ so it might be a good idea for PDBG to follow this rule. - log messages do not look very informative to me. I also think they are a bit too many of them currently. For example, I do not think it is a good idea to print the kernel pointers out. For example, in code like the following: mhp = get_mhp(rhp, (sg_list[i].lkey) >> 8); if (!mhp) { PDBG("%s %d\n", __FUNCTION__, __LINE__); return -EIO; } might be better to say "MR key XXX does not exist. Exiting.". -EIO also looks like a strange error code to return here, does it not? Maybe something like EINVAL would be more appropriate? - I wonder about the names like get_mhp - what does "mhp" mean? static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid) { return idr_find(&rhp->mmidr, mmid); } Looks like it looks up an mr. Is that right? Maybe the name shouldbe changed to convey this meaning. - In the following code, what does "missing pdid check" mean? /* * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. */ This sounds interesting. Does this mean the code does not validate the PD currently? I have the same question for: /* TBD: check perms */ in iwch_bind_mw. BTW, does TBD stand for "To Be Done" here? google says: >Definitions of TBD on the Web: * To Be Determined, Defined, Decided. www.csr.com/ptot.htm * to be determined www.liberalsagainstterrorism.com/wiki/index.php/Counterinsurgency_Operations/Glossary * Treasury Board (Secretariat) www.psc-cfp.gc.ca/centres/definitions_and_notes_e.htm * The three letter abbreviation TBD may be/mean, depending on context: * an acronym for "To Be Determined" ("...at a later point in time.", typically)* the Douglas Devastator, a US Navy torpedo bomber of World War II en.wikipedia.org/wiki/TBD What is to be determined here? Do you mean TODO really? - iwch_sgl2pbl_map is used in several places, and seems a bit too big to be inline Well, it's time to go do my day job now :) Hope this helps, -- MST From monil at voltaire.com Wed Feb 7 23:02:31 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 8 Feb 2007 09:02:31 +0200 Subject: [openib-general] issues with compilation of ofed 1.2 In-Reply-To: <45C9EE31.2040602@voltaire.com> References: <45C9EE31.2040602@voltaire.com> Message-ID: <6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com> Doug, On 2/7/07, Yosef Etigin wrote: > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro. Can you please help us with that ? -- Moni > > -- > Yosef Etigin > Alex Tabachnik > From vlad at lists.openfabrics.org Thu Feb 8 02:24:24 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 8 Feb 2007 02:24:24 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070208-0200 daily build status Message-ID: <20070208102424.660EAE60808@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From amip at mellanox.co.il Thu Feb 8 04:33:29 2007 From: amip at mellanox.co.il (Ami Perlmutter) Date: Thu, 8 Feb 2007 14:33:29 +0200 Subject: [openib-general] bug in netpipe Message-ID: <6C2C79E72C305246B504CBA17B5500C9C41CE2@mtlexch01.mtl.com> Hi I've been running netpipe over Infiniband's SDP and uncovered a race when using the "-r" option. The problem is when both sides close their sockets, the listening socket is closed last, which allows a faster client to try to connect to it before it closes. When this happens, next time the client tries to use the socket it gets a connection reset error. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at cisco.com Thu Feb 8 05:35:13 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 8 Feb 2007 08:35:13 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <20070207194949.GB12140@mellanox.co.il> References: <1170866522.6223.8.camel@vladsk-laptop> <7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com> <1170867620.6223.11.camel@vladsk-laptop> <212D8756-09B1-4637-ADCA-1CD8A535403A@cisco.com> <20070207194949.GB12140@mellanox.co.il> Message-ID: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com> On Feb 7, 2007, at 2:49 PM, Michael S. Tsirkin wrote: >> My $0.02: This is another in a growing list of issues reflecting the >> whole "build everything in DESTDIR" is a problematic approach. > > I don't know much about RPM, and I am not exactly sure why are > our source RPMs so complicated. It's a combination of two things: 1) similar to what you said below, we have lots of software packages that are all dependent upon each other --> this leads to a conglomeration of rpath's and shared library dependencies that are incorrect 2) we're trying to *use* the software when it is installed in the DESTDIR --> this means that you have to put special-case in the software so that they look for support files in both the DESTDIR *and* the final installation directory One way to think about what we're doing is making a disk image and then snapshotting RPMs from that disk image. That's a natural candidate for chroot. > However, with the plan configure/make we are able to > build all openfabrics components within build directory, > without any chroot tricks. > > So let's not give up yet, IMO it is very nice to be able to build in > standard environment, without being root. Yes, it's nice from the user perspective. But it's fairly annoying from the developer point of view because you have to add all these special cases. > Note that what is biting us here is mostly the large number of > modules: > simple single-module packages don't have this problem - and this > is really a design decision we took. Understood. I guess I'm asking whether all these special cases required to support the DESTDIR approach are a) worth it, b) going to take more time to get right (which end up being somewhat fragile) than to use a chroot/image-based approach. Again, just my $0.02, and I might get shouted down. But I thought I'd at least ask... :-) -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Thu Feb 8 05:43:05 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Feb 2007 15:43:05 +0200 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com> References: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com> Message-ID: <20070208134305.GC20183@mellanox.co.il> > Quoting Jeff Squyres : > Subject: Re: Open MPI rpmbuild fails in OFED-1.2 > > On Feb 7, 2007, at 2:49 PM, Michael S. Tsirkin wrote: > > >> My $0.02: This is another in a growing list of issues reflecting the > >> whole "build everything in DESTDIR" is a problematic approach. > > > > I don't know much about RPM, and I am not exactly sure why are > > our source RPMs so complicated. > > It's a combination of two things: > > 1) similar to what you said below, we have lots of software packages > that are all dependent upon each other > --> this leads to a conglomeration of rpath's and shared library > dependencies that are incorrect > > 2) we're trying to *use* the software when it is installed in the > DESTDIR > --> this means that you have to put special-case in the software so > that they look for support files in both the DESTDIR *and* the final > installation directory How do you mean, use? Hmm. I guess my question is - this works fine when I run OFED's configure script, why is SRPM so much more difficult? -- MST From jsquyres at cisco.com Thu Feb 8 06:14:59 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 8 Feb 2007 09:14:59 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <20070208134305.GC20183@mellanox.co.il> References: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com> <20070208134305.GC20183@mellanox.co.il> Message-ID: <4FE6D655-AF12-4EF5-A058-4CFC9597C12B@cisco.com> On Feb 8, 2007, at 8:43 AM, Michael S. Tsirkin wrote: >> 2) we're trying to *use* the software when it is installed in the >> DESTDIR >> --> this means that you have to put special-case in the software so >> that they look for support files in both the DESTDIR *and* the final >> installation directory > > How do you mean, use? The easiest example is that MPI wrapper compilers are used to compile the MPI test suites (mpicc, etc.). This means that OMPI's libraries and support files (e.g., help files, wrapper compiler data files) need to be found, even though they're not in their final installation locations. > Hmm. I guess my question is - this works fine when I run OFED's > configure script, why is SRPM so much more difficult? It's not the single SRPM that is the problem. We've had an OMPI SRPM that works fine for a long, long time. A single DESTDIR build is no problem, especially for an Autoconf/Automake/Libtool-based project like Open MPI. The problems are: - libibverbs and other support libraries are in the DESTDIR when OMPI is built (but eventually will move). So OMPI has to rpath *BOTH* locations for libibverbs (i.e., the DESTDIR and the final installdir), one of which will be a lie. God help you if you're trying to build OFED on a machine where a previous version of OFED is installed -- i.e., where libibverbs exists in *BOTH* the DESTDIR and the final prefix! (this specific problem actually caused me to waste a few hours while developing the new OMPI build stuff in build.sh last week) - I didn't look closely at the OFED 1.2 build scripts yet, but we ran into problems during the development of OFED 1.1 where dependent OFA libraries needed to link to each other. In OFED, those links were simply removed because of the whole DESTDIR/installdir duality. This actually caused problems in some scenarios. IIRC, the one I remember is that the link between libmthca and libibverbs was effectively removed by removing AC_CHECK_LIB from libmthca's configure.ac (recall that mthca uses some of the public symbols in libibverbs) because AC_CHECK_LIB was looking in the installdir. That may not be 100% right -- I don't recall all the details. - we *use* OMPI in the DESTDIR (and MVAPICH), as described above. I had to add a patch to the upcoming OMPI v1.2 community release to first examine the environment and look for a specific variable to re- root all of the compiled-in directories (it's too late in the OMPI v1.2 release process to put this patch in the official v1.2 release). What a pain. :-\ So the OMPI path issue is resolvable (at the cost of adding a whole pile of code to OMPI), but the rpath issue is not. Once you link an app, its rpaths are fixed and you can't change them based on an environment variable. Hence, the only solution is to rpath *both* directories, but even that has problems and ambiguities (as described above). In fairness, we could tell the user to set LD_LIBRARY_PATH, but no one seems to want to do that (users always screw it up, and it becomes problematic for rsh/ssh-based scenarios). All this plus the fact that we're clearly going outside of what the SuSE RPM developers intend just indicates to me that this doesn't seem Right... -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From halr at voltaire.com Thu Feb 8 06:19:43 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Feb 2007 09:19:43 -0500 Subject: [openib-general] [PATCH] OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch Message-ID: <1170944383.31538.74632.camel@hal.voltaire.com> OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch This change resolves an issue with strange SL assignment when two HCAs communicate with other and are on the same switch. Since LASH is switch to switch routing, the get_lash_sl function was casting 9999 (the value assigned to the variable NONE) to be a uint8_t when asked for an SL assignment in this case. This change resolves this issue. Signed-off-by: Thomas Sødring Signed-off-by: Hal Rosenstock diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c index 5dfe068..e5f751c 100644 --- a/osm/opensm/osm_ucast_lash.c +++ b/osm/opensm/osm_ucast_lash.c @@ -1468,6 +1468,7 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_ osm_port_t *p_src_port, osm_port_t *p_dst_port) { unsigned dst_id; + unsigned src_id; osm_switch_t *p_sw; if (p_osm->routing_engine.ucast_build_fwd_tables != lash_process) @@ -1482,6 +1483,10 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_ if (!p_sw || !p_sw->priv) return OSM_DEFAULT_SL; + src_id = get_lash_id(p_sw); + if (src_id == dst_id) + return OSM_DEFAULT_SL; + return (uint8_t)((switch_t *)p_sw->priv)->routing_table[dst_id].lane; } From ogerlitz at voltaire.com Thu Feb 8 06:35:35 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 08 Feb 2007 16:35:35 +0200 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <45C85B39.4080700@voltaire.com> References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> Message-ID: <45CB3537.8060508@voltaire.com> Or Gerlitz wrote: > Sean Hefty wrote: >>> Sean Hefty (3): >>> rdma_cm: Increment port number after close to avoid re-use. >>> ib_sa: track multicast join/leave requests >>> rdma_cm: add multicast communication support >> Assuming that you haven't look at this yet, I updated the ib_sa patch >> above to shorten the workqueue name, plus added a fourth patch to >> shorten the workqueue names for ib_addr and rdma_cm. E.g. "ib_mcast_wq" >> became "ib_mcast". > Roland, > We are working (developing and testing) with a userspace rdma cm based > multicast app over this code during the last two months and are very > satisfied with it. The testing included IPoIB, the user space app and > multicast interoperability between them. Roland, Can you comment on the status of this merge request? thanks, Or. From swise at opengridcomputing.com Thu Feb 8 06:57:19 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 08:57:19 -0600 Subject: [openib-general] more comments on cxgb3 In-Reply-To: <20070208064055.GR12140@mellanox.co.il> References: <20070208064055.GR12140@mellanox.co.il> Message-ID: <1170946639.3049.10.camel@stevo-desktop> On Thu, 2007-02-08 at 08:40 +0200, Michael S. Tsirkin wrote: > OK, so I looked at cxgb3 some more. Thanks! > To summarise my previous comments, I think the cxio hal layer needs to go to > make the code readable - if I understand correctly it is there for historical > reasons only. > I can do this but its low on the list of todos. > I started looking at userspace/kernel interaction, and then > went over to other code under cxgb3 (but not core/). > > - Consider a user that does e.g. create QP, but never calls mmap. > Is there some code that will clean out the unclamed mmap object? > I couldn't find it, and iwch_dealloc_ucontext does not seem to > do anything with it. > This is a bug. I've got a fix for it too. > - Passing physical address to userspace and back looks suspicios. > Especially this: > uresp.physaddr = virt_to_phys(chp->cq.queue); > Could you elaborate on the design here? What are these phy addresses > and how come userspace needs to know the phy address? > You are not doing DMA by this address, by any chance? > No, Its not used for DMA by the HW. The physaddr is passed up to the user, and the user then mmaps() using that as the offset. I took this code from the ipath driver. It has been pointed out to me that this is broken for 32b userspace on a 64 kernel because mmap2() cannot pass down 64 bits. So I need to rework this code. > - It seems that by passing in huge resource sizes, userspace will be able to > drink up unlimited amounts of kernel memory. > mthca handles this by using the mlock rlimit, should something be done here > as well? Hmm. That's a good point. I'll put this on the todo as well. So mthca adds to process's rlimit value as things are allocated out of kernel memory (or maybe even anything that gets pinned)? > > A couple of comments on PDBG macro. > - I'd like to suggest following the practice of prefixing macro names with module name > (same goes for functions like get_mhp really) - unless they are local to file. > > - You are using __FUNCTION__ a lot - it might be to just to kill it, > messages are unique so you'll be able to locate the msg source anyway, > save some kernel text and logs will be shorter. In any case I think > __func__ is the recommended gcc way to get the name currently. > > - comment near pr_debug definition in include/linux/kernel.h says: > /* If you are writing a driver, please use dev_dbg instead */ > so it might be a good idea for PDBG to follow this rule. > > - log messages do not look very informative to me. > I also think they are a bit too many of them currently. > For example, I do not think it is a good idea to print > the kernel pointers out. > > For example, in code like the following: > mhp = get_mhp(rhp, (sg_list[i].lkey) >> 8); > if (!mhp) { > PDBG("%s %d\n", __FUNCTION__, __LINE__); > return -EIO; > } > > might be better to say > "MR key XXX does not exist. Exiting.". > -EIO also looks like a strange error code to return here, does it not? > Maybe something like EINVAL would be more appropriate? > I'll take a todo to clean up the debug stuff. > - I wonder about the names like get_mhp - what does "mhp" mean? > static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid) > { > return idr_find(&rhp->mmidr, mmid); > } > Memory Handle Pointer. > Looks like it looks up an mr. Is that right? Maybe the name shouldbe changed > to convey this meaning. > > - In the following code, what does "missing pdid check" mean? > /* > * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. > */ > This sounds interesting. > Does this mean the code does not validate the PD currently? > I need firmware support for this. It will be done asap and I can remove this code entirely. > I have the same question for: > /* TBD: check perms */ > in iwch_bind_mw. > > BTW, does TBD stand for "To Be Done" here? Yes. > Do you mean TODO really? > > - iwch_sgl2pbl_map is used in several places, and seems a bit too big to be inline > > Well, it's time to go do my day job now :) > > Hope this helps, > Thanks again Michael! Steve. From monis at voltaire.com Thu Feb 8 07:00:07 2007 From: monis at voltaire.com (Moni Shoua) Date: Thu, 08 Feb 2007 17:00:07 +0200 Subject: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour In-Reply-To: <20070207182426.GB9131@mellanox.co.il> References: <45C9F1DE.8090409@voltaire.com> <20070207182426.GB9131@mellanox.co.il> Message-ID: <45CB3AF7.5000909@voltaire.com> Michael S. Tsirkin wrote: >> Quoting Moni Shoua : >> Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour >> >> >>> Another concern: assume that one device goes away (e.g. hotplug). >>> It seems that neighbours whose dev field point to another device, will not be destroyed. >>> Correct? >> I agree. >> >>> Therefore in your design, it seems that to_ipoib_neigh()->dev >>> will get us a pointer to device that has been removed already. >>> >> I agree that this is a problem. > > I think we can solve this if we track all ipoib neighbours, like we do for old kernels, > and then flush ipoib neighbours on any hotplug event. > Roland, does this sound too awful? > >> It think it would be best to prevent an IPoIB device >> from disappearing or from ib_ipoib from being unloaded as long as IPoIB >> device is a slave. Unfortunately, I don't see how this can be done just >> by fixing something in bonding or IPoIB. > > So hotplug is blocked potentially forever? > This does not sound good. OK, so I'm dropping this thought. > >> However, any slave knows he has a master (dev->master). >> What do you think about a solution where IPoIB first tries to clean up the >> neighbours that belong to it's master before deleting the IPoIB device? > > How? Let me know what do you think about that. I hope this makes sense. in IPoIB, before calling unregister_netdev do for each kernel neighbour n if n->dev == ib_dev->master delete n Michael, as I see it we have to deal with 2 cases. 1. IPoIB device is deleted (unregister_netdev) - IPoIB netdev in not in the kernel's address space. we have to make sure that no one holds a pointer to it after it is deleted. 2 ib_ipoib module is unloaded (modprobe -r) - the ipoib_neigh_destructor is not in the kernel's address space. we have to make sure no one calls to it after the module is unloaded. I think that if nothing prevents the execution of the "code" above it serves both cases. Do you see any problem with that? Do I have to maintain my own list of neighbours or use the kernel's arp table for that? I am trying to study the neighbour cleanup function and do something like that but I would be happy to learn from others as well. >>>> Furthermore, bond_setup_by_slave is called only for non >>>> Ethernet devices (we consider to change the logic to "called only for >>>> IPoIB devices just for safety). >>> Why is this necessary, BTW? >>> >> If we don't do that, we get a memory leak because the neigh destructor will >> never be called for non IPoIB devices although they carry ipoib_neigh >> with them. > > How can this happen? If it does, I think we are back to where we started: > to_ipoib_neigh is broken for non-IPoIB device. > I thought you said only devices of the same type can be paired? > > The scenario is: 1. kernel allocates a neighbour structure for bond0, puts it on a skb and passed it to bond xmit function. 2. bond0 passes the skb to ipoib 3. ipoib allocates ipoib_neigh and hangs it on linux neighbour. 4. a while after that, the kernel wants to destroy the neighbour (cleanup) but doesn't call ipoib_neigh_destructor because it the neigh setup registered the destructor for ibX device. From pw at osc.edu Thu Feb 8 07:24:09 2007 From: pw at osc.edu (Pete Wyckoff) Date: Thu, 8 Feb 2007 10:24:09 -0500 Subject: [openib-general] sharing qp between user and kernel In-Reply-To: References: <20070207223146.GA28637@osc.edu> Message-ID: <20070208152409.GC31079@osc.edu> rdreier at cisco.com wrote on Wed, 07 Feb 2007 15:50 -0800: > Pete> Before I dig into this anymore, do you expect this to work? > Pete> Are there fundamental problems with QP sharing between user > Pete> and kernel? It would sure be nice not to have to stick the > Pete> connection management aspects into the kernel. > > No, I wouldn't expect this to work. At first glance at least, yes, > there are fundamental problems. Sharing a QP between user and > kernelspace, where userspace is doing full kernel bypass (as eg mthca > does -- there are NO system calls when doing post work request, poll > CQ and request CQ notification operations), seems like a huge > problem. I don't see any way that the kernel can keep a consistent > view of the QP state unless userspace has to call into the kernel for > every operation, which would kill performance. My hope was not to allow full QP sharing between user and kernel, but just a limited interface to "send this kernel data now". It requires that the kernel register some physical memory, and submit the work requests. Perhaps the kernel can invoke the equivalent of the userspace post function instead of trying to use the kernel API for sending. Thanks to all for the comments. We'll keep working with non-bypass devices to see if the approach offers any advantages first. -- Pete From mst at mellanox.co.il Thu Feb 8 07:29:47 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Feb 2007 17:29:47 +0200 Subject: [openib-general] [PATCH] IB/ipoib_cm: fix up issues from code review In-Reply-To: References: Message-ID: <20070208152947.GA6560@mellanox.co.il> The following lightly tested patch addresses Roland's comments on IPoIB CM. Applies on top of PATCHv6: - Randomise RQ PSN - Fix for modular IPv6 - MTU endian-ness fix for ICMPs - Cosmetics Signed-off-by: Michael S. Tsirkin --- Roland, do you want me to report the full fixed-up patch instead? Pls let me know when IPoIB CM is in for-2.6.21, I'll switch to that for my testing. diff --git a/drivers/infiniband/ulp/ipoib/Kconfig b/drivers/infiniband/ulp/ipoib/Kconfig index 0ffca11..af78ccc 100644 --- a/drivers/infiniband/ulp/ipoib/Kconfig +++ b/drivers/infiniband/ulp/ipoib/Kconfig @@ -1,6 +1,6 @@ config INFINIBAND_IPOIB tristate "IP-over-InfiniBand" - depends on INFINIBAND && NETDEVICES && INET + depends on INFINIBAND && NETDEVICES && INET && (IPV6 || IPV6=n) ---help--- Support for the IP-over-InfiniBand protocol (IPoIB). This transports IP packets over InfiniBand so you can use your IB diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 8082d50..eb885ee 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -127,7 +127,6 @@ struct ipoib_tx_buf { u64 mapping; }; -#ifdef CONFIG_INFINIBAND_IPOIB_CM struct ib_cm_id; struct ipoib_cm_data { @@ -181,7 +180,6 @@ struct ipoib_cm_dev_priv { struct ib_recv_wr rx_wr; }; -#endif /* * Device private locking: tx_lock protects members used in TX fast * path (and we use LLTX so upper layers don't do extra locking). diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index e7e7cc0..8ee6f06 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -37,7 +37,7 @@ #include #include -#ifdef CONFIG_IPV6 +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) #include #endif @@ -170,7 +170,8 @@ static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev, } static int ipoib_cm_modify_rx_qp(struct net_device *dev, - struct ib_cm_id *cm_id, struct ib_qp *qp) + struct ib_cm_id *cm_id, struct ib_qp *qp, + unsigned psn) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; @@ -193,7 +194,7 @@ static int ipoib_cm_modify_rx_qp(struct net_device *dev, ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret); return ret; } - qp_attr.rq_psn = 0 /* FIXME */; + qp_attr.rq_psn = psn; ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); if (ret) { ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret); @@ -203,7 +204,8 @@ static int ipoib_cm_modify_rx_qp(struct net_device *dev, } static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id, - struct ib_qp *qp, struct ib_cm_req_event_param *req) + struct ib_qp *qp, struct ib_cm_req_event_param *req, + unsigned psn) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_cm_data data = {}; @@ -219,7 +221,7 @@ static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id, rep.target_ack_delay = 20; /* FIXME */ rep.srq = 1; rep.qp_num = qp->qp_num; - rep.starting_psn = 0 /* FIXME */; + rep.starting_psn = psn; return ib_send_cm_rep(cm_id, &rep); } @@ -229,6 +231,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_cm_rx *p; unsigned long flags; + unsigned psn; int ret; ipoib_dbg(priv, "REQ arrived\n"); @@ -243,11 +246,12 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even goto err_qp; } - ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp); + psn = random32() & 0xffffff; + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); if (ret) goto err_modify; - ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd); + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { ipoib_warn(priv, "failed to send REP: %d\n", ret); goto err_rep; @@ -742,7 +746,7 @@ static int ipoib_cm_send_req(struct net_device *dev, req.retry_count = 0; /* RFC draft warns against retries */ req.rnr_retry_count = 0; /* RFC draft warns against retries */ req.max_cm_retries = 15; - req.srq = 15; + req.srq = 1; return ib_send_cm_req(id, &req); } @@ -1041,7 +1045,7 @@ static void ipoib_cm_skb_reap(struct work_struct *work) struct sk_buff *skb; unsigned long flags; - __be32 mtu = cpu_to_be32(priv->mcast_mtu); + unsigned mtu = priv->mcast_mtu; spin_lock_irqsave(&priv->tx_lock, flags); spin_lock(&priv->lock); @@ -1050,7 +1054,7 @@ static void ipoib_cm_skb_reap(struct work_struct *work) spin_unlock_irqrestore(&priv->tx_lock, flags); if (skb->protocol == htons(ETH_P_IP)) icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); -#ifdef CONFIG_IPV6 +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) else if (skb->protocol == htons(ETH_P_IPV6)) icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); #endif -- MST From swise at opengridcomputing.com Thu Feb 8 07:32:11 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 09:32:11 -0600 Subject: [openib-general] sharing qp between user and kernel In-Reply-To: <20070208152409.GC31079@osc.edu> References: <20070207223146.GA28637@osc.edu> <20070208152409.GC31079@osc.edu> Message-ID: <1170948731.3049.24.camel@stevo-desktop> On Thu, 2007-02-08 at 10:24 -0500, Pete Wyckoff wrote: > rdreier at cisco.com wrote on Wed, 07 Feb 2007 15:50 -0800: > > Pete> Before I dig into this anymore, do you expect this to work? > > Pete> Are there fundamental problems with QP sharing between user > > Pete> and kernel? It would sure be nice not to have to stick the > > Pete> connection management aspects into the kernel. > > > > No, I wouldn't expect this to work. At first glance at least, yes, > > there are fundamental problems. Sharing a QP between user and > > kernelspace, where userspace is doing full kernel bypass (as eg mthca > > does -- there are NO system calls when doing post work request, poll > > CQ and request CQ notification operations), seems like a huge > > problem. I don't see any way that the kernel can keep a consistent > > view of the QP state unless userspace has to call into the kernel for > > every operation, which would kill performance. > > My hope was not to allow full QP sharing between user and kernel, > but just a limited interface to "send this kernel data now". It > requires that the kernel register some physical memory, and submit > the work requests. Perhaps the kernel can invoke the equivalent of > the userspace post function instead of trying to use the kernel API > for sending. > You could map the kernel data into the user process and then have the user process post the WR. But the user process would have to have that memory registered as part of a MR to post it. It could be done though. So basically instead of sharing QP, give your kernel module access memory from a registered MR. The kernel module can produce the data in that memory then tell the user process to post the WR... Steve From mst at mellanox.co.il Thu Feb 8 07:34:10 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Feb 2007 17:34:10 +0200 Subject: [openib-general] more comments on cxgb3 In-Reply-To: <1170946639.3049.10.camel@stevo-desktop> References: <1170946639.3049.10.camel@stevo-desktop> Message-ID: <20070208153410.GB6560@mellanox.co.il> > > - It seems that by passing in huge resource sizes, userspace will be able to > > drink up unlimited amounts of kernel memory. > > mthca handles this by using the mlock rlimit, should something be done here > > as well? > > Hmm. That's a good point. I'll put this on the todo as well. So mthca > adds to process's rlimit value as things are allocated out of kernel > memory (or maybe even anything that gets pinned)? Yes. The code is actually in uverbs core, mthca uses that. > > - I wonder about the names like get_mhp - what does "mhp" mean? > > static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid) > > { > > return idr_find(&rhp->mmidr, mmid); > > } > > > > Memory Handle Pointer. hmm, what's a Handle? Maybe a better name can be found. -- MST From Arkady.Kanevsky at netapp.com Thu Feb 8 07:43:16 2007 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Thu, 8 Feb 2007 10:43:16 -0500 Subject: [openib-general] dapl broken for iWARP Message-ID: That is correct. I am working with Krishna on it. Expect patches soon. By the way the problem is not DAPL specific and so is a proposed solution. There are 3 aspects of the solution. One is APIs. We suggest that we do not augment these. That is a connection requestor sets its QP RDMA ORD and IRD. When connection is established user can check the QP RDMA ORD and IRD to see what he has now to use over the connection. We may consider to extend QP attributes to support transport specific parameters passing in the future. For example, iWARP MPA CRC request. Second is the semantic that CM provides. The proposal is to match IBCM semantic. That is CM guarantee that local IRD is >= remote ORD. This guarantees that incoming RDMA Read requests will not overwhelm the QP RDMA Read capabilities. Again there is not changes to IBCM only to IWCM. Notice that as part of this IWCM will pass down to driver and extract from driver needed info. The final part is iWARP CM extension to exchange RDMA ORD, IRD. This is similar to IBTA Annex for IP Addressing. The harder part that this will eventually require IETF MPA spec extension, and the fact that MPA protocol is implemented in RNIC HW by many vendors, and hence can not be done by IWCM itself. Thanks, Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Wednesday, February 07, 2007 6:12 PM > To: Arlin Davis > Cc: openib-general > Subject: Re: [openib-general] dapl broken for iWARP > > On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote: > > Steve Wise wrote: > > > > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > > > > > > > >>Arlin, > > >> > > >>The OFED dapl code is assuming the responder_resources and > > >>initiator_depth passed up on a connection request event > are from the > > >>remote peer. This doesn't happen for iWARP. In the > current iWARP > > >>specifications, its up to the application to exchange this > > >>information somehow. So these are defaulting to 0 on the > server side > > >>of any dapl connection over iWARP. > > >> > > >>This is a fairly recent change, I think. We need to come up with > > >>some way to deal with this for OFED 1.2 IMO. > > >> > > >> > > Yes, this was changed recently to sync up with the rdma_cm changes > > that exposed the values. > > > > >> > > >> > > > > > >The IWCM could set these to the device max values for instance. > > > > > > > > That would work fine as long as you know the remote > settings will be > > equal or better. The provider just sets the min of local device max > > values and the remote values provided with the request. > > > > I know Krishna Kumar is working on a solution for exchanging > this info in private data so the IWCM can "do the right > thing". Stay tuned for a patch series to review for this. > But this functionality is definitely post OFED-1.2. > > > So for the OFED-1.2, I will set these to the device max in the IWCM. > Assuming the other side is OFED 1.2 DAPL, then it will work fine. > > Steve. > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Thu Feb 8 07:49:16 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 09:49:16 -0600 Subject: [openib-general] more comments on cxgb3 In-Reply-To: <20070208064055.GR12140@mellanox.co.il> References: <20070208064055.GR12140@mellanox.co.il> Message-ID: <1170949756.3049.32.camel@stevo-desktop> > - Consider a user that does e.g. create QP, but never calls mmap. > Is there some code that will clean out the unclamed mmap object? > I couldn't find it, and iwch_dealloc_ucontext does not seem to > do anything with it. BTW: Here is my fix for this. ----- Clean up pending mmaps on ucontext deallocation. From: Steve Wise Free all pending mmap structs when the ucontext is deallocated. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 1 + drivers/infiniband/hw/cxgb3/iwch_provider.h | 15 +++++++++++++++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index db2b0a8..98568ee 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -99,6 +99,7 @@ static int iwch_dealloc_ucontext(struct struct iwch_dev *rhp = to_iwch_dev(context->device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); PDBG("%s context %p\n", __FUNCTION__, context); + free_mmaps(ucontext); cxio_release_ucontext(&rhp->rdev, &ucontext->uctx); kfree(ucontext); return 0; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 1ede8a7..c8c07ee 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -199,6 +199,21 @@ struct iwch_mm_entry { unsigned len; }; +static inline void free_mmaps(struct iwch_ucontext *ucontext) +{ + struct list_head *pos, *nxt; + struct iwch_mm_entry *mm; + + spin_lock(&ucontext->mmap_lock); + list_for_each_safe(pos, nxt, &ucontext->mmaps) { + mm = list_entry(pos, struct iwch_mm_entry, entry); + list_del(&mm->entry); + kfree(mm); + } + spin_unlock(&ucontext->mmap_lock); + return; +} + static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext, u64 addr, unsigned len) { From ossrosch at linux.vnet.ibm.com Thu Feb 8 09:20:10 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Thu, 8 Feb 2007 18:20:10 +0100 Subject: [openib-general] [PATCH ofed-1.2] libehca: fix build error with disable-libcheck option Message-ID: <200702081820.10992.ossrosch@linux.vnet.ibm.com> This patch fix libehca build errors if disable-libcheck option is choosen. Signed-off-by: Stefan Roscher --- configure.in | 3 +-- src/ehca_uinit.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff -Nurp libehca_old/configure.in libehca_new/configure.in --- libehca_old/configure.in 2007-02-08 17:42:09.000000000 +0100 +++ libehca_new/configure.in 2007-02-08 17:18:20.000000000 +0100 @@ -28,6 +28,7 @@ AC_CHECK_LIB(ibverbs, dnl Checks for header files. AC_CHECK_HEADER(infiniband/driver.h, [], AC_MSG_ERROR([ not found. libehca requires libibverbs.])) +fi dnl Checks for library functions AC_CHECK_FUNCS(ibv_read_sysfs_file ibv_register_driver) @@ -43,7 +44,6 @@ rm -f $dummy.c AM_CONDITIONAL(HAVE_IBV_DEVICE_LIBRARY_EXTENSION, test $IBV_DEVICE_LIBRARY_EXTENSION != IBV_DEVICE_LIBRARY_EXTENSION) AC_SUBST(IBV_DEVICE_LIBRARY_EXTENSION) -fi dnl Checks for programs. AC_PROG_CC @@ -55,4 +55,3 @@ if test "$disable_libcheck" == "yes" then echo "#define HAVE_IBV_READ_SYSFS_FILE 1" >> config.h fi - diff -Nurp libehca_old/src/ehca_uinit.c libehca_new/src/ehca_uinit.c --- libehca_old/src/ehca_uinit.c 2007-02-08 17:42:09.000000000 +0100 +++ libehca_new/src/ehca_uinit.c 2007-02-08 17:18:20.000000000 +0100 @@ -55,7 +55,7 @@ #include #endif -#ifdef HAVE_SYSFS_LIBSYSFS_H +#ifndef HAVE_IBV_REGISTER_DRIVER #include #endif From ossrosch at linux.vnet.ibm.com Thu Feb 8 09:32:14 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Thu, 8 Feb 2007 18:32:14 +0100 Subject: [openib-general] [PATCH ofed-1.2] ofa_user.spec: fix libehca directory structure Message-ID: <200702081832.14862.ossrosch@linux.vnet.ibm.com> Correct directory structure according to new driver loading scheme from libibverbs Signed-off-by: Stefan Roscher --- --- ofa_user.spec_old 2007-02-08 09:03:33.000000000 -0800 +++ ofa_user.spec_new 2007-02-08 09:07:32.000000000 -0800 @@ -693,11 +693,11 @@ touch libosmvendor-devel-files /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/ipathverbs*.a | sed -e "s@$RPM_BUILD_ROOT@@g" >> libipathverbs-devel-files fi - if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/infiniband/libehca*so* > /dev/null 2>&1 ); then - /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/infiniband/libehca*so* | sed -e "s@$RPM_BUILD_ROOT@@g" > libehca-files + if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/libehca*so* > /dev/null 2>&1 ); then + /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/libehca*so* | sed -e "s@$RPM_BUILD_ROOT@@g" > libehca-files fi - if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/infiniband/libehca*.a > /dev/null 2>&1 ); then - /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/infiniband/libehca*.a | sed -e "s@$RPM_BUILD_ROOT@@g" >> libehca-devel-files + if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/libehca*.a > /dev/null 2>&1 ); then + /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/libehca*.a | sed -e "s@$RPM_BUILD_ROOT@@g" >> libehca-devel-files fi if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/libibcommon*so.* > /dev/null 2>&1 ); then @@ -1165,14 +1165,14 @@ fi %if %{build_libehca} %files -n libehca -f libehca-files %defattr(-,root,root,-) -%{_libdir}/infiniband/libehca*.so* +%{_libdir}/libehca*.so* # %doc AUTHORS COPYING ChangeLog README %endif %if %{build_libehca_devel} %files -n libehca-devel -f libehca-devel-files %defattr(-,root,root,-) -%{_libdir}/infiniband/libehca*.a +%{_libdir}/libehca*.a %endif %if %{build_libsdp} From sashak at voltaire.com Thu Feb 8 09:45:53 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 8 Feb 2007 19:45:53 +0200 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] In-Reply-To: <45B33135.4010606@dev.mellanox.co.il> References: <45B33135.4010606@dev.mellanox.co.il> Message-ID: <20070208174553.GT22807@sashak.voltaire.com> On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > Tzachi, Yossi, please join the thread. > What do you think about distributing a copy of the pthread DLL > with opensm? Any news here? Thanks. Sasha > > -- Yevgeny. > > -------- Original Message -------- > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes] > Date: Fri, 19 Jan 2007 00:20:32 +0200 > From: Sasha Khapyorsky > To: Michael S. Tsirkin > CC: Yevgeny Kliteynik , OPENIB > References: <20070118194403.GA23783 at sashak.voltaire.com> <20070118215023.GP9890 at mellanox.co.il> > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > Quoting Sasha Khapyorsky : > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes] > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > What about pure opensource - http://sourceware.org/pthreads-win32/? It > > > > > is licensed under LGPL, I see on the net many positive reports about > > > > > stability and usability. > > > > > > > > I used it to do a windows port of linux complib at some point and opensm > > > > seemed to work fine with it. What it was lacking at that point was > > > > support for 64 bit applications, and for some reason (which is > > > > still unclear to me) there was a strong desire to run opensm in 64 bit mode. > > > > Seems to have been fixed now, BTW. > > > > > > So this seems to be good option for OpenSM on Windows. Right? > > > > No idea. Distributing a copy of the pthread DLL with opensm does not > > look like a problem. But is it worth it? > > Sure, it makes windows porting much more transparent and let us to > use standard *nix stuff w/out #ifndef WIN32. Other (generic) benefit > is that posix is more standard and powerful than wrappers like complib. > > Sasha > From Kosygin'sHalifax at visionelectronics.com Thu Feb 8 08:00:04 2007 From: Kosygin'sHalifax at visionelectronics.com (Zmeer Jawad) Date: Thu, 8 Feb 2007 18:20:04 +0180 Subject: [openib-general] Aunt's complaint turned 'vagina' into 'hoohaa' Message-ID: Good tidings from QCPC give you the real alternative to hit the jackpot. QCPC is a company with far-sighted aims. Company strategy is to diversify within the power supply marketplace and build strong, niche oriented operations around the globe. QCPC take a long-term view of business, focusing on growth and overall development of our subsidiaries in future. Company has chosen one of solar power producer. As you know oil prices are rising higer and higer! A great amount of electricity generating plants uses oil-products. May be your domestic electrical power supplier or heat register works by using oil-products. Modern technologies of solar extraction also are very effective in bad light or sun and the accumulating energy can be saved inside special batteries. That is why we can talk about full energy-independent house. In century of high technologies we can't imagine town life without energy. A lot of states in our country has enough reserve of solar power to generate needed electric energy. More over the President realizes the important role of this policy and allocated $1 Billion to Renewable Energy. U.S. Department of Energy FY 08 budget includes $179 million for the President's Initiative. Particularly Solar America Initiative - $148 million; QCPC is at the right time and place now. They have chosen promising line of activity for your share investments and we negotiated a contract with Samlex America, which has manufactured and distributed power supply products to more than 90 countries worldwide since 1991. Pathfinding product designs, strict quality control, and responsible after sales service provide customers with high quality power conversion products at extremely competitive prices. Because of this news the prices of the QCPC stocks are about to grow up. QCPC's financial condition is stable now. An overall market and economic conditions are also can better affect the performance of the QCPC's shares. From mshefty at ichips.intel.com Thu Feb 8 10:23:11 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Feb 2007 10:23:11 -0800 Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when appropriate for unicast packets In-Reply-To: <1170894459.31538.23768.camel@hal.voltaire.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> Message-ID: <45CB6A8F.2030705@ichips.intel.com> >>The active side clearly cannot learn what the SLID of the passive >>side's router should be. >> >>We don't want to have the routers snoop and alter CM GMPs. >> >>The passive side cannot use information from the LRH to get the router >>LID since the LRH may not be reversible. >> >>The only option seems to be to have the passive side do a path record >>query on a SGID in the CM REQ... >> >>This is a spec problem unfortunately. > > > Yes and I would expect that this would be changed. Looking at the problem more, I think that the issue extends to the remote port LID as well. My expectation with a local path record query is that the SLID is the local port, and the DLID is the local router. This should be sufficient for one-way UD traffic, but for connected traffic we still need to discover the remote router and remote port LIDs. I think that we need a way for the local node to query the remote SA to obtain this information. Or we need a new path record for routable paths that includes this information. - Sean From tzachid at mellanox.co.il Thu Feb 8 10:31:17 2007 From: tzachid at mellanox.co.il (Tzachi Dar) Date: Thu, 8 Feb 2007 20:31:17 +0200 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] Message-ID: <6C2C79E72C305246B504CBA17B5500C9C41DBF@mtlexch01.mtl.com> The windows open IB has decided on using a BSD only license. The common implementation of pthreads as far as I know is LGPL, which means that it can not be used in open IB. The only two ways that I see around this are 1) Change the license of open IB windows which might be a complicated thing. 2) Find an implementation of pthreads that is BSD. Thanks Tzachi > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Thursday, February 08, 2007 7:46 PM > To: Tzachi Dar; Yossi Leybovich > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > opensm: sigusr1: syslog() fixes]] > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > > Tzachi, Yossi, please join the thread. > > What do you think about distributing a copy of the pthread DLL with > > opensm? > > Any news here? Thanks. > > Sasha > > > > > -- Yevgeny. > > > > -------- Original Message -------- > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > syslog() fixes] > > Date: Fri, 19 Jan 2007 00:20:32 +0200 > > From: Sasha Khapyorsky > > To: Michael S. Tsirkin > > CC: Yevgeny Kliteynik , > OPENIB > > References: <20070118194403.GA23783 at sashak.voltaire.com> > > <20070118215023.GP9890 at mellanox.co.il> > > > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > Quoting Sasha Khapyorsky : > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > > > syslog() fixes] > > > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > What about pure opensource - > > > > > > http://sourceware.org/pthreads-win32/? It is licensed under > > > > > > LGPL, I see on the net many positive reports about > stability and usability. > > > > > > > > > > I used it to do a windows port of linux complib at some point > > > > > and opensm seemed to work fine with it. What it was > lacking at > > > > > that point was support for 64 bit applications, and for some > > > > > reason (which is still unclear to me) there was a > strong desire to run opensm in 64 bit mode. > > > > > Seems to have been fixed now, BTW. > > > > > > > > So this seems to be good option for OpenSM on Windows. Right? > > > > > > No idea. Distributing a copy of the pthread DLL with > opensm does not > > > look like a problem. But is it worth it? > > > > Sure, it makes windows porting much more transparent and > let us to use > > standard *nix stuff w/out #ifndef WIN32. Other (generic) benefit is > > that posix is more standard and powerful than wrappers like complib. > > > > Sasha > > > From jgunthorpe at obsidianresearch.com Thu Feb 8 11:08:09 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Thu, 8 Feb 2007 12:08:09 -0700 Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when appropriate for unicast packets In-Reply-To: <45CB6A8F.2030705@ichips.intel.com> References: <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> Message-ID: <20070208190809.GL11411@obsidianresearch.com> On Thu, Feb 08, 2007 at 10:23:11AM -0800, Sean Hefty wrote: > >>The active side clearly cannot learn what the SLID of the passive > >>side's router should be. > >> > >>We don't want to have the routers snoop and alter CM GMPs. > >> > >>The passive side cannot use information from the LRH to get the router > >>LID since the LRH may not be reversible. > >> > >>The only option seems to be to have the passive side do a path record > >>query on a SGID in the CM REQ... > >> > >>This is a spec problem unfortunately. > > > > > >Yes and I would expect that this would be changed. > > Looking at the problem more, I think that the issue extends to the remote > port LID as well. My expectation with a local path record query is that > the SLID is the local port, and the DLID is the local router. This should > be sufficient for one-way UD traffic, but for connected traffic we still > need to discover the remote router and remote port LIDs. Hum, you mean to meet the LID validation rules of 9.6.1.5? That is a huge PITA.. [IMHO, 9.6.1.5 C9-54 is a mistake, if there is a GRH then the LRH.SLID should not be validated against the QP context since it makes it extra hard for multipath routing and QoS to work...] Here is one thought on how to do this: To meet this rule each side of the CM must take the SLID from the incoming LRH as the DLID for the connection. This SLID will be one of the SLIDs for the local router. The other side doesn't need to know what it is. The passive side will get the router SLID from the REQ and the active side gets it from the ACK. The passive side is easy, it just path record queries the DGID and requests the DLID == the incoming LRH.SLID. The nasty problem is with the active side - CMA will select a router lid it uses as the DLID and the router may select a different LID for it to use as the SLID when it processes the ACK. By C9-54 they have to be the same :< So the active side might have to do another path record query to move its DLID and SL to match the routers choosen SLID. Double suck :P Overarching all of this is some mechanism where the SM and all the routers collaborate to keep the router SLID the same for the duration of every RC flow. (One simple way would be to have the SM encode the SLID it wants to router to pick in the Flow Label or TClass..) Suck. Another idea would be to encode the local router SLID in the flow label and have the CM exchange and use asymetric flow labels.. That would move control over SL selection into the endpoints and remove the possible 2nd pathrecord query from the active side - but I haven't looked if CM can exchange flow labels in the ACK.. > I think that we need a way for the local node to query the remote SA to > obtain this information. Or we need a new path record for routable paths > that includes this information. Being able to query doesn't really help matters since you still can't tell the router what SLID to use.. The main idea is that the router lid is only useful to the endpoint on the same subnet so there is no reason to make the non-local side fetch it. Jason From sashak at voltaire.com Thu Feb 8 11:46:56 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 8 Feb 2007 21:46:56 +0200 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9C41DBF@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9C41DBF@mtlexch01.mtl.com> Message-ID: <20070208194656.GV22807@sashak.voltaire.com> On 20:31 Thu 08 Feb , Tzachi Dar wrote: > The windows open IB has decided on using a BSD only license. > The common implementation of pthreads as far as I know is LGPL, which > means that it can not be used in open IB. Why not? AFAIK it works perfectly (see (5,6 and Preamble)): http://www.gnu.org/copyleft/lesser.html And of course there are tons of examples when BSD software links against LGPLed glibc. > The only two ways that I see around this are 1) Change the license of > open IB windows which might be a complicated thing. 2) Find an > implementation of pthreads that is BSD. BTW, just wondering... What is relation between windows open IB and OFA (and OFA's "dual-license rule")? Sasha > > Thanks > Tzachi > > > -----Original Message----- > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > Sent: Thursday, February 08, 2007 7:46 PM > > To: Tzachi Dar; Yossi Leybovich > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > opensm: sigusr1: syslog() fixes]] > > > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > > > Tzachi, Yossi, please join the thread. > > > What do you think about distributing a copy of the pthread DLL with > > > opensm? > > > > Any news here? Thanks. > > > > Sasha > > > > > > > > -- Yevgeny. > > > > > > -------- Original Message -------- > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > > syslog() fixes] > > > Date: Fri, 19 Jan 2007 00:20:32 +0200 > > > From: Sasha Khapyorsky > > > To: Michael S. Tsirkin > > > CC: Yevgeny Kliteynik , > > OPENIB > > > References: <20070118194403.GA23783 at sashak.voltaire.com> > > > <20070118215023.GP9890 at mellanox.co.il> > > > > > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > Quoting Sasha Khapyorsky : > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > > > > syslog() fixes] > > > > > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > What about pure opensource - > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed under > > > > > > > LGPL, I see on the net many positive reports about > > stability and usability. > > > > > > > > > > > > I used it to do a windows port of linux complib at some point > > > > > > and opensm seemed to work fine with it. What it was > > lacking at > > > > > > that point was support for 64 bit applications, and for some > > > > > > reason (which is still unclear to me) there was a > > strong desire to run opensm in 64 bit mode. > > > > > > Seems to have been fixed now, BTW. > > > > > > > > > > So this seems to be good option for OpenSM on Windows. Right? > > > > > > > > No idea. Distributing a copy of the pthread DLL with > > opensm does not > > > > look like a problem. But is it worth it? > > > > > > Sure, it makes windows porting much more transparent and > > let us to use > > > standard *nix stuff w/out #ifndef WIN32. Other (generic) benefit is > > > that posix is more standard and powerful than wrappers like complib. > > > > > > Sasha > > > > > From rdreier at cisco.com Thu Feb 8 11:47:22 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Feb 2007 11:47:22 -0800 Subject: [openib-general] more comments on cxgb3 In-Reply-To: <1170949756.3049.32.camel@stevo-desktop> (Steve Wise's message of "Thu, 08 Feb 2007 09:49:16 -0600") References: <20070208064055.GR12140@mellanox.co.il> <1170949756.3049.32.camel@stevo-desktop> Message-ID: > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index db2b0a8..98568ee 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -99,6 +99,7 @@ static int iwch_dealloc_ucontext(struct > struct iwch_dev *rhp = to_iwch_dev(context->device); > struct iwch_ucontext *ucontext = to_iwch_ucontext(context); > PDBG("%s context %p\n", __FUNCTION__, context); > + free_mmaps(ucontext); > cxio_release_ucontext(&rhp->rdev, &ucontext->uctx); > kfree(ucontext); > return 0; > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h > index 1ede8a7..c8c07ee 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h > @@ -199,6 +199,21 @@ struct iwch_mm_entry { > unsigned len; > }; > > +static inline void free_mmaps(struct iwch_ucontext *ucontext) > +{ > + struct list_head *pos, *nxt; > + struct iwch_mm_entry *mm; > + > + spin_lock(&ucontext->mmap_lock); > + list_for_each_safe(pos, nxt, &ucontext->mmaps) { > + mm = list_entry(pos, struct iwch_mm_entry, entry); > + list_del(&mm->entry); > + kfree(mm); > + } > + spin_unlock(&ucontext->mmap_lock); > + return; > +} Since you only have one caller, I would suggest just open-coding the deletion at the call-site (since that function is really too big to inline if it ever grows another caller). And I don't think you need the locking either, since there better be no one else looking at the context structure while you're in the process of freeing it. Something like: struct iwch_dev *rhp = to_iwch_dev(context->device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); struct iwch_mm_entry *mm, *tmp; PDBG("%s context %p\n", __FUNCTION__, context); list_for_each_entry_safe(mm, tmp, &ucontext->mmaps) kfree(mm); cxio_release_ucontext(&rhp->rdev, &ucontext->uctx); kfree(ucontext); return 0; - R. From sean.hefty at intel.com Thu Feb 8 11:54:38 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 8 Feb 2007 11:54:38 -0800 Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when appropriate for unicast packets In-Reply-To: <20070208190809.GL11411@obsidianresearch.com> Message-ID: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com> >Hum, you mean to meet the LID validation rules of 9.6.1.5? That is a >huge PITA.. > >[IMHO, 9.6.1.5 C9-54 is a mistake, if there is a GRH then the LRH.SLID > should not be validated against the QP context since it makes it > extra hard for multipath routing and QoS to work...] Yes - this gets messy. >Here is one thought on how to do this: >To meet this rule each side of the CM must take the SLID from >the incoming LRH as the DLID for the connection. This SLID will be >one of the SLIDs for the local router. The other side doesn't need to >know what it is. The passive side will get the router SLID from the >REQ and the active side gets it from the ACK. > >The passive side is easy, it just path record queries the DGID and >requests the DLID == the incoming LRH.SLID. This requires that the passive side be able to issue path record queries, but I think that it could work for static routes. A point was made to me that the remote side could be a TCA without query capabilities. There's still the issue of what value is carried in the remote port LID in the CM REQ (12.7.21), and I haven't even gotten to APM yet... >The nasty problem is with the active side - CMA will select a router >lid it uses as the DLID and the router may select a different LID for >it to use as the SLID when it processes the ACK. By C9-54 they have to >be the same :< So the active side might have to do another path record >query to move its DLID and SL to match the routers choosen >SLID. Double suck :P As long as the SA and local routers are in sync, we may be okay here without a second path record query. - Sean From hnguyen at linux.vnet.ibm.com Thu Feb 8 12:00:51 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Thu, 8 Feb 2007 21:00:51 +0100 Subject: [openib-general] [PATCH ofed-1.2] libehca: fix build error with disable-libcheck option In-Reply-To: <200702081820.10992.ossrosch@linux.vnet.ibm.com> References: <200702081820.10992.ossrosch@linux.vnet.ibm.com> Message-ID: <200702082100.51705.hnguyen@linux.vnet.ibm.com> > This patch fix libehca build errors if disable-libcheck option is choosen. Applied From swise at opengridcomputing.com Thu Feb 8 12:26:38 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 14:26:38 -0600 Subject: [openib-general] [PATCH 2/5] No need to disable interrupts for mmap locking. In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int> References: <20070208202634.4382.15287.stgit@dell3.ogc.int> Message-ID: <20070208202638.4382.98241.stgit@dell3.ogc.int> From: Steve Wise Lock mmap_lock is never taken from non-process context, so just use bare spin_lock()/spin_unlock(). Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.h | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index a8cfeaf..1ede8a7 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -205,29 +205,29 @@ static inline struct iwch_mm_entry *remo struct list_head *pos, *nxt; struct iwch_mm_entry *mm; - spin_lock_irq(&ucontext->mmap_lock); + spin_lock(&ucontext->mmap_lock); list_for_each_safe(pos, nxt, &ucontext->mmaps) { mm = list_entry(pos, struct iwch_mm_entry, entry); if (mm->addr == addr && mm->len == len) { list_del_init(&mm->entry); - spin_unlock_irq(&ucontext->mmap_lock); + spin_unlock(&ucontext->mmap_lock); PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, mm->len); return mm; } } - spin_unlock_irq(&ucontext->mmap_lock); + spin_unlock(&ucontext->mmap_lock); return NULL; } static inline void insert_mmap(struct iwch_ucontext *ucontext, struct iwch_mm_entry *mm) { - spin_lock_irq(&ucontext->mmap_lock); + spin_lock(&ucontext->mmap_lock); PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, mm->len); list_add_tail(&mm->entry, &ucontext->mmaps); - spin_unlock_irq(&ucontext->mmap_lock); + spin_unlock(&ucontext->mmap_lock); } enum iwch_qp_attr_mask { From swise at opengridcomputing.com Thu Feb 8 12:26:34 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 14:26:34 -0600 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes Message-ID: <20070208202634.4382.15287.stgit@dell3.ogc.int> Here are some fixes to address various comments from Michael and Roland. This is _not_ for ofed_1_2, but rather for merging into 2.6.21. Steve. From swise at opengridcomputing.com Thu Feb 8 12:26:40 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 14:26:40 -0600 Subject: [openib-general] [PATCH 3/5] Clean up pending mmaps on ucontext deallocation. In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int> References: <20070208202634.4382.15287.stgit@dell3.ogc.int> Message-ID: <20070208202640.4382.90592.stgit@dell3.ogc.int> From: Steve Wise Free all pending mmap structs when the ucontext is deallocated. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index db2b0a8..85484ac 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -98,7 +98,11 @@ static int iwch_dealloc_ucontext(struct { struct iwch_dev *rhp = to_iwch_dev(context->device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); + struct iwch_mm_entry *mm, *tmp; + PDBG("%s context %p\n", __FUNCTION__, context); + list_for_each_entry_safe(mm, tmp, &ucontext->mmaps, entry) + kfree(mm); cxio_release_ucontext(&rhp->rdev, &ucontext->uctx); kfree(ucontext); return 0; From swise at opengridcomputing.com Thu Feb 8 12:26:42 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 14:26:42 -0600 Subject: [openib-general] [PATCH 4/5] Get rid of static rdev table. In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int> References: <20070208202634.4382.15287.stgit@dell3.ogc.int> Message-ID: <20070208202642.4382.43612.stgit@dell3.ogc.int> From: Steve Wise Use a liked list. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 57 +++++++++------------------ drivers/infiniband/hw/cxgb3/core/cxio_hal.h | 2 - 2 files changed, 19 insertions(+), 40 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c index 2c4e351..acffe16 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c @@ -43,49 +43,28 @@ #include "cxio_hal.h" #include "cxgb3_offload.h" #include "sge_defs.h" -static struct cxio_rdev *rdev_tbl[T3_MAX_NUM_RNIC]; +static LIST_HEAD(rdev_list); static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) { - int i; - for (i = 0; i < T3_MAX_NUM_RNIC; i++) - if (rdev_tbl[i]) - if (!strcmp(rdev_tbl[i]->dev_name, dev_name)) - return rdev_tbl[i]; + struct cxio_rdev *rdev; + + list_for_each_entry(rdev, &rdev_list, entry) + if (!strcmp(rdev->dev_name, dev_name)) + return rdev; return NULL; } static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev) { - int i; - for (i = 0; i < T3_MAX_NUM_RNIC; i++) - if (rdev_tbl[i]) - if (rdev_tbl[i]->t3cdev_p == tdev) - return rdev_tbl[i]; - return NULL; -} - -static inline int cxio_hal_add_rdev(struct cxio_rdev *rdev_p) -{ - int i; - for (i = 0; i < T3_MAX_NUM_RNIC; i++) - if (!rdev_tbl[i]) { - rdev_tbl[i] = rdev_p; - break; - } - return (i == T3_MAX_NUM_RNIC); -} + struct cxio_rdev *rdev; -static inline void cxio_hal_delete_rdev(struct cxio_rdev *rdev_p) -{ - int i; - for (i = 0; i < T3_MAX_NUM_RNIC; i++) - if (rdev_tbl[i] == rdev_p) { - rdev_tbl[i] = NULL; - break; - } + list_for_each_entry(rdev, &rdev_list, entry) + if (rdev->t3cdev_p == tdev) + return rdev; + return NULL; } int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq, @@ -937,8 +916,7 @@ int cxio_rdev_open(struct cxio_rdev *rde return -EINVAL; } - if (cxio_hal_add_rdev(rdev_p)) - return -ENOMEM; + list_add_tail(&rdev_p->entry, &rdev_list); PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name); memset(&rdev_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp)); @@ -1018,7 +996,7 @@ err3: err2: cxio_hal_destroy_ctrl_qp(rdev_p); err1: - cxio_hal_delete_rdev(rdev_p); + list_del(&rdev_p->entry); return err; } @@ -1027,7 +1005,7 @@ void cxio_rdev_close(struct cxio_rdev *r if (rdev_p) { cxio_hal_pblpool_destroy(rdev_p); cxio_hal_rqtpool_destroy(rdev_p); - cxio_hal_delete_rdev(rdev_p); + list_del(&rdev_p->entry); rdev_p->t3cdev_p->ulp = NULL; cxio_hal_destroy_ctrl_qp(rdev_p); cxio_hal_destroy_resource(rdev_p->rscp); @@ -1038,7 +1016,6 @@ int __init cxio_hal_init(void) { if (cxio_hal_init_rhdl_resource(T3_MAX_NUM_RI)) return -ENOMEM; - memset(rdev_tbl, 0, T3_MAX_NUM_RNIC * sizeof(void *)); t3_register_cpl_handler(CPL_ASYNC_NOTIF, cxio_hal_ev_handler); return 0; } @@ -1046,9 +1023,11 @@ int __init cxio_hal_init(void) void __exit cxio_hal_exit(void) { int i; + struct cxio_rdev *rdev, *tmp; + t3_register_cpl_handler(CPL_ASYNC_NOTIF, NULL); - for (i = 0; i < T3_MAX_NUM_RNIC; i++) - cxio_rdev_close(rdev_tbl[i]); + list_for_each_entry_safe(rdev, tmp, &rdev_list, entry) + cxio_rdev_close(rdev); cxio_hal_destroy_rhdl_resource(); } diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h index d5ae282..8fb2999 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h @@ -47,7 +47,6 @@ #define T3_CTRL_QP_SIZE_LOG2 8 #define T3_CTRL_CQ_ID 0 /* TBD */ -#define T3_MAX_NUM_RNIC 8 #define T3_MAX_NUM_RI (1<<15) #define T3_MAX_NUM_QP (1<<15) #define T3_MAX_NUM_CQ (1<<15) @@ -106,6 +105,7 @@ struct cxio_rdev { struct cxio_ucontext uctx; struct gen_pool *pbl_pool; struct gen_pool *rqt_pool; + struct list_head entry; }; static inline int cxio_num_stags(struct cxio_rdev *rdev_p) From swise at opengridcomputing.com Thu Feb 8 12:26:44 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 14:26:44 -0600 Subject: [openib-general] [PATCH 5/5] Hold the iwch device mutex around cxio_rdev_open(). In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int> References: <20070208202634.4382.15287.stgit@dell3.ogc.int> Message-ID: <20070208202644.4382.75136.stgit@dell3.ogc.int> From: Steve Wise Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c index 0c95f2c..c353a9b 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.c +++ b/drivers/infiniband/hw/cxgb3/iwch.c @@ -119,7 +119,10 @@ static void open_rnic_dev(struct t3cdev rnicp->rdev.ulp = rnicp; rnicp->rdev.t3cdev_p = tdev; + mutex_lock(&dev_mutex); + if (cxio_rdev_open(&rnicp->rdev)) { + mutex_unlock(&dev_mutex); printk(KERN_ERR MOD "Unable to open CXIO rdev\n"); ib_dealloc_device(&rnicp->ibdev); return; @@ -127,7 +130,6 @@ static void open_rnic_dev(struct t3cdev rnic_init(rnicp); - mutex_lock(&dev_mutex); list_add_tail(&rnicp->entry, &dev_list); mutex_unlock(&dev_mutex); From halr at voltaire.com Thu Feb 8 12:39:44 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Feb 2007 15:39:44 -0500 Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when appropriate for unicast packets In-Reply-To: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com> References: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com> Message-ID: <1170967182.31538.96962.camel@hal.voltaire.com> On Thu, 2007-02-08 at 14:54, Sean Hefty wrote: > >Hum, you mean to meet the LID validation rules of 9.6.1.5? That is a > >huge PITA.. > > > >[IMHO, 9.6.1.5 C9-54 is a mistake, if there is a GRH then the LRH.SLID > > should not be validated against the QP context since it makes it > > extra hard for multipath routing and QoS to work...] > > Yes - this gets messy. > > >Here is one thought on how to do this: > >To meet this rule each side of the CM must take the SLID from > >the incoming LRH as the DLID for the connection. This SLID will be > >one of the SLIDs for the local router. The other side doesn't need to > >know what it is. The passive side will get the router SLID from the > >REQ and the active side gets it from the ACK. > > > >The passive side is easy, it just path record queries the DGID and > >requests the DLID == the incoming LRH.SLID. > > This requires that the passive side be able to issue path record queries, but I > think that it could work for static routes. A point was made to me that the > remote side could be a TCA without query capabilities. Are you referring to SA query capabilities ? Would such a device just be expected to work without change in an IB routed environment anyway ? -- Hal > > There's still the issue of what value is carried in the remote port LID in the > CM REQ (12.7.21), and I haven't even gotten to APM yet... > > >The nasty problem is with the active side - CMA will select a router > >lid it uses as the DLID and the router may select a different LID for > >it to use as the SLID when it processes the ACK. By C9-54 they have to > >be the same :< So the active side might have to do another path record > >query to move its DLID and SL to match the routers choosen > >SLID. Double suck :P > > As long as the SA and local routers are in sync, we may be okay here without a > second path record query. > > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tziporet at mellanox.co.il Thu Feb 8 13:18:06 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 08 Feb 2007 23:18:06 +0200 Subject: [openib-general] new OFED 1.2 package Message-ID: <45CB938E.5040305@mellanox.co.il> New OFED package was uploaded to the OFA server: http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070208-1508.tgz Many of the issues reported on the previous version are resolved (bugzilla will be updated next week). Since we had lab restructuring we did only basic tests on RHEL up4 and SLES10 (x86 and x86_64) All - we are going for our weekend now. Please report all issues you encounter so we will be able to fix and do the alpha release on Monday. Thanks, Tziporet & Vlad From tzachid at mellanox.co.il Thu Feb 8 13:24:08 2007 From: tzachid at mellanox.co.il (Tzachi Dar) Date: Thu, 8 Feb 2007 23:24:08 +0200 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] Message-ID: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com> See bellow. Thanks Tzachi > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Thursday, February 08, 2007 9:47 PM > To: Tzachi Dar > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; > OPENIB; Michael S. Tsirkin; Hal Rosenstock > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > opensm: sigusr1: syslog() fixes]] > > On 20:31 Thu 08 Feb , Tzachi Dar wrote: > > The windows open IB has decided on using a BSD only license. > > The common implementation of pthreads as far as I know is > LGPL, which > > means that it can not be used in open IB. > > Why not? AFAIK it works perfectly (see (5,6 and Preamble)): > http://www.gnu.org/copyleft/lesser.html > > And of course there are tons of examples when BSD software > links against LGPLed glibc. I can of course write you an answer that will be more than 5 pages long of why *I* don't think that Using GPL software is bad for everyone, but I guess that my opinion doesn't really meter, so I Won't do it. The page that you have referenced is of the GNU org, and even there it is hard to say that they are trying to encourage you to use the LGPL license. In any case, the main point is that When open IB windows was formed there was a general decision that it will use BSD license. If we Start having components with the LGPL this will break that decision, and therefore this requires some voting of the open IB organization. > > The only two ways that I see around this are 1) Change the > license of > > open IB windows which might be a complicated thing. 2) Find an > > implementation of pthreads that is BSD. > > BTW, just wondering... What is relation between windows open > IB and OFA (and OFA's "dual-license rule")? Well, the way I see it one can take code from the Linux part under the BSD licance and use it in The windows part. The otherway around seems fine to me but some say that since the windows BSD liscance Reqires that some text will always remain there, the other way around is not possibale. As I'm not an Expert in that erea I don't know who is right. > Sasha > > > > > Thanks > > Tzachi > > > > > -----Original Message----- > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > Sent: Thursday, February 08, 2007 7:46 PM > > > To: Tzachi Dar; Yossi Leybovich > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > > opensm: sigusr1: syslog() fixes]] > > > > > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > > > > Tzachi, Yossi, please join the thread. > > > > What do you think about distributing a copy of the pthread DLL > > > > with opensm? > > > > > > Any news here? Thanks. > > > > > > Sasha > > > > > > > > > > > -- Yevgeny. > > > > > > > > -------- Original Message -------- > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > > > syslog() fixes] > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200 > > > > From: Sasha Khapyorsky > > > > To: Michael S. Tsirkin > > > > CC: Yevgeny Kliteynik , > > > OPENIB > > > > References: <20070118194403.GA23783 at sashak.voltaire.com> > > > > <20070118215023.GP9890 at mellanox.co.il> > > > > > > > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > Quoting Sasha Khapyorsky : > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] > opensm: sigusr1: > > > > > > syslog() fixes] > > > > > > > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > > What about pure opensource - > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed > > > > > > > > under LGPL, I see on the net many positive reports about > > > stability and usability. > > > > > > > > > > > > > > I used it to do a windows port of linux complib at some > > > > > > > point and opensm seemed to work fine with it. What it was > > > lacking at > > > > > > > that point was support for 64 bit applications, > and for some > > > > > > > reason (which is still unclear to me) there was a > > > strong desire to run opensm in 64 bit mode. > > > > > > > Seems to have been fixed now, BTW. > > > > > > > > > > > > So this seems to be good option for OpenSM on > Windows. Right? > > > > > > > > > > No idea. Distributing a copy of the pthread DLL with > > > opensm does not > > > > > look like a problem. But is it worth it? > > > > > > > > Sure, it makes windows porting much more transparent and > > > let us to use > > > > standard *nix stuff w/out #ifndef WIN32. Other > (generic) benefit > > > > is that posix is more standard and powerful than > wrappers like complib. > > > > > > > > Sasha > > > > > > > > From Shainer at Mellanox.com Thu Feb 8 13:34:37 2007 From: Shainer at Mellanox.com (Gilad Shainer) Date: Thu, 8 Feb 2007 13:34:37 -0800 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F618167@mtiexch01.mti.com> Windows Open IB is part of OpenFabrics. OpenFabrics includes Linux and Windows communities. The Linux code is dual license while the Windows code is BSD only. Gilad. -----Original Message----- From: Tzachi Dar Sent: Thursday, February 08, 2007 1:24 PM To: Sasha Khapyorsky Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock Subject: RE: [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] See bellow. Thanks Tzachi > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Thursday, February 08, 2007 9:47 PM > To: Tzachi Dar > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; OPENIB; Michael > S. Tsirkin; Hal Rosenstock > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > opensm: sigusr1: syslog() fixes]] > > On 20:31 Thu 08 Feb , Tzachi Dar wrote: > > The windows open IB has decided on using a BSD only license. > > The common implementation of pthreads as far as I know is > LGPL, which > > means that it can not be used in open IB. > > Why not? AFAIK it works perfectly (see (5,6 and Preamble)): > http://www.gnu.org/copyleft/lesser.html > > And of course there are tons of examples when BSD software links > against LGPLed glibc. I can of course write you an answer that will be more than 5 pages long of why *I* don't think that Using GPL software is bad for everyone, but I guess that my opinion doesn't really meter, so I Won't do it. The page that you have referenced is of the GNU org, and even there it is hard to say that they are trying to encourage you to use the LGPL license. In any case, the main point is that When open IB windows was formed there was a general decision that it will use BSD license. If we Start having components with the LGPL this will break that decision, and therefore this requires some voting of the open IB organization. > > The only two ways that I see around this are 1) Change the > license of > > open IB windows which might be a complicated thing. 2) Find an > > implementation of pthreads that is BSD. > > BTW, just wondering... What is relation between windows open IB and > OFA (and OFA's "dual-license rule")? Well, the way I see it one can take code from the Linux part under the BSD licance and use it in The windows part. The otherway around seems fine to me but some say that since the windows BSD liscance Reqires that some text will always remain there, the other way around is not possibale. As I'm not an Expert in that erea I don't know who is right. > Sasha > > > > > Thanks > > Tzachi > > > > > -----Original Message----- > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > Sent: Thursday, February 08, 2007 7:46 PM > > > To: Tzachi Dar; Yossi Leybovich > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > > opensm: sigusr1: syslog() fixes]] > > > > > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > > > > Tzachi, Yossi, please join the thread. > > > > What do you think about distributing a copy of the pthread DLL > > > > with opensm? > > > > > > Any news here? Thanks. > > > > > > Sasha > > > > > > > > > > > -- Yevgeny. > > > > > > > > -------- Original Message -------- > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > > > syslog() fixes] > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200 > > > > From: Sasha Khapyorsky > > > > To: Michael S. Tsirkin > > > > CC: Yevgeny Kliteynik , > > > OPENIB > > > > References: <20070118194403.GA23783 at sashak.voltaire.com> > > > > <20070118215023.GP9890 at mellanox.co.il> > > > > > > > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > Quoting Sasha Khapyorsky : > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] > opensm: sigusr1: > > > > > > syslog() fixes] > > > > > > > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > > What about pure opensource - > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed > > > > > > > > under LGPL, I see on the net many positive reports about > > > stability and usability. > > > > > > > > > > > > > > I used it to do a windows port of linux complib at some > > > > > > > point and opensm seemed to work fine with it. What it was > > > lacking at > > > > > > > that point was support for 64 bit applications, > and for some > > > > > > > reason (which is still unclear to me) there was a > > > strong desire to run opensm in 64 bit mode. > > > > > > > Seems to have been fixed now, BTW. > > > > > > > > > > > > So this seems to be good option for OpenSM on > Windows. Right? > > > > > > > > > > No idea. Distributing a copy of the pthread DLL with > > > opensm does not > > > > > look like a problem. But is it worth it? > > > > > > > > Sure, it makes windows porting much more transparent and > > > let us to use > > > > standard *nix stuff w/out #ifndef WIN32. Other > (generic) benefit > > > > is that posix is more standard and powerful than > wrappers like complib. > > > > > > > > Sasha > > > > > > > > From krause at cup.hp.com Thu Feb 8 13:19:38 2007 From: krause at cup.hp.com (Michael Krause) Date: Thu, 08 Feb 2007 13:19:38 -0800 Subject: [openib-general] Immediate data question In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> Message-ID: <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> At 03:41 PM 2/7/2007, Roland Dreier wrote: > Changqing> What I mean is that, is there any performance penalty > Changqing> for receiver's overall performance if RNR happens > Changqing> continuously on one of the QP ? > >Not for the receiver, but the sender will be severely slowed down by >having to wait for the RNR timeouts. RNR = Receiver Not Ready so by definition, the data flow isn't going to progress until the receiver is ready to receive data. If a receive QP enters RNR for a RC, then it is likely not progressing as desired. RNR was initially put in place to enable a receiver to create back pressure to the sender without causing a fatal error condition. It should rarely be entered and therefore should have negligible impact on overall performance however when a RNR occurs, no forward progress will occur so performance is essentially zero. Mike From krause at cup.hp.com Thu Feb 8 13:26:49 2007 From: krause at cup.hp.com (Michael Krause) Date: Thu, 08 Feb 2007 13:26:49 -0800 Subject: [openib-general] dapl broken for iWARP In-Reply-To: References: Message-ID: <6.2.0.14.2.20070208132315.08989298@esmail.cup.hp.com> At 07:43 AM 2/8/2007, Kanevsky, Arkady wrote: >That is correct. >I am working with Krishna on it. >Expect patches soon. > >By the way the problem is not DAPL specific >and so is a proposed solution. > >There are 3 aspects of the solution. >One is APIs. We suggest that we do not augment these. >That is a connection requestor sets its QP >RDMA ORD and IRD. >When connection is established user can check the QP RDMA ORD and IRD >to see what he has now to use over the connection. >We may consider to extend QP attributes to support transport specific >parameters passing in the future. >For example, iWARP MPA CRC request. > >Second is the semantic that CM provides. >The proposal is to match IBCM semantic. >That is CM guarantee that local IRD is >= remote ORD. >This guarantees that incoming RDMA Read requests will not overwhelm >the QP RDMA Read capabilities. >Again there is not changes to IBCM only to IWCM. >Notice that as part of this IWCM will pass down to driver and extract >from driver >needed info. > >The final part is iWARP CM extension to exchange RDMA ORD, IRD. >This is similar to IBTA Annex for IP Addressing. >The harder part that this will eventually require IETF MPA spec extension, >and the fact that MPA protocol is implemented in RNIC HW by many vendors, >and hence can not be done by IWCM itself. We looked at this quite a bit during the creation of the specification. All of the targeted usage models exchange this information as part of their "hello" or login exchanges. As such, the "hum" was to not change MPA to communicate such information and leave it to software to exchange these values through existing mechanisms. I seriously doubt there will be much support for modifying the MPA specification at this stage since the implementations are largely complete and a modification would have to deal with the legacy interoperability issue which likely would be solved in software any way. It would be simpler to simply modify the underlying DAPL implementation to exchange the information and keep this hidden from both the application and the RNIC providers. Mike >Thanks, > >Arkady Kanevsky email: arkady at netapp.com >Network Appliance Inc. phone: 781-768-5395 >1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 >Waltham, MA 02451 central phone: 781-768-5300 > > > > -----Original Message----- > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > Sent: Wednesday, February 07, 2007 6:12 PM > > To: Arlin Davis > > Cc: openib-general > > Subject: Re: [openib-general] dapl broken for iWARP > > > > On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote: > > > Steve Wise wrote: > > > > > > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > > > > > > > > > > >>Arlin, > > > >> > > > >>The OFED dapl code is assuming the responder_resources and > > > >>initiator_depth passed up on a connection request event > > are from the > > > >>remote peer. This doesn't happen for iWARP. In the > > current iWARP > > > >>specifications, its up to the application to exchange this > > > >>information somehow. So these are defaulting to 0 on the > > server side > > > >>of any dapl connection over iWARP. > > > >> > > > >>This is a fairly recent change, I think. We need to come up with > > > >>some way to deal with this for OFED 1.2 IMO. > > > >> > > > >> > > > Yes, this was changed recently to sync up with the rdma_cm changes > > > that exposed the values. > > > > > > >> > > > >> > > > > > > > >The IWCM could set these to the device max values for instance. > > > > > > > > > > > That would work fine as long as you know the remote > > settings will be > > > equal or better. The provider just sets the min of local device max > > > values and the remote values provided with the request. > > > > > > > I know Krishna Kumar is working on a solution for exchanging > > this info in private data so the IWCM can "do the right > > thing". Stay tuned for a patch series to review for this. > > But this functionality is definitely post OFED-1.2. > > > > > > So for the OFED-1.2, I will set these to the device max in the IWCM. > > Assuming the other side is OFED 1.2 DAPL, then it will work fine. > > > > Steve. > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general From krause at cup.hp.com Thu Feb 8 13:36:34 2007 From: krause at cup.hp.com (Michael Krause) Date: Thu, 08 Feb 2007 13:36:34 -0800 Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when appropriate for unicast packets In-Reply-To: <1170967182.31538.96962.camel@hal.voltaire.com> References: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com> <1170967182.31538.96962.camel@hal.voltaire.com> Message-ID: <6.2.0.14.2.20070208133129.084a01e0@esmail.cup.hp.com> At 12:39 PM 2/8/2007, Hal Rosenstock wrote: >On Thu, 2007-02-08 at 14:54, Sean Hefty wrote: > > >Hum, you mean to meet the LID validation rules of 9.6.1.5? That is a > > >huge PITA.. > > > > > >[IMHO, 9.6.1.5 C9-54 is a mistake, if there is a GRH then the LRH.SLID > > > should not be validated against the QP context since it makes it > > > extra hard for multipath routing and QoS to work...] If you examine the prior diagram, the packet validation is quite precise and intent on catching any misrouted packets as early in the validation process as possible. This particular compliance statement makes it clear as to the type of connection and how to pattern match. The protocol was designed to work witin a single subnet as well as across subnets. Hence, the GRH must be validated in conjunction with the LRH and the QP context in order to insure an intermediate component did not misroute the packet. As described, a RC QP must flow through at most a single path at any given time in order to insure packet ordering is maintained (IB requires strong ordering so multi-path within a single RC is not allowed). As for QoS, one can arbitrate a packet for a RC QP relative to other flows without any additional complexity. If one wants to segregate a set of RC QP onto different paths as well as arbitration slots that is allowed and supported by the architecture even if going between the same set of ports - simply use multiple LID and SL during connection establishment. Mike > > > > Yes - this gets messy. > > > > >Here is one thought on how to do this: > > >To meet this rule each side of the CM must take the SLID from > > >the incoming LRH as the DLID for the connection. This SLID will be > > >one of the SLIDs for the local router. The other side doesn't need to > > >know what it is. The passive side will get the router SLID from the > > >REQ and the active side gets it from the ACK. > > > > > >The passive side is easy, it just path record queries the DGID and > > >requests the DLID == the incoming LRH.SLID. > > > > This requires that the passive side be able to issue path record > queries, but I > > think that it could work for static routes. A point was made to me > that the > > remote side could be a TCA without query capabilities. > >Are you referring to SA query capabilities ? Would such a device just be >expected to work without change in an IB routed environment anyway ? > >-- Hal > > > > > There's still the issue of what value is carried in the remote port LID > in the > > CM REQ (12.7.21), and I haven't even gotten to APM yet... > > > > >The nasty problem is with the active side - CMA will select a router > > >lid it uses as the DLID and the router may select a different LID for > > >it to use as the SLID when it processes the ACK. By C9-54 they have to > > >be the same :< So the active side might have to do another path record > > >query to move its DLID and SL to match the routers choosen > > >SLID. Double suck :P > > > > As long as the SA and local routers are in sync, we may be okay here > without a > > second path record query. > > > > - Sean > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Thu Feb 8 14:09:20 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 9 Feb 2007 00:09:20 +0200 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com> Message-ID: <20070208220920.GY22807@sashak.voltaire.com> On 23:24 Thu 08 Feb , Tzachi Dar wrote: > > > The windows open IB has decided on using a BSD only license. > > > The common implementation of pthreads as far as I know is > > LGPL, which > > > means that it can not be used in open IB. > > > > Why not? AFAIK it works perfectly (see (5,6 and Preamble)): > > http://www.gnu.org/copyleft/lesser.html > > > > And of course there are tons of examples when BSD software > > links against LGPLed glibc. > > I can of course write you an answer that will be more than 5 pages long > of why *I* don't think that > Using GPL software is bad for everyone, but I guess that my opinion > doesn't really meter, so I > Won't do it. I didn't mean to take it in this direction, sorry. I reffered original LGPL text where stated explicitly that non-(L)GPL programs can be linked against LGPLed libraries. And again, there are lot of examples (Apache, Mozilla, Xorg, etc.) where this works. > The page that you have referenced is of the GNU org, and even there it > is hard to say that they > are trying to encourage you to use the LGPL license. In any case, the > main point is that > When open IB windows was formed there was a general decision that it > will use BSD license. If we > Start having components with the LGPL this will break that decision, and > therefore this requires > some voting of the open IB organization. You are not going to maintain win-pthread32 as OpenIB component, but using this as third party. I think this should not be very different from using native windows thread dll (which I guess is not BSD too). I don't any LGPL issue here. Make sense? Sasha From mshefty at ichips.intel.com Thu Feb 8 14:02:02 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Feb 2007 14:02:02 -0800 Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when appropriate for unicast packets In-Reply-To: <1170967182.31538.96962.camel@hal.voltaire.com> References: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com> <1170967182.31538.96962.camel@hal.voltaire.com> Message-ID: <45CB9DDA.8020303@ichips.intel.com> >>This requires that the passive side be able to issue path record queries, but I >>think that it could work for static routes. A point was made to me that the >>remote side could be a TCA without query capabilities. > > Are you referring to SA query capabilities ? Would such a device just be > expected to work without change in an IB routed environment anyway ? Yes I was referring to SA query capability, such as a path record query. Since the spec requires that the path information be provided by the active side, I think that such a device could work without change. (But it does mean that the active side has to provide some way to obtain the necessary information to put into a CM REQ, plus know what the remote router will do.) - Sean From mst at mellanox.co.il Thu Feb 8 14:20:31 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 00:20:31 +0200 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com> Message-ID: <20070208222031.GD6560@mellanox.co.il> > Well, the way I see it one can take code from the Linux part under the BSD > licance and use it in The windows part. The otherway around seems fine to me but > some say that since the windows BSD liscance Reqires that some text will always > remain there, the other way around is not possibale. As I'm not an Expert in > that erea I don't know who is right. Interesting. Where does this idea come from? AFAIK BSD license is well known to be GPL-compatible, so there should be no problem moving code in either direction. -- MST From mst at mellanox.co.il Thu Feb 8 14:24:43 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 00:24:43 +0200 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] In-Reply-To: <20070208222031.GD6560@mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com> <20070208222031.GD6560@mellanox.co.il> Message-ID: <20070208222443.GE6560@mellanox.co.il> > Quoting r. Michael S. Tsirkin : > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] > > > Well, the way I see it one can take code from the Linux part under the BSD > > licance and use it in The windows part. The otherway around seems fine to me but > > some say that since the windows BSD liscance Reqires that some text will always > > remain there, the other way around is not possibale. As I'm not an Expert in > > that erea I don't know who is right. > > Interesting. Where does this idea come from? Maybe this? http://www.gnu.org/philosophy/bsd.html Note that openib license does not include the advertising clause. > AFAIK BSD license is well known to be > GPL-compatible, so there should be no problem moving code in either direction. -- MST From dledford at redhat.com Thu Feb 8 14:28:13 2007 From: dledford at redhat.com (Doug Ledford) Date: Thu, 08 Feb 2007 17:28:13 -0500 Subject: [openib-general] issues with compilation of ofed 1.2 In-Reply-To: <6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com> References: <45C9EE31.2040602@voltaire.com> <6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com> Message-ID: <1170973693.19297.2.camel@firewall.xsintricity.com> On Thu, 2007-02-08 at 09:02 +0200, Moni Levy wrote: > Doug, > On 2/7/07, Yosef Etigin wrote: > > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro. > > Can you please help us with that ? The value of the sysfsutils is far overshadowed by the value of libsysfs (and libsysfs is far more commonly used). So, in RHEL5, the rpm package names reflect this: libsysfs sysfsutils (I think, might be libsysfs-utils) libsysfs-devel It's all still there, just a different name. > -- Moni > > > > > -- > > Yosef Etigin > > Alex Tabachnik > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rowland at cse.ohio-state.edu Thu Feb 8 14:24:24 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Thu, 08 Feb 2007 17:24:24 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <20070208134305.GC20183@mellanox.co.il> References: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com> <20070208134305.GC20183@mellanox.co.il> Message-ID: <45CBA318.9030304@cse.ohio-state.edu> Michael S. Tsirkin wrote: >> Quoting Jeff Squyres : >> 2) we're trying to *use* the software when it is installed in the >> DESTDIR >> --> this means that you have to put special-case in the software so >> that they look for support files in both the DESTDIR *and* the final >> installation directory Either that, or fix your resulting package so that it will work with the final installation directory case (not work with both), and then setup a temporary environment that will allow it to work for the rest of the build process. For the mpitests package being built against our RPM result, this is the approach I took. It took me a little time to figure out how to do this, because it is odd. > How do you mean, use? I assume this means linking against the libraries. In the mpitests RPM build, it could mean using mpicc, etc. from the MPI package while it itself is not working in its final destination directory. No one does this sort of thing normally when building software packages from source code. > Hmm. I guess my question is - this works fine when I run OFED's > configure script, why is SRPM so much more difficult? Anyone can correct me if I am reading this wrong, but I've commented on this at least once before - somewhat indirectly. I've built and supported open source software at the OSU CSE department for a long time - so I've built many different packages from source about a million times. When you build a source code package, obviously you make sure the necessary libraries can be found. These libraries are in some system location - their "final installation directory". If this location is not in the default search path, you can deal with that various ways or add the path to the system's default. If your package builds its own libraries and also uses them, then you deal with that yourself in your package's build system. When you say OFED's configure script, this is the situation I see. Never have I purposely built the libraries required for a package in a temporary location, built the package against those libraries, and then moved everything to a final location. It makes things more difficult. Take our SRPM for example. If you have OFED installed, I am mainly concerned with the stack prefix, by default /usr/local/ofed. If I build our code with the libraries in their final location, I don't have to worry about subtle things like the various scripts having this path hard coded in them. Most packages rightly make the assumption that these paths you use are to dependencies that have already been installed, and if there is some need to incorporate those paths into the package build result, for whatever reason, it's all right to do that. In our SRPM, I need to fix some things in the OFED installer case because the libraries I am building against are not in their final location. These are things that I do not have to do normally, however to be fair - I already have to fix some things because of the RPM BuildRoot usage anyway because our package is not installed in its final destination either in any RPM building scenario. It only goes into its final destination directory when you install the actual RPM. With RPMs, this is a good, safe way to build _individual_ packages. In the SuSE case, the %build section is assuming this and cleaning things up before you start, because - why would there be anything in there BuildRoot already? Is this right or wrong? That's a matter of opinion. There's information in various RPM building resources that mention some of this stuff. Luckily this is not a big deal for me to handle in our case. However, it could have been. This is like a "bootstrap" situation, but it's not normally how one would go and build some source code package on their system, and RPMs are all about reproducible source code builds. Again, to be totally fair, you wouldn't normally install your package into a BuildRoot prefixed "prefix" location either, but that seems easier to deal with than the location of libraries your package may depend on. And to go even further, as in the mpitests RPM build case, would one normally expect the RPM build result that is installed in a BuildRoot prefixed "prefix" and just left there to even work? I would say it is absolutely not safe to just assume that for any given RPM build. This all depends on the source code you are trying to build and what it does exactly. Any time you are using paths that don't reflect the final destination of whatever dependency, you have the potential to have to deal with extra work to fix the final result. In the usual SRPM building case, the packages that your package depends on would already be installed on the system in their final destination directory. I could even require these RPM packages as build requirements - something I am not doing in the spec file itself, yes? This would mean that I could take a few steps out of my SRPM spec file. From what I am reading here, this would be one reason to think of a chroot situation. But it seems to me that this would make things potentially difficult. Another option could be to go ahead and install the OFA packages before the MPI packages. Either way would work for me because I've already handled this DESTDIR situation (even for the mpitests being built against our RPM result - because I leave it in DESTDIR after the RPM build is done if this is all being done by the OFED installer). In addition, if you were to follow this logic, the MPI packages would be installed in their final location before the mpitests packages were built against them. If this were done instead, I don't think chroot would be required. However, it does mean the packages have to be installed, and from what I've seen - in a 3 step process. When I was first trying to make our SRPM work with the OFED install scripts, this was the first thing I had to fix. And I definitely was not expecting this type of situation. Now, perhaps I have misread this thread and applied it to my own experience. If I have not misread it, then I understand what the point is. This does not mean I advocate changing it personally. From my testing, I've made this aspect of our build work. I had no problems moving my %build section code to the %install section. To me the question is if this is too difficult to deal with. It depends on the package being built, but no standalone packages I am aware of contain logic for "temporary paths" to dependencies, or again to be fair in the RPM building scenario, a "temporary prefix". None of what I am saying applies to a package that builds and uses its own libraries though. In those cases, the developers obviously have to deal with that. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From rdreier at cisco.com Thu Feb 8 14:41:05 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Feb 2007 14:41:05 -0800 Subject: [openib-general] [PATCH] IB/ipoib_cm: fix up issues from code review In-Reply-To: <20070208152947.GA6560@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 8 Feb 2007 17:29:47 +0200") References: <20070208152947.GA6560@mellanox.co.il> Message-ID: OK, I pulled this in and fixed it to build with the netdevice class_device-ectomy that just went upstream, and pushed it out on my for-2.6.21 branch like this. diff --git a/drivers/infiniband/ulp/ipoib/Kconfig b/drivers/infiniband/ulp/ipoib/Kconfig index c75322d..af78ccc 100644 --- a/drivers/infiniband/ulp/ipoib/Kconfig +++ b/drivers/infiniband/ulp/ipoib/Kconfig @@ -1,6 +1,6 @@ config INFINIBAND_IPOIB tristate "IP-over-InfiniBand" - depends on INFINIBAND && NETDEVICES && INET + depends on INFINIBAND && NETDEVICES && INET && (IPV6 || IPV6=n) ---help--- Support for the IP-over-InfiniBand protocol (IPoIB). This transports IP packets over InfiniBand so you can use your IB @@ -8,6 +8,20 @@ config INFINIBAND_IPOIB See Documentation/infiniband/ipoib.txt for more information +config INFINIBAND_IPOIB_CM + bool "IP-over-InfiniBand Connected Mode support" + depends on INFINIBAND_IPOIB && EXPERIMENTAL + default n + ---help--- + This option enables experimental support for IPoIB connected mode. + After enabling this option, you need to switch to connected mode through + /sys/class/net/ibXXX/mode to actually create connections, and then increase + the interface MTU with e.g. ifconfig ib0 mtu 65520. + + WARNING: Enabling connected mode will trigger some + packet drops for multicast and UD mode traffic from this interface, + unless you limit mtu for these destinations to 2044. + config INFINIBAND_IPOIB_DEBUG bool "IP-over-InfiniBand debugging" if EMBEDDED depends on INFINIBAND_IPOIB diff --git a/drivers/infiniband/ulp/ipoib/Makefile b/drivers/infiniband/ulp/ipoib/Makefile index 8935e74..98ee38e 100644 --- a/drivers/infiniband/ulp/ipoib/Makefile +++ b/drivers/infiniband/ulp/ipoib/Makefile @@ -5,5 +5,6 @@ ib_ipoib-y := ipoib_main.o \ ipoib_multicast.o \ ipoib_verbs.o \ ipoib_vlan.o +ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_CM) += ipoib_cm.o ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_DEBUG) += ipoib_fs.o diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 07deee8..eb885ee 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -62,6 +62,10 @@ enum { IPOIB_ENCAP_LEN = 4, + IPOIB_CM_MTU = 0x10000 - 0x10, /* padding to align header to 16 */ + IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, + IPOIB_CM_HEAD_SIZE = IPOIB_CM_BUF_SIZE % PAGE_SIZE, + IPOIB_CM_RX_SG = ALIGN(IPOIB_CM_BUF_SIZE, PAGE_SIZE) / PAGE_SIZE, IPOIB_RX_RING_SIZE = 128, IPOIB_TX_RING_SIZE = 64, IPOIB_MAX_QUEUE_SIZE = 8192, @@ -81,6 +85,8 @@ enum { IPOIB_MCAST_RUN = 6, IPOIB_STOP_REAPER = 7, IPOIB_MCAST_STARTED = 8, + IPOIB_FLAG_NETIF_STOPPED = 9, + IPOIB_FLAG_ADMIN_CM = 10, IPOIB_MAX_BACKOFF_SECONDS = 16, @@ -90,6 +96,14 @@ enum { IPOIB_MCAST_FLAG_ATTACHED = 3, }; + +#define IPOIB_OP_RECV (1ul << 31) +#ifdef CONFIG_INFINIBAND_IPOIB_CM +#define IPOIB_CM_OP_SRQ (1ul << 30) +#else +#define IPOIB_CM_OP_SRQ (0) +#endif + /* structs */ struct ipoib_header { @@ -113,6 +127,59 @@ struct ipoib_tx_buf { u64 mapping; }; +struct ib_cm_id; + +struct ipoib_cm_data { + __be32 qpn; /* High byte MUST be ignored on receive */ + __be32 mtu; +}; + +struct ipoib_cm_rx { + struct ib_cm_id *id; + struct ib_qp *qp; + struct list_head list; + struct net_device *dev; + unsigned long jiffies; +}; + +struct ipoib_cm_tx { + struct ib_cm_id *id; + struct ib_cq *cq; + struct ib_qp *qp; + struct list_head list; + struct net_device *dev; + struct ipoib_neigh *neigh; + struct ipoib_path *path; + struct ipoib_tx_buf *tx_ring; + unsigned tx_head; + unsigned tx_tail; + unsigned long flags; + u32 mtu; + struct ib_wc ibwc[IPOIB_NUM_WC]; +}; + +struct ipoib_cm_rx_buf { + struct sk_buff *skb; + u64 mapping[IPOIB_CM_RX_SG]; +}; + +struct ipoib_cm_dev_priv { + struct ib_srq *srq; + struct ipoib_cm_rx_buf *srq_ring; + struct ib_cm_id *id; + struct list_head passive_ids; + struct work_struct start_task; + struct work_struct reap_task; + struct work_struct skb_task; + struct delayed_work stale_task; + struct sk_buff_head skb_queue; + struct list_head start_list; + struct list_head reap_list; + struct ib_wc ibwc[IPOIB_NUM_WC]; + struct ib_sge rx_sge[IPOIB_CM_RX_SG]; + struct ib_recv_wr rx_wr; +}; + /* * Device private locking: tx_lock protects members used in TX fast * path (and we use LLTX so upper layers don't do extra locking). @@ -179,6 +246,10 @@ struct ipoib_dev_priv { struct list_head child_intfs; struct list_head list; +#ifdef CONFIG_INFINIBAND_IPOIB_CM + struct ipoib_cm_dev_priv cm; +#endif + #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG struct list_head fs_list; struct dentry *mcg_dentry; @@ -212,6 +283,9 @@ struct ipoib_path { struct ipoib_neigh { struct ipoib_ah *ah; +#ifdef CONFIG_INFINIBAND_IPOIB_CM + struct ipoib_cm_tx *cm; +#endif union ib_gid dgid; struct sk_buff_head queue; @@ -315,6 +389,146 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); void ipoib_pkey_poll(struct work_struct *work); int ipoib_pkey_dev_delay_open(struct net_device *dev); +#ifdef CONFIG_INFINIBAND_IPOIB_CM + +#define IPOIB_FLAGS_RC 0x80 +#define IPOIB_FLAGS_UC 0x40 + +/* We don't support UC connections at the moment */ +#define IPOIB_CM_SUPPORTED(ha) (ha[0] & (IPOIB_FLAGS_RC)) + +static inline int ipoib_cm_admin_enabled(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + return IPOIB_CM_SUPPORTED(dev->dev_addr) && + test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); +} + +static inline int ipoib_cm_enabled(struct net_device *dev, struct neighbour *n) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + return IPOIB_CM_SUPPORTED(n->ha) && + test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); +} + +static inline int ipoib_cm_up(struct ipoib_neigh *neigh) + +{ + return test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags); +} + +static inline struct ipoib_cm_tx *ipoib_cm_get(struct ipoib_neigh *neigh) +{ + return neigh->cm; +} + +static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *tx) +{ + neigh->cm = tx; +} + +void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx); +int ipoib_cm_dev_open(struct net_device *dev); +void ipoib_cm_dev_stop(struct net_device *dev); +int ipoib_cm_dev_init(struct net_device *dev); +int ipoib_cm_add_mode_attr(struct net_device *dev); +void ipoib_cm_dev_cleanup(struct net_device *dev); +struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, + struct ipoib_neigh *neigh); +void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); +void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, + unsigned int mtu); +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc); +#else + +struct ipoib_cm_tx; + +static inline int ipoib_cm_admin_enabled(struct net_device *dev) +{ + return 0; +} +static inline int ipoib_cm_enabled(struct net_device *dev, struct neighbour *n) + +{ + return 0; +} + +static inline int ipoib_cm_up(struct ipoib_neigh *neigh) + +{ + return 0; +} + +static inline struct ipoib_cm_tx *ipoib_cm_get(struct ipoib_neigh *neigh) +{ + return NULL; +} + +static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *tx) +{ +} + +static inline +void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx) +{ + return; +} + +static inline +int ipoib_cm_dev_open(struct net_device *dev) +{ + return 0; +} + +static inline +void ipoib_cm_dev_stop(struct net_device *dev) +{ + return; +} + +static inline +int ipoib_cm_dev_init(struct net_device *dev) +{ + return -ENOSYS; +} + +static inline +void ipoib_cm_dev_cleanup(struct net_device *dev) +{ + return; +} + +static inline +struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, + struct ipoib_neigh *neigh) +{ + return NULL; +} + +static inline +void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx) +{ + return; +} + +static inline +int ipoib_cm_add_mode_attr(struct net_device *dev) +{ + return 0; +} + +static inline void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, + unsigned int mtu) +{ + dev_kfree_skb_any(skb); +} + +static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +{ +} + +#endif + #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG void ipoib_create_debug_files(struct net_device *dev); void ipoib_delete_debug_files(struct net_device *dev); @@ -392,4 +606,6 @@ extern int ipoib_debug_level; #define IPOIB_GID_ARG(gid) IPOIB_GID_RAW_ARG((gid).raw) +#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff) + #endif /* _IPOIB_H */ diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c new file mode 100644 index 0000000..2d48387 --- /dev/null +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -0,0 +1,1237 @@ +/* + * Copyright (c) 2006 Mellanox Technologies. All rights reserved + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + +#include +#include +#include +#include +#include + +#ifdef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA +static int data_debug_level; + +module_param_named(cm_data_debug_level, data_debug_level, int, 0644); +MODULE_PARM_DESC(cm_data_debug_level, + "Enable data path debug tracing for connected mode if > 0"); +#endif + +#include "ipoib.h" + +#define IPOIB_CM_IETF_ID 0x1000000000000000ULL + +#define IPOIB_CM_RX_UPDATE_TIME (256 * HZ) +#define IPOIB_CM_RX_TIMEOUT (2 * 256 * HZ) +#define IPOIB_CM_RX_DELAY (3 * 256 * HZ) +#define IPOIB_CM_RX_UPDATE_MASK (0x3) + +struct ipoib_cm_id { + struct ib_cm_id *id; + int flags; + u32 remote_qpn; + u32 remote_mtu; +}; + +static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *event); + +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, + u64 mapping[IPOIB_CM_RX_SG]) +{ + int i; + + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); + + for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i) + ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); +} + +static int ipoib_cm_post_receive(struct net_device *dev, int id) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_recv_wr *bad_wr; + int i, ret; + + priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; + + for (i = 0; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i]; + + ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr); + if (unlikely(ret)) { + ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret); + ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[id].mapping); + dev_kfree_skb_any(priv->cm.srq_ring[id].skb); + priv->cm.srq_ring[id].skb = NULL; + } + + return ret; +} + +static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, + u64 mapping[IPOIB_CM_RX_SG]) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct sk_buff *skb; + int i; + + skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12); + if (unlikely(!skb)) + return -ENOMEM; + + /* + * IPoIB adds a 4 byte header. So we need 12 more bytes to align the + * IP header to a multiple of 16. + */ + skb_reserve(skb, 12); + + mapping[0] = ib_dma_map_single(priv->ca, skb->data, IPOIB_CM_HEAD_SIZE, + DMA_FROM_DEVICE); + if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) { + dev_kfree_skb_any(skb); + return -EIO; + } + + for (i = 0; i < IPOIB_CM_RX_SG - 1; i++) { + struct page *page = alloc_page(GFP_ATOMIC); + + if (!page) + goto partial_error; + skb_fill_page_desc(skb, i, page, 0, PAGE_SIZE); + + mapping[i + 1] = ib_dma_map_page(priv->ca, skb_shinfo(skb)->frags[i].page, + 0, PAGE_SIZE, DMA_TO_DEVICE); + if (unlikely(ib_dma_mapping_error(priv->ca, mapping[i + 1]))) + goto partial_error; + } + + priv->cm.srq_ring[id].skb = skb; + return 0; + +partial_error: + + ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); + + for (; i >= 0; --i) + ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); + + kfree_skb(skb); + return -ENOMEM; +} + +static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev, + struct ipoib_cm_rx *p) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_qp_init_attr attr = { + .send_cq = priv->cq, /* does not matter, we never send anything */ + .recv_cq = priv->cq, + .srq = priv->cm.srq, + .cap.max_send_wr = 1, /* FIXME: 0 Seems not to work */ + .cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */ + .sq_sig_type = IB_SIGNAL_ALL_WR, + .qp_type = IB_QPT_RC, + .qp_context = p, + }; + return ib_create_qp(priv->pd, &attr); +} + +static int ipoib_cm_modify_rx_qp(struct net_device *dev, + struct ib_cm_id *cm_id, struct ib_qp *qp, + unsigned psn) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; + + qp_attr.qp_state = IB_QPS_INIT; + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for INIT: %d\n", ret); + return ret; + } + ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to INIT: %d\n", ret); + return ret; + } + qp_attr.qp_state = IB_QPS_RTR; + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret); + return ret; + } + qp_attr.rq_psn = psn; + ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret); + return ret; + } + return 0; +} + +static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id, + struct ib_qp *qp, struct ib_cm_req_event_param *req, + unsigned psn) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_data data = {}; + struct ib_cm_rep_param rep = {}; + + data.qpn = cpu_to_be32(priv->qp->qp_num); + data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE); + + rep.private_data = &data; + rep.private_data_len = sizeof data; + rep.flow_control = 0; + rep.rnr_retry_count = req->rnr_retry_count; + rep.target_ack_delay = 20; /* FIXME */ + rep.srq = 1; + rep.qp_num = qp->qp_num; + rep.starting_psn = psn; + return ib_send_cm_rep(cm_id, &rep); +} + +static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) +{ + struct net_device *dev = cm_id->context; + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_rx *p; + unsigned long flags; + unsigned psn; + int ret; + + ipoib_dbg(priv, "REQ arrived\n"); + p = kzalloc(sizeof *p, GFP_KERNEL); + if (!p) + return -ENOMEM; + p->dev = dev; + p->id = cm_id; + p->qp = ipoib_cm_create_rx_qp(dev, p); + if (IS_ERR(p->qp)) { + ret = PTR_ERR(p->qp); + goto err_qp; + } + + psn = random32() & 0xffffff; + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); + if (ret) + goto err_modify; + + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); + if (ret) { + ipoib_warn(priv, "failed to send REP: %d\n", ret); + goto err_rep; + } + + cm_id->context = p; + p->jiffies = jiffies; + spin_lock_irqsave(&priv->lock, flags); + list_add(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + return 0; + +err_rep: +err_modify: + ib_destroy_qp(p->qp); +err_qp: + kfree(p); + return ret; +} + +static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *event) +{ + struct ipoib_cm_rx *p; + struct ipoib_dev_priv *priv; + unsigned long flags; + int ret; + + switch (event->event) { + case IB_CM_REQ_RECEIVED: + return ipoib_cm_req_handler(cm_id, event); + case IB_CM_DREQ_RECEIVED: + p = cm_id->context; + ib_send_cm_drep(cm_id, NULL, 0); + /* Fall through */ + case IB_CM_REJ_RECEIVED: + p = cm_id->context; + priv = netdev_priv(p->dev); + spin_lock_irqsave(&priv->lock, flags); + if (list_empty(&p->list)) + ret = 0; /* Connection is going away already. */ + else { + list_del_init(&p->list); + ret = -ECONNRESET; + } + spin_unlock_irqrestore(&priv->lock, flags); + if (ret) { + ib_destroy_qp(p->qp); + kfree(p); + return ret; + } + return 0; + default: + return 0; + } +} +/* Adjust length of skb with fragments to match received data */ +static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, + unsigned int length) +{ + int i, num_frags; + unsigned int size; + + /* put header into skb */ + size = min(length, hdr_space); + skb->tail += size; + skb->len += size; + length -= size; + + num_frags = skb_shinfo(skb)->nr_frags; + for (i = 0; i < num_frags; i++) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + + if (length == 0) { + /* don't need this page */ + __free_page(frag->page); + --skb_shinfo(skb)->nr_frags; + } else { + size = min(length, (unsigned) PAGE_SIZE); + + frag->size = size; + skb->data_len += size; + skb->truesize += size; + skb->len += size; + length -= size; + } + } +} + +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; + struct sk_buff *skb; + struct ipoib_cm_rx *p; + unsigned long flags; + u64 mapping[IPOIB_CM_RX_SG]; + + ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n", + wr_id, wc->opcode, wc->status); + + if (unlikely(wr_id >= ipoib_recvq_size)) { + ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n", + wr_id, ipoib_recvq_size); + return; + } + + skb = priv->cm.srq_ring[wr_id].skb; + + if (unlikely(wc->status != IB_WC_SUCCESS)) { + ipoib_dbg(priv, "cm recv error " + "(status=%d, wrid=%d vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); + ++priv->stats.rx_dropped; + goto repost; + } + + if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { + p = wc->qp->qp_context; + if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { + spin_lock_irqsave(&priv->lock, flags); + p->jiffies = jiffies; + /* Move this entry to list head, but do + * not re-add it if it has been removed. */ + if (!list_empty(&p->list)) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + } + } + + if (unlikely(ipoib_cm_alloc_rx_skb(dev, wr_id, mapping))) { + /* + * If we can't allocate a new RX buffer, dump + * this packet and reuse the old buffer. + */ + ipoib_dbg(priv, "failed to allocate receive buffer %d\n", wr_id); + ++priv->stats.rx_dropped; + goto repost; + } + + ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[wr_id].mapping); + memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, sizeof mapping); + + ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", + wc->byte_len, wc->slid); + + skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len); + + skb->protocol = ((struct ipoib_header *) skb->data)->proto; + skb->mac.raw = skb->data; + skb_pull(skb, IPOIB_ENCAP_LEN); + + dev->last_rx = jiffies; + ++priv->stats.rx_packets; + priv->stats.rx_bytes += skb->len; + + skb->dev = dev; + /* XXX get correct PACKET_ type here */ + skb->pkt_type = PACKET_HOST; + netif_rx_ni(skb); + +repost: + if (unlikely(ipoib_cm_post_receive(dev, wr_id))) + ipoib_warn(priv, "ipoib_cm_post_receive failed " + "for buf %d\n", wr_id); +} + +static inline int post_send(struct ipoib_dev_priv *priv, + struct ipoib_cm_tx *tx, + unsigned int wr_id, + u64 addr, int len) +{ + struct ib_send_wr *bad_wr; + + priv->tx_sge.addr = addr; + priv->tx_sge.length = len; + + priv->tx_wr.wr_id = wr_id; + + return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr); +} + +void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_tx_buf *tx_req; + u64 addr; + + if (unlikely(skb->len > tx->mtu)) { + ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", + skb->len, tx->mtu); + ++priv->stats.tx_dropped; + ++priv->stats.tx_errors; + ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN); + return; + } + + ipoib_dbg_data(priv, "sending packet: head 0x%x length %d connection 0x%x\n", + tx->tx_head, skb->len, tx->qp->qp_num); + + /* + * We put the skb into the tx_ring _before_ we call post_send() + * because it's entirely possible that the completion handler will + * run before we execute anything after the post_send(). That + * means we have to make sure everything is properly recorded and + * our state is consistent before we call post_send(). + */ + tx_req = &tx->tx_ring[tx->tx_head & (ipoib_sendq_size - 1)]; + tx_req->skb = skb; + addr = ib_dma_map_single(priv->ca, skb->data, skb->len, DMA_TO_DEVICE); + if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { + ++priv->stats.tx_errors; + dev_kfree_skb_any(skb); + return; + } + + tx_req->mapping = addr; + + if (unlikely(post_send(priv, tx, tx->tx_head & (ipoib_sendq_size - 1), + addr, skb->len))) { + ipoib_warn(priv, "post_send failed\n"); + ++priv->stats.tx_errors; + ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE); + dev_kfree_skb_any(skb); + } else { + dev->trans_start = jiffies; + ++tx->tx_head; + + if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) { + ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n", + tx->qp->qp_num); + netif_stop_queue(dev); + set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); + } + } +} + +static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx, + struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned int wr_id = wc->wr_id; + struct ipoib_tx_buf *tx_req; + unsigned long flags; + + ipoib_dbg_data(priv, "cm send completion: id %d, op %d, status: %d\n", + wr_id, wc->opcode, wc->status); + + if (unlikely(wr_id >= ipoib_sendq_size)) { + ipoib_warn(priv, "cm send completion event with wrid %d (> %d)\n", + wr_id, ipoib_sendq_size); + return; + } + + tx_req = &tx->tx_ring[wr_id]; + + ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, DMA_TO_DEVICE); + + /* FIXME: is this right? Shouldn't we only increment on success? */ + ++priv->stats.tx_packets; + priv->stats.tx_bytes += tx_req->skb->len; + + dev_kfree_skb_any(tx_req->skb); + + spin_lock_irqsave(&priv->tx_lock, flags); + ++tx->tx_tail; + if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) && + tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) { + clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); + netif_wake_queue(dev); + } + + if (wc->status != IB_WC_SUCCESS && + wc->status != IB_WC_WR_FLUSH_ERR) { + struct ipoib_neigh *neigh; + + ipoib_dbg(priv, "failed cm send event " + "(status=%d, wrid=%d vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); + + spin_lock(&priv->lock); + neigh = tx->neigh; + + if (neigh) { + neigh->cm = NULL; + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + + tx->neigh = NULL; + } + + /* queue would be re-started anyway when TX is destroyed, + * but it makes sense to do it ASAP here. */ + if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) + netif_wake_queue(dev); + + if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { + list_move(&tx->list, &priv->cm.reap_list); + queue_work(ipoib_workqueue, &priv->cm.reap_task); + } + + clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags); + + spin_unlock(&priv->lock); + } + + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr) +{ + struct ipoib_cm_tx *tx = tx_ptr; + int n, i; + + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + do { + n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc); + for (i = 0; i < n; ++i) + ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i); + } while (n == IPOIB_NUM_WC); +} + +int ipoib_cm_dev_open(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int ret; + + if (!IPOIB_CM_SUPPORTED(dev->dev_addr)) + return 0; + + priv->cm.id = ib_create_cm_id(priv->ca, ipoib_cm_rx_handler, dev); + if (IS_ERR(priv->cm.id)) { + printk(KERN_WARNING "%s: failed to create CM ID\n", priv->ca->name); + return IS_ERR(priv->cm.id); + } + + ret = ib_cm_listen(priv->cm.id, cpu_to_be64(IPOIB_CM_IETF_ID | priv->qp->qp_num), + 0, NULL); + if (ret) { + printk(KERN_WARNING "%s: failed to listen on ID 0x%llx\n", priv->ca->name, + IPOIB_CM_IETF_ID | priv->qp->qp_num); + ib_destroy_cm_id(priv->cm.id); + return ret; + } + return 0; +} + +void ipoib_cm_dev_stop(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_rx *p; + unsigned long flags; + + if (!IPOIB_CM_SUPPORTED(dev->dev_addr)) + return; + + ib_destroy_cm_id(priv->cm.id); + spin_lock_irqsave(&priv->lock, flags); + while (!list_empty(&priv->cm.passive_ids)) { + p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); + list_del_init(&p->list); + spin_unlock_irqrestore(&priv->lock, flags); + ib_destroy_cm_id(p->id); + ib_destroy_qp(p->qp); + kfree(p); + spin_lock_irqsave(&priv->lock, flags); + } + spin_unlock_irqrestore(&priv->lock, flags); + + cancel_delayed_work(&priv->cm.stale_task); +} + +static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) +{ + struct ipoib_cm_tx *p = cm_id->context; + struct ipoib_dev_priv *priv = netdev_priv(p->dev); + struct ipoib_cm_data *data = event->private_data; + struct sk_buff_head skqueue; + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; + struct sk_buff *skb; + unsigned long flags; + + p->mtu = be32_to_cpu(data->mtu); + + if (p->mtu < priv->dev->mtu + IPOIB_ENCAP_LEN) { + ipoib_warn(priv, "Rejecting connection: mtu %d < device mtu %d + 4\n", + p->mtu, priv->dev->mtu); + return -EINVAL; + } + + qp_attr.qp_state = IB_QPS_RTR; + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret); + return ret; + } + + qp_attr.rq_psn = 0 /* FIXME */; + ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret); + return ret; + } + + qp_attr.qp_state = IB_QPS_RTS; + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for RTS: %d\n", ret); + return ret; + } + ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to RTS: %d\n", ret); + return ret; + } + + skb_queue_head_init(&skqueue); + + spin_lock_irqsave(&priv->lock, flags); + set_bit(IPOIB_FLAG_OPER_UP, &p->flags); + if (p->neigh) + while ((skb = __skb_dequeue(&p->neigh->queue))) + __skb_queue_tail(&skqueue, skb); + spin_unlock_irqrestore(&priv->lock, flags); + + while ((skb = __skb_dequeue(&skqueue))) { + skb->dev = p->dev; + if (dev_queue_xmit(skb)) + ipoib_warn(priv, "dev_queue_xmit failed " + "to requeue packet\n"); + } + + ret = ib_send_cm_rtu(cm_id, NULL, 0); + if (ret) { + ipoib_warn(priv, "failed to send RTU: %d\n", ret); + return ret; + } + return 0; +} + +static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_qp_init_attr attr = {}; + attr.recv_cq = priv->cq; + attr.srq = priv->cm.srq; + attr.cap.max_send_wr = ipoib_sendq_size; + attr.cap.max_send_sge = 1; + attr.sq_sig_type = IB_SIGNAL_ALL_WR; + attr.qp_type = IB_QPT_RC; + attr.send_cq = cq; + return ib_create_qp(priv->pd, &attr); +} + +static int ipoib_cm_send_req(struct net_device *dev, + struct ib_cm_id *id, struct ib_qp *qp, + u32 qpn, + struct ib_sa_path_rec *pathrec) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_data data = {}; + struct ib_cm_req_param req = {}; + + data.qpn = cpu_to_be32(priv->qp->qp_num); + data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE); + + req.primary_path = pathrec; + req.alternate_path = NULL; + req.service_id = cpu_to_be64(IPOIB_CM_IETF_ID | qpn); + req.qp_num = qp->qp_num; + req.qp_type = qp->qp_type; + req.private_data = &data; + req.private_data_len = sizeof data; + req.flow_control = 0; + + req.starting_psn = 0; /* FIXME */ + + /* + * Pick some arbitrary defaults here; we could make these + * module parameters if anyone cared about setting them. + */ + req.responder_resources = 4; + req.remote_cm_response_timeout = 20; + req.local_cm_response_timeout = 20; + req.retry_count = 0; /* RFC draft warns against retries */ + req.rnr_retry_count = 0; /* RFC draft warns against retries */ + req.max_cm_retries = 15; + req.srq = 1; + return ib_send_cm_req(id, &req); +} + +static int ipoib_cm_modify_tx_init(struct net_device *dev, + struct ib_cm_id *cm_id, struct ib_qp *qp) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; + ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index); + if (ret) { + ipoib_warn(priv, "pkey 0x%x not in cache: %d\n", priv->pkey, ret); + return ret; + } + + qp_attr.qp_state = IB_QPS_INIT; + qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE; + qp_attr.port_num = priv->port; + qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_PORT; + + ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify tx QP to INIT: %d\n", ret); + return ret; + } + return 0; +} + +static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn, + struct ib_sa_path_rec *pathrec) +{ + struct ipoib_dev_priv *priv = netdev_priv(p->dev); + int ret; + + p->tx_ring = kzalloc(ipoib_sendq_size * sizeof *p->tx_ring, + GFP_KERNEL); + if (!p->tx_ring) { + ipoib_warn(priv, "failed to allocate tx ring\n"); + ret = -ENOMEM; + goto err_tx; + } + + p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p, + ipoib_sendq_size + 1); + if (IS_ERR(p->cq)) { + ret = PTR_ERR(p->cq); + ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret); + goto err_cq; + } + + ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP); + if (ret) { + ipoib_warn(priv, "failed to request completion notification: %d\n", ret); + goto err_req_notify; + } + + p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq); + if (IS_ERR(p->qp)) { + ret = PTR_ERR(p->qp); + ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret); + goto err_qp; + } + + p->id = ib_create_cm_id(priv->ca, ipoib_cm_tx_handler, p); + if (IS_ERR(p->id)) { + ret = PTR_ERR(p->id); + ipoib_warn(priv, "failed to create tx cm id: %d\n", ret); + goto err_id; + } + + ret = ipoib_cm_modify_tx_init(p->dev, p->id, p->qp); + if (ret) { + ipoib_warn(priv, "failed to modify tx qp to rtr: %d\n", ret); + goto err_modify; + } + + ret = ipoib_cm_send_req(p->dev, p->id, p->qp, qpn, pathrec); + if (ret) { + ipoib_warn(priv, "failed to send cm req: %d\n", ret); + goto err_send_cm; + } + + ipoib_dbg(priv, "Request connection 0x%x for gid " IPOIB_GID_FMT " qpn 0x%x\n", + p->qp->qp_num, IPOIB_GID_ARG(pathrec->dgid), qpn); + + return 0; + +err_send_cm: +err_modify: + ib_destroy_cm_id(p->id); +err_id: + p->id = NULL; + ib_destroy_qp(p->qp); +err_req_notify: +err_qp: + p->qp = NULL; + ib_destroy_cq(p->cq); +err_cq: + p->cq = NULL; +err_tx: + return ret; +} + +static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) +{ + struct ipoib_dev_priv *priv = netdev_priv(p->dev); + struct ipoib_tx_buf *tx_req; + + ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n", + p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail); + + if (p->id) + ib_destroy_cm_id(p->id); + + if (p->qp) + ib_destroy_qp(p->qp); + + if (p->cq) + ib_destroy_cq(p->cq); + + if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags)) + netif_wake_queue(p->dev); + + if (p->tx_ring) { + while ((int) p->tx_tail - (int) p->tx_head < 0) { + tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)]; + ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, + DMA_TO_DEVICE); + dev_kfree_skb_any(tx_req->skb); + ++p->tx_tail; + } + + kfree(p->tx_ring); + } + + kfree(p); +} + +static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *event) +{ + struct ipoib_cm_tx *tx = cm_id->context; + struct ipoib_dev_priv *priv = netdev_priv(tx->dev); + struct net_device *dev = priv->dev; + struct ipoib_neigh *neigh; + unsigned long flags; + int ret; + + switch (event->event) { + case IB_CM_DREQ_RECEIVED: + ipoib_dbg(priv, "DREQ received.\n"); + ib_send_cm_drep(cm_id, NULL, 0); + break; + case IB_CM_REP_RECEIVED: + ipoib_dbg(priv, "REP received.\n"); + ret = ipoib_cm_rep_handler(cm_id, event); + if (ret) + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); + break; + case IB_CM_REQ_ERROR: + case IB_CM_REJ_RECEIVED: + case IB_CM_TIMEWAIT_EXIT: + ipoib_dbg(priv, "CM error %d.\n", event->event); + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + neigh = tx->neigh; + + if (neigh) { + neigh->cm = NULL; + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + + tx->neigh = NULL; + } + + if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { + list_move(&tx->list, &priv->cm.reap_list); + queue_work(ipoib_workqueue, &priv->cm.reap_task); + } + + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); + break; + default: + break; + } + + return 0; +} + +struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, + struct ipoib_neigh *neigh) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_tx *tx; + + tx = kzalloc(sizeof *tx, GFP_ATOMIC); + if (!tx) + return NULL; + + neigh->cm = tx; + tx->neigh = neigh; + tx->path = path; + tx->dev = dev; + list_add(&tx->list, &priv->cm.start_list); + set_bit(IPOIB_FLAG_INITIALIZED, &tx->flags); + queue_work(ipoib_workqueue, &priv->cm.start_task); + return tx; +} + +void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx) +{ + struct ipoib_dev_priv *priv = netdev_priv(tx->dev); + if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { + list_move(&tx->list, &priv->cm.reap_list); + queue_work(ipoib_workqueue, &priv->cm.reap_task); + ipoib_dbg(priv, "Reap connection for gid " IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(tx->neigh->dgid)); + tx->neigh = NULL; + } +} + +static void ipoib_cm_tx_start(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + cm.start_task); + struct net_device *dev = priv->dev; + struct ipoib_neigh *neigh; + struct ipoib_cm_tx *p; + unsigned long flags; + int ret; + + struct ib_sa_path_rec pathrec; + u32 qpn; + + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + while (!list_empty(&priv->cm.start_list)) { + p = list_entry(priv->cm.start_list.next, typeof(*p), list); + list_del_init(&p->list); + neigh = p->neigh; + qpn = IPOIB_QPN(neigh->neighbour->ha); + memcpy(&pathrec, &p->path->pathrec, sizeof pathrec); + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); + ret = ipoib_cm_tx_init(p, qpn, &pathrec); + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + if (ret) { + neigh = p->neigh; + if (neigh) { + neigh->cm = NULL; + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + } + list_del(&p->list); + kfree(p); + } + } + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +static void ipoib_cm_tx_reap(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + cm.reap_task); + struct ipoib_cm_tx *p; + unsigned long flags; + + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + while (!list_empty(&priv->cm.reap_list)) { + p = list_entry(priv->cm.reap_list.next, typeof(*p), list); + list_del(&p->list); + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); + ipoib_cm_tx_destroy(p); + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + } + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +static void ipoib_cm_skb_reap(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + cm.skb_task); + struct net_device *dev = priv->dev; + struct sk_buff *skb; + unsigned long flags; + + unsigned mtu = priv->mcast_mtu; + + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + while ((skb = skb_dequeue(&priv->cm.skb_queue))) { + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); + if (skb->protocol == htons(ETH_P_IP)) + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + else if (skb->protocol == htons(ETH_P_IPV6)) + icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); +#endif + dev_kfree_skb_any(skb); + spin_lock_irqsave(&priv->tx_lock, flags); + spin_lock(&priv->lock); + } + spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); +} + +void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, + unsigned int mtu) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int e = skb_queue_empty(&priv->cm.skb_queue); + + if (skb->dst) + skb->dst->ops->update_pmtu(skb->dst, mtu); + + skb_queue_tail(&priv->cm.skb_queue, skb); + if (e) + queue_work(ipoib_workqueue, &priv->cm.skb_task); +} + +static void ipoib_cm_stale_task(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv, + cm.stale_task.work); + struct ipoib_cm_rx *p; + unsigned long flags; + + spin_lock_irqsave(&priv->lock, flags); + while (!list_empty(&priv->cm.passive_ids)) { + /* List if sorted by LRU, start from tail, + * stop when we see a recently used entry */ + p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); + if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) + break; + list_del_init(&p->list); + spin_unlock_irqrestore(&priv->lock, flags); + ib_destroy_cm_id(p->id); + ib_destroy_qp(p->qp); + kfree(p); + spin_lock_irqsave(&priv->lock, flags); + } + spin_unlock_irqrestore(&priv->lock, flags); +} + + +static ssize_t show_mode(struct device *d, struct device_attribute *attr, + char *buf) +{ + struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(d)); + + if (test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags)) + return sprintf(buf, "connected\n"); + else + return sprintf(buf, "datagram\n"); +} + +static ssize_t set_mode(struct device *d, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct net_device *dev = to_net_dev(d); + struct ipoib_dev_priv *priv = netdev_priv(dev); + + /* flush paths if we switch modes so that connections are restarted */ + if (IPOIB_CM_SUPPORTED(dev->dev_addr) && !strcmp(buf, "connected\n")) { + set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); + ipoib_warn(priv, "enabling connected mode " + "will cause multicast packet drops\n"); + ipoib_flush_paths(dev); + return count; + } + + if (!strcmp(buf, "datagram\n")) { + clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); + dev->mtu = min(priv->mcast_mtu, dev->mtu); + ipoib_flush_paths(dev); + return count; + } + + return -EINVAL; +} + +static DEVICE_ATTR(mode, S_IWUGO | S_IRUGO, show_mode, set_mode); + +int ipoib_cm_add_mode_attr(struct net_device *dev) +{ + return device_create_file(&dev->dev, &dev_attr_mode); +} + +int ipoib_cm_dev_init(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_srq_init_attr srq_init_attr = { + .attr = { + .max_wr = ipoib_recvq_size, + .max_sge = IPOIB_CM_RX_SG + } + }; + int ret, i; + + INIT_LIST_HEAD(&priv->cm.passive_ids); + INIT_LIST_HEAD(&priv->cm.reap_list); + INIT_LIST_HEAD(&priv->cm.start_list); + INIT_WORK(&priv->cm.start_task, ipoib_cm_tx_start); + INIT_WORK(&priv->cm.reap_task, ipoib_cm_tx_reap); + INIT_WORK(&priv->cm.skb_task, ipoib_cm_skb_reap); + INIT_DELAYED_WORK(&priv->cm.stale_task, ipoib_cm_stale_task); + + skb_queue_head_init(&priv->cm.skb_queue); + + priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); + if (IS_ERR(priv->cm.srq)) { + ret = PTR_ERR(priv->cm.srq); + priv->cm.srq = NULL; + return ret; + } + + priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring, + GFP_KERNEL); + if (!priv->cm.srq_ring) { + printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n", + priv->ca->name, ipoib_recvq_size); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + + for (i = 0; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].lkey = priv->mr->lkey; + + priv->cm.rx_sge[0].length = IPOIB_CM_HEAD_SIZE; + for (i = 1; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].length = PAGE_SIZE; + priv->cm.rx_wr.next = NULL; + priv->cm.rx_wr.sg_list = priv->cm.rx_sge; + priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; + + for (i = 0; i < ipoib_recvq_size; ++i) { + if (ipoib_cm_alloc_rx_skb(dev, i, priv->cm.srq_ring[i].mapping)) { + ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + if (ipoib_cm_post_receive(dev, i)) { + ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -EIO; + } + } + + priv->dev->dev_addr[0] = IPOIB_FLAGS_RC; + return 0; +} + +void ipoib_cm_dev_cleanup(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int i, ret; + + if (!priv->cm.srq) + return; + + ipoib_dbg(priv, "Cleanup ipoib connected mode.\n"); + + ret = ib_destroy_srq(priv->cm.srq); + if (ret) + ipoib_warn(priv, "ib_destroy_srq failed: %d\n", ret); + + priv->cm.srq = NULL; + if (!priv->cm.srq_ring) + return; + for (i = 0; i < ipoib_recvq_size; ++i) + if (priv->cm.srq_ring[i].skb) { + ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[i].mapping); + dev_kfree_skb_any(priv->cm.srq_ring[i].skb); + priv->cm.srq_ring[i].skb = NULL; + } + kfree(priv->cm.srq_ring); + priv->cm.srq_ring = NULL; +} diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 59d9594..f2aa923 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -50,8 +50,6 @@ MODULE_PARM_DESC(data_debug_level, "Enable data path debug tracing if > 0"); #endif -#define IPOIB_OP_RECV (1ul << 31) - static DEFINE_MUTEX(pkey_mutex); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, @@ -268,10 +266,11 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; - if (netif_queue_stopped(dev) && - test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) + if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) && + priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) { + clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); netif_wake_queue(dev); + } spin_unlock_irqrestore(&priv->tx_lock, flags); if (wc->status != IB_WC_SUCCESS && @@ -283,7 +282,9 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc) { - if (wc->wr_id & IPOIB_OP_RECV) + if (wc->wr_id & IPOIB_CM_OP_SRQ) + ipoib_cm_handle_rx_wc(dev, wc); + else if (wc->wr_id & IPOIB_OP_RECV) ipoib_ib_handle_rx_wc(dev, wc); else ipoib_ib_handle_tx_wc(dev, wc); @@ -327,12 +328,12 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_tx_buf *tx_req; u64 addr; - if (unlikely(skb->len > dev->mtu + INFINIBAND_ALEN)) { + if (unlikely(skb->len > priv->mcast_mtu + INFINIBAND_ALEN)) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", - skb->len, dev->mtu + INFINIBAND_ALEN); + skb->len, priv->mcast_mtu + INFINIBAND_ALEN); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; - dev_kfree_skb_any(skb); + ipoib_cm_skb_too_long(dev, skb, priv->mcast_mtu); return; } @@ -372,6 +373,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); netif_stop_queue(dev); + set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); } } } @@ -424,6 +426,13 @@ int ipoib_ib_dev_open(struct net_device *dev) return -1; } + ret = ipoib_cm_dev_open(dev); + if (ret) { + ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); + ipoib_ib_dev_stop(dev); + return -1; + } + clear_bit(IPOIB_STOP_REAPER, &priv->flags); queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task, HZ); @@ -509,6 +518,8 @@ int ipoib_ib_dev_stop(struct net_device *dev) clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); + ipoib_cm_dev_stop(dev); + /* * Move our QP to the error state and then reinitialize in * when all work requests have completed or have been flushed. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index af5ee2e..18d27fd 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -49,8 +49,6 @@ #include -#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff) - MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("IP-over-InfiniBand net driver"); MODULE_LICENSE("Dual BSD/GPL"); @@ -145,6 +143,8 @@ static int ipoib_stop(struct net_device *dev) netif_stop_queue(dev); + clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); + /* * Now flush workqueue to make sure a scheduled task doesn't * bring our internal state back up. @@ -178,8 +178,18 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu) { struct ipoib_dev_priv *priv = netdev_priv(dev); - if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) + /* dev->mtu > 2K ==> connected mode */ + if (ipoib_cm_admin_enabled(dev) && new_mtu <= IPOIB_CM_MTU) { + if (new_mtu > priv->mcast_mtu) + ipoib_warn(priv, "mtu > %d will cause multicast packet drops.\n", + priv->mcast_mtu); + dev->mtu = new_mtu; + return 0; + } + + if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) { return -EINVAL; + } priv->admin_mtu = new_mtu; @@ -414,6 +424,20 @@ static void path_rec_completion(int status, memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, sizeof(union ib_gid)); + if (ipoib_cm_enabled(dev, neigh->neighbour)) { + if (!ipoib_cm_get(neigh)) + ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, + path, + neigh)); + if (!ipoib_cm_get(neigh)) { + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + continue; + } + } + while ((skb = __skb_dequeue(&neigh->queue))) __skb_queue_tail(&skqueue, skb); } @@ -520,7 +544,25 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, sizeof(union ib_gid)); - ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha)); + if (ipoib_cm_enabled(dev, neigh->neighbour)) { + if (!ipoib_cm_get(neigh)) + ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, path, neigh)); + if (!ipoib_cm_get(neigh)) { + list_del(&neigh->list); + if (neigh->ah) + ipoib_put_ah(neigh->ah); + ipoib_neigh_free(dev, neigh); + goto err_drop; + } + if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) + __skb_queue_tail(&neigh->queue, skb); + else { + ipoib_warn(priv, "queue length limit %d. Packet drop.\n", + skb_queue_len(&neigh->queue)); + goto err_drop; + } + } else + ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha)); } else { neigh->ah = NULL; @@ -538,6 +580,7 @@ err_list: err_path: ipoib_neigh_free(dev, neigh); +err_drop: ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -640,7 +683,12 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) neigh = *to_ipoib_neigh(skb->dst->neighbour); - if (likely(neigh->ah)) { + if (ipoib_cm_get(neigh)) { + if (ipoib_cm_up(neigh)) { + ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); + goto out; + } + } else if (neigh->ah) { if (unlikely(memcmp(&neigh->dgid.raw, skb->dst->neighbour->ha + 4, sizeof(union ib_gid)))) { @@ -805,6 +853,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) neigh->neighbour = neighbour; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); + ipoib_cm_set(neigh, NULL); return neigh; } @@ -818,6 +867,8 @@ void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh) ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); } + if (ipoib_cm_get(neigh)) + ipoib_cm_destroy_tx(ipoib_cm_get(neigh)); kfree(neigh); } @@ -1080,6 +1131,8 @@ static struct net_device *ipoib_add_port(const char *format, ipoib_create_debug_files(priv->dev); + if (ipoib_cm_add_mode_attr(priv->dev)) + goto sysfs_failed; if (ipoib_add_pkey_attr(priv->dev)) goto sysfs_failed; if (device_create_file(&priv->dev->dev, &dev_attr_create_child)) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index b04b72c..fea737f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -597,7 +597,9 @@ void ipoib_mcast_join_task(struct work_struct *work) priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) - IPOIB_ENCAP_LEN; - dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); + + if (!ipoib_cm_admin_enabled(dev)) + dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 7b717c6..3cb551b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -168,35 +168,41 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) .qp_type = IB_QPT_UD }; + int ret, size; + priv->pd = ib_alloc_pd(priv->ca); if (IS_ERR(priv->pd)) { printk(KERN_WARNING "%s: failed to allocate PD\n", ca->name); return -ENODEV; } - priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, - ipoib_sendq_size + ipoib_recvq_size + 1); + priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(priv->mr)) { + printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name); + goto out_free_pd; + } + + size = ipoib_sendq_size + ipoib_recvq_size + 1; + ret = ipoib_cm_dev_init(dev); + if (!ret) + size += ipoib_recvq_size; + + priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size); if (IS_ERR(priv->cq)) { printk(KERN_WARNING "%s: failed to create CQ\n", ca->name); - goto out_free_pd; + goto out_free_mr; } if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP)) goto out_free_cq; - priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE); - if (IS_ERR(priv->mr)) { - printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name); - goto out_free_cq; - } - init_attr.send_cq = priv->cq; init_attr.recv_cq = priv->cq, priv->qp = ib_create_qp(priv->pd, &init_attr); if (IS_ERR(priv->qp)) { printk(KERN_WARNING "%s: failed to create QP\n", ca->name); - goto out_free_mr; + goto out_free_cq; } priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff; @@ -212,12 +218,12 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) return 0; -out_free_mr: - ib_dereg_mr(priv->mr); - out_free_cq: ib_destroy_cq(priv->cq); +out_free_mr: + ib_dereg_mr(priv->mr); + out_free_pd: ib_dealloc_pd(priv->pd); return -ENODEV; @@ -235,12 +241,14 @@ void ipoib_transport_dev_cleanup(struct net_device *dev) clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); } - if (ib_dereg_mr(priv->mr)) - ipoib_warn(priv, "ib_dereg_mr failed\n"); - if (ib_destroy_cq(priv->cq)) ipoib_warn(priv, "ib_cq_destroy failed\n"); + ipoib_cm_dev_cleanup(dev); + + if (ib_dereg_mr(priv->mr)) + ipoib_warn(priv, "ib_dereg_mr failed\n"); + if (ib_dealloc_pd(priv->pd)) ipoib_warn(priv, "ib_dealloc_pd failed\n"); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 085eafe..6762988 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -115,6 +115,8 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) ipoib_create_debug_files(priv->dev); + if (ipoib_cm_add_mode_attr(priv->dev)) + goto sysfs_failed; if (ipoib_add_pkey_attr(priv->dev)) goto sysfs_failed; From mst at mellanox.co.il Thu Feb 8 14:54:50 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 00:54:50 +0200 Subject: [openib-general] [PATCH] IB/ipoib_cm: fix up issues from code review In-Reply-To: References: Message-ID: <20070208225450.GJ6560@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] IB/ipoib_cm: fix up issues from code review > > OK, I pulled this in and fixed it to build with the netdevice > class_device-ectomy that just went upstream, and pushed it out on my > for-2.6.21 branch like this. Thanks! -- MST From sashak at voltaire.com Thu Feb 8 15:14:12 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 9 Feb 2007 01:14:12 +0200 Subject: [openib-general] [PATCH TRIVIAL] opensm: remove #ifdef __WIN__ in not shared file. Message-ID: <20070208231412.GA22807@sashak.voltaire.com> opensm/main.c is not shared by win OpenSM, and #ifdef __WIN__ is not needed here. Signed-off-by: Sasha Khapyorsky --- osm/opensm/main.c | 5 ----- 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/osm/opensm/main.c b/osm/opensm/main.c index 69c940c..fa09360 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -65,10 +65,6 @@ static volatile unsigned int osm_usr1_flag = 0; #define GUID_ARRAY_SIZE 64 #define INVALID_GUID (0xFFFFFFFFFFFFFFFFULL) -#ifdef __WIN__ -#define block_signals() -#define setup_signals() -#else static void mark_exit_flag(int signum) { if(!osm_exit_flag) @@ -119,7 +115,6 @@ static void setup_signals() #endif pthread_sigmask(SIG_SETMASK, &saved_sigset, NULL); } -#endif /* __WIN__ */ /********************************************************************** **********************************************************************/ -- 1.5.0.rc2.g11a3 From sashak at voltaire.com Thu Feb 8 15:16:18 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 9 Feb 2007 01:16:18 +0200 Subject: [openib-general] [PATCH TRIVIAL] osmtest: use more descriptive constant names Message-ID: <20070208231618.GB22807@sashak.voltaire.com> Use more descriptive constant names for osmtest flows. Signed-off-by: Sasha Khapyorsky --- osm/osmtest/include/osmtest.h | 12 ++++++++++++ osm/osmtest/main.c | 20 ++++++++++---------- osm/osmtest/osmtest.c | 22 +++++++++++++--------- 3 files changed, 35 insertions(+), 19 deletions(-) diff --git a/osm/osmtest/include/osmtest.h b/osm/osmtest/include/osmtest.h index 39afbaf..13131dd 100644 --- a/osm/osmtest/include/osmtest.h +++ b/osm/osmtest/include/osmtest.h @@ -56,6 +56,18 @@ #include "osmtest_base.h" #include "osmtest_subnet.h" +enum OSMT_FLOWS { + OSMT_FLOW_ALL = 0, + OSMT_FLOW_CREATE_INVENTORY, + OSMT_FLOW_VALIDATE_INVENTORY, + OSMT_FLOW_SERVICE_REGISTRATION, + OSMT_FLOW_EVENT_FORWARDING, + OSMT_FLOW_STRESS_SA, + OSMT_FLOW_MULTICAST, + OSMT_FLOW_QOS, + OSMT_FLOW_TRAP, +}; + /****s* OpenSM: Subnet/osmtest_opt_t * NAME * osmtest_opt_t diff --git a/osm/osmtest/main.c b/osm/osmtest/main.c index ca5805b..5f402b7 100644 --- a/osm/osmtest/main.c +++ b/osm/osmtest/main.c @@ -354,7 +354,7 @@ main( int argc, opt.create = FALSE; opt.mmode = 1; opt.ignore_path_records = FALSE; /* Do path Records too */ - opt.flow = 0; /* run all validation tests */ + opt.flow = OSMT_FLOW_ALL; /* run all validation tests */ strcpy(flow_name, "All Validations"); strcpy( opt.file_name, "osmtest.dat" ); @@ -396,31 +396,31 @@ main( int argc, if (!strcmp("c", optarg)) { strcpy(flow_name, "Create Inventory"); - opt.flow = 1; + opt.flow = OSMT_FLOW_CREATE_INVENTORY; } else if (!strcmp("v", optarg)) { strcpy(flow_name, "Validate Inventory"); - opt.flow = 2; + opt.flow = OSMT_FLOW_VALIDATE_INVENTORY; } else if (!strcmp("s", optarg)) { strcpy(flow_name, "Services Registration"); - opt.flow = 3; + opt.flow = OSMT_FLOW_SERVICE_REGISTRATION; } else if (!strcmp("e", optarg)) { strcpy(flow_name, "Event Forwarding"); - opt.flow = 4; + opt.flow = OSMT_FLOW_EVENT_FORWARDING; } else if (!strcmp("f", optarg)) { strcpy(flow_name, "Stress SA"); - opt.flow = 5; + opt.flow = OSMT_FLOW_STRESS_SA; } else if (!strcmp("m", optarg)) { strcpy(flow_name, "Multicast"); - opt.flow = 6; + opt.flow = OSMT_FLOW_MULTICAST; } else if (!strcmp("q", optarg)) { strcpy(flow_name, "QoS: VLArb and SLtoVL"); - opt.flow = 7; + opt.flow = OSMT_FLOW_QOS; } else if (!strcmp("t", optarg)) { strcpy(flow_name, "Trap 64/65"); - opt.flow = 8; + opt.flow = OSMT_FLOW_TRAP; } else if (!strcmp("a", optarg)) { strcpy(flow_name, "All Validations"); - opt.flow = 0; + opt.flow = OSMT_FLOW_ALL; } else { printf( "\nError: unknown flow %s\n",flow_name); exit(2); diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c index 3c16a6f..ce185ec 100644 --- a/osm/osmtest/osmtest.c +++ b/osm/osmtest/osmtest.c @@ -7948,7 +7948,7 @@ osmtest_run( IN osmtest_t * const p_osmt ) goto Exit; } - if( p_osmt->opt.flow == 1 ) + if( p_osmt->opt.flow == OSMT_FLOW_CREATE_INVENTORY ) { /* * Creating an inventory file with all nodes, ports and paths @@ -7965,7 +7965,7 @@ osmtest_run( IN osmtest_t * const p_osmt ) } else { - if( p_osmt->opt.flow == 5 ) + if( p_osmt->opt.flow == OSMT_FLOW_STRESS_SA ) { /* * Stress SA - flood the it with queries @@ -8030,7 +8030,8 @@ osmtest_run( IN osmtest_t * const p_osmt ) /* * Run normal validition tests. */ - if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 2) + if (p_osmt->opt.flow == OSMT_FLOW_ALL || + p_osmt->opt.flow == OSMT_FLOW_VALIDATE_INVENTORY) { /* * Only validate the given inventory file @@ -8056,7 +8057,7 @@ osmtest_run( IN osmtest_t * const p_osmt ) } } - if (p_osmt->opt.flow == 0) + if (p_osmt->opt.flow == OSMT_FLOW_ALL) { status = osmtest_wrong_sm_key_ignored( p_osmt ); if( status != IB_SUCCESS ) @@ -8069,7 +8070,8 @@ osmtest_run( IN osmtest_t * const p_osmt ) } } - if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 3) + if (p_osmt->opt.flow == OSMT_FLOW_ALL || + p_osmt->opt.flow == OSMT_FLOW_SERVICE_REGISTRATION) { /* * run service registration, deregistration, and lease test @@ -8085,7 +8087,8 @@ osmtest_run( IN osmtest_t * const p_osmt ) } } - if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 4) + if (p_osmt->opt.flow == OSMT_FLOW_ALL || + p_osmt->opt.flow == OSMT_FLOW_EVENT_FORWARDING) { /* * Run event forwarding test @@ -8110,7 +8113,7 @@ osmtest_run( IN osmtest_t * const p_osmt ) #endif } - if (p_osmt->opt.flow == 7) + if (p_osmt->opt.flow == OSMT_FLOW_QOS) { /* * QoS info: dump VLArb and SLtoVL tables. @@ -8138,7 +8141,7 @@ osmtest_run( IN osmtest_t * const p_osmt ) } } - if (p_osmt->opt.flow == 8) + if (p_osmt->opt.flow == OSMT_FLOW_TRAP) { /* * Run trap 64/65 flow (this flow requires running of external tool) @@ -8162,7 +8165,8 @@ osmtest_run( IN osmtest_t * const p_osmt ) #endif } - if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 6) + if (p_osmt->opt.flow == OSMT_FLOW_ALL || + p_osmt->opt.flow == OSMT_FLOW_MULTICAST) { /* * Multicast flow -- 1.5.0.rc2.g11a3 From swise at opengridcomputing.com Thu Feb 8 15:11:21 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 17:11:21 -0600 Subject: [openib-general] dapl broken for iWARP In-Reply-To: <1170885460.31481.0.camel@stevo-desktop> References: <1170878543.30334.52.camel@stevo-desktop> <1170885460.31481.0.camel@stevo-desktop> Message-ID: <1170976281.3049.122.camel@stevo-desktop> On Wed, 2007-02-07 at 15:57 -0600, Steve Wise wrote: > On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > Arlin, > > > > The OFED dapl code is assuming the responder_resources and > > initiator_depth passed up on a connection request event are from the > > remote peer. This doesn't happen for iWARP. In the current iWARP > > specifications, its up to the application to exchange this information > > somehow. So these are defaulting to 0 on the server side of any dapl > > connection over iWARP. > > > > This is a fairly recent change, I think. We need to come up with some > > way to deal with this for OFED 1.2 IMO. > > > > The IWCM could set these to the device max values for instance. > > Steve. > There is a slight problem with all this. There are no device attributes currently for ORD and IRD. The ammasso driver maps these to max_qp_rd_atom (IRD) and max_qp_init_rd_atom(ORD). But this is screwy. We need new attribute for these. For OFED 1.2, I think I should just have the IWCM set them to 8. The only RNIC in ofed is cxgb3 and it supports 8... Steve. From mshefty at ichips.intel.com Thu Feb 8 15:43:24 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Feb 2007 15:43:24 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45CB6A8F.2030705@ichips.intel.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> Message-ID: <45CBB59C.4010709@ichips.intel.com> > Looking at the problem more, I think that the issue extends to the remote port > LID as well. My expectation with a local path record query is that the SLID is > the local port, and the DLID is the local router. This should be sufficient for > one-way UD traffic, but for connected traffic we still need to discover the > remote router and remote port LIDs. Given a path record query for: SGID - local DGID - remote What would be the SLID and DLID? And if the query is reversed, such that: SGID - remote DGID - local Are the SLID/DLID values simply reversed? What if the DGID in the second case were a multicast GID? What does the SLID become in this case? - Sean From halr at voltaire.com Thu Feb 8 15:18:03 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Feb 2007 18:18:03 -0500 Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when appropriate for unicast packets In-Reply-To: <45CB9DDA.8020303@ichips.intel.com> References: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com> <1170967182.31538.96962.camel@hal.voltaire.com> <45CB9DDA.8020303@ichips.intel.com> Message-ID: <1170976680.31538.106389.camel@hal.voltaire.com> On Thu, 2007-02-08 at 17:02, Sean Hefty wrote: > >>This requires that the passive side be able to issue path record queries, but I > >>think that it could work for static routes. A point was made to me that the > >>remote side could be a TCA without query capabilities. > > > > Are you referring to SA query capabilities ? Would such a device just be > > expected to work without change in an IB routed environment anyway ? > > Yes I was referring to SA query capability, such as a path record query. Since > the spec requires that the path information be provided by the active side, I > think that such a device could work without change. (But it does mean that the > active side has to provide some way to obtain the necessary information to put > into a CM REQ, plus know what the remote router will do.) It also means it needs to be able to put a GRH in on the sending side. Not sure that is "free" in implementations as you have been noting for OpenIB recently. -- Hal > - Sean From rdreier at cisco.com Thu Feb 8 15:50:42 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Feb 2007 15:50:42 -0800 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <1170860483.11491.21.camel@trinity.ogc.int> (Tom Tucker's message of "Wed, 07 Feb 2007 09:01:23 -0600") References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> <1170858272.14381.1.camel@stevo-desktop> <1170860483.11491.21.camel@trinity.ogc.int> Message-ID: Hmm, Steve likes it, Tom doesn't. Can you guys arm wrestle or something and tell me if this patch is correct or not? - R. From rdreier at cisco.com Thu Feb 8 15:56:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Feb 2007 15:56:28 -0800 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <45CB3537.8060508@voltaire.com> (Or Gerlitz's message of "Thu, 08 Feb 2007 16:35:35 +0200") References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> Message-ID: I merged the "increment port number" and "remove redundant '_wq'" patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland I plan to review to multicast stuff next week and I hope to merge it for 2.6.21. Or, have you or anyone else at Voltaire read over the code in addition to using it? Do you see anything that should be cleaned up? - R. From rdreier at cisco.com Thu Feb 8 16:26:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Feb 2007 16:26:25 -0800 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int> (Steve Wise's message of "Thu, 08 Feb 2007 14:26:34 -0600") References: <20070208202634.4382.15287.stgit@dell3.ogc.int> Message-ID: OK, I've pulled the cxgb3 stuff into a single commit in my for-2.6.21 branch. I took the liberty of cleaning up some sparse warnings, etc. There's still a few other obvious things to fix up: drivers/infiniband/hw/cxgb3/iwch_ev.c:102:6: warning: symbol 'iwch_ev_disp atch' was not declared. Should it be static? Rather than putting an extern in iwch.c, please put a proper definition in an appropriate header file included from iwch.c. Also I agree with MST, I would like to see the core/ subdirectory die completely. - R. From swise at opengridcomputing.com Thu Feb 8 16:39:10 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 18:39:10 -0600 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: References: <20070208202634.4382.15287.stgit@dell3.ogc.int> Message-ID: <1170981550.3049.130.camel@stevo-desktop> On Thu, 2007-02-08 at 16:26 -0800, Roland Dreier wrote: > OK, I've pulled the cxgb3 stuff into a single commit in my for-2.6.21 > branch. I took the liberty of cleaning up some sparse warnings, etc. > There's still a few other obvious things to fix up: > > drivers/infiniband/hw/cxgb3/iwch_ev.c:102:6: warning: symbol 'iwch_ev_disp > atch' was not declared. Should it be static? > > Rather than putting an extern in iwch.c, please put a proper > definition in an appropriate header file included from iwch.c. > ok. > Also I agree with MST, I would like to see the core/ subdirectory die > completely. > ok ok...I'll kill the subdir... From rdreier at cisco.com Thu Feb 8 16:40:09 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Feb 2007 16:40:09 -0800 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: (Roland Dreier's message of "Thu, 08 Feb 2007 16:26:25 -0800") References: <20070208202634.4382.15287.stgit@dell3.ogc.int> Message-ID: Oh yeah -- Steve, please keep sending cleanup patches based on my tree now. I'm planning on asking Linus to merge what's in for-2.6.21 in the next couple of days, but there's still more than a week before the merge window closes, and even after the merge window closes I'll still accept fixes/cleanups for stuff already upstream. And here's what I have pending in for-2.6.21 so far: Ahmed S. Darwish (1): IB/core: Use ARRAY_SIZE macro for mandatory_table Akinobu Mita (1): IB/ehca: Fix memleak on module unloading David Howells (1): IB/mthca: Work around gcc bug on sparc64 Michael S. Tsirkin (1): IPoIB: Connected mode experimental support Roland Dreier (1): IB/mthca: Use correct structure size in call to memset() Sean Hefty (2): RDMA/cma: Increment port number after close to avoid re-use IB: Remove redundant "_wq" from workqueue names Steve Wise (1): RDMA/cxgb3: Add driver for Chelsio T3 Rnic From swise at opengridcomputing.com Thu Feb 8 16:40:53 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 08 Feb 2007 18:40:53 -0600 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug. In-Reply-To: <1170805153.19662.155.camel@stevo-desktop> References: <1170799320.19662.124.camel@stevo-desktop> <20070206221232.GO24372@mellanox.co.il> <1170805153.19662.155.camel@stevo-desktop> Message-ID: <1170981653.3049.133.camel@stevo-desktop> Michael, >From your email, it sounded like you would regression test this. Is it ready to pull in? Thanks! Steve. On Tue, 2007-02-06 at 17:39 -0600, Steve Wise wrote: > Here it is (only tested with rping over iWARP on sles9sp3): > > ---------------- > > > xxx_ip_dev_find() must use scope HOST. > > From: Steve Wise > > Function xxx_ip_dev_find(RT_SCOPE_LINK) returns the wrong interface on > some kernels. The correct scope is RT_SCOPE_HOST. > > Signed-off-by: Steve Wise > --- > > .../backport/2.6.11/include/linux/inetdevice.h | 2 +- > .../backport/2.6.11_FC4/include/linux/inetdevice.h | 2 +- > .../backport/2.6.12/include/linux/inetdevice.h | 2 +- > .../backport/2.6.13/include/linux/inetdevice.h | 2 +- > .../2.6.13_suse10_0_u/include/linux/inetdevice.h | 2 +- > .../backport/2.6.14/include/linux/inetdevice.h | 2 +- > .../backport/2.6.15/include/linux/inetdevice.h | 2 +- > .../2.6.15_ubuntu606/include/linux/inetdevice.h | 2 +- > .../backport/2.6.16/include/linux/inetdevice.h | 2 +- > .../backport/2.6.17/include/linux/inetdevice.h | 2 +- > .../2.6.5_sles9_sp3/include/linux/inetdevice.h | 2 +- > .../backport/2.6.9_U2/include/linux/inetdevice.h | 2 +- > .../backport/2.6.9_U3/include/linux/inetdevice.h | 2 +- > .../backport/2.6.9_U4/include/linux/inetdevice.h | 2 +- > 14 files changed, 14 insertions(+), 14 deletions(-) > > diff --git a/kernel_addons/backport/2.6.11/include/linux/inetdevice.h b/kernel_addons/backport/2.6.11/include/linux/inetdevice.h > index 7244487..2d3c50f 100644 > --- a/kernel_addons/backport/2.6.11/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.11/include/linux/inetdevice.h > @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h b/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h > index 7244487..2d3c50f 100644 > --- a/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h > @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.12/include/linux/inetdevice.h b/kernel_addons/backport/2.6.12/include/linux/inetdevice.h > index 7244487..2d3c50f 100644 > --- a/kernel_addons/backport/2.6.12/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.12/include/linux/inetdevice.h > @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.13/include/linux/inetdevice.h b/kernel_addons/backport/2.6.13/include/linux/inetdevice.h > index 7a32313..fd0aa36 100644 > --- a/kernel_addons/backport/2.6.13/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.13/include/linux/inetdevice.h > @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h b/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h > index 7a32313..fd0aa36 100644 > --- a/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h > @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.14/include/linux/inetdevice.h b/kernel_addons/backport/2.6.14/include/linux/inetdevice.h > index 7a32313..fd0aa36 100644 > --- a/kernel_addons/backport/2.6.14/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.14/include/linux/inetdevice.h > @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.15/include/linux/inetdevice.h b/kernel_addons/backport/2.6.15/include/linux/inetdevice.h > index 7a32313..fd0aa36 100644 > --- a/kernel_addons/backport/2.6.15/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.15/include/linux/inetdevice.h > @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h b/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h > index 7a32313..fd0aa36 100644 > --- a/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h > @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.16/include/linux/inetdevice.h b/kernel_addons/backport/2.6.16/include/linux/inetdevice.h > index 7a32313..fd0aa36 100644 > --- a/kernel_addons/backport/2.6.16/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.16/include/linux/inetdevice.h > @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.17/include/linux/inetdevice.h b/kernel_addons/backport/2.6.17/include/linux/inetdevice.h > index 7a32313..fd0aa36 100644 > --- a/kernel_addons/backport/2.6.17/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.17/include/linux/inetdevice.h > @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h b/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h > index 7244487..2d3c50f 100644 > --- a/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h > @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h > index 7244487..2d3c50f 100644 > --- a/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h > @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h > index 7244487..2d3c50f 100644 > --- a/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h > @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > diff --git a/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h > index 7244487..2d3c50f 100644 > --- a/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h > +++ b/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h > @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_ > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > - ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); > + ip = inet_select_addr(dev, 0, RT_SCOPE_HOST); > if (ip == addr) { > dev_hold(dev); > break; > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From michael.arndt at informatik.tu-chemnitz.de Thu Feb 8 16:39:06 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Fri, 9 Feb 2007 01:39:06 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> Message-ID: <001e01c74be2$b4889310$21606d86@one7> Hi, I think I have found the problem. It is the timeout parameter on the umad_send function. How exactly I have to handle this parameter? It seems to be that it shoult be zero if there is no response exspected. But what value should it be if there is a response expected. In a test I used zero for SubnGetResp packets because there shouldn't be more packets and 100 for SubnGet or SubnSet. But if the router is stressed the umad_send function broke down and give an error -5 every thiertieth packet. Any idea or advice? Thanks Michael From bsharp at NetEffect.com Thu Feb 8 17:19:46 2007 From: bsharp at NetEffect.com (Bob Sharp) Date: Thu, 8 Feb 2007 19:19:46 -0600 Subject: [openib-general] dapl broken for iWARP Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC06AD1A81@venom2> > For OFED 1.2, I think I should just have the IWCM set them to 8. The > only RNIC in ofed is cxgb3 and it supports 8... > Steve, If we can create the new attributes for RNICs, it seems like would be better to agree on the mapping of IRD/ORD to IB parameters than it would be to limit these parameters to 8. That number seems a bit low. Bob From swise at opengridcomputing.com Thu Feb 8 17:41:22 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Thu, 08 Feb 2007 19:41:22 -0600 Subject: [openib-general] dapl broken for iWARP In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC06AD1A81@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC06AD1A81@venom2> Message-ID: <1170985282.25474.2.camel@linux-q667.site> On Thu, 2007-02-08 at 19:19 -0600, Bob Sharp wrote: > > For OFED 1.2, I think I should just have the IWCM set them to 8. The > > only RNIC in ofed is cxgb3 and it supports 8... > > > Steve, > > If we can create the new attributes for RNICs, it seems like would be > better to agree on the mapping of IRD/ORD to IB parameters than it would > be to limit these parameters to 8. That number seems a bit low. > Hey Bob, This is for the OFED 1.2 release only and its too late to be adding new features methinks since we're at feature freeze. For the upstream kernel (ie 2.6.21) we can define the attributes. > Bob From bsharp at NetEffect.com Thu Feb 8 17:51:34 2007 From: bsharp at NetEffect.com (Bob Sharp) Date: Thu, 8 Feb 2007 19:51:34 -0600 Subject: [openib-general] dapl broken for iWARP References: <5E701717F2B2ED4EA60F87C8AA57B7CC06AD1A81@venom2> <1170985282.25474.2.camel@linux-q667.site> Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC01E5DDD5@venom2> > > > For OFED 1.2, I think I should just have the IWCM set them to 8. The > > > only RNIC in ofed is cxgb3 and it supports 8... > > > > > Steve, > > > > If we can't create the new attributes for RNICs, it seems like it would be > > better to agree on the mapping of IRD/ORD to IB parameters than it would > > be to limit these parameters to 8. That number seems a bit low. > > > > Hey Bob, > > This is for the OFED 1.2 release only and its too late to be adding new > features methinks since we're at feature freeze. For the upstream > kernel (ie 2.6.21) we can define the attributes. > I figured as much. So lets just go with your Ammasso mapping of IRD/ORD to the IB parameters that RNICs don't use for now. From krkumar2 at in.ibm.com Thu Feb 8 19:29:20 2007 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Fri, 9 Feb 2007 08:59:20 +0530 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: Message-ID: Roland, Yes, we will do some "arm wrestling" today :) thanks, KK Roland Dreier wrote on 02/09/2007 05:20:42 AM: > Hmm, Steve likes it, Tom doesn't. Can you guys arm wrestle or > something and tell me if this patch is correct or not? > > - R. From halr at voltaire.com Thu Feb 8 20:15:36 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Feb 2007 23:15:36 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <001e01c74be2$b4889310$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> Message-ID: <1170994529.31538.124584.camel@hal.voltaire.com> On Thu, 2007-02-08 at 19:39, Michael Arndt wrote: > Hi, > > I think I have found the problem. It is the timeout parameter on the > umad_send function. How exactly I have to handle this parameter? It seems to > be that it shoult be zero if there is no response exspected. But what value > should it be if there is a response expected. In a test I used zero for > SubnGetResp packets because there shouldn't be more packets and 100 for > SubnGet or SubnSet. But if the router is stressed the umad_send function > broke down and give an error -5 every thiertieth packet. Any idea or advice? umad_send takes the timeout in msec. 100 msec is too short. Try something on the order of seconds. Note also that negative 'timeout_ms' value makes the kernel wait for the reply forever. -- Hal > Thanks Michael From rdreier at cisco.com Thu Feb 8 20:23:18 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Feb 2007 20:23:18 -0800 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: (Krishna Kumar2's message of "Fri, 9 Feb 2007 08:59:20 +0530") References: Message-ID: BTW, while looking at iwcm.c, I noticed the following highly dubious code for the first time: static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) { int ret = 0; BUG_ON(atomic_read(&cm_id_priv->refcount)==0); if (atomic_dec_and_test(&cm_id_priv->refcount)) { BUG_ON(!list_empty(&cm_id_priv->work_list)); if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)); ret = 1; } complete(&cm_id_priv->destroy_comp); } return ret; } The test of waitqueue_active on destroy_comp.wait looks really bad for two reasons: first, it is relying on an internal implementation detail of struct completion that really shouldn't be used by generic code. And second, it seems to me that this doesn't even work right, since there is a race something like the following: iw_destroy_cm_id(): destroy_cm_id(cm_id); // still 1 ref left cm_work_handler(): if (iwcm_deref_id()) // drop last ref return; // no one waiting yet, doesn't // return, but destroy_comp is // signaled wait_for_completion(&cm_id_priv->destroy_comp); // destroy_comp is signaled, proceed kfree(cm_id_priv); // continue using cm_id_priv // OOPS I don't understand this code well enough for the fix to be obvious. - R. From jgunthorpe at obsidianresearch.com Thu Feb 8 20:37:27 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Thu, 8 Feb 2007 21:37:27 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45CBB59C.4010709@ichips.intel.com> References: <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> Message-ID: <20070209043727.GN11411@obsidianresearch.com> On Thu, Feb 08, 2007 at 03:43:24PM -0800, Sean Hefty wrote: > > Looking at the problem more, I think that the issue extends to the remote port > > LID as well. My expectation with a local path record query is that the SLID is > > the local port, and the DLID is the local router. This should be sufficient for > > one-way UD traffic, but for connected traffic we still need to discover the > > remote router and remote port LIDs. > > Given a path record query for: > > SGID - local > DGID - remote > > What would be the SLID and DLID? > > And if the query is reversed, such that: > > SGID - remote > DGID - local > > Are the SLID/DLID values simply reversed? I have a follow up question to this.. With CM how is the SL for each side determined? I'm looking through the code here and it looks like the SL of the active side is passed in the REQ to the passive side (ie both sides are the same) But cma_query_ib_route does not set the reversible bit when it asks for the path. If you don't set the reversible bit isn't it necessary to make a 2nd path query to get the reverse path's SL? [Path responses without the reversible bit set are actually simplex paths and reversing them probably will run into SL2VL mapping tables that cause the packets to be dropped ie o7-8] Infact, to get an optimal path aren't 3 path records required: 1) A reversible path from active to passive from the CM GMPs (required by C12-5.1.3) 2) An optimal non-reversible path from active to passive 3) An optimal non-reversible path from passive to active Jason From krkumar2 at in.ibm.com Thu Feb 8 21:01:23 2007 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Fri, 9 Feb 2007 10:31:23 +0530 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: Message-ID: Regarding the race - can this and the other problem (of using internal data-structure) both be taken care of by changing iw_deref_id to return 1 if atomic_dec_and_test finds the last reference ? Then the waitqueue_active() code can be removed, just do the completion (reaching here implies that someone is in the middle of iw_destroy_cm_id). The question is what is the issue if we return 1 even if no one is waiting in iw_destroy_cm_id() and which results in cm_work_handler() returning out ? thanks, - KK > BTW, while looking at iwcm.c, I noticed the following highly dubious > code for the first time: > > static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) > { > int ret = 0; > > BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > if (atomic_dec_and_test(&cm_id_priv->refcount)) { > BUG_ON(!list_empty(&cm_id_priv->work_list)); > if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { > BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); > BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, > &cm_id_priv->flags)); > ret = 1; > } > complete(&cm_id_priv->destroy_comp); > } > > return ret; > } > > The test of waitqueue_active on destroy_comp.wait looks really bad for > two reasons: first, it is relying on an internal implementation detail > of struct completion that really shouldn't be used by generic code. > And second, it seems to me that this doesn't even work right, since > there is a race something like the following: > > iw_destroy_cm_id(): > destroy_cm_id(cm_id); // still 1 ref left > > cm_work_handler(): > if (iwcm_deref_id()) // drop last ref > return; > // no one waiting yet, doesn't > // return, but destroy_comp is > // signaled > > wait_for_completion(&cm_id_priv->destroy_comp); > // destroy_comp is signaled, proceed > kfree(cm_id_priv); > > // continue using cm_id_priv > // OOPS > > I don't understand this code well enough for the fix to be obvious. > > - R. From mst at mellanox.co.il Thu Feb 8 22:51:49 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 08:51:49 +0200 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: <1170981550.3049.130.camel@stevo-desktop> References: <1170981550.3049.130.camel@stevo-desktop> Message-ID: <20070209065149.GL6560@mellanox.co.il> > > Also I agree with MST, I would like to see the core/ subdirectory die > > completely. > > > > ok ok...I'll kill the subdir... It's not just the directory BTW. Stuff like building completions in t3_cqe format and then reformatting to ib_wc seems to be much more confusing (and some of it is actually on datapath). Same goes for t3_wq and I suspect everything else defined in cxio_wr.h - please, use the native types from include/rdma/. Having to wade through 3 driver-specific layers of abstractions just because I want to for example change API in ib_verbs.h and need to update all drivers will be very taxing. I understand your design calls for 2 layers, but at least the API exposed by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures which seem to just duplicate ib_verbs.h. -- MST From mst at mellanox.co.il Thu Feb 8 23:19:15 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 09:19:15 +0200 Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport-IWCM workaroundfor ip_dev_find() bug. In-Reply-To: <1170981653.3049.133.camel@stevo-desktop> References: <1170981653.3049.133.camel@stevo-desktop> Message-ID: <20070209071915.GN6560@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport-IWCM workaroundfor ip_dev_find() bug. > > Michael, > > >From your email, it sounded like you would regression test this. Not yet, we had lab restructuring - hopefully next week. -- MST From mst at mellanox.co.il Thu Feb 8 23:28:52 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 09:28:52 +0200 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: References: <20070208202634.4382.15287.stgit@dell3.ogc.int> Message-ID: <20070209072852.GP6560@mellanox.co.il> > And here's what I have pending in for-2.6.21 so far: What about the mthca memory registration patches? I thought they are on their way. Should I repost? -- MST From mst at mellanox.co.il Fri Feb 9 00:04:19 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 10:04:19 +0200 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> Message-ID: <20070209080418.GQ6560@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: please pull for 2.6.21: fix + add IB multicast support > > I merged the "increment port number" and "remove redundant '_wq'" > patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland > > I plan to review to multicast stuff next week and I hope to merge it > for 2.6.21. Or, have you or anyone else at Voltaire read over the > code in addition to using it? Do you see anything that should be > cleaned up? I looked at the code briefly, don't have much time at the moment unfortunately. +static void join_group(struct mcast_group *group, struct mcast_member *member, + u8 join_state) +{ + member->state = MCAST_MEMBER; + adjust_membership(group, join_state, 1); + group->rec.join_state |= join_state; + member->multicast.rec = group->rec; + member->multicast.rec.join_state = join_state; + list_del(&member->list); + list_add(&member->list, &group->active_list); +} Can be just list_move. Patch allocates everything with kzalloc, but then goes ahead and initialize everything. So just kmalloc it - no reason to waste initialized memory if non-initialized will do. List of places: + member = kzalloc(sizeof *member, gfp_mask); + if (!member) + return ERR_PTR(-ENOMEM); Same here: + group = kzalloc(sizeof *group, gfp_mask); + if (!group) + return NULL; + and same here: + iter = kzalloc(sizeof *iter + attr_size, GFP_KERNEL); + if (!iter) + return ERR_PTR(-ENOMEM); + It seems same goes for + mc = kzalloc(sizeof(*mc), GFP_KERNEL); + if (!mc) + return NULL; in ucma.c - everything gets initied by calling function - but a bit less sure, needs checking. By the way, it seems same goes for + bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); + if (!bind_list) + return -ENOMEM; in cma_alloc_any_port in the port randomization patch that was merged and for cma_alloc_port in existing code. -- MSTYou seem to be careful to do list_del_init for member->list all over, From or.gerlitz at gmail.com Fri Feb 9 01:21:30 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 9 Feb 2007 11:21:30 +0200 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> Message-ID: <15ddcffd0702090121t3314f577ue42584282991984a@mail.gmail.com> On 2/9/07, Roland Dreier wrote: > I plan to review to multicast stuff next week and I hope to merge it for 2.6.21 thanks, good news! > Or, have you or anyone else at Voltaire read over the > code in addition to using it? Do you see anything that should be > cleaned up? OK, I most the the review i did (and interaction with Sean to add changes) was on the rdma_cm: add multicast communication support patch, and i was less focused on the ib_sa: track multicast join/leave requests patch, however i recall that there were some discussions between Sean and Michael and they reached an agreement. I will look on the ib_sa patch on Sunday and let Sean/you know if i have any comments. Or. From mst at mellanox.co.il Fri Feb 9 01:29:21 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 11:29:21 +0200 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <15ddcffd0702090121t3314f577ue42584282991984a@mail.gmail.com> References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> <15ddcffd0702090121t3314f577ue42584282991984a@mail.gmail.com> Message-ID: <20070209092921.GU6560@mellanox.co.il> > > Or, have you or anyone else at Voltaire read over the > > code in addition to using it? Do you see anything that should be > > cleaned up? > > OK, I most the the review i did (and interaction with Sean to add changes) was > on the rdma_cm: add multicast communication support patch, and i was > less focused > on the ib_sa: track multicast join/leave requests patch, however i > recall that there were some discussions between Sean and Michael and > they reached an agreement. Yes, we reached an agreement. These patches have also seen some limited testing in the OFED tree. -- MST From vlad at lists.openfabrics.org Fri Feb 9 02:24:20 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Fri, 9 Feb 2007 02:24:20 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070209-0200 daily build status Message-ID: <20070209102420.B6FB6E60807@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.18 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From halr at voltaire.com Fri Feb 9 04:12:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 07:12:55 -0500 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45CBB59C.4010709@ichips.intel.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> Message-ID: <1171023168.31538.153989.camel@hal.voltaire.com> On Thu, 2007-02-08 at 18:43, Sean Hefty wrote: > > Looking at the problem more, I think that the issue extends to the remote port > > LID as well. My expectation with a local path record query is that the SLID is > > the local port, and the DLID is the local router. This should be sufficient for > > one-way UD traffic, but for connected traffic we still need to discover the > > remote router and remote port LIDs. > > Given a path record query for: > > SGID - local > DGID - remote > > What would be the SLID and DLID? SLID corresponding to SGID and a DLID for some IB router on the subnet which can route to the remote DGID. > And if the query is reversed, such that: > > SGID - remote > DGID - local > > Are the SLID/DLID values simply reversed? An SM is free to choose SLID and DLID to supply to if there are multiple LIDs for the ports in question it can choose alternates. The key here is whether a reversible path has been requested or not. It is also not clear what reversible means in the context of an IB internetwork (multiple IB subnets interconnected by IB routers). > What if the DGID in the second case were a multicast GID? So you are asking about what an SA PR lookup for a remote SGID to a DGID which is an MGID would yield ? I think this too is beyond the spec. > What does the SLID become in this case? The SLID couldn't be valid (on a remote subnet) so I'm not sure what would be said for this case. -- Hal > - Sean From halr at voltaire.com Fri Feb 9 04:15:31 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 07:15:31 -0500 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070209043727.GN11411@obsidianresearch.com> References: <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <20070209043727.GN11411@obsidianresearch.com> Message-ID: <1171023319.31538.154141.camel@hal.voltaire.com> On Thu, 2007-02-08 at 23:37, Jason Gunthorpe wrote: > On Thu, Feb 08, 2007 at 03:43:24PM -0800, Sean Hefty wrote: > > > Looking at the problem more, I think that the issue extends to the remote port > > > LID as well. My expectation with a local path record query is that the SLID is > > > the local port, and the DLID is the local router. This should be sufficient for > > > one-way UD traffic, but for connected traffic we still need to discover the > > > remote router and remote port LIDs. > > > > Given a path record query for: > > > > SGID - local > > DGID - remote > > > > What would be the SLID and DLID? > > > > And if the query is reversed, such that: > > > > SGID - remote > > DGID - local > > > > Are the SLID/DLID values simply reversed? > > I have a follow up question to this.. With CM how is the SL for each > side determined? I'm looking through the code here and it looks like > the SL of the active side is passed in the REQ to the passive side (ie > both sides are the same) But cma_query_ib_route does not set the > reversible bit when it asks for the path. If you don't set the > reversible bit isn't it necessary to make a 2nd path query to get the > reverse path's SL? [Path responses without the reversible bit set > are actually simplex paths and reversing them probably will run into > SL2VL mapping tables that cause the packets to be dropped ie o7-8] > > Infact, to get an optimal path aren't 3 path records required: > 1) A reversible path from active to passive from the CM GMPs > (required by C12-5.1.3) > 2) An optimal non-reversible path from active to passive > 3) An optimal non-reversible path from passive to active What you are saying seems correct to me although I am not sure about reversibility in the intersubnet case. It may be that the non reversible paths supplied (in a single subnet) happen to also be reversible so this all works. -- Hal > Jason > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ossrosch at linux.vnet.ibm.com Fri Feb 9 05:37:01 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Fri, 9 Feb 2007 14:37:01 +0100 Subject: [openib-general] [PATCH ofed-1.2] ofa_user.spec: fix installation path for ehca.driver Message-ID: <200702091437.02142.ossrosch@linux.vnet.ibm.com> Hi Vladimir, we tested the newest ofed1.2 package and found out that ehca.driver file is not copied into /usr/local/ofed/etc/libibverbs.d/ This patch add the installation path for ehca.driver to ofa_user.spec. Please ensure you first apply the ofa_user.spec patch I sent yesterday: http://openib.org/pipermail/openib-general/2007-February/032736.html Signed-off-by: Stefan Roscher --- ofa_user.spec | 1 + 1 files changed, 1 insertion(+) diff -Nurp ofed_scripts_old/ofa_user.spec ofed_scripts_new/ofa_user.spec --- ofed_scripts_old/ofa_user.spec 2007-02-09 14:00:38.000000000 +0100 +++ ofed_scripts_new/ofa_user.spec 2007-02-09 14:02:45.000000000 +0100 @@ -1165,6 +1165,7 @@ fi %files -n libehca -f libehca-files %defattr(-,root,root,-) %{_libdir}/libehca*.so* +%config %{_prefix}/etc/libibverbs.d/ehca.driver # %doc AUTHORS COPYING ChangeLog README %endif From halr at voltaire.com Fri Feb 9 06:04:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 09:04:21 -0500 Subject: [openib-general] [PATCH TRIVIAL] osmtest: use more descriptive constant names In-Reply-To: <20070208231618.GB22807@sashak.voltaire.com> References: <20070208231618.GB22807@sashak.voltaire.com> Message-ID: <1171029859.31538.160613.camel@hal.voltaire.com> On Thu, 2007-02-08 at 18:16, Sasha Khapyorsky wrote: > Use more descriptive constant names for osmtest flows. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to master and ofed_1_2). -- Hal From tom at opengridcomputing.com Fri Feb 9 06:22:39 2007 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 09 Feb 2007 08:22:39 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: References: Message-ID: <1171030959.26453.6.camel@trinity.ogc.int> Kumar: I _LOVE_ the patch and the fact that you're making this code better. I just want to tweak it a little bit... * Please convince yourself (and me ;-)) that the iw_cm_destroy_id can never block where you've put it. I'll bet that it's fine, but convince yourself too. Your comment scared me a little -- that's all. * Please see if moving the call to reject can be moved to the destroy switch so that we don't have to call it everywhere else. * Please make sure that everywhere we call destory_cm_id, the cleanup of the work queue is also done. Thanks, Tom On Fri, 2007-02-09 at 08:59 +0530, Krishna Kumar2 wrote: > Roland, > > Yes, we will do some "arm wrestling" today :) > > thanks, > > KK > > Roland Dreier wrote on 02/09/2007 05:20:42 AM: > > > Hmm, Steve likes it, Tom doesn't. Can you guys arm wrestle or > > something and tell me if this patch is correct or not? > > > > - R. > From swise at opengridcomputing.com Fri Feb 9 06:23:47 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 09 Feb 2007 08:23:47 -0600 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: <20070209065149.GL6560@mellanox.co.il> References: <1170981550.3049.130.camel@stevo-desktop> <20070209065149.GL6560@mellanox.co.il> Message-ID: <1171031027.4896.1.camel@stevo-desktop> On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote: > > > Also I agree with MST, I would like to see the core/ subdirectory die > > > completely. > > > > > > > ok ok...I'll kill the subdir... > > It's not just the directory BTW. Stuff like building completions in > t3_cqe format and then reformatting to ib_wc seems to be much more confusing > (and some of it is actually on datapath). The t3_cqe format is built BY THE HW. > Same goes for t3_wq and I suspect everything else defined in cxio_wr.h - > please, use the native types from include/rdma/. > Ditto. t3_wq is the HW format. > Having to wade through 3 driver-specific layers of abstractions just because I want to > for example change API in ib_verbs.h and need to update all drivers will be > very taxing. I understand your design calls for 2 layers, but at least the API exposed > by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures > which seem to just duplicate ib_verbs.h. cxio_wr.h is hw format. You want me to change ib_verbs.h to make WRs and CQEs align with Chelsio hardware? From swise at opengridcomputing.com Fri Feb 9 06:58:45 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 09 Feb 2007 08:58:45 -0600 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: <1171031027.4896.1.camel@stevo-desktop> References: <1170981550.3049.130.camel@stevo-desktop> <20070209065149.GL6560@mellanox.co.il> <1171031027.4896.1.camel@stevo-desktop> Message-ID: <1171033125.4896.21.camel@stevo-desktop> On Fri, 2007-02-09 at 08:23 -0600, Steve Wise wrote: > On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote: > > > > Also I agree with MST, I would like to see the core/ subdirectory die > > > > completely. > > > > > > > > > > ok ok...I'll kill the subdir... > > > > It's not just the directory BTW. Stuff like building completions in > > t3_cqe format and then reformatting to ib_wc seems to be much more confusing > > (and some of it is actually on datapath). > > The t3_cqe format is built BY THE HW. > > > > Same goes for t3_wq and I suspect everything else defined in cxio_wr.h - > > please, use the native types from include/rdma/. > > > > Ditto. t3_wq is the HW format. > To be more precise: struct t3_wq is the struct used to describe the T3 HW WQ, which is both the SQ and RQ for the QP. struct t3_cq is the struct used to describe the T3 HW CQ -and- a SW CQ used to maintain proper completion ordering that isn't maintained by the HW for some operations. union t3_wr defines the union of all the HW-specific WR structs. struct t3_cqe defines the HW CQE format. All of the is very tightly integrated with the HW. These HW-specific structs are included in a high-level struct that defines the object and has all the stuff needed to integrate into the OFA stack. Example: struct iwch_qp defines the top-level QP structure that maintains both the T3 HW struct (struct t3_wq) and the OFA struct (struct ib_qp) as well as attributes, wait objects, locks, etc to correctly implement the OFA QP object. This is similar to what other providers do: hw/mthca/mthca_provider.h: mthca_qp includes 2 mthca_wq structs for the SQ and RQ. hw/amso/c2_provider.h: c2_qp includes 2 c2_mq structs for the SQ and RQ message queues. hw/ipath/ipath_verbs.h: ipath_qp include ipath_swqe and ipath_rq for their work queues. WRT data path operations, consider iwch_poll_cq_one(). The CQE is in T3 FW format and must be converted into a OFA struct ib_wc. There's no way around this, right? mthca_poll_one() does the same thing. Ditto for c2_poll_one(). Hope this helps. Steve. From mst at mellanox.co.il Fri Feb 9 07:03:07 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Feb 2007 17:03:07 +0200 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: <1171031027.4896.1.camel@stevo-desktop> References: <1170981550.3049.130.camel@stevo-desktop> <20070209065149.GL6560@mellanox.co.il> <1171031027.4896.1.camel@stevo-desktop> Message-ID: <20070209150307.GW6560@mellanox.co.il> > Quoting r. Steve Wise : > Subject: Re: [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes > > On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote: > > > > Also I agree with MST, I would like to see the core/ subdirectory die > > > > completely. > > > > > > > > > > ok ok...I'll kill the subdir... > > > > It's not just the directory BTW. Stuff like building completions in > > t3_cqe format and then reformatting to ib_wc seems to be much more confusing > > (and some of it is actually on datapath). > > The t3_cqe format is built BY THE HW. I understand, I did not get that. But for example create_read_req_cqe builds it in software. It could build ib_wc instead. ... > > Having to wade through 3 driver-specific layers of abstractions just because I want to > > for example change API in ib_verbs.h and need to update all drivers will be > > very taxing. I understand your design calls for 2 layers, but at least the API exposed > > by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures > > which seem to just duplicate ib_verbs.h. > > cxio_wr.h is hw format. You want me to change ib_verbs.h to make WRs > and CQEs align with Chelsio hardware? No, but it need not be part of interface. The reason I was confused is because you seem to create an extra copy e.g. for t3_cqe. cxio_poll_cq currently creates an intermediate copy of the completion on the stack, I think it could format ib_wc directly instead. -- MST From Arkady.Kanevsky at netapp.com Fri Feb 9 07:15:47 2007 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Fri, 9 Feb 2007 10:15:47 -0500 Subject: [openib-general] dapl broken for iWARP Message-ID: Steve, what is an issue of using max_qp_rd_atom and max_qp_init_rd_atom beside the bad name? Thanks, Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Thursday, February 08, 2007 6:11 PM > To: Arlin Davis > Cc: openib-general > Subject: Re: [openib-general] dapl broken for iWARP > > On Wed, 2007-02-07 at 15:57 -0600, Steve Wise wrote: > > On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > > Arlin, > > > > > > The OFED dapl code is assuming the responder_resources and > > > initiator_depth passed up on a connection request event > are from the > > > remote peer. This doesn't happen for iWARP. In the > current iWARP > > > specifications, its up to the application to exchange this > > > information somehow. So these are defaulting to 0 on the > server side > > > of any dapl connection over iWARP. > > > > > > This is a fairly recent change, I think. We need to come up with > > > some way to deal with this for OFED 1.2 IMO. > > > > > > > The IWCM could set these to the device max values for instance. > > > > Steve. > > > > There is a slight problem with all this. There are no device > attributes currently for ORD and IRD. The ammasso driver > maps these to max_qp_rd_atom (IRD) and > max_qp_init_rd_atom(ORD). But this is screwy. > We need new attribute for these. > > For OFED 1.2, I think I should just have the IWCM set them to > 8. The only RNIC in ofed is cxgb3 and it supports 8... > > > Steve. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Fri Feb 9 07:25:45 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 09 Feb 2007 09:25:45 -0600 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: <20070209150307.GW6560@mellanox.co.il> References: <1170981550.3049.130.camel@stevo-desktop> <20070209065149.GL6560@mellanox.co.il> <1171031027.4896.1.camel@stevo-desktop> <20070209150307.GW6560@mellanox.co.il> Message-ID: <1171034745.4896.37.camel@stevo-desktop> > I understand, I did not get that. > > But for example create_read_req_cqe builds it in software. > It could build ib_wc instead. > Reads are handled in a slightly different manner. This is due to the fact that the T3 HW can complete a read out of order. For example: POST READ POST WRITE The post read trigger the HW to send an RDMA_READ_REQUEST. Immediately after that the HW can (and will) send the RDMA_WRITE. Once the peer TCP ACKs the WRITE, the HW will post a CQE for the WRITE. That completion might happen before the peer sends back the READ_RESPONSE. Since the RDMAC verbs spec sez WRs must be completed in order, the T3 driver has to deal with this. (and its painful :) In addition, I have to maintain other state about a read. 1) the consumer wr_id. For non reads, the wr_id is actually reflected back by the HW from the WQE to the CQE. For reads, this doesn't happen. 2) the CQE for a read completion doesn't contain the original length. I need to pull that from the associated original WQE. So all this means the driver needs to construct a proper read cqe from several parts. That's why it creates it locally on the stack. BUT: You're right though: All WQEs get copied out of the HWCQ and into an on-stack variable in iwch_poll_cq_one(). Removing this, however, requires rethinking all the READ logic which assumes the WQE is copied out of the HWCQ. Can cannot make this change right now because of stability concerns (it took me long enough to understand how to correctly handle the read case as it stands :-) > ... > > > > Having to wade through 3 driver-specific layers of abstractions just because I want to > > > for example change API in ib_verbs.h and need to update all drivers will be > > > very taxing. I understand your design calls for 2 layers, but at least the API exposed > > > by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures > > > which seem to just duplicate ib_verbs.h. > > > > cxio_wr.h is hw format. You want me to change ib_verbs.h to make WRs > > and CQEs align with Chelsio hardware? > > No, but it need not be part of interface. The reason I was confused is because > you seem to create an extra copy e.g. for t3_cqe. cxio_poll_cq currently > creates an intermediate copy of the completion on the stack, I think it could > format ib_wc directly instead. > I'll log this as a performance optimization that we can do later. Thanks for helping review this stuff!! Steve. From swise at opengridcomputing.com Fri Feb 9 07:26:57 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 09 Feb 2007 09:26:57 -0600 Subject: [openib-general] dapl broken for iWARP In-Reply-To: References: Message-ID: <1171034817.4896.39.camel@stevo-desktop> On Fri, 2007-02-09 at 10:15 -0500, Kanevsky, Arkady wrote: > Steve, > what is an issue of using > max_qp_rd_atom and max_qp_init_rd_atom > beside the bad name? its a hack. But Bob already asked to do this, so I guess I will. We still don't ensure interoperability with DAPL consumers. A global value would. Using device max's wont. > Thanks, > > Arkady Kanevsky email: arkady at netapp.com > Network Appliance Inc. phone: 781-768-5395 > 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 > Waltham, MA 02451 central phone: 781-768-5300 > > > > -----Original Message----- > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > Sent: Thursday, February 08, 2007 6:11 PM > > To: Arlin Davis > > Cc: openib-general > > Subject: Re: [openib-general] dapl broken for iWARP > > > > On Wed, 2007-02-07 at 15:57 -0600, Steve Wise wrote: > > > On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > > > Arlin, > > > > > > > > The OFED dapl code is assuming the responder_resources and > > > > initiator_depth passed up on a connection request event > > are from the > > > > remote peer. This doesn't happen for iWARP. In the > > current iWARP > > > > specifications, its up to the application to exchange this > > > > information somehow. So these are defaulting to 0 on the > > server side > > > > of any dapl connection over iWARP. > > > > > > > > This is a fairly recent change, I think. We need to come up with > > > > some way to deal with this for OFED 1.2 IMO. > > > > > > > > > > The IWCM could set these to the device max values for instance. > > > > > > Steve. > > > > > > > There is a slight problem with all this. There are no device > > attributes currently for ORD and IRD. The ammasso driver > > maps these to max_qp_rd_atom (IRD) and > > max_qp_init_rd_atom(ORD). But this is screwy. > > We need new attribute for these. > > > > For OFED 1.2, I think I should just have the IWCM set them to > > 8. The only RNIC in ofed is cxgb3 and it supports 8... > > > > > > Steve. > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > From Arkady.Kanevsky at netapp.com Fri Feb 9 07:29:57 2007 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Fri, 9 Feb 2007 10:29:57 -0500 Subject: [openib-general] dapl broken for iWARP Message-ID: Mike, this is not a DAPL issue. There are 2 ways to deal with it. One is for all ULPs to use private data to exchange CM info. yes, some ULPs, like SDP do that in hello world message. Another is to let CM handle it. This way ULP does not have to deal with it. This is analogous to the IBTA CM IP addressing Annex. It ensure backwards compatibility and does not break any existing apps which use MPA as specified by IETF. No need to bother IETF until we have it working. Thanks, Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Michael Krause [mailto:krause at cup.hp.com] > Sent: Thursday, February 08, 2007 4:27 PM > To: Kanevsky, Arkady; Steve Wise; Arlin Davis > Cc: openib-general > Subject: Re: [openib-general] dapl broken for iWARP > > At 07:43 AM 2/8/2007, Kanevsky, Arkady wrote: > >That is correct. > >I am working with Krishna on it. > >Expect patches soon. > > > >By the way the problem is not DAPL specific and so is a proposed > >solution. > > > >There are 3 aspects of the solution. > >One is APIs. We suggest that we do not augment these. > >That is a connection requestor sets its QP RDMA ORD and IRD. > >When connection is established user can check the QP RDMA > ORD and IRD > >to see what he has now to use over the connection. > >We may consider to extend QP attributes to support transport > specific > >parameters passing in the future. > >For example, iWARP MPA CRC request. > > > >Second is the semantic that CM provides. > >The proposal is to match IBCM semantic. > >That is CM guarantee that local IRD is >= remote ORD. > >This guarantees that incoming RDMA Read requests will not > overwhelm the > >QP RDMA Read capabilities. > >Again there is not changes to IBCM only to IWCM. > >Notice that as part of this IWCM will pass down to driver > and extract > >from driver needed info. > > > >The final part is iWARP CM extension to exchange RDMA ORD, IRD. > >This is similar to IBTA Annex for IP Addressing. > >The harder part that this will eventually require IETF MPA spec > >extension, and the fact that MPA protocol is implemented in > RNIC HW by > >many vendors, and hence can not be done by IWCM itself. > > We looked at this quite a bit during the creation of the > specification. All of the targeted usage models exchange > this information > as part of their "hello" or login exchanges. As such, the > "hum" was to > not change MPA to communicate such information and leave it > to software to > exchange these values through existing mechanisms. I > seriously doubt > there will be much support for modifying the MPA > specification at this stage since the implementations are > largely complete and a modification would have to deal with > the legacy interoperability issue which likely would be > solved in software any way. It would be simpler to simply > modify the underlying DAPL implementation to exchange the > information and keep this hidden from both the application > and the RNIC providers. > > Mike > > > >Thanks, > > > >Arkady Kanevsky email: arkady at netapp.com > >Network Appliance Inc. phone: 781-768-5395 > >1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 > >Waltham, MA 02451 central phone: 781-768-5300 > > > > > > > -----Original Message----- > > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > > Sent: Wednesday, February 07, 2007 6:12 PM > > > To: Arlin Davis > > > Cc: openib-general > > > Subject: Re: [openib-general] dapl broken for iWARP > > > > > > On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote: > > > > Steve Wise wrote: > > > > > > > > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > > > > > > > > > > > > > >>Arlin, > > > > >> > > > > >>The OFED dapl code is assuming the responder_resources and > > > > >>initiator_depth passed up on a connection request event > > > are from the > > > > >>remote peer. This doesn't happen for iWARP. In the > > > current iWARP > > > > >>specifications, its up to the application to exchange this > > > > >>information somehow. So these are defaulting to 0 on the > > > server side > > > > >>of any dapl connection over iWARP. > > > > >> > > > > >>This is a fairly recent change, I think. We need to > come up with > > > > >>some way to deal with this for OFED 1.2 IMO. > > > > >> > > > > >> > > > > Yes, this was changed recently to sync up with the > rdma_cm changes > > > > that exposed the values. > > > > > > > > >> > > > > >> > > > > > > > > > >The IWCM could set these to the device max values for instance. > > > > > > > > > > > > > > That would work fine as long as you know the remote > > > settings will be > > > > equal or better. The provider just sets the min of > local device max > > > > values and the remote values provided with the request. > > > > > > > > > > I know Krishna Kumar is working on a solution for exchanging > > > this info in private data so the IWCM can "do the right > > > thing". Stay tuned for a patch series to review for this. > > > But this functionality is definitely post OFED-1.2. > > > > > > > > > So for the OFED-1.2, I will set these to the device max > in the IWCM. > > > Assuming the other side is OFED 1.2 DAPL, then it will work fine. > > > > > > Steve. > > > > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit > >http://openib.org/mailman/listinfo/openib-general > > From Arkady.Kanevsky at netapp.com Fri Feb 9 07:32:56 2007 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Fri, 9 Feb 2007 10:32:56 -0500 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate forunicast packets Message-ID: Hal, unfortunately, IBTA punted on this issue. We considered it for IBTA CM IP address annex but at the end could not handle all the cases. Thanks, Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, February 07, 2007 7:20 PM > To: Sean Hefty > Cc: Jason Gunthorpe; Roland Dreier; > openib-general at openib.org; Kanevsky, Arkady > Subject: Re: [openib-general] [PATCH] IPOIB: Use a GRH when > appropriate forunicast packets > > On Wed, 2007-02-07 at 15:24, Sean Hefty wrote: > > > I didn't get too far on getting CMA to work. Beyond the > bad HopLimit > > > feild I was seeing Hal pointed out a number of problems > in IBA that > > > would prevent it from working as is :< > > > > I've started thinking about what it would take to get the > rdma cm to > > work across a router. I think the rdma cm may need to treat IPv6 > > addresses as a GID for this to work across subnets, versus > trying to > > map an ipoib IP address to a GID based on ARP. > > An IB GID is IPv6 like but not an IPv6 address so I don't > think this is a good idea and don't see how you get around > mapping IP addresses to GIDs in an IB routed network given > the way things are spec'd. I think that the RDMA CM assumes a > single IPoIB subnet. Does it work when the destination is on > another subnet ? I think there are some unaddressed gateway > issues here to make that work and these may have been punted > (during spec time). Arkady might be a good person to comment on this. > > -- Hal > > > - Sean > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > From rdreier at cisco.com Fri Feb 9 07:34:10 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 09 Feb 2007 07:34:10 -0800 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: <20070209072852.GP6560@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 9 Feb 2007 09:28:52 +0200") References: <20070208202634.4382.15287.stgit@dell3.ogc.int> <20070209072852.GP6560@mellanox.co.il> Message-ID: Michael> What about the mthca memory registration patches? I Michael> thought they are on their way. Should I repost? Sorry, I forgot about that. Yes, please resend the latest state. From tom at opengridcomputing.com Fri Feb 9 07:41:08 2007 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 09 Feb 2007 09:41:08 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: References: Message-ID: <1171035668.26453.11.camel@trinity.ogc.int> Roland: This looks bad. Lemme noodle... On Thu, 2007-02-08 at 20:23 -0800, Roland Dreier wrote: > BTW, while looking at iwcm.c, I noticed the following highly dubious > code for the first time: > > static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) > { > int ret = 0; > > BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > if (atomic_dec_and_test(&cm_id_priv->refcount)) { > BUG_ON(!list_empty(&cm_id_priv->work_list)); > if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { > BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); > BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, > &cm_id_priv->flags)); > ret = 1; > } > complete(&cm_id_priv->destroy_comp); > } > > return ret; > } > > The test of waitqueue_active on destroy_comp.wait looks really bad for > two reasons: first, it is relying on an internal implementation detail > of struct completion that really shouldn't be used by generic code. > And second, it seems to me that this doesn't even work right, since > there is a race something like the following: > > iw_destroy_cm_id(): > destroy_cm_id(cm_id); // still 1 ref left > > cm_work_handler(): > if (iwcm_deref_id()) // drop last ref > return; > // no one waiting yet, doesn't > // return, but destroy_comp is > // signaled > > wait_for_completion(&cm_id_priv->destroy_comp); > // destroy_comp is signaled, proceed > kfree(cm_id_priv); > > // continue using cm_id_priv > // OOPS > > I don't understand this code well enough for the fix to be obvious. > > - R. From halr at voltaire.com Fri Feb 9 07:39:01 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 10:39:01 -0500 Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate forunicast packets In-Reply-To: References: Message-ID: <1171035534.31538.166197.camel@hal.voltaire.com> Arkady, On Fri, 2007-02-09 at 10:32, Kanevsky, Arkady wrote: > Hal, > unfortunately, IBTA punted on this issue. > We considered it for IBTA CM IP address annex but at the end > could not handle all the cases. Thanks. Any idea if this issue might be addressed (no pun intended) or whether it is left for implementors to decide if/how to try to handle this ? -- Hal > Thanks, > > Arkady Kanevsky email: arkady at netapp.com > Network Appliance Inc. phone: 781-768-5395 > 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 > Waltham, MA 02451 central phone: 781-768-5300 > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Wednesday, February 07, 2007 7:20 PM > > To: Sean Hefty > > Cc: Jason Gunthorpe; Roland Dreier; > > openib-general at openib.org; Kanevsky, Arkady > > Subject: Re: [openib-general] [PATCH] IPOIB: Use a GRH when > > appropriate forunicast packets > > > > On Wed, 2007-02-07 at 15:24, Sean Hefty wrote: > > > > I didn't get too far on getting CMA to work. Beyond the > > bad HopLimit > > > > feild I was seeing Hal pointed out a number of problems > > in IBA that > > > > would prevent it from working as is :< > > > > > > I've started thinking about what it would take to get the > > rdma cm to > > > work across a router. I think the rdma cm may need to treat IPv6 > > > addresses as a GID for this to work across subnets, versus > > trying to > > > map an ipoib IP address to a GID based on ARP. > > > > An IB GID is IPv6 like but not an IPv6 address so I don't > > think this is a good idea and don't see how you get around > > mapping IP addresses to GIDs in an IB routed network given > > the way things are spec'd. I think that the RDMA CM assumes a > > single IPoIB subnet. Does it work when the destination is on > > another subnet ? I think there are some unaddressed gateway > > issues here to make that work and these may have been punted > > (during spec time). Arkady might be a good person to comment on this. > > > > -- Hal > > > > > - Sean > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > From purdy at sgi.com Fri Feb 9 08:05:16 2007 From: purdy at sgi.com (Dale Purdy) Date: Fri, 9 Feb 2007 10:05:16 -0600 Subject: [openib-general] [PATCH] OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch In-Reply-To: <1170944383.31538.74632.camel@hal.voltaire.com> References: <1170944383.31538.74632.camel@hal.voltaire.com> Message-ID: We have successfully tested this bug fix and would like to see it pushed into the 1.2 branch. Dale On Thu, 8 Feb 2007, Hal Rosenstock wrote: > OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch This change resolves an issue with strange SL assignment when two HCAs communicate with other and are on the same switch. Since LASH is switch to switch routing, the get_lash_sl function was casting 9999 (the value assigned to the variable NONE) to be a uint8_t when asked for an SL assignment in this case. This change resolves this issue. Signed-off-by: Thomas Sødring Signed-off-by: Hal Rosenstock diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c index 5dfe068..e5f751c 100644 --- a/osm/opensm/osm_ucast_lash.c +++ b/osm/opensm/osm_ucast_lash.c @@ -1468,6 +1468,7 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_ osm_port_t *p_src_port, osm_port_t *p_dst_port) { unsigned dst_id; + unsigned src_id; osm_switch_t *p_sw; if (p_osm->routing_engine.ucast_build_fwd_tables != lash_process) @@ -1482,6 +1483,10 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_ if (!p_sw || !p_sw->priv) return OSM_DEFAULT_SL; + src_id = get_lash_id(p_sw); + if (src_id == dst_id) + return OSM_DEFAULT_SL; + return (uint8_t)((switch_t *)p_sw->priv)->routing_table[dst_id].lane; } _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Fri Feb 9 08:22:52 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 11:22:52 -0500 Subject: [openib-general] [PATCH] OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch In-Reply-To: References: <1170944383.31538.74632.camel@hal.voltaire.com> Message-ID: <1171038146.31538.168863.camel@hal.voltaire.com> On Fri, 2007-02-09 at 11:05, Dale Purdy wrote: > We have successfully tested this bug fix Thanks. > and would like to see it > pushed into the 1.2 branch. Already pushed for ofed_1_2. I will be sending a note to Vlad to pick these up and it should be in alpha. -- Hal > Dale > > On Thu, 8 Feb 2007, Hal Rosenstock wrote: > > > OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch > > This change resolves an issue with strange SL assignment when > two HCAs communicate with other and are on the same switch. > Since LASH is switch to switch routing, the get_lash_sl > function was casting 9999 (the value assigned to the > variable NONE) to be a uint8_t when asked for an SL assignment > in this case. This change resolves this issue. > > Signed-off-by: Thomas Sødring > Signed-off-by: Hal Rosenstock > > diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c > index 5dfe068..e5f751c 100644 > --- a/osm/opensm/osm_ucast_lash.c > +++ b/osm/opensm/osm_ucast_lash.c > @@ -1468,6 +1468,7 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_ > osm_port_t *p_src_port, osm_port_t *p_dst_port) > { > unsigned dst_id; > + unsigned src_id; > osm_switch_t *p_sw; > > if (p_osm->routing_engine.ucast_build_fwd_tables != lash_process) > @@ -1482,6 +1483,10 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_ > if (!p_sw || !p_sw->priv) > return OSM_DEFAULT_SL; > > + src_id = get_lash_id(p_sw); > + if (src_id == dst_id) > + return OSM_DEFAULT_SL; > + > return (uint8_t)((switch_t *)p_sw->priv)->routing_table[dst_id].lane; > } > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Fri Feb 9 08:49:58 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 09 Feb 2007 10:49:58 -0600 Subject: [openib-general] [PATCH] for-2.6.21 Declare iwch_ev_dispatch in iwch.h Message-ID: <1171039798.4896.49.camel@stevo-desktop> Declare iwch_ev_dispatch in iwch.h Remove the extern declaration from iwch.c and put it in iwch.h Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch.c | 2 -- drivers/infiniband/hw/cxgb3/iwch.h | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c index c353a9b..4611afa 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.c +++ b/drivers/infiniband/hw/cxgb3/iwch.c @@ -162,8 +162,6 @@ static void close_rnic_dev(struct t3cdev mutex_unlock(&dev_mutex); } -extern void iwch_ev_dispatch(struct cxio_rdev *rdev_p, struct sk_buff *skb); - static int __init iwch_init_module(void) { int err; diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h index 29cf2e8..6517ef8 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.h +++ b/drivers/infiniband/hw/cxgb3/iwch.h @@ -172,4 +172,6 @@ static inline void remove_handle(struct extern struct cxgb3_client t3c_client; extern cxgb3_cpl_handler_func t3c_handlers[NUM_CPL_CMDS]; +extern void iwch_ev_dispatch(struct cxio_rdev *rdev_p, struct sk_buff *skb); + #endif From mshefty at ichips.intel.com Fri Feb 9 09:09:05 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Feb 2007 09:09:05 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070209043727.GN11411@obsidianresearch.com> References: <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <20070209043727.GN11411@obsidianresearch.com> Message-ID: <45CCAAB1.7000103@ichips.intel.com> > I have a follow up question to this.. With CM how is the SL for each > side determined? I'm looking through the code here and it looks like > the SL of the active side is passed in the REQ to the passive side (ie > both sides are the same) But cma_query_ib_route does not set the > reversible bit when it asks for the path. If you don't set the > reversible bit isn't it necessary to make a 2nd path query to get the > reverse path's SL? [Path responses without the reversible bit set > are actually simplex paths and reversing them probably will run into > SL2VL mapping tables that cause the packets to be dropped ie o7-8] Complete support for non-reversible paths is missing. It would take some additional work to add this in, and would likely require API changes. (Personally, I would like to keep ignoring this until it becomes an issue.) For now, the CMA should at least set the reversible bit for its query. I don't know that the reversible bit in a path record can really apply across subnets. - Sean From michael.arndt at informatik.tu-chemnitz.de Fri Feb 9 09:14:49 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Fri, 9 Feb 2007 18:14:49 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> Message-ID: <000401c74c6d$ce4875f0$21606d86@one7> Hi, > umad_send takes the timeout in msec. 100 msec is too short. Try > something on the order of seconds. Note also that negative 'timeout_ms' > value makes the kernel wait for the reply forever. I have tried many values, but sooner or later the umad_send broke down, which is bad because the SM thinks a port or node is unreachable if there didn't come an response. All works fine if I sleep after every send but that can't be the right track. What can I do or is there a known bug in the libibumad that I have slipped? I modified the __osm_state_mgr_sweep_hop_1 function so it send not just one packets with [0][1] but also [0][1][1], [0][1][1][1]. Any there it happens too that one packet is not be sent sometimes. I'm wondering because if the SM get all PortInfos from an switch there are be many sends too, but it seems be that it works. Is the packet size for the umad_send 256 max? Thanks Michael From mshefty at ichips.intel.com Fri Feb 9 09:22:15 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Feb 2007 09:22:15 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <1171023168.31538.153989.camel@hal.voltaire.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> Message-ID: <45CCADC7.5000804@ichips.intel.com> > SLID corresponding to SGID and a DLID for some IB router on the subnet > which can route to the remote DGID. This was my assumption as well. > An SM is free to choose SLID and DLID to supply to if there are multiple > LIDs for the ports in question it can choose alternates. The key here is > whether a reversible path has been requested or not. It is also not > clear what reversible means in the context of an IB internetwork > (multiple IB subnets interconnected by IB routers). For simplicity, assume a single path. My assumption in this case was that the SLID/DLID values would be reversed. That is, the LIDs are relative to the local subnet, not the SGID. But if I set the SGID = DGID = remote GID, then the LIDs would be relative to the remote subnet. (Assuming that the local SA could support such a query at all.) It seems that in order to meet the requirements of the spec, we need a way to perform inter-subnet queries. (The alternative being to change the spec...) And if the local SA can return a path record to a remote DGID, then it also seems like the local SA must be able to collect some sort of information about the path to the remote subnet. (How it does this seems TBD.) So... I'm thinking that the solution to these problems should rest within the local SA... - Sean From mshefty at ichips.intel.com Fri Feb 9 10:01:03 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Feb 2007 10:01:03 -0800 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <20070209080418.GQ6560@mellanox.co.il> References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> <20070209080418.GQ6560@mellanox.co.il> Message-ID: <45CCB6DF.3020602@ichips.intel.com> > + member = kzalloc(sizeof *member, gfp_mask); > + if (!member) > + return ERR_PTR(-ENOMEM); This appears okay to replace with kmalloc. > + group = kzalloc(sizeof *group, gfp_mask); > + if (!group) > + return NULL; > + We would need additional initialize code to clear the members array, set the state, and set last_join fields. > and same here: > > + iter = kzalloc(sizeof *iter + attr_size, GFP_KERNEL); > + if (!iter) > + return ERR_PTR(-ENOMEM); I think this is coming from the local SA cache patch, which isn't part of this pull request. > + > > It seems same goes for > > + mc = kzalloc(sizeof(*mc), GFP_KERNEL); > + if (!mc) > + return NULL; We would need to set events_reported. > + bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); > + if (!bind_list) > + return -ENOMEM; This looks like it can be replaced with kmalloc. Roland, let me know how you'd like to handle any changes. - Sean From halr at voltaire.com Fri Feb 9 09:58:51 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 12:58:51 -0500 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45CCADC7.5000804@ichips.intel.com> References: <20070126000319.GA12386@obsidianresearch.com> <20070126180840.GD12386@obsidianresearch.com> <45CA2084.7090503@ichips.intel.com> <20070207191154.GC11411@obsidianresearch.com> <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> Message-ID: <1171043929.31538.174521.camel@hal.voltaire.com> On Fri, 2007-02-09 at 12:22, Sean Hefty wrote: > > SLID corresponding to SGID and a DLID for some IB router on the subnet > > which can route to the remote DGID. > > This was my assumption as well. > > > An SM is free to choose SLID and DLID to supply to if there are multiple > > LIDs for the ports in question it can choose alternates. The key here is > > whether a reversible path has been requested or not. It is also not > > clear what reversible means in the context of an IB internetwork > > (multiple IB subnets interconnected by IB routers). > > For simplicity, assume a single path. My assumption in this case was that the > SLID/DLID values would be reversed. That is, the LIDs are relative to the local > subnet, not the SGID. But if I set the SGID = DGID = remote GID, then the LIDs > would be relative to the remote subnet. (Assuming that the local SA could > support such a query at all.) > > It seems that in order to meet the requirements of the spec, we need a way to > perform inter-subnet queries. (The alternative being to change the spec...) > And if the local SA can return a path record to a remote DGID, then it also > seems like the local SA must be able to collect some sort of information about > the path to the remote subnet. (How it does this seems TBD.) > > So... I'm thinking that the solution to these problems should rest within the > local SA... Yes, this seems most consistent with what is there now although there are some issues to work out on how some of the fields are supported and which queries would work intersubnet (as well as how they would work). -- Hal > - Sean From halr at voltaire.com Fri Feb 9 10:12:56 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 13:12:56 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <000401c74c6d$ce4875f0$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> Message-ID: <1171044773.31538.175280.camel@hal.voltaire.com> On Fri, 2007-02-09 at 12:14, Michael Arndt wrote: > Hi, > > > umad_send takes the timeout in msec. 100 msec is too short. Try > > something on the order of seconds. Note also that negative 'timeout_ms' > > value makes the kernel wait for the reply forever. > > I have tried many values, but sooner or later the umad_send broke down, > which is bad because the SM thinks a port or node is unreachable if there > didn't come an response. All works fine if I sleep after every send but that > can't be the right track. What can I do or is there a known bug in the > libibumad that I have slipped? I have no clue; I don't really understand what you have changed so it is hard to know. > I modified the __osm_state_mgr_sweep_hop_1 function so it send not just one > packets with [0][1] but also [0][1][1], [0][1][1][1]. I don't understand what you are trying to do and the scope of your changes. > Any there it happens too that one packet is not be sent sometimes. I can't parse this sentence. > I'm wondering because if the SM get all PortInfos from an switch > there are be many sends too, but it seems be that it works. Yes, the SM will poll for each port on a switch for its PortInfo and each of these is a separate SubnGet. > Is the packet size for the umad_send 256 max? It depends on the MAD type. SMPs are limited to a single MAD (256 bytes) whereas GMPs can be larger if the class supports RMPP (as SA does). -- Hal > Thanks Michael > > > > > From michael.arndt at informatik.tu-chemnitz.de Fri Feb 9 10:38:12 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Fri, 9 Feb 2007 19:38:12 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> Message-ID: <000401c74c79$74439b50$21606d86@one7> Hi, > I have no clue; I don't really understand what you have changed so it is > hard to know. For example: if I send ten SMPs like: for (i=0;i<10;i++){ umad_send(portid, agentid, msg, len, timeout, repeats); } timeout > 0! than only the first one is sent and all other umad_send calls returning with -5. Thanks Michael From jsquyres at cisco.com Fri Feb 9 10:38:04 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri, 9 Feb 2007 13:38:04 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <1170866522.6223.8.camel@vladsk-laptop> References: <1170866522.6223.8.camel@vladsk-laptop> Message-ID: <7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com> New SRPM on server that munges the %build section into the %install section. Yuck. :-) On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote: > Hi Jeff, > Please remove %build macro from the RPM spec file. > On SuSE distros it removes RPM_BUILD_ROOT. > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343 > + umask 022 > + cd /var/tmp/OFEDRPM/BUILD > + /bin/rm -rf /var/tmp/OFED > ++ dirname /var/tmp/OFED > + /bin/mkdir -p /var/tmp > + /bin/mkdir /var/tmp/OFED > + cd openmpi-1.2b4ofedr13470 > + fortify_source=1 > + test '' '!=' '' > ... > > -- > Vladimir Sokolovsky > Mellanox Technologies Ltd. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From sashak at voltaire.com Fri Feb 9 11:04:18 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 9 Feb 2007 21:04:18 +0200 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <000401c74c79$74439b50$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> Message-ID: <1171051141.2767.7.camel@localhost> Hi Michael, On Fri, 2007-02-09 at 19:38 +0100, Michael Arndt wrote: > Hi, > > > I have no clue; I don't really understand what you have changed so it is > > hard to know. > > For example: if I send ten SMPs like: > > for (i=0;i<10;i++){ > umad_send(portid, agentid, msg, len, timeout, repeats); > } > > timeout > 0! > than only the first one is sent and all other umad_send calls returning > with -5. It is strange, I did similar thing (you can see in management/diags/src/mcm_rereg_test.c) and it worked fine for me. Which libibumad version you are using? Also I understand you did some changes in the stack, is it related to user_mad? Could you publish this? Sasha From halr at voltaire.com Fri Feb 9 11:03:07 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 14:03:07 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <000401c74c79$74439b50$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> Message-ID: <1171047785.31538.178263.camel@hal.voltaire.com> On Fri, 2007-02-09 at 13:38, Michael Arndt wrote: > Hi, > > > I have no clue; I don't really understand what you have changed so it is > > hard to know. > > For example: if I send ten SMPs like: > > for (i=0;i<10;i++){ > umad_send(portid, agentid, msg, len, timeout, repeats); > } > > timeout > 0! > than only the first one is sent and all other umad_send calls returning > with -5. -5 is EIO For some reason, umad_send is indicating this after the write into the fd to pass the send to user_mad kernel module: n = write(port->dev_fd, mad, length + sizeof *mad); if (n == length + sizeof *mad) return 0; DEBUG("write returned %d != sizeof umad %zu + length %d (%m)", n, sizeof *mad, length); if (!errno) errno = EIO; return -EIO; I have no clue as to why subsequent (non first) writes are failing to write the proper amount of data. Do you have or can you create a simple test program to demonstrate this ? -- Hal > Thanks Michael From changquing.tang at hp.com Fri Feb 9 11:11:04 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Fri, 9 Feb 2007 19:11:04 -0000 Subject: [openib-general] Immediate data question In-Reply-To: <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> Message-ID: <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> > > > >Not for the receiver, but the sender will be severely slowed down by > >having to wait for the RNR timeouts. > > RNR = Receiver Not Ready so by definition, the data flow > isn't going to > progress until the receiver is ready to receive data. If a > receive QP > enters RNR for a RC, then it is likely not progressing as > desired. RNR > was initially put in place to enable a receiver to create > back pressure to the sender without causing a fatal error > condition. It should rarely be entered and therefore should > have negligible impact on overall performance however when a > RNR occurs, no forward progress will occur so performance is > essentially zero. Mike: I still do not quite understand this issue. I have two situations that have RNR triggered. 1. process A and process B is connected with QP. A first post a send to B, B does not post receive. Then A and B are doing a long time RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE message. Finally B will post a receive. Does the first pending send in A block all the later RDMA_WRITE ? If not, since RNR is triggered periodically till B post receive, does it affect the RDMA_WRITE performance between A and B ? 2. extend above to three processes, A connect to B, B connect to C, so B has two QPs, but one CQ. A posts a send to B, B does not post receive, rather B and C are doing a long time RDMA_WRITE, or send/recv. But B must sends RNR periodically to A, right?. So does the pending message from A affects B's overall performance between B and C ? Thank you. --CQ > > Mike > > > From jgunthorpe at obsidianresearch.com Fri Feb 9 11:20:46 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 9 Feb 2007 12:20:46 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <1171043929.31538.174521.camel@hal.voltaire.com> References: <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> Message-ID: <20070209192046.GP11411@obsidianresearch.com> On Fri, Feb 09, 2007 at 12:58:51PM -0500, Hal Rosenstock wrote: > > For simplicity, assume a single path. My assumption in this case was that the > > SLID/DLID values would be reversed. That is, the LIDs are relative to the local > > subnet, not the SGID. But if I set the SGID = DGID = remote GID, then the LIDs > > would be relative to the remote subnet. (Assuming that the local SA could > > support such a query at all.) > > > > It seems that in order to meet the requirements of the spec, we need a way to > > perform inter-subnet queries. (The alternative being to change the spec...) > > And if the local SA can return a path record to a remote DGID, then it also > > seems like the local SA must be able to collect some sort of information about > > the path to the remote subnet. (How it does this seems TBD.) > > > > So... I'm thinking that the solution to these problems should rest within the > > local SA... > > Yes, this seems most consistent with what is there now although there > are some issues to work out on how some of the fields are supported and > which queries would work intersubnet (as well as how they would work). I agree, some kind of inter subnet query will have to be used to make this work consistently with the rest of IBA. It looks to me like we overall need to have this look like: - Routers need to be able to support inter-subnet reversible paths to meed the requirements for CM. - Inter-subnet reversible paths are defined to mean that when the LRH is selected on the destination subnet by the router it is reversible. - This can be signaled by using TClass and/or FlowLabel fields in the GRH. - Routers need to be able to produce knowable SLIDs to meet the QP LID matching requirement - The LID to use can be signaled by using TClass and/or FlowLabel - A kind of inter-subnet path record query is needed that can return a local and remote GRH and LRH. These four structures need to be *linked* so that: - Side A GRH.SGID = active side's Port GID - Side A GRH.DGID = passive side's Port GID - Side A LRH.SLID = any active side's port LID - Side A LRH.DLID = A subnet router - Side A LRH.SL = SL to A subnet router - Side B GRH.SGID = Side A GRH.DGID - Side B GRH.DGID = Side A GRH.SGID - Side B LRH.SLID = any passive side's port LID - Side B LRH.DLID = B subnet router - Side B LRH.SL = SL to B subnet router - When the A subnet router sees Side B GRH it produces LRH.SLID = Side A LRH.DLID LRH.DLID = Side A LRH.SLID LRH.SL = SL to Side A Active side (may be != to Side A LRH.SL) - Similarly for Side B. This linkage requirement is necessary due to the QP LID matching rules. I'm imagining that like SL the GRH.TClass and GRH.FlowLabel could be different in each direction. I'd think of this query as a generic duplex PathRecord query. Off hand I don't see that the existing path record query structure has enough information to do this.. Particularly, in cases where each subnet has more than 1 router port there is no real guarentee that querying for the SGID -> DGID direction and then the DGID -> SGID direction uses the same router ports without providing both router LIDs as part of the query. Whatever responds to this query must be interacting with the router(s) to ensure they recognize the GRHs and produce LRHs to meet all the above requirements. ** The hackish and simple thing to do right now is to just demand that routers *always* use reversible LRHs with a single SLID and have the passive side pick up the QP lids from the LRH if it is routed.. Jason From halr at voltaire.com Fri Feb 9 11:47:23 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 14:47:23 -0500 Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation In-Reply-To: <039701c7494b$6bd5d860$1914a8c0@surioffice> References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com> <1170072757.4555.242192.camel@hal.voltaire.com> <039701c7494b$6bd5d860$1914a8c0@surioffice> Message-ID: <1171050441.31538.180858.camel@hal.voltaire.com> Suri, On Mon, 2007-02-05 at 12:31, Suresh Shelvapille wrote: > Hal: > > We are upgrading to 2.6.19.1 kernel and I finally ported the changes > required for Switch operation from my current kernel (2.6.12) version. > > I have tested these changes for a switch with different SM(s). But I need > the community's help to test the changes on different HCAs to make sure I > have not broken anything. > > Please see if the changes look OK. Here are my initial comments on these patches based only on code inspection: mad.c: @@ -1871,24 +1877,49 @@ ... if (recv->mad.mad.mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { - if (!smi_handle_dr_smp_recv(&recv->mad.smp, - port_priv->device->node_type, - port_priv->port_num, - port_priv->device->phys_port_cnt)) - goto out; - if (!smi_check_forward_dr_smp(&recv->mad.smp)) - goto local; - if (!smi_handle_dr_smp_send(&recv->mad.smp, - port_priv->device->node_type, - port_priv->port_num)) + + int retsmi; + + retsmi = smi_handle_dr_smp_recv(&recv->mad.smp, + port_priv->device->node_type, + port_num, + port_priv->device->phys_port_cnt); + if (!retsmi) goto out; - if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) + else if (retsmi == 2) { + if (!response) { + printk(KERN_ERR PFX "No memory for forwarded MAD\n"); + goto out; + } + memcpy(response, recv, sizeof(*response)); + response->header.recv_wc.wc = &response->header.wc; + response->header.recv_wc.recv_buf.mad = &response->mad.mad; + response->header.recv_wc.recv_buf.grh = &response->grh; + + /* in case of forward, output port should be the one + * in either the Initial path(for outgoing) or return_path(return) + */ + if (!ib_get_smp_direction(&recv->mad.smp)) + port_num = recv->mad.smp.initial_path[recv->mad.smp.hop_ptr+1]; + else + port_num = recv->mad.smp.return_path[recv->mad.smp.hop_ptr-1]; + + if (!agent_send_response(&response->mad.mad, + &response->grh, wc, + port_priv->device, + port_num, + qp_info->qp->qp_num)) + response = NULL; Per the above change, it appears that smi_check_forward_dr_smp and smi_handle_dr_smp_send are no longer used at least here (smi_check_forward_dr_smp is not used at all with this change). Couldn't these be fixed to do the right thing for this case (as well as existing cases) ? I'm not sure your changes work for end ports (CA and router ports). Also, based on smi comments below, there might also be changes to following: + if (!ib_get_smp_direction(&recv->mad.smp)) + port_num = recv->mad.smp.initial_path[recv->mad.smp.hop_ptr+1]; + else + port_num = recv->mad.smp.return_path[recv->mad.smp.hop_ptr-1]; + smi.c: @@ -147,13 +147,18 @@ ... /* C14-9:3 -- We're at the end of the DR segment of path */ if (hop_ptr == hop_cnt) { if (hop_cnt) smp->return_path[hop_ptr] = port_num; + smp->hop_ptr++; + /* smp->hop_ptr updated when sending */ The comment indicates the hop_ptr should be updated when sending not here. Can't this be done ? @@ -168,8 +173,8 @@ /* C14-13:1 */ if (hop_cnt && hop_ptr == hop_cnt + 1) { - smp->hop_ptr--; - return (smp->return_path[smp->hop_ptr] == + /* smp->hop_ptr--;*/ + return (smp->return_path[smp->hop_ptr-1] == port_num); } This change affects more than switches as now the hop_ptr is not correct per SMI. I think this also should be handled differently. agent.c: @@ -113,6 +119,11 @@ memcpy(send_buf->mad, mad, sizeof *mad); send_buf->ah = ah; + mad_send_wr = container_of(send_buf, + struct ib_mad_send_wr_private, + send_buf); + mad_send_wr->send_wr.wr.ud.port_num = port_num; + if ((ret = ib_post_send_mad(send_buf, NULL))) { Shouldn't this only be for switches ? Not sure it causes a problem for other than switches, but I think would be more consistent with the current code. So I think this change should be surrounded by: if (device->node_type == RDMA_NODE_IB_SWITCH) { ... } -- Hal > Thanks, > Suri From halr at voltaire.com Fri Feb 9 12:01:59 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 15:01:59 -0500 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45CCD1E2.5050806@ichips.intel.com> References: <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> <45CCD1E2.5050806@ichips.intel.com> Message-ID: <1171051315.31538.181667.camel@hal.voltaire.com> On Fri, 2007-02-09 at 14:56, Sean Hefty wrote: > I don't see a way to issue the SA query to the remote subnet though. Even though SA queries can go intersubnet as they are GMPs and can contain a GRH, the /missing part (right now) is locating the SA on that remote subnet if this is a needed function. In any case, as there needs to be some SA PathRecord forwarding from SM to SM on a per subnet basis from source to destination, this will need to be solved (at least the SMs will likely know and that could be exposed as well to SA clients). -- Hal From mshefty at ichips.intel.com Fri Feb 9 11:56:18 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Feb 2007 11:56:18 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070209192046.GP11411@obsidianresearch.com> References: <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> Message-ID: <45CCD1E2.5050806@ichips.intel.com> > - A kind of inter-subnet path record query is needed that can > return a local and remote GRH and LRH. These four structures need to > be *linked* so that: > - Side A GRH.SGID = active side's Port GID > - Side A GRH.DGID = passive side's Port GID > - Side A LRH.SLID = any active side's port LID > - Side A LRH.DLID = A subnet router > - Side A LRH.SL = SL to A subnet router > > - Side B GRH.SGID = Side A GRH.DGID > - Side B GRH.DGID = Side A GRH.SGID > - Side B LRH.SLID = any passive side's port LID > - Side B LRH.DLID = B subnet router > - Side B LRH.SL = SL to B subnet router Something along this line is what I was considering as well. > Off hand I don't see that the existing path record query structure > has enough information to do this.. Particularly, in cases > where each subnet has more than 1 router port there is no real > guarentee that querying for the SGID -> DGID direction and then the > DGID -> SGID direction uses the same router ports without providing > both router LIDs as part of the query. I'm trying to figure out a way to get this information, but I'm still at a loss. If there was a way to query both the local SA and remote SA using the same SGID/DGID pair, it's possible that the combined path records could be used to form this data. I.e. set SGID = local port, and DGID = remote port for both queries. I don't see a way to issue the SA query to the remote subnet though. > ** The hackish and simple thing to do right now is to just demand that > routers *always* use reversible LRHs with a single SLID and have the > passive side pick up the QP lids from the LRH if it is routed.. Yep - we also need to hack the CM to set/replace the SLID/DLID carried in the CM REQ. - Sean From michael.arndt at informatik.tu-chemnitz.de Fri Feb 9 12:19:04 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Fri, 9 Feb 2007 21:19:04 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> Message-ID: <001001c74c87$8b653470$21606d86@one7> Hi, > It is strange, I did similar thing (you can see in > management/diags/src/mcm_rereg_test.c) and it worked fine for me. What location is that? >Which libibumad version you are using? Also I understand you did some >changes in the stack, is it related to user_mad? Could you publish this? I use OFED-1.1 and attached libibumad version. The stack where I have tested this context wasn't changed to exclude this. It is a diploma thesis and will publish as soon as posible ;)...in german ...sorry. The hole example code Hal was asking for is below. I have marked the position with /* here */. Currently is the retry parameter zero, but I also tested 3. Thanks Michael // ---- Includes -------------------------------- #include #include #include #include "sender.h" // ---- Defines und Deklarationen --------------- static const uint8_t CLASS_SUBN_DIRECTED_ROUTE = 0x81; static const uint8_t CLASS_SUBN_LID_ROUTE = 0x1; static int long drmad_tid = 0x123; // Prototypes void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod); void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void *data); char * drmad_status_str(struct drsmp *drsmp); int str2DRPath(char *str, DRPath *path); int set_bit(int nr, void *method_mask); // ---- Main ------------------------------------ int main (void){ int Port_ID = 0; int Agent_ID = 0; int ret; int i; int length, timeout_ms = 10000; void *umad; struct drsmp *smp; // ---- Einstellungen --------------------------- int Portnummer = 1; char Devicename [2][UMAD_CA_NAME_LEN]; DRPath Path; char Path_Str[64]; uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo int modifier = 1; struct _register_info{ int Management_Class; int Management_Version; uint8_t RMPP_Version; uint32_t Method_Mask[4]; } Register_Info; // ++ Wertzuweisung ++ Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE; Register_Info.Management_Version = 1; Register_Info.RMPP_Version = 0; set_bit(0x01,&(Register_Info.Method_Mask)); set_bit(0x02,&(Register_Info.Method_Mask)); set_bit(0x81,&(Register_Info.Method_Mask)); set_bit(0x03,&(Register_Info.Method_Mask)); set_bit(0x05,&(Register_Info.Method_Mask)); set_bit(0x06,&(Register_Info.Method_Mask)); sprintf(Path_Str,"0,1,1,1"); // ---- Init Phase ------------------------------ printf("... Init Lib ..."); umad_init(); printf("done\n\n"); // ++ Debug ++ umad_debug(0); printf("... Get CAs Names ..."); ret = umad_get_cas_names(Devicename,2); if (!ret) { printf("Fehler: umad_get_cas_names: %i\n",ret); return -1; } else { printf("done\n\n"); for (i = 0;i < ret;i++){ printf("Devicename: %s\n",Devicename[i]); } } // ++ Open ++ printf("... Open Port ..."); if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0) { printf("Fehler: umad_open_port: %i\n",Port_ID); return -1; } else printf("done\n\n"); // ++ Register ++ printf("... Register User Mad ..."); if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class, Register_Info.Management_Version, Register_Info.RMPP_Version, 0)) < 0){ printf("Fehler: umad_register : %i\n",Agent_ID); goto Exit; } else printf("done\n\n"); // ---- Paket bauen ----------------------------- printf("... Paket allokieren ..."); if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){ printf("Fehler: umad_alloc\n"); goto Exit; } printf("done\n\n"); smp = umad_get_mad(umad); printf("... Smp Pointer ... done\n"); if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n"); printf("... SMP bauen ..."); drsmp_get_init(umad,&Path,attribute,modifier); printf("... done ...\n\n"); //xdump(stderr, "before send:\n", smp, 256); dump_dr_smp(smp); length = IB_MAD_SIZE; /* here */ for (i = 0; i < 10; i++){ printf("... Send Mad ..."); if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0) printf("Fehler: umad_send : %i\n",ret); else printf("done\n\n"); } /* for (i = 0; i < 10; i++){ printf("... Recv Mad ..."); if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID) printf("Fehler umad_recv: %s\n", drmad_status_str(smp)); else printf("done\n\n"); } */ dump_dr_smp(smp); switch (attribute){ case MAD_ATTR_NODE_INFO : dump_node_info((const struct node_info*)&(smp->data[0])); break; case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct port_info*)&(smp->data[0])); break; } // ---- Down Phase ------------------------------ Exit: printf("... Unregister User Mad ..."); if (umad_unregister(Port_ID,Agent_ID) < 0) printf("Fehler: umad_unregister\n"); else printf("done\n\n"); printf("... Close Port ..."); if (Port_ID != -1) if ((umad_close_port(Port_ID)) != 0){ printf("Fehler: umad_close_port\n"); } else printf("done\n\n"); else printf("nix zu tun\n\n"); } // ---- SMP Paket ------------------------------- void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod) { struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); memset(smp, 0, sizeof (*smp)); smp->base_version = 1; smp->mgmt_class = CLASS_SUBN_DIRECTED_ROUTE; smp->class_version = 1; smp->method = 0x01; smp->attr_id = (uint16_t)htons((uint16_t)attr); smp->attr_mod = htonl(mod); smp->tid = htonll(drmad_tid++); smp->dr_slid = 0xffff; smp->dr_dlid = 0xffff; umad_set_addr(umad, 0xffff, 0, 0, 0); if (path) memcpy(smp->initial_path, path->path, path->hop_cnt+1); smp->hop_cnt = path->hop_cnt; } void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void *data) { struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); memset(smp, 0, sizeof (*smp)); smp->method = 2; /* SET */ smp->attr_id = (uint16_t)htons((uint16_t)attr); smp->attr_mod = htonl(mod); smp->tid = htonll(drmad_tid++); smp->dr_slid = 0xffff; smp->dr_dlid = 0xffff; umad_set_addr(umad, 0xffff, 0, 0, 0); if (path) memcpy(smp->initial_path, path->path, path->hop_cnt+1); if (data) memcpy(smp->data, data, sizeof smp->data); smp->hop_cnt = path->hop_cnt; } int str2DRPath(char *str, DRPath *path) { char *s; path->hop_cnt = -1; //DEBUG("DR str: %s", str); while (str && *str) { if ((s = strchr(str, ','))) *s = 0; path->path[++path->hop_cnt] = atoi(str); if (!s) break; str = s+1; } #if 0 if (path->path[0] != 0 || (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) { DEBUG("hop 0 != 0 or hop 1 != dev_port"); return -1; } #endif return path->hop_cnt; } From mshefty at ichips.intel.com Fri Feb 9 12:34:40 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Feb 2007 12:34:40 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <1171051315.31538.181667.camel@hal.voltaire.com> References: <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> <45CCD1E2.5050806@ichips.intel.com> <1171051315.31538.181667.camel@hal.voltaire.com> Message-ID: <45CCDAE0.1080102@ichips.intel.com> > the /missing part (right now) is locating the SA on that > remote subnet if this is a needed function. Maybe we can expose this to SA clients through a ServiceRecord? This doesn't solve how the two SAs find each other (or any of the other difficult stuff), but with this and the path record query ability that we mentioned, I think we may have a solution for the host stack. - Sean From sashak at voltaire.com Fri Feb 9 13:41:19 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 09 Feb 2007 23:41:19 +0200 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <001001c74c87$8b653470$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> <001001c74c87$8b653470$21606d86@one7> Message-ID: <1171057279.2767.20.camel@localhost> On Fri, 2007-02-09 at 21:19 +0100, Michael Arndt wrote: > Hi, > > > It is strange, I did similar thing (you can see in > > management/diags/src/mcm_rereg_test.c) and it worked fine for me. > > What location is that? Do git clone git://git.openfabrics.org/~halr/management and find this as management/diags/src/mcm_rereg_test.c . Or you can look at this via gitweb interface: http://git.openfabrics.org/git Sasha > >Which libibumad version you are using? Also I understand you did some > >changes in the stack, is it related to user_mad? Could you publish this? > > I use OFED-1.1 and attached libibumad version. The stack where I have tested > this context wasn't changed to exclude this. It is a diploma thesis and will > publish as soon as posible ;)...in german ...sorry. > > The hole example code Hal was asking for is below. I have marked the > position with /* here */. Currently is the retry parameter zero, but I also > tested 3. > > Thanks Michael > > // ---- Includes -------------------------------- > #include > #include > #include > > #include "sender.h" > > // ---- Defines und Deklarationen --------------- > > static const uint8_t CLASS_SUBN_DIRECTED_ROUTE = 0x81; > static const uint8_t CLASS_SUBN_LID_ROUTE = 0x1; > > static int long drmad_tid = 0x123; > > // Prototypes > > void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod); > void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void > *data); > char * drmad_status_str(struct drsmp *drsmp); > int str2DRPath(char *str, DRPath *path); > int set_bit(int nr, void *method_mask); > > > > // ---- Main ------------------------------------ > > int main (void){ > > int Port_ID = 0; > int Agent_ID = 0; > int ret; > int i; > int length, timeout_ms = 10000; > > > void *umad; > struct drsmp *smp; > > > // ---- Einstellungen --------------------------- > int Portnummer = 1; > char Devicename [2][UMAD_CA_NAME_LEN]; > DRPath Path; > char Path_Str[64]; > > uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo > int modifier = 1; > > struct _register_info{ > int Management_Class; > int Management_Version; > uint8_t RMPP_Version; > uint32_t Method_Mask[4]; > } Register_Info; > > // ++ Wertzuweisung ++ > > Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE; > Register_Info.Management_Version = 1; > Register_Info.RMPP_Version = 0; > > set_bit(0x01,&(Register_Info.Method_Mask)); > set_bit(0x02,&(Register_Info.Method_Mask)); > set_bit(0x81,&(Register_Info.Method_Mask)); > set_bit(0x03,&(Register_Info.Method_Mask)); > set_bit(0x05,&(Register_Info.Method_Mask)); > set_bit(0x06,&(Register_Info.Method_Mask)); > > sprintf(Path_Str,"0,1,1,1"); > > > // ---- Init Phase ------------------------------ > printf("... Init Lib ..."); > umad_init(); > printf("done\n\n"); > > // ++ Debug ++ > umad_debug(0); > > printf("... Get CAs Names ..."); > ret = umad_get_cas_names(Devicename,2); > if (!ret) { > printf("Fehler: umad_get_cas_names: %i\n",ret); > return -1; > } > else { > printf("done\n\n"); > for (i = 0;i < ret;i++){ > printf("Devicename: %s\n",Devicename[i]); > } > > } > // ++ Open ++ > printf("... Open Port ..."); > if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0) > { > printf("Fehler: umad_open_port: %i\n",Port_ID); > return -1; > } > else printf("done\n\n"); > // ++ Register ++ > printf("... Register User Mad ..."); > if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class, > Register_Info.Management_Version, > Register_Info.RMPP_Version, > 0)) < 0){ > printf("Fehler: umad_register : %i\n",Agent_ID); > goto Exit; > } > else printf("done\n\n"); > // ---- Paket bauen ----------------------------- > > printf("... Paket allokieren ..."); > if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){ > printf("Fehler: umad_alloc\n"); > goto Exit; > } > printf("done\n\n"); > > smp = umad_get_mad(umad); > printf("... Smp Pointer ... done\n"); > > if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n"); > > printf("... SMP bauen ..."); > drsmp_get_init(umad,&Path,attribute,modifier); > printf("... done ...\n\n"); > > > //xdump(stderr, "before send:\n", smp, 256); > dump_dr_smp(smp); > > length = IB_MAD_SIZE; > > /* here */ > for (i = 0; i < 10; i++){ > printf("... Send Mad ..."); > if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0) > printf("Fehler: umad_send : %i\n",ret); > else printf("done\n\n"); > } > > /* > for (i = 0; i < 10; i++){ > printf("... Recv Mad ..."); > if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID) > printf("Fehler umad_recv: %s\n", drmad_status_str(smp)); > else printf("done\n\n"); > } > */ > > dump_dr_smp(smp); > switch (attribute){ > case MAD_ATTR_NODE_INFO : dump_node_info((const struct > node_info*)&(smp->data[0])); break; > case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct > port_info*)&(smp->data[0])); break; > } > > > // ---- Down Phase ------------------------------ > Exit: > printf("... Unregister User Mad ..."); > if (umad_unregister(Port_ID,Agent_ID) < 0) > printf("Fehler: umad_unregister\n"); > else printf("done\n\n"); > > printf("... Close Port ..."); > if (Port_ID != -1) > if ((umad_close_port(Port_ID)) != 0){ > printf("Fehler: umad_close_port\n"); > } > else printf("done\n\n"); > else printf("nix zu tun\n\n"); > > } > > // ---- SMP Paket ------------------------------- > > > void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod) > { > struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); > > memset(smp, 0, sizeof (*smp)); > > smp->base_version = 1; > smp->mgmt_class = CLASS_SUBN_DIRECTED_ROUTE; > smp->class_version = 1; > > smp->method = 0x01; > smp->attr_id = (uint16_t)htons((uint16_t)attr); > smp->attr_mod = htonl(mod); > smp->tid = htonll(drmad_tid++); > smp->dr_slid = 0xffff; > smp->dr_dlid = 0xffff; > > umad_set_addr(umad, 0xffff, 0, 0, 0); > > if (path) > memcpy(smp->initial_path, path->path, path->hop_cnt+1); > > smp->hop_cnt = path->hop_cnt; > } > > void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void > *data) > { > struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); > > memset(smp, 0, sizeof (*smp)); > > smp->method = 2; /* SET */ > smp->attr_id = (uint16_t)htons((uint16_t)attr); > smp->attr_mod = htonl(mod); > smp->tid = htonll(drmad_tid++); > smp->dr_slid = 0xffff; > smp->dr_dlid = 0xffff; > > umad_set_addr(umad, 0xffff, 0, 0, 0); > > if (path) > memcpy(smp->initial_path, path->path, path->hop_cnt+1); > > if (data) > memcpy(smp->data, data, sizeof smp->data); > > smp->hop_cnt = path->hop_cnt; > } > > int str2DRPath(char *str, DRPath *path) > { > char *s; > > path->hop_cnt = -1; > > //DEBUG("DR str: %s", str); > while (str && *str) { > if ((s = strchr(str, ','))) > *s = 0; > path->path[++path->hop_cnt] = atoi(str); > if (!s) > break; > str = s+1; > } > > #if 0 > if (path->path[0] != 0 || > (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) { > DEBUG("hop 0 != 0 or hop 1 != dev_port"); > return -1; > } > #endif > > return path->hop_cnt; > } > > From halr at voltaire.com Fri Feb 9 13:45:29 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 16:45:29 -0500 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070209192046.GP11411@obsidianresearch.com> References: <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> Message-ID: <1171057501.31538.187596.camel@hal.voltaire.com> On Fri, 2007-02-09 at 14:20, Jason Gunthorpe wrote: > On Fri, Feb 09, 2007 at 12:58:51PM -0500, Hal Rosenstock wrote: > > > For simplicity, assume a single path. My assumption in this case was that the > > > SLID/DLID values would be reversed. That is, the LIDs are relative to the local > > > subnet, not the SGID. But if I set the SGID = DGID = remote GID, then the LIDs > > > would be relative to the remote subnet. (Assuming that the local SA could > > > support such a query at all.) > > > > > > It seems that in order to meet the requirements of the spec, we need a way to > > > perform inter-subnet queries. (The alternative being to change the spec...) > > > And if the local SA can return a path record to a remote DGID, then it also > > > seems like the local SA must be able to collect some sort of information about > > > the path to the remote subnet. (How it does this seems TBD.) > > > > > > So... I'm thinking that the solution to these problems should rest within the > > > local SA... > > > > Yes, this seems most consistent with what is there now although there > > are some issues to work out on how some of the fields are supported and > > which queries would work intersubnet (as well as how they would work). > > I agree, some kind of inter subnet query will have to be used to make > this work consistently with the rest of IBA. > > It looks to me like we overall need to have this look like: > - Routers need to be able to support inter-subnet reversible paths > to meed the requirements for CM. > - Inter-subnet reversible paths are defined to mean that when the LRH > is selected on the destination subnet by the router it is reversible. > - This can be signaled by using TClass and/or FlowLabel fields in the GRH. > - Routers need to be able to produce knowable SLIDs to meet the QP LID > matching requirement > - The LID to use can be signaled by using TClass and/or FlowLabel > - A kind of inter-subnet path record query is needed that can > return a local and remote GRH and LRH. These four structures need to > be *linked* so that: > - Side A GRH.SGID = active side's Port GID > - Side A GRH.DGID = passive side's Port GID > - Side A LRH.SLID = any active side's port LID > - Side A LRH.DLID = A subnet router > - Side A LRH.SL = SL to A subnet router > > - Side B GRH.SGID = Side A GRH.DGID > - Side B GRH.DGID = Side A GRH.SGID > - Side B LRH.SLID = any passive side's port LID > - Side B LRH.DLID = B subnet router > - Side B LRH.SL = SL to B subnet router > > - When the A subnet router sees Side B GRH it produces > LRH.SLID = Side A LRH.DLID > LRH.DLID = Side A LRH.SLID > LRH.SL = SL to Side A Active side (may be != to Side A LRH.SL) > - Similarly for Side B. > > This linkage requirement is necessary due to the QP LID matching > rules. I'm imagining that like SL the GRH.TClass and GRH.FlowLabel > could be different in each direction. > > I'd think of this query as a generic duplex PathRecord query. > > Off hand I don't see that the existing path record query structure > has enough information to do this.. Particularly, in cases > where each subnet has more than 1 router port there is no real > guarentee that querying for the SGID -> DGID direction and then the > DGID -> SGID direction uses the same router ports without providing > both router LIDs as part of the query. Router LIDs rather than GIDs (in the case of LMC > 0) ? The SA PathRecord may have room but the MultiPathRecord is pretty tightly packed now. -- Hal > Whatever responds to this query must be interacting with the router(s) > to ensure they recognize the GRHs and produce LRHs to meet all the > above requirements. > > ** The hackish and simple thing to do right now is to just demand that > routers *always* use reversible LRHs with a single SLID and have the > passive side pick up the QP lids from the LRH if it is routed.. > > Jason From halr at voltaire.com Fri Feb 9 13:54:48 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 16:54:48 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <001001c74c87$8b653470$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> <001001c74c87$8b653470$21606d86@one7> Message-ID: <1171058084.31538.188191.camel@hal.voltaire.com> On Fri, 2007-02-09 at 15:19, Michael Arndt wrote: > Hi, > > > It is strange, I did similar thing (you can see in > > management/diags/src/mcm_rereg_test.c) and it worked fine for me. > > What location is that? > > >Which libibumad version you are using? Also I understand you did some > >changes in the stack, is it related to user_mad? Could you publish this? > > I use OFED-1.1 and attached libibumad version. The stack where I have tested > this context wasn't changed to exclude this. It is a diploma thesis and will > publish as soon as posible ;)...in german ...sorry. > > The hole example code Hal was asking for is below. I have marked the > position with /* here */. Currently is the retry parameter zero, but I also > tested 3. > > Thanks Michael > > // ---- Includes -------------------------------- > #include > #include > #include > > #include "sender.h" Can you provide this as well ? -- Hal > > // ---- Defines und Deklarationen --------------- > > static const uint8_t CLASS_SUBN_DIRECTED_ROUTE = 0x81; > static const uint8_t CLASS_SUBN_LID_ROUTE = 0x1; > > static int long drmad_tid = 0x123; > > // Prototypes > > void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod); > void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void > *data); > char * drmad_status_str(struct drsmp *drsmp); > int str2DRPath(char *str, DRPath *path); > int set_bit(int nr, void *method_mask); > > > > // ---- Main ------------------------------------ > > int main (void){ > > int Port_ID = 0; > int Agent_ID = 0; > int ret; > int i; > int length, timeout_ms = 10000; > > > void *umad; > struct drsmp *smp; > > > // ---- Einstellungen --------------------------- > int Portnummer = 1; > char Devicename [2][UMAD_CA_NAME_LEN]; > DRPath Path; > char Path_Str[64]; > > uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo > int modifier = 1; > > struct _register_info{ > int Management_Class; > int Management_Version; > uint8_t RMPP_Version; > uint32_t Method_Mask[4]; > } Register_Info; > > // ++ Wertzuweisung ++ > > Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE; > Register_Info.Management_Version = 1; > Register_Info.RMPP_Version = 0; > > set_bit(0x01,&(Register_Info.Method_Mask)); > set_bit(0x02,&(Register_Info.Method_Mask)); > set_bit(0x81,&(Register_Info.Method_Mask)); > set_bit(0x03,&(Register_Info.Method_Mask)); > set_bit(0x05,&(Register_Info.Method_Mask)); > set_bit(0x06,&(Register_Info.Method_Mask)); > > sprintf(Path_Str,"0,1,1,1"); > > > // ---- Init Phase ------------------------------ > printf("... Init Lib ..."); > umad_init(); > printf("done\n\n"); > > // ++ Debug ++ > umad_debug(0); > > printf("... Get CAs Names ..."); > ret = umad_get_cas_names(Devicename,2); > if (!ret) { > printf("Fehler: umad_get_cas_names: %i\n",ret); > return -1; > } > else { > printf("done\n\n"); > for (i = 0;i < ret;i++){ > printf("Devicename: %s\n",Devicename[i]); > } > > } > // ++ Open ++ > printf("... Open Port ..."); > if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0) > { > printf("Fehler: umad_open_port: %i\n",Port_ID); > return -1; > } > else printf("done\n\n"); > // ++ Register ++ > printf("... Register User Mad ..."); > if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class, > Register_Info.Management_Version, > Register_Info.RMPP_Version, > 0)) < 0){ > printf("Fehler: umad_register : %i\n",Agent_ID); > goto Exit; > } > else printf("done\n\n"); > // ---- Paket bauen ----------------------------- > > printf("... Paket allokieren ..."); > if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){ > printf("Fehler: umad_alloc\n"); > goto Exit; > } > printf("done\n\n"); > > smp = umad_get_mad(umad); > printf("... Smp Pointer ... done\n"); > > if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n"); > > printf("... SMP bauen ..."); > drsmp_get_init(umad,&Path,attribute,modifier); > printf("... done ...\n\n"); > > > //xdump(stderr, "before send:\n", smp, 256); > dump_dr_smp(smp); > > length = IB_MAD_SIZE; > > /* here */ > for (i = 0; i < 10; i++){ > printf("... Send Mad ..."); > if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0) > printf("Fehler: umad_send : %i\n",ret); > else printf("done\n\n"); > } > > /* > for (i = 0; i < 10; i++){ > printf("... Recv Mad ..."); > if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID) > printf("Fehler umad_recv: %s\n", drmad_status_str(smp)); > else printf("done\n\n"); > } > */ > > dump_dr_smp(smp); > switch (attribute){ > case MAD_ATTR_NODE_INFO : dump_node_info((const struct > node_info*)&(smp->data[0])); break; > case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct > port_info*)&(smp->data[0])); break; > } > > > // ---- Down Phase ------------------------------ > Exit: > printf("... Unregister User Mad ..."); > if (umad_unregister(Port_ID,Agent_ID) < 0) > printf("Fehler: umad_unregister\n"); > else printf("done\n\n"); > > printf("... Close Port ..."); > if (Port_ID != -1) > if ((umad_close_port(Port_ID)) != 0){ > printf("Fehler: umad_close_port\n"); > } > else printf("done\n\n"); > else printf("nix zu tun\n\n"); > > } > > // ---- SMP Paket ------------------------------- > > > void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod) > { > struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); > > memset(smp, 0, sizeof (*smp)); > > smp->base_version = 1; > smp->mgmt_class = CLASS_SUBN_DIRECTED_ROUTE; > smp->class_version = 1; > > smp->method = 0x01; > smp->attr_id = (uint16_t)htons((uint16_t)attr); > smp->attr_mod = htonl(mod); > smp->tid = htonll(drmad_tid++); > smp->dr_slid = 0xffff; > smp->dr_dlid = 0xffff; > > umad_set_addr(umad, 0xffff, 0, 0, 0); > > if (path) > memcpy(smp->initial_path, path->path, path->hop_cnt+1); > > smp->hop_cnt = path->hop_cnt; > } > > void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void > *data) > { > struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); > > memset(smp, 0, sizeof (*smp)); > > smp->method = 2; /* SET */ > smp->attr_id = (uint16_t)htons((uint16_t)attr); > smp->attr_mod = htonl(mod); > smp->tid = htonll(drmad_tid++); > smp->dr_slid = 0xffff; > smp->dr_dlid = 0xffff; > > umad_set_addr(umad, 0xffff, 0, 0, 0); > > if (path) > memcpy(smp->initial_path, path->path, path->hop_cnt+1); > > if (data) > memcpy(smp->data, data, sizeof smp->data); > > smp->hop_cnt = path->hop_cnt; > } > > int str2DRPath(char *str, DRPath *path) > { > char *s; > > path->hop_cnt = -1; > > //DEBUG("DR str: %s", str); > while (str && *str) { > if ((s = strchr(str, ','))) > *s = 0; > path->path[++path->hop_cnt] = atoi(str); > if (!s) > break; > str = s+1; > } > > #if 0 > if (path->path[0] != 0 || > (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) { > DEBUG("hop 0 != 0 or hop 1 != dev_port"); > return -1; > } > #endif > > return path->hop_cnt; > } > > From halr at voltaire.com Fri Feb 9 13:48:47 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Feb 2007 16:48:47 -0500 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45CCDAE0.1080102@ichips.intel.com> References: <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> <45CCD1E2.5050806@ichips.intel.com> <1171051315.31538.181667.camel@hal.voltaire.com> <45CCDAE0.1080102@ichips.intel.com> Message-ID: <1171057719.31538.187820.camel@hal.voltaire.com> On Fri, 2007-02-09 at 15:34, Sean Hefty wrote: > > the /missing part (right now) is locating the SA on that > > remote subnet if this is a needed function. > > Maybe we can expose this to SA clients through a ServiceRecord? That might be one way if there were a standardized service name for SA and there was some way to globally distribute those across SAs. The hard part is the global distribution of this information. -- Hal > This doesn't > solve how the two SAs find each other (or any of the other difficult stuff), but > with this and the path record query ability that we mentioned, I think we may > have a solution for the host stack. > > - Sean From panda at cse.ohio-state.edu Fri Feb 9 14:28:16 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri, 9 Feb 2007 17:28:16 -0500 (EST) Subject: [openib-general] MVAPICH 0.9.9-beta release is available Message-ID: <200702092228.l19MSGEo006670@xi.cse.ohio-state.edu> The MVAPICH team is pleased to announce the availability of MVAPICH 0.9.9-beta with the following NEW features: - Message coalescing support to enable reduction of per Queue-pair send queues for reduction in memory requirement on large scale clusters. This design also increases the small message messaging rate significantly. - Designs for avoiding hot-spots in networks of large-scale clusters - Multi-pathing support leveraging LMC mechanism - Multi-port support for enabling user processes to bind to different IB ports for balanced communication performance on multi-core platforms - Multi-core optimized scalable shared memory design - Memory Hook support provided by integration with ptmalloc2 library. This provides safe release of memory to the Operating System and is expected to benefit the memory usage of applications that frequently use malloc and free operations. - Optimized, high-performance shared memory aware collective operations for multi-core platforms - Shared-Memory only channel (This interface support is useful for running MPI jobs on multi-processor systems without using any high-performance network. For example, multi-core servers, desktops, and laptops; and clusters with serial nodes.) A new "Multiple-pair Bandwidth and Message Rate" test is also available as a part of OSU_Benchmarks. For downloading MVAPICH 0.9.9-beta package and accessing the anonymous SVN, please visit the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ MVAPICH 0.9.9-beta is also available for OFED 1.2 testing. All feedbacks, including bug reports and hints for performance tuning, are welcome. Please post it to the mvapich-discuss mailing list. Thanks, MVAPICH Team From jgunthorpe at obsidianresearch.com Fri Feb 9 14:38:45 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 9 Feb 2007 15:38:45 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <1171057501.31538.187596.camel@hal.voltaire.com> References: <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> <1171057501.31538.187596.camel@hal.voltaire.com> Message-ID: <20070209223845.GR11411@obsidianresearch.com> On Fri, Feb 09, 2007 at 04:45:29PM -0500, Hal Rosenstock wrote: > > Off hand I don't see that the existing path record query structure > > has enough information to do this.. Particularly, in cases > > where each subnet has more than 1 router port there is no real > > guarentee that querying for the SGID -> DGID direction and then the > > DGID -> SGID direction uses the same router ports without providing > > both router LIDs as part of the query. > > Router LIDs rather than GIDs (in the case of LMC > 0) ? Yes, it is the router LID that is matched by the QP, the router GID never makes it into any packets or PR responses. To elaborate on it.. Basically you need to specify the egress subnet *and* the egress router LID when constructing the path to handle the case of multiple fabric and router paths. The GID of the target and the LID of the target's router port is enough disambiguate all the possible multipaths down to a set that will match the QP programming. This is all because of the LID matching rules. The ultimate router egress LID must be controlled when establishing the path. It must match the DLID in the QP, so it must be specified when the path is looked up so that the SA/Routers/etc can provide a PR that meets the egress LID requirement. This is not just to ensure that the router selects the right LRH.SLID in the case of LMC >0 but to also ensure that the *right* router port is used in the case of multiple (redundant) routed paths. Basically the idea where each end of a RC QP could independently do a Path Record query for the remote GID cannot work due to the LID matching rule. Sean: Even if you can query both SA's there isn't enough information to force things to use the same router path in each direction. Jason From swise at opengridcomputing.com Fri Feb 9 14:46:57 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 09 Feb 2007 16:46:57 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain> Message-ID: <1171061217.4525.15.camel@stevo-desktop> > All 4 above cases were tested by injecting random error in > iw_conn_req_handler() and running rdma_bw/krping, they were > confirmed. I added the BUG_ON() to confirm the earlier check > for id_priv->refcount==0 should always be true (and could be > removed). Can you post the test case you're using for this? Steve. From mshefty at ichips.intel.com Fri Feb 9 15:08:12 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Feb 2007 15:08:12 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070209223845.GR11411@obsidianresearch.com> References: <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> <1171057501.31538.187596.camel@hal.voltaire.com> <20070209223845.GR11411@obsidianresearch.com> Message-ID: <45CCFEDC.3040700@ichips.intel.com> > Sean: Even if you can query both SA's there isn't enough information > to force things to use the same router path in each direction. My assumption is that the remote SA contains the necessary information about how a packet coming from the local SGID to the remote DGID would be routed on the remote subnet. The returned path record must specify the SLID that the remote router will send from, along with the DLID that the router will map the DGID to. Likewise for the local SA. As long as the path is reversible, then my expectation is that the local router will use the returned LIDs for packets coming from the remote DGID to the local SGID. The route itself is determined using the SGID, DGID, TClass, FlowLabel. So, as long as the two queries match on these fields, I would think that it would work. - Sean From michael.arndt at informatik.tu-chemnitz.de Fri Feb 9 15:14:35 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Sat, 10 Feb 2007 00:14:35 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> <001001c74c87$8b653470$21606d86@one7> <1171058084.31538.188191.camel@hal.voltaire.com> Message-ID: <000801c74ca0$108a83e0$21606d86@one7> Hi, below the two files missing, sender.h and helper.c. Thanks Michael ############################# Sender. h ############################################## // ---- Includes -------------------------------- #include #include #include #include #include #include // ---- Defines --------------------------------- #define IB_MAD_SIZE 256 #define UMAD_DEV_NAME_SZ 32 #define UMAD_DEV_FILE_SZ 256 #define DIRECTION (uint16_t)htons(0x8000) #define BUF_SIZE 4096 #define MCLASS_SUBN_DIR 0x81 #define MCLASS_SUBN_LID 0x01 #define MCLASS_SUBN_ADM 0x03 #define SMP_STATUS_MASK_HO 0x7FFF #define SMP_STATUS_MASK (uint16_t)htons(SMP_STATUS_MASK_HO) #define PRIx64 "lx" #define SM_METHOD_STR_UNKNOWN_VAL 0x21 #define NODE_INFO_PORT_NUM_MASK (ntohl(0xFF000000)) #define NODE_INFO_VEND_ID_MASK (ntohl(0x00FFFFFF)) #define MAD_ATTR_PORT_INFO 0x0015 #define MAD_ATTR_NODE_INFO 0x0011 #define MAD_ATTR_NODE_DESC 0x0010 #define IB_NOTICE_NODE_TYPE_ROUTER (ntohl(0x000003)) #define SM_ATTR_STR_UNKNOWN_VAL 0x21 #define PORT_LINK_SPEED_SHIFT 4 #define PORT_LINK_SPEED_SUPPORTED_MASK 0xF0 #define PORT_LINK_SPEED_ACTIVE_MASK 0xF0 #define PORT_LINK_SPEED_ENABLED_MASK 0x0F #define LINK_NO_CHANGE 0 #define LINK_DOWN 1 #define LINK_INIT 2 #define LINK_ARMED 3 #define LINK_ACTIVE 4 #define LINK_ACT_DEFER 5 #define PORT_STATE_MASK 0x0F #define PORT_LMC_MASK 0x07 #define PORT_LMC_MAX 0x07 #define PORT_MPB_MASK 0xC0 #define PORT_MPB_SHIFT 6 #define PORT_LINK_SPEED_SHIFT 4 #define PORT_LINK_SPEED_SUPPORTED_MASK 0xF0 #define PORT_LINK_SPEED_ACTIVE_MASK 0xF0 #define PORT_LINK_SPEED_ENABLED_MASK 0x0F #define PORT_PHYS_STATE_MASK 0xF0 #define PORT_PHYS_STATE_SHIFT 4 #define PORT_LNKDWNDFTSTATE_MASK 0x0F #ifndef __BYTE_ORDER #error "__BYTE_ORDER macro undefined. Missing in endian.h?" #endif #if __BYTE_ORDER == __LITTLE_ENDIAN #define CPU_LE 1 #define CPU_BE 0 #else #define CPU_LE 0 #define CPU_BE 1 #endif #if CPU_LE #define NODE_INFO_PORT_NUM_SHIFT 0 #else #define NODE_INFO_PORT_NUM_SHIFT 24 #endif #define own_ntoh64( x ) (uint64_t)( \ (((uint64_t)(x) & 0x00000000000000FFULL) << 56) | \ (((uint64_t)(x) & 0x000000000000FF00ULL) << 40) | \ (((uint64_t)(x) & 0x0000000000FF0000ULL) << 24) | \ (((uint64_t)(x) & 0x00000000FF000000ULL) << 8 ) | \ (((uint64_t)(x) & 0x000000FF00000000ULL) >> 8 ) | \ (((uint64_t)(x) & 0x0000FF0000000000ULL) >> 24) | \ (((uint64_t)(x) & 0x00FF000000000000ULL) >> 40) | \ (((uint64_t)(x) & 0xFF00000000000000ULL) >> 56) ) #define own_ntoh64_2( x ) (uint64_t)( \ (((uint64_t)(x) & 0x00000000000000FFULL) << 24) | \ (((uint64_t)(x) & 0x000000000000FF00ULL) << 8) | \ (((uint64_t)(x) & 0x0000000000FF0000ULL) >> 8) | \ (((uint64_t)(x) & 0x00000000FF000000ULL) >> 24 ) | \ (((uint64_t)(x) & 0x000000FF00000000ULL) << 24 ) | \ (((uint64_t)(x) & 0x0000FF0000000000ULL) << 8) | \ (((uint64_t)(x) & 0x00FF000000000000ULL) >> 8) | \ (((uint64_t)(x) & 0xFF00000000000000ULL) >> 24) ) // ---- Deklarationen --------------------------- struct Port { char dev_file[UMAD_DEV_FILE_SZ]; char dev_name[UMAD_DEV_NAME_SZ]; int dev_port; int dev_fd; int id; }; struct _register_info{ int Management_Class; int Management_Version; uint8_t RMPP_Version; uint32_t Method_Mask[4]; } Register_Info; typedef struct { char path[64]; int hop_cnt; } DRPath; struct drsmp { uint8_t base_version; uint8_t mgmt_class; uint8_t class_version; uint8_t method; uint16_t status; uint8_t hop_ptr; uint8_t hop_cnt; uint64_t tid; uint16_t attr_id; uint16_t resv; uint32_t attr_mod; uint64_t mkey; uint16_t dr_slid; uint16_t dr_dlid; uint32_t reserved[7]; uint8_t data[64]; uint8_t initial_path[64]; uint8_t return_path[64]; }; struct node_info { uint8_t base_version; uint8_t class_version; uint8_t node_type; uint8_t num_ports; uint64_t sys_guid; uint64_t node_guid; uint64_t port_guid; uint16_t partition_cap; uint16_t device_id; uint32_t revision; uint32_t port_num_vendor_id; }; struct port_info { uint64_t m_key; uint64_t subnet_prefix; uint16_t base_lid; uint16_t master_sm_base_lid; uint32_t capability_mask; uint16_t diag_code; uint16_t m_key_lease_period; uint8_t local_port_num; uint8_t link_width_enabled; uint8_t link_width_supported; uint8_t link_width_active; uint8_t state_info1; /* LinkSpeedSupported and PortState */ uint8_t state_info2; /* PortPhysState and LinkDownDefaultState */ uint8_t mkey_lmc; uint8_t link_speed; /* LinkSpeedEnabled and LinkSpeedActive */ uint8_t mtu_smsl; uint8_t vl_cap; /* VLCap and InitType */ uint8_t vl_high_limit; uint8_t vl_arb_high_cap; uint8_t vl_arb_low_cap; uint8_t mtu_cap; uint8_t vl_stall_life; uint8_t vl_enforce; uint16_t m_key_violations; uint16_t p_key_violations; uint16_t q_key_violations; uint8_t guid_cap; uint8_t subnet_timeout; /* cli_rereg(1b), resrv(2b), timeout(5b) */ uint8_t resp_time_value; uint8_t error_threshold; }; // ---- Prototypes int routing(struct drsmp* smp, struct umad_ca* Devices_Info , int Devices_cnt); int set_bit(int nr, void *method_mask); char *drmad_status_str(struct drsmp *drsmp); void dump_dr_smp(const struct drsmp* const p_smp); ############################################## helper.c ########################################################## // ---- Include --------------------------------- #include "sender.h" // ---- Hilfe Funktionen ------------------------ const char* sm_method_str[] = { "RESERVED0", /* 0 */ "SubnGet", /* 1 */ "SubnSet", /* 2 */ "RESERVED3", /* 3 */ "RESERVED4", /* 4 */ "SubnTrap", /* 5 */ "RESERVED6", /* 6 */ "SubnTrapRepress", /* 7 */ "RESERVED8", /* 8 */ "RESERVED9", /* 9 */ "RESERVEDA", /* A */ "RESERVEDB", /* B */ "RESERVEDC", /* C */ "RESERVEDD", /* D */ "RESERVEDE", /* E */ "RESERVEDF", /* F */ "RESERVED10", /* 10 */ "SubnGetResp", /* 11 */ "RESERVED12", /* 12 */ "RESERVED13", /* 13 */ "RESERVED14", /* 14 */ "RESERVED15", /* 15 */ "RESERVED16", /* 16 */ "RESERVED17", /* 17 */ "RESERVED18", /* 18 */ "RESERVED19", /* 19 */ "RESERVED1A", /* 1A */ "RESERVED1B", /* 1B */ "RESERVED1C", /* 1C */ "RESERVED1D", /* 1D */ "RESERVED1E", /* 1E */ "RESERVED1F", /* 1F */ "UNKNOWN" /* 20 */ }; const char* node_type_str[] = { "UNKNOWN", "Channel Adapter", "Switch", "Router", "Subnet Management" }; const char* sm_attr_str[] = { "RESERVED", /* 0 */ "ClassPortInfo", /* 1 */ "Notice", /* 2 */ "InformInfo", /* 3 */ "RESERVED", /* 4 */ "RESERVED", /* 5 */ "RESERVED", /* 6 */ "RESERVED", /* 7 */ "RESERVED", /* 8 */ "RESERVED", /* 9 */ "RESERVED", /* A */ "RESERVED", /* B */ "RESERVED", /* C */ "RESERVED", /* D */ "RESERVED", /* E */ "RESERVED", /* F */ "NodeDescription", /* 10 */ "NodeInfo", /* 11 */ "SwitchInfo", /* 12 */ "UNKNOWN", /* 13 */ "GUIDInfo", /* 14 */ "PortInfo", /* 15 */ "P_KeyTable", /* 16 */ "SLtoVLMappingTable", /* 17 */ "VLArbitrationTable", /* 18 */ "LinearForwardingTable", /* 19 */ "RandomForwardingTable", /* 1A */ "MulticastForwardingTable", /* 1B */ "UNKNOWN", /* 1C */ "UNKNOWN", /* 1D */ "UNKNOWN", /* 1E */ "UNKNOWN", /* 1F */ "SMInfo", /* 20 */ "UNKNOWN" /* 21 - always highest value */ }; const char* sa_attr_str[] = { "RESERVED", /* 0 */ "ClassPortInfo", /* 1 */ "Notice", /* 2 */ "InformInfo", /* 3 */ "RESERVED", /* 4 */ "RESERVED", /* 5 */ "RESERVED", /* 6 */ "RESERVED", /* 7 */ "RESERVED", /* 8 */ "RESERVED", /* 9 */ "RESERVED", /* A */ "RESERVED", /* B */ "RESERVED", /* C */ "RESERVED", /* D */ "RESERVED", /* E */ "RESERVED", /* F */ "RESERVED", /* 10 */ "NodeRecord", /* 11 */ "PortInfoRecord", /* 12 */ "SLtoVLMappingTableRecord", /* 13 */ "SwitchInfoRecord", /* 14 */ "LinearForwardingTableRecord", /* 15 */ "RandomForwardingTableRecord", /* 16 */ "MulticastForwardingTableRecord", /* 17 */ "SMInfoRecord", /* 18 */ "RESERVED", /* 19 */ "RandomForwardingTable", /* 1A */ "MulticastForwardingTable", /* 1B */ "UNKNOWN", /* 1C */ "UNKNOWN", /* 1D */ "UNKNOWN", /* 1E */ "UNKNOWN", /* 1F */ "LinkRecord", /* 20 */ "UNKNOWN", /* 21 */ "UNKNOWN", /* 22 */ "UNKNOWN", /* 23 */ "UNKNOWN", /* 24 */ "UNKNOWN", /* 25 */ "UNKNOWN", /* 26 */ "UNKNOWN", /* 27 */ "UNKNOWN", /* 28 */ "UNKNOWN", /* 29 */ "UNKNOWN", /* 2A */ "UNKNOWN", /* 2B */ "UNKNOWN", /* 2C */ "UNKNOWN", /* 2D */ "UNKNOWN", /* 2E */ "UNKNOWN", /* 2F */ "GuidInfoRecord", /* 30 */ "ServiceRecord", /* 31 */ "UNKNOWN", /* 32 */ "P_KeyTableRecord", /* 33 */ "UNKNOWN", /* 34 */ "PathRecord", /* 35 */ "VLArbitrationTableRecord", /* 36 */ "UNKNOWN", /* 37 */ "MCMemberRecord", /* 38 */ "TraceRecord", /* 39 */ "MultiPathRecord", /* 3A */ "ServiceAssociationRecord", /* 3B */ "UNKNOWN", /* 3C */ "UNKNOWN", /* 3D */ "UNKNOWN", /* 3E */ "UNKNOWN", /* 3F */ "UNKNOWN", /* 40 */ "UNKNOWN", /* 41 */ "UNKNOWN", /* 42 */ "UNKNOWN", /* 43 */ "UNKNOWN", /* 44 */ "UNKNOWN", /* 45 */ "UNKNOWN", /* 46 */ "UNKNOWN", /* 47 */ "UNKNOWN", /* 48 */ "UNKNOWN", /* 49 */ "UNKNOWN", /* 4A */ "UNKNOWN", /* 4B */ "UNKNOWN", /* 4C */ "UNKNOWN", /* 4D */ "UNKNOWN", /* 4E */ "UNKNOWN", /* 4F */ "UNKNOWN", /* 50 */ "UNKNOWN", /* 51 */ "UNKNOWN", /* 52 */ "UNKNOWN", /* 53 */ "UNKNOWN", /* 54 */ "UNKNOWN", /* 55 */ "UNKNOWN", /* 56 */ "UNKNOWN", /* 57 */ "UNKNOWN", /* 58 */ "UNKNOWN", /* 59 */ "UNKNOWN", /* 5A */ "UNKNOWN", /* 5B */ "UNKNOWN", /* 5C */ "UNKNOWN", /* 5D */ "UNKNOWN", /* 5E */ "UNKNOWN", /* 5F */ "UNKNOWN", /* 60 */ "UNKNOWN", /* 61 */ "UNKNOWN", /* 62 */ "UNKNOWN", /* 63 */ "UNKNOWN", /* 64 */ "UNKNOWN", /* 65 */ "UNKNOWN", /* 66 */ "UNKNOWN", /* 67 */ "UNKNOWN", /* 68 */ "UNKNOWN", /* 69 */ "UNKNOWN", /* 6A */ "UNKNOWN", /* 6B */ "UNKNOWN", /* 6C */ "UNKNOWN", /* 6D */ "UNKNOWN", /* 6E */ "UNKNOWN", /* 6F */ "UNKNOWN", /* 70 */ "UNKNOWN", /* 71 */ "UNKNOWN", /* 72 */ "UNKNOWN", /* 73 */ "UNKNOWN", /* 74 */ "UNKNOWN", /* 75 */ "UNKNOWN", /* 76 */ "UNKNOWN", /* 77 */ "UNKNOWN", /* 78 */ "UNKNOWN", /* 79 */ "UNKNOWN", /* 7A */ "UNKNOWN", /* 7B */ "UNKNOWN", /* 7C */ "UNKNOWN", /* 7D */ "UNKNOWN", /* 7E */ "UNKNOWN", /* 7F */ "UNKNOWN", /* 80 */ "UNKNOWN", /* 81 */ "UNKNOWN", /* 82 */ "UNKNOWN", /* 83 */ "UNKNOWN", /* 84 */ "UNKNOWN", /* 85 */ "UNKNOWN", /* 86 */ "UNKNOWN", /* 87 */ "UNKNOWN", /* 88 */ "UNKNOWN", /* 89 */ "UNKNOWN", /* 8A */ "UNKNOWN", /* 8B */ "UNKNOWN", /* 8C */ "UNKNOWN", /* 8D */ "UNKNOWN", /* 8E */ "UNKNOWN", /* 8F */ "UNKNOWN", /* 90 */ "UNKNOWN", /* 91 */ "UNKNOWN", /* 92 */ "UNKNOWN", /* 93 */ "UNKNOWN", /* 94 */ "UNKNOWN", /* 95 */ "UNKNOWN", /* 96 */ "UNKNOWN", /* 97 */ "UNKNOWN", /* 98 */ "UNKNOWN", /* 99 */ "UNKNOWN", /* 9A */ "UNKNOWN", /* 9B */ "UNKNOWN", /* 9C */ "UNKNOWN", /* 9D */ "UNKNOWN", /* 9E */ "UNKNOWN", /* 9F */ "UNKNOWN", /* A0 */ "UNKNOWN", /* A1 */ "UNKNOWN", /* A2 */ "UNKNOWN", /* A3 */ "UNKNOWN", /* A4 */ "UNKNOWN", /* A5 */ "UNKNOWN", /* A6 */ "UNKNOWN", /* A7 */ "UNKNOWN", /* A8 */ "UNKNOWN", /* A9 */ "UNKNOWN", /* AA */ "UNKNOWN", /* AB */ "UNKNOWN", /* AC */ "UNKNOWN", /* AD */ "UNKNOWN", /* AE */ "UNKNOWN", /* AF */ "UNKNOWN", /* B0 */ "UNKNOWN", /* B1 */ "UNKNOWN", /* B2 */ "UNKNOWN", /* B3 */ "UNKNOWN", /* B4 */ "UNKNOWN", /* B5 */ "UNKNOWN", /* B6 */ "UNKNOWN", /* B7 */ "UNKNOWN", /* B8 */ "UNKNOWN", /* B9 */ "UNKNOWN", /* BA */ "UNKNOWN", /* BB */ "UNKNOWN", /* BC */ "UNKNOWN", /* BD */ "UNKNOWN", /* BE */ "UNKNOWN", /* BF */ "UNKNOWN", /* C0 */ "UNKNOWN", /* C1 */ "UNKNOWN", /* C2 */ "UNKNOWN", /* C3 */ "UNKNOWN", /* C4 */ "UNKNOWN", /* C5 */ "UNKNOWN", /* C6 */ "UNKNOWN", /* C7 */ "UNKNOWN", /* C8 */ "UNKNOWN", /* C9 */ "UNKNOWN", /* CA */ "UNKNOWN", /* CB */ "UNKNOWN", /* CC */ "UNKNOWN", /* CD */ "UNKNOWN", /* CE */ "UNKNOWN", /* CF */ "UNKNOWN", /* D0 */ "UNKNOWN", /* D1 */ "UNKNOWN", /* D2 */ "UNKNOWN", /* D3 */ "UNKNOWN", /* D4 */ "UNKNOWN", /* D5 */ "UNKNOWN", /* D6 */ "UNKNOWN", /* D7 */ "UNKNOWN", /* D8 */ "UNKNOWN", /* D9 */ "UNKNOWN", /* DA */ "UNKNOWN", /* DB */ "UNKNOWN", /* DC */ "UNKNOWN", /* DD */ "UNKNOWN", /* DE */ "UNKNOWN", /* DF */ "UNKNOWN", /* E0 */ "UNKNOWN", /* E1 */ "UNKNOWN", /* E2 */ "UNKNOWN", /* E3 */ "UNKNOWN", /* E4 */ "UNKNOWN", /* E5 */ "UNKNOWN", /* E6 */ "UNKNOWN", /* E7 */ "UNKNOWN", /* E8 */ "UNKNOWN", /* E9 */ "UNKNOWN", /* EA */ "UNKNOWN", /* EB */ "UNKNOWN", /* EC */ "UNKNOWN", /* ED */ "UNKNOWN", /* EE */ "UNKNOWN", /* EF */ "UNKNOWN", /* F0 */ "UNKNOWN", /* F1 */ "UNKNOWN", /* F2 */ "InformInfoRecord", /* F3 */ "UNKNOWN" /* F4 - always highest value */ }; const char* port_state_str[] = { "No State Change (NOP)", "DOWN", "INIT", "ARMED", "ACTIVE", "ACTDEFER", "UNKNOWN" }; int set_bit(int nr, void *method_mask) { int mask, retval; long *addr = method_mask; addr += nr >> 5; mask = 1 << (nr & 0x1f); retval = (mask & *addr) != 0; *addr |= mask; return retval; } char * drmad_status_str(struct drsmp *drsmp) { switch (drsmp->status) { case 0: return "success"; case ETIMEDOUT: return "timeout"; } return "unknown error"; } const char* get_sm_method_str(uint8_t method ) { if (method & 0x80) method = (method & 0x0F) | 0x10; if( method >= SM_METHOD_STR_UNKNOWN_VAL ) method = SM_METHOD_STR_UNKNOWN_VAL; return( sm_method_str[method] ); } uint16_t smp_get_status(uint16_t status ) { return( (uint16_t)(status & SMP_STATUS_MASK) ); } uint8_t node_info_get_local_port_num(const struct node_info* const p_ni) { return( (uint8_t)(( p_ni->port_num_vendor_id & NODE_INFO_PORT_NUM_MASK ) >> NODE_INFO_PORT_NUM_SHIFT )); } const char* get_node_type_str(uint32_t node_type) { if( node_type >= IB_NOTICE_NODE_TYPE_ROUTER ) node_type = 0; return( node_type_str[node_type] ); } uint32_t node_info_get_vendor_id(const struct node_info* const p_ni ) { return( (uint32_t)( p_ni->port_num_vendor_id & NODE_INFO_VEND_ID_MASK ) ); } const char* get_sm_attr_str(uint16_t attr ) { uint16_t host_attr; host_attr = ntohs( attr ); if( host_attr >= SM_ATTR_STR_UNKNOWN_VAL ) host_attr = SM_ATTR_STR_UNKNOWN_VAL; return( sm_attr_str[host_attr] ); } uint8_t port_info_get_link_speed_sup(const struct port_info* const p_pi ) { return( (uint8_t)((p_pi->state_info1 & PORT_LINK_SPEED_SUPPORTED_MASK) >> PORT_LINK_SPEED_SHIFT) ); } const char* get_port_state_str(uint8_t port_state ) { if( port_state > LINK_ACTIVE ) port_state = LINK_ACTIVE + 1; return( port_state_str[port_state] ); } uint8_t port_info_get_mpb(const struct port_info* const p_pi ) { return( (uint8_t)((p_pi->mkey_lmc & PORT_MPB_MASK) >> PORT_MPB_SHIFT) ); } uint8_t port_info_get_lmc(const struct port_info* const p_pi ) { return( (uint8_t)(p_pi->mkey_lmc & PORT_LMC_MASK) ); } uint8_t port_info_get_client_rereg(struct port_info const* p_pi ) { return ( (p_pi->subnet_timeout & 0x80 ) >> 7); } uint8_t port_info_get_timeout(struct port_info const* p_pi ) { return(p_pi->subnet_timeout & 0x1F ); } uint8_t port_info_get_port_state(const struct port_info* const p_pi ) { return( (uint8_t)(p_pi->state_info1 & PORT_STATE_MASK) ); } void dump_dr_smp( const struct drsmp * const p_smp) { uint32_t i; char buf[BUF_SIZE]; char line[BUF_SIZE]; sprintf( buf, "SMP dump:\n" "\t\t\t\tbase_ver................0x%X\n" "\t\t\t\tmgmt_class..............0x%X\n" "\t\t\t\tclass_ver...............0x%X\n" "\t\t\t\tmethod..................0x%X (%s)\n", p_smp->base_version, p_smp->mgmt_class, p_smp->class_version, p_smp->method, get_sm_method_str(p_smp->method)); if (p_smp->mgmt_class == MCLASS_SUBN_DIR) { sprintf( line, "\t\t\t\tD bit...................0x%X\n" "\t\t\t\tstatus..................0x%X\n", (p_smp->status & DIRECTION) == DIRECTION, smp_get_status(p_smp->status) ); } else { sprintf( line,"\t\t\t\tstatus..................0x%X\n", ntohs(p_smp->status)); } strcat( buf, line ); sprintf( line, "\t\t\t\thop_ptr.................0x%X\n" "\t\t\t\thop_count...............0x%X\n" "\t\t\t\ttrans_id................0x%" PRIx64 "\n" "\t\t\t\tattr_id.................0x%X (%s)\n" "\t\t\t\tresv....................0x%X\n" "\t\t\t\tattr_mod................0x%X\n" "\t\t\t\tm_key...................0x%016" PRIx64 "\n", p_smp->hop_ptr, p_smp->hop_cnt, own_ntoh64(p_smp->tid), ntohs(p_smp->attr_id), get_sm_attr_str(p_smp->attr_id), ntohs(p_smp->resv), ntohl(p_smp->attr_mod), ntohl(p_smp->mkey) ); strcat( buf, line ); if (p_smp->mgmt_class == MCLASS_SUBN_DIR) { sprintf( line, "\t\t\t\tdr_slid.................0x%X\n" "\t\t\t\tdr_dlid.................0x%X\n", ntohs(p_smp->dr_slid), ntohs(p_smp->dr_dlid) ); strcat( buf, line ); strcat( buf, "\n\t\t\t\tInitial path: " ); for( i = 0; i <= p_smp->hop_cnt; i++ ) { sprintf( line, "[%X]", p_smp->initial_path[i] ); strcat( buf, line ); } strcat( buf, "\n\t\t\t\tReturn path: " ); for( i = 0; i <= p_smp->hop_cnt; i++ ) { sprintf( line, "[%X]", p_smp->return_path[i] ); strcat( buf, line ); } strcat( buf, "\n\t\t\t\tReserved: " ); for( i = 0; i < 7; i++ ) { sprintf( line, "[%0X]", p_smp->reserved[i] ); strcat( buf, line ); } strcat( buf, "\n" ); for( i = 0; i < 64; i += 16 ) { sprintf( line, "\n\t\t\t\t%02X %02X %02X %02X " "%02X %02X %02X %02X" " %02X %02X %02X %02X %02X %02X %02X %02X\n", p_smp->data[i], p_smp->data[i+1], p_smp->data[i+2], p_smp->data[i+3], p_smp->data[i+4], p_smp->data[i+5], p_smp->data[i+6], p_smp->data[i+7], p_smp->data[i+8], p_smp->data[i+9], p_smp->data[i+10], p_smp->data[i+11], p_smp->data[i+12], p_smp->data[i+13], p_smp->data[i+14], p_smp->data[i+15] ); strcat( buf, line ); } } else { // not a Direct Route so provide source and destination lids strcat(buf, "\t\t\t\tMAD IS LID ROUTED\n"); } printf("%s",buf); } void dump_node_info(const struct node_info* const p_ni) { printf( "NodeInfo dump:\n" "\t\t\t\tbase_version............0x%X\n" "\t\t\t\tclass_version...........0x%X\n" "\t\t\t\tnode_type...............%s\n" "\t\t\t\tnum_ports...............0x%X\n" "\t\t\t\tsys_guid................0x%016" PRIx64 "\n" "\t\t\t\tnode_guid...............0x%016" PRIx64 "\n" "\t\t\t\tport_guid...............0x%016" PRIx64 "\n" "\t\t\t\tpartition_cap...........0x%X\n" "\t\t\t\tdevice_id...............0x%X\n" "\t\t\t\trevision................0x%X\n" "\t\t\t\tport_num................0x%X\n" "\t\t\t\tvendor_id...............0x%X\n" "", p_ni->base_version, p_ni->class_version, get_node_type_str( p_ni->node_type ), p_ni->num_ports, own_ntoh64_2(p_ni->sys_guid), own_ntoh64_2( p_ni->node_guid ), own_ntoh64_2( p_ni->port_guid ), ntohs( p_ni->partition_cap ), ntohs( p_ni->device_id ), ntohl( p_ni->revision ), node_info_get_local_port_num( p_ni ), ntohl( node_info_get_vendor_id( p_ni ) ) ); } void dump_port_info(const uint64_t node_guid, const uint64_t port_guid, const uint8_t port_num, const struct port_info* const p_pi) { char buf[BUF_SIZE]; printf( "PortInfo dump:\n" "\t\t\t\tport number.............0x%X\n" "\t\t\t\tnode_guid...............0x%016" PRIx64 "\n" "\t\t\t\tport_guid...............0x%016" PRIx64 "\n" "\t\t\t\tm_key...................0x%016" PRIx64 "\n" "\t\t\t\tsubnet_prefix...........0x%016" PRIx64 "\n" "\t\t\t\tbase_lid................0x%X\n" "\t\t\t\tmaster_sm_base_lid......0x%X\n" "\t\t\t\tcapability_mask.........0x%X\n" "\t\t\t\tdiag_code...............0x%X\n" "\t\t\t\tm_key_lease_period......0x%X\n" "\t\t\t\tlocal_port_num..........0x%X\n" "\t\t\t\tlink_width_enabled......0x%X\n" "\t\t\t\tlink_width_supported....0x%X\n" "\t\t\t\tlink_width_active.......0x%X\n" "\t\t\t\tlink_speed_supported....0x%X\n" "\t\t\t\tport_state..............%s\n" "\t\t\t\tstate_info2.............0x%X\n" "\t\t\t\tm_key_protect_bits......0x%X\n" "\t\t\t\tlmc.....................0x%X\n" "\t\t\t\tlink_speed..............0x%X\n" "\t\t\t\tmtu_smsl................0x%X\n" "\t\t\t\tvl_cap_init_type........0x%X\n" "\t\t\t\tvl_high_limit...........0x%X\n" "\t\t\t\tvl_arb_high_cap.........0x%X\n" "\t\t\t\tvl_arb_low_cap..........0x%X\n" "\t\t\t\tinit_rep_mtu_cap........0x%X\n" "\t\t\t\tvl_stall_life...........0x%X\n" "\t\t\t\tvl_enforce..............0x%X\n" "\t\t\t\tm_key_violations........0x%X\n" "\t\t\t\tp_key_violations........0x%X\n" "\t\t\t\tq_key_violations........0x%X\n" "\t\t\t\tguid_cap................0x%X\n" "\t\t\t\tclient_reregister.......0x%X\n" "\t\t\t\tsubnet_timeout..........0x%X\n" "\t\t\t\tresp_time_value.........0x%X\n" "\t\t\t\terror_threshold.........0x%X\n" "", port_num, own_ntoh64( node_guid ), own_ntoh64( port_guid ), own_ntoh64( p_pi->m_key ), own_ntoh64( p_pi->subnet_prefix ), ntohs( p_pi->base_lid ), ntohs( p_pi->master_sm_base_lid ), ntohl( p_pi->capability_mask ), ntohs( p_pi->diag_code ), ntohs( p_pi->m_key_lease_period ), p_pi->local_port_num, p_pi->link_width_enabled, p_pi->link_width_supported, p_pi->link_width_active, port_info_get_link_speed_sup( p_pi ), get_port_state_str( port_info_get_port_state( p_pi ) ), p_pi->state_info2, port_info_get_mpb( p_pi ), port_info_get_lmc( p_pi ), p_pi->link_speed, p_pi->mtu_smsl, p_pi->vl_cap, p_pi->vl_high_limit, p_pi->vl_arb_high_cap, p_pi->vl_arb_low_cap, p_pi->mtu_cap, p_pi->vl_stall_life, p_pi->vl_enforce, ntohs( p_pi->m_key_violations ), ntohs( p_pi->p_key_violations ), ntohs( p_pi->q_key_violations ), p_pi->guid_cap, port_info_get_client_rereg( p_pi ), port_info_get_timeout( p_pi ), p_pi->resp_time_value, p_pi->error_threshold ); } From mshefty at ichips.intel.com Fri Feb 9 15:57:59 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Feb 2007 15:57:59 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <1171057719.31538.187820.camel@hal.voltaire.com> References: <45CA3568.1000508@ichips.intel.com> <20070207213108.GD11411@obsidianresearch.com> <45CA5573.80802@ichips.intel.com> <20070207224928.GF11411@obsidianresearch.com> <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> <45CCD1E2.5050806@ichips.intel.com> <1171051315.31538.181667.camel@hal.voltaire.com> <45CCDAE0.1080102@ichips.intel.com> <1171057719.31538.187820.camel@hal.voltaire.com> Message-ID: <45CD0A87.9010800@ichips.intel.com> > The hard part is the global distribution of this information. The best idea I can come up with for locating remote SAs is to have the SAs assign themselves a specific Unicast Global GID Assigned Value. So, each SA gives themselves a GID similar to: 64-bit subnet prefix :: 1. Hosts on remote subnets can then direct requests to the SAs on the remote subnet. SA failover would need to take this GID with them... - Sean From jgunthorpe at obsidianresearch.com Fri Feb 9 16:48:20 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 9 Feb 2007 17:48:20 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45CCFEDC.3040700@ichips.intel.com> References: <1170894459.31538.23768.camel@hal.voltaire.com> <45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com> <1171023168.31538.153989.camel@hal.voltaire.com> <45CCADC7.5000804@ichips.intel.com> <1171043929.31538.174521.camel@hal.voltaire.com> <20070209192046.GP11411@obsidianresearch.com> <1171057501.31538.187596.camel@hal.voltaire.com> <20070209223845.GR11411@obsidianresearch.com> <45CCFEDC.3040700@ichips.intel.com> Message-ID: <20070210004820.GS11411@obsidianresearch.com> On Fri, Feb 09, 2007 at 03:08:12PM -0800, Sean Hefty wrote: > The route itself is determined using the SGID, DGID, TClass, FlowLabel. > So, as long as the two queries match on these fields, I would think that it > would work. So basically what you are saying is that the TClass and FlowLabel act as some kind of global dis-ambiguation that lets all SAs know that the tuple MUST be matched with on each side. I can see how this can work, but I think it has big implications, like global SA database sharing, maybe larger router tables, or limited router multipath, etc. [1] I've been thinking that the tuple would only reflect 2 of the 4 lids (ie the ones the router chooses on entry to the final subnet). I personally can't see anything discussed so far as a slam dunk answer to this broader problem. The very simple reversible paths only solution still seems best to me only because it involves the least work and only requires that IBA specify routed reversible paths. The only missing bit is a way to signal that the target should have this behavior in the REQ message. Perhaps something like setting the DLID in the REQ to 0xFFFF? Jason [1] - Interestingly with this scheme the first PR query must select all 4 LIDs (although it may not know what they are..). The PR itself would return the first two local LIDS and those would also have to be encoded in the GRH. The 2nd remote PR would recover those LIDs from the GRH to build the return GRH. Since routers route based on GRH every GRH also encodes the destination LIDs too! From sean.hefty at intel.com Fri Feb 9 18:08:34 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 9 Feb 2007 18:08:34 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070210004820.GS11411@obsidianresearch.com> Message-ID: <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> >So basically what you are saying is that the TClass and FlowLabel act >as some kind of global dis-ambiguation that lets all SAs know that the >tuple MUST be matched with >on each side. Sort of... My reasoning is that if you look at a packet traveling from the source QP to the destination QP, and examine the packet in some intermediate subnet (say between two routers), then the only information that it carries is the tuple. This information must be sufficient to direct the routing at the endpoints. I don't see that all SAs need to know this information. An SA would: 1. Given local and remote GIDs, need to know which router the packet will arrive on. 2. Know the SLID/DLID mapping used by that router. It shouldn't need information about the paths used by packets on the remote subnet. If a subnet has multiple routers into it, they can forward packets to the correct router if needed. (Could the routers just forward to the end node and insert the expected SLID?) If the path is reversible (with reversible defined relative to SLID/DLID that is returned in the path record), then the active node would only need two SA queries - one to each subnet. For non-reversible paths, 4 queries may be needed. >I've been thinking that the tuple would >only reflect 2 of the 4 lids (ie the ones the router chooses on entry >to the final subnet). This was my thinking as well, which is why I think 2 path record queries are needed. Each path would specify 2 of the 4 LIDs that we need. The local path record gives us the local QP information, and the remote path record is used to fill in the SLID/DLID in the CM REQ. >The very simple reversible paths only solution still seems best to me >only because it involves the least work and only requires that IBA >specify routed reversible paths. I'm still trying to find a solution that doesn't violate the architecture as defined. I don't see why my idea wouldn't work yet. It just requires some unspecified coordination between the local SA and local routers. - Sean From vlad at lists.openfabrics.org Sat Feb 10 02:23:54 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sat, 10 Feb 2007 02:23:54 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070210-0200 daily build status Message-ID: <20070210102354.64E09E60804@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From halr at voltaire.com Sat Feb 10 07:49:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Feb 2007 10:49:15 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <001001c74c87$8b653470$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> <001001c74c87$8b653470$21606d86@one7> Message-ID: <1171122546.31538.251673.camel@hal.voltaire.com> On Fri, 2007-02-09 at 15:19, Michael Arndt wrote: > Hi, > > > It is strange, I did similar thing (you can see in > > management/diags/src/mcm_rereg_test.c) and it worked fine for me. > > What location is that? > > >Which libibumad version you are using? Also I understand you did some > >changes in the stack, is it related to user_mad? Could you publish this? > > I use OFED-1.1 and attached libibumad version. The stack where I have tested > this context wasn't changed to exclude this. It is a diploma thesis and will > publish as soon as posible ;)...in german ...sorry. > > The hole example code Hal was asking for is below. Some comments interspersed below with my modified version which sends the 10 SMPs. -- Hal > I have marked the > position with /* here */. Currently is the retry parameter zero, but I also > tested 3. > > Thanks Michael > > // ---- Includes -------------------------------- > #include > #include > #include > > #include "sender.h" > > // ---- Defines und Deklarationen --------------- > > static const uint8_t CLASS_SUBN_DIRECTED_ROUTE = 0x81; > static const uint8_t CLASS_SUBN_LID_ROUTE = 0x1; > > static int long drmad_tid = 0x123; > > // Prototypes > > void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod); > void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void > *data); > char * drmad_status_str(struct drsmp *drsmp); > int str2DRPath(char *str, DRPath *path); > int set_bit(int nr, void *method_mask); > > > > // ---- Main ------------------------------------ > > int main (void){ > > int Port_ID = 0; > int Agent_ID = 0; > int ret; > int i; > int length, timeout_ms = 10000; > > > void *umad; > struct drsmp *smp; > > > // ---- Einstellungen --------------------------- > int Portnummer = 1; > char Devicename [2][UMAD_CA_NAME_LEN]; > DRPath Path; > char Path_Str[64]; > > uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo > int modifier = 1; > > struct _register_info{ > int Management_Class; > int Management_Version; > uint8_t RMPP_Version; > uint32_t Method_Mask[4]; > } Register_Info; > > // ++ Wertzuweisung ++ > > Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE; > Register_Info.Management_Version = 1; > Register_Info.RMPP_Version = 0; > > set_bit(0x01,&(Register_Info.Method_Mask)); > set_bit(0x02,&(Register_Info.Method_Mask)); > set_bit(0x81,&(Register_Info.Method_Mask)); This overwrites something past method mask. > set_bit(0x03,&(Register_Info.Method_Mask)); > set_bit(0x05,&(Register_Info.Method_Mask)); > set_bit(0x06,&(Register_Info.Method_Mask)); Several of these methods don't apply to SM class. Also, your umad_register doesn't use this so this is not needed if that is the case but are you trying to use solicited or unsolicited sending ? That is unclear to me as to what you really want. > sprintf(Path_Str,"0,1,1,1"); > > > // ---- Init Phase ------------------------------ > printf("... Init Lib ..."); > umad_init(); > printf("done\n\n"); > > // ++ Debug ++ > umad_debug(0); > > printf("... Get CAs Names ..."); > ret = umad_get_cas_names(Devicename,2); > if (!ret) { > printf("Fehler: umad_get_cas_names: %i\n",ret); > return -1; > } > else { > printf("done\n\n"); > for (i = 0;i < ret;i++){ > printf("Devicename: %s\n",Devicename[i]); > } > > } > // ++ Open ++ > printf("... Open Port ..."); > if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0) > { > printf("Fehler: umad_open_port: %i\n",Port_ID); > return -1; > } > else printf("done\n\n"); > // ++ Register ++ > printf("... Register User Mad ..."); > if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class, > Register_Info.Management_Version, > Register_Info.RMPP_Version, > 0)) < 0){ See previous comment on this. > printf("Fehler: umad_register : %i\n",Agent_ID); > goto Exit; > } > else printf("done\n\n"); > // ---- Paket bauen ----------------------------- > > printf("... Paket allokieren ..."); > if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){ > printf("Fehler: umad_alloc\n"); > goto Exit; > } > printf("done\n\n"); > > smp = umad_get_mad(umad); > printf("... Smp Pointer ... done\n"); > > if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n"); I moved this up to where Path_Str was initially set. It wouldn't actually send the packets multiple times without doing this. I didn't investigate this further. > printf("... SMP bauen ..."); > drsmp_get_init(umad,&Path,attribute,modifier); > printf("... done ...\n\n"); > > > //xdump(stderr, "before send:\n", smp, 256); > dump_dr_smp(smp); I got seg fault on this so I commented it out. > length = IB_MAD_SIZE; > > /* here */ > for (i = 0; i < 10; i++){ > printf("... Send Mad ..."); > if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0) The main problem is this: You cannot reuse the same umad allocation for multiple umad_sends. That's why you get the error. So I changed this. Also, since you are not using solicited sends there is no need for the timeout to be specified but that doesn't really matter. > printf("Fehler: umad_send : %i\n",ret); > else printf("done\n\n"); > } > > /* > for (i = 0; i < 10; i++){ > printf("... Recv Mad ..."); > if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID) > printf("Fehler umad_recv: %s\n", drmad_status_str(smp)); > else printf("done\n\n"); > } > */ > > dump_dr_smp(smp); Also, got seg fault on this so also commented it out. > switch (attribute){ > case MAD_ATTR_NODE_INFO : dump_node_info((const struct > node_info*)&(smp->data[0])); break; > case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct > port_info*)&(smp->data[0])); break; > } Also, got seg fault on this so also commented it out. > > // ---- Down Phase ------------------------------ > Exit: > printf("... Unregister User Mad ..."); > if (umad_unregister(Port_ID,Agent_ID) < 0) > printf("Fehler: umad_unregister\n"); > else printf("done\n\n"); > > printf("... Close Port ..."); > if (Port_ID != -1) > if ((umad_close_port(Port_ID)) != 0){ > printf("Fehler: umad_close_port\n"); > } > else printf("done\n\n"); > else printf("nix zu tun\n\n"); > > } > > // ---- SMP Paket ------------------------------- > > > void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod) > { > struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); > > memset(smp, 0, sizeof (*smp)); > > smp->base_version = 1; > smp->mgmt_class = CLASS_SUBN_DIRECTED_ROUTE; > smp->class_version = 1; > > smp->method = 0x01; > smp->attr_id = (uint16_t)htons((uint16_t)attr); > smp->attr_mod = htonl(mod); > smp->tid = htonll(drmad_tid++); > smp->dr_slid = 0xffff; > smp->dr_dlid = 0xffff; > > umad_set_addr(umad, 0xffff, 0, 0, 0); > > if (path) > memcpy(smp->initial_path, path->path, path->hop_cnt+1); > > smp->hop_cnt = path->hop_cnt; > } > > void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void > *data) > { > struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); > > memset(smp, 0, sizeof (*smp)); > > smp->method = 2; /* SET */ > smp->attr_id = (uint16_t)htons((uint16_t)attr); > smp->attr_mod = htonl(mod); > smp->tid = htonll(drmad_tid++); > smp->dr_slid = 0xffff; > smp->dr_dlid = 0xffff; > > umad_set_addr(umad, 0xffff, 0, 0, 0); > > if (path) > memcpy(smp->initial_path, path->path, path->hop_cnt+1); > > if (data) > memcpy(smp->data, data, sizeof smp->data); > > smp->hop_cnt = path->hop_cnt; > } > > int str2DRPath(char *str, DRPath *path) > { > char *s; > > path->hop_cnt = -1; > > //DEBUG("DR str: %s", str); > while (str && *str) { > if ((s = strchr(str, ','))) > *s = 0; > path->path[++path->hop_cnt] = atoi(str); > if (!s) > break; > str = s+1; > } > > #if 0 > if (path->path[0] != 0 || > (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) { > DEBUG("hop 0 != 0 or hop 1 != dev_port"); > return -1; > } > #endif > > return path->hop_cnt; > } > Here's my modified version. --- // ---- Includes -------------------------------- #include #include #include #include "sender.h" // ---- Defines und Deklarationen --------------- static const uint8_t CLASS_SUBN_DIRECTED_ROUTE = 0x81; static const uint8_t CLASS_SUBN_LID_ROUTE = 0x1; static int long drmad_tid = 0x123; // Prototypes void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod); void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void *data); char * drmad_status_str(struct drsmp *drsmp); int str2DRPath(char *str, DRPath *path); int set_bit(int nr, void *method_mask); // ---- Main ------------------------------------ int main (void){ int Port_ID = 0; int Agent_ID = 0; int ret; int i; int length, timeout_ms = 10000; void *umad; struct drsmp *smp; // ---- Einstellungen --------------------------- int Portnummer = 1; char Devicename [2][UMAD_CA_NAME_LEN]; DRPath Path; char Path_Str[64]; uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo int modifier = 1; struct _register_info{ int Management_Class; int Management_Version; uint8_t RMPP_Version; uint32_t Method_Mask[4]; } Register_Info; // ++ Wertzuweisung ++ Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE; Register_Info.Management_Version = 1; Register_Info.RMPP_Version = 0; set_bit(0x01,&(Register_Info.Method_Mask)); set_bit(0x02,&(Register_Info.Method_Mask)); #if 0 set_bit(0x81,&(Register_Info.Method_Mask)); #endif set_bit(0x03,&(Register_Info.Method_Mask)); set_bit(0x05,&(Register_Info.Method_Mask)); set_bit(0x06,&(Register_Info.Method_Mask)); sprintf(Path_Str,"0,1,1,1"); #if 1 if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n"); #endif // ---- Init Phase ------------------------------ printf("... Init Lib ..."); umad_init(); printf("done\n\n"); // ++ Debug ++ umad_debug(0); printf("... Get CAs Names ..."); ret = umad_get_cas_names(Devicename,2); if (!ret) { printf("Fehler: umad_get_cas_names: %i\n",ret); return -1; } else { printf("done\n\n"); for (i = 0;i < ret;i++){ printf("Devicename: %s\n",Devicename[i]); } } // ++ Open ++ printf("... Open Port ..."); if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0) { printf("Fehler: umad_open_port: %i\n",Port_ID); return -1; } else printf("done\n\n"); // ++ Register ++ printf("... Register User Mad ..."); #if 1 if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class, Register_Info.Management_Version, Register_Info.RMPP_Version, 0)) < 0){ #else if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class, Register_Info.Management_Version, Register_Info.RMPP_Version, &(Register_Info.Method_Mask[0]))) < 0){ #endif printf("Fehler: umad_register : %i\n",Agent_ID); goto Exit; } else printf("done\n\n"); // ---- Paket bauen ----------------------------- #if 0 printf("... Paket allokieren ..."); if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){ printf("Fehler: umad_alloc\n"); goto Exit; } printf("done\n\n"); smp = umad_get_mad(umad); printf("... Smp Pointer ... done\n"); if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n"); printf("... SMP bauen ..."); drsmp_get_init(umad,&Path,attribute,modifier); printf("... done ...\n\n"); #endif //xdump(stderr, "before send:\n", smp, 256); #if 0 dump_dr_smp(smp); #endif length = IB_MAD_SIZE; /* here */ for (i = 0; i < 10; i++){ #if 1 printf("... Paket allokieren ..."); if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){ printf("Fehler: umad_alloc %p\n", umad); goto Exit; } printf("done\n\n"); smp = umad_get_mad(umad); printf("... Smp Pointer ... done\n"); #if 0 if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n"); #endif printf("... SMP bauen ..."); drsmp_get_init(umad,&Path,attribute,modifier); printf("... done ...\n\n"); #endif printf("... Send Mad ..."); #if 0 if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0) #else if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 0, 0)) < 0) #endif printf("Fehler: umad_send : %i\n",ret); else printf("done\n\n"); } /* for (i = 0; i < 10; i++){ printf("... Recv Mad ..."); if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID) printf("Fehler umad_recv: %s\n", drmad_status_str(smp)); else printf("done\n\n"); } */ #if 0 dump_dr_smp(smp); switch (attribute){ case MAD_ATTR_NODE_INFO : dump_node_info((const struct node_info*)&(smp->data[0])); break; case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct port_info*)&(smp->data[0])); break; } #endif // ---- Down Phase ------------------------------ Exit: printf("... Unregister User Mad ..."); if (umad_unregister(Port_ID,Agent_ID) < 0) printf("Fehler: umad_unregister\n"); else printf("done\n\n"); printf("... Close Port ..."); if (Port_ID != -1) if ((umad_close_port(Port_ID)) != 0){ printf("Fehler: umad_close_port\n"); } else printf("done\n\n"); else printf("nix zu tun\n\n"); } // ---- SMP Paket ------------------------------- void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod) { struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); memset(smp, 0, sizeof (*smp)); smp->base_version = 1; smp->mgmt_class = CLASS_SUBN_DIRECTED_ROUTE; smp->class_version = 1; smp->method = 0x01; smp->attr_id = (uint16_t)htons((uint16_t)attr); smp->attr_mod = htonl(mod); smp->tid = htonll(drmad_tid++); smp->dr_slid = 0xffff; smp->dr_dlid = 0xffff; umad_set_addr(umad, 0xffff, 0, 0, 0); if (path) memcpy(smp->initial_path, path->path, path->hop_cnt+1); smp->hop_cnt = path->hop_cnt; } void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void *data) { struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad)); memset(smp, 0, sizeof (*smp)); smp->method = 2; /* SET */ smp->attr_id = (uint16_t)htons((uint16_t)attr); smp->attr_mod = htonl(mod); smp->tid = htonll(drmad_tid++); smp->dr_slid = 0xffff; smp->dr_dlid = 0xffff; umad_set_addr(umad, 0xffff, 0, 0, 0); if (path) memcpy(smp->initial_path, path->path, path->hop_cnt+1); if (data) memcpy(smp->data, data, sizeof smp->data); smp->hop_cnt = path->hop_cnt; } int str2DRPath(char *str, DRPath *path) { char *s; path->hop_cnt = -1; //DEBUG("DR str: %s", str); while (str && *str) { if ((s = strchr(str, ','))) *s = 0; path->path[++path->hop_cnt] = atoi(str); if (!s) break; str = s+1; } #if 0 if (path->path[0] != 0 || (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) { DEBUG("hop 0 != 0 or hop 1 != dev_port"); return -1; } #endif return path->hop_cnt; } From mst at mellanox.co.il Sat Feb 10 09:51:18 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Feb 2007 19:51:18 +0200 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <45CCB6DF.3020602@ichips.intel.com> References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> <20070209080418.GQ6560@mellanox.co.il> <45CCB6DF.3020602@ichips.intel.com> Message-ID: <20070210175118.GX6560@mellanox.co.il> > > + > > > > It seems same goes for > > > > + mc = kzalloc(sizeof(*mc), GFP_KERNEL); > > + if (!mc) > > + return NULL; > > We would need to set events_reported. IMO, probably worth it to init just this one field rather than use up initialized memory - and I think it's clearer. -- MST From rdreier at cisco.com Sat Feb 10 10:32:08 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 10 Feb 2007 10:32:08 -0800 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <20070210175118.GX6560@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 10 Feb 2007 19:51:18 +0200") References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> <20070209080418.GQ6560@mellanox.co.il> <45CCB6DF.3020602@ichips.intel.com> <20070210175118.GX6560@mellanox.co.il> Message-ID: > IMO, probably worth it to init just this one field rather than use up > initialized memory - and I think it's clearer. What do you mean by using up initialized memory? kzalloc() just does a memset(0), and it's not like there's a limit on the number of times we're allowed to call memset(). - R. From swise at opengridcomputing.com Sat Feb 10 10:52:53 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 10 Feb 2007 12:52:53 -0600 Subject: [openib-general] [PATCH] for-2.6.21 Remove hw/cxgb3/core subdirectory. Message-ID: <1171133573.11017.41.camel@stevo-desktop> From: Steve Wise Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/Makefile | 4 drivers/infiniband/hw/cxgb3/core/cxio_dbg.c | 205 ---- drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 1280 ---------------------- drivers/infiniband/hw/cxgb3/core/cxio_hal.h | 201 --- drivers/infiniband/hw/cxgb3/core/cxio_resource.c | 331 ------ drivers/infiniband/hw/cxgb3/core/cxio_resource.h | 70 - drivers/infiniband/hw/cxgb3/core/cxio_wr.h | 685 ------------ drivers/infiniband/hw/cxgb3/cxio_dbg.c | 205 ++++ drivers/infiniband/hw/cxgb3/cxio_hal.c | 1280 ++++++++++++++++++++++ drivers/infiniband/hw/cxgb3/cxio_hal.h | 201 +++ drivers/infiniband/hw/cxgb3/cxio_resource.c | 331 ++++++ drivers/infiniband/hw/cxgb3/cxio_resource.h | 70 + drivers/infiniband/hw/cxgb3/cxio_wr.h | 685 ++++++++++++ drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 14 files changed, 2775 insertions(+), 2775 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/Makefile b/drivers/infiniband/hw/cxgb3/Makefile index ae63195..0e110f3 100644 --- a/drivers/infiniband/hw/cxgb3/Makefile +++ b/drivers/infiniband/hw/cxgb3/Makefile @@ -4,9 +4,9 @@ EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/ obj-$(CONFIG_INFINIBAND_CXGB3) += iw_cxgb3.o iw_cxgb3-y := iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \ - iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o + iwch_provider.o iwch.o cxio_hal.o cxio_resource.o ifdef CONFIG_INFINIBAND_CXGB3_DEBUG EXTRA_CFLAGS += -DDEBUG -iw_cxgb3-y += core/cxio_dbg.o +iw_cxgb3-y += cxio_dbg.o endif diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c deleted file mode 100644 index dfaa704..0000000 --- a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c +++ /dev/null @@ -1,205 +0,0 @@ -/* - * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - */ -#ifdef DEBUG -#include -#include "common.h" -#include "cxgb3_ioctl.h" -#include "cxio_hal.h" -#include "cxio_wr.h" - -void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag) -{ - struct ch_mem_range *m; - u64 *data; - int rc; - int size = 32; - - m = kmalloc(sizeof(*m) + size, GFP_ATOMIC); - if (!m) { - PDBG("%s couldn't allocate memory.\n", __FUNCTION__); - return; - } - m->mem_id = MEM_PMRX; - m->addr = (stag>>8) * 32 + rdev->rnic_info.tpt_base; - m->len = size; - PDBG("%s TPT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len); - rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); - if (rc) { - PDBG("%s toectl returned error %d\n", __FUNCTION__, rc); - kfree(m); - return; - } - - data = (u64 *)m->buf; - while (size > 0) { - PDBG("TPT %08x: %016llx\n", m->addr, (u64)*data); - size -= 8; - data++; - m->addr += 8; - } - kfree(m); -} - -void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, uint len, u8 shift) -{ - struct ch_mem_range *m; - u64 *data; - int rc; - int size, npages; - - shift += 12; - npages = (len + (1ULL << shift) - 1) >> shift; - size = npages * sizeof(u64); - - m = kmalloc(sizeof(*m) + size, GFP_ATOMIC); - if (!m) { - PDBG("%s couldn't allocate memory.\n", __FUNCTION__); - return; - } - m->mem_id = MEM_PMRX; - m->addr = pbl_addr; - m->len = size; - PDBG("%s PBL addr 0x%x len %d depth %d\n", - __FUNCTION__, m->addr, m->len, npages); - rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); - if (rc) { - PDBG("%s toectl returned error %d\n", __FUNCTION__, rc); - kfree(m); - return; - } - - data = (u64 *)m->buf; - while (size > 0) { - PDBG("PBL %08x: %016llx\n", m->addr, (u64)*data); - size -= 8; - data++; - m->addr += 8; - } - kfree(m); -} - -void cxio_dump_wqe(union t3_wr *wqe) -{ - __be64 *data = (__be64 *)wqe; - uint size = (uint)(be64_to_cpu(*data) & 0xff); - - if (size == 0) - size = 8; - while (size > 0) { - PDBG("WQE %p: %016llx\n", data, be64_to_cpu(*data)); - size--; - data++; - } -} - -void cxio_dump_wce(struct t3_cqe *wce) -{ - __be64 *data = (__be64 *)wce; - int size = sizeof(*wce); - - while (size > 0) { - PDBG("WCE %p: %016llx\n", data, be64_to_cpu(*data)); - size -= 8; - data++; - } -} - -void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents) -{ - struct ch_mem_range *m; - int size = nents * 64; - u64 *data; - int rc; - - m = kmalloc(sizeof(*m) + size, GFP_ATOMIC); - if (!m) { - PDBG("%s couldn't allocate memory.\n", __FUNCTION__); - return; - } - m->mem_id = MEM_PMRX; - m->addr = ((hwtid)<<10) + rdev->rnic_info.rqt_base; - m->len = size; - PDBG("%s RQT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len); - rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); - if (rc) { - PDBG("%s toectl returned error %d\n", __FUNCTION__, rc); - kfree(m); - return; - } - - data = (u64 *)m->buf; - while (size > 0) { - PDBG("RQT %08x: %016llx\n", m->addr, (u64)*data); - size -= 8; - data++; - m->addr += 8; - } - kfree(m); -} - -void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid) -{ - struct ch_mem_range *m; - int size = TCB_SIZE; - u32 *data; - int rc; - - m = kmalloc(sizeof(*m) + size, GFP_ATOMIC); - if (!m) { - PDBG("%s couldn't allocate memory.\n", __FUNCTION__); - return; - } - m->mem_id = MEM_CM; - m->addr = hwtid * size; - m->len = size; - PDBG("%s TCB %d len %d\n", __FUNCTION__, m->addr, m->len); - rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); - if (rc) { - PDBG("%s toectl returned error %d\n", __FUNCTION__, rc); - kfree(m); - return; - } - - data = (u32 *)m->buf; - while (size > 0) { - printk("%2u: %08x %08x %08x %08x %08x %08x %08x %08x\n", - m->addr, - *(data+2), *(data+3), *(data),*(data+1), - *(data+6), *(data+7), *(data+4), *(data+5)); - size -= 32; - data += 8; - m->addr += 32; - } - kfree(m); -} -#endif diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c deleted file mode 100644 index 19553b3..0000000 --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +++ /dev/null @@ -1,1280 +0,0 @@ -/* - * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - */ -#include -#include - -#include -#include -#include -#include - -#include "cxio_resource.h" -#include "cxio_hal.h" -#include "cxgb3_offload.h" -#include "sge_defs.h" - -static LIST_HEAD(rdev_list); -static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; - -static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) -{ - struct cxio_rdev *rdev; - - list_for_each_entry(rdev, &rdev_list, entry) - if (!strcmp(rdev->dev_name, dev_name)) - return rdev; - return NULL; -} - -static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev - *tdev) -{ - struct cxio_rdev *rdev; - - list_for_each_entry(rdev, &rdev_list, entry) - if (rdev->t3cdev_p == tdev) - return rdev; - return NULL; -} - -int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq, - enum t3_cq_opcode op, u32 credit) -{ - int ret; - struct t3_cqe *cqe; - u32 rptr; - - struct rdma_cq_op setup; - setup.id = cq->cqid; - setup.credits = (op == CQ_CREDIT_UPDATE) ? credit : 0; - setup.op = op; - ret = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_OP, &setup); - - if ((ret < 0) || (op == CQ_CREDIT_UPDATE)) - return ret; - - /* - * If the rearm returned an index other than our current index, - * then there might be CQE's in flight (being DMA'd). We must wait - * here for them to complete or the consumer can miss a notification. - */ - if (Q_PTR2IDX((cq->rptr), cq->size_log2) != ret) { - int i=0; - - rptr = cq->rptr; - - /* - * Keep the generation correct by bumping rptr until it - * matches the index returned by the rearm - 1. - */ - while (Q_PTR2IDX((rptr+1), cq->size_log2) != ret) - rptr++; - - /* - * Now rptr is the index for the (last) cqe that was - * in-flight at the time the HW rearmed the CQ. We - * spin until that CQE is valid. - */ - cqe = cq->queue + Q_PTR2IDX(rptr, cq->size_log2); - while (!CQ_VLD_ENTRY(rptr, cq->size_log2, cqe)) { - udelay(1); - if (i++ > 1000000) { - BUG_ON(1); - printk(KERN_ERR "%s: stalled rnic\n", - rdev_p->dev_name); - return -EIO; - } - } - } - return 0; -} - -static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) -{ - struct rdma_cq_setup setup; - setup.id = cqid; - setup.base_addr = 0; /* NULL address */ - setup.size = 0; /* disaable the CQ */ - setup.credits = 0; - setup.credit_thres = 0; - setup.ovfl_mode = 0; - return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); -} - -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) -{ - u64 sge_cmd; - struct t3_modify_qp_wr *wqe; - struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); - if (!skb) { - PDBG("%s alloc_skb failed\n", __FUNCTION__); - return -ENOMEM; - } - wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); - memset(wqe, 0, sizeof(*wqe)); - build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7); - wqe->flags = cpu_to_be32(MODQP_WRITE_EC); - sge_cmd = qpid << 8 | 3; - wqe->sge_cmd = cpu_to_be64(sge_cmd); - skb->priority = CPL_PRIORITY_CONTROL; - return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); -} - -int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) -{ - struct rdma_cq_setup setup; - int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe); - - cq->cqid = cxio_hal_get_cqid(rdev_p->rscp); - if (!cq->cqid) - return -ENOMEM; - cq->sw_queue = kzalloc(size, GFP_KERNEL); - if (!cq->sw_queue) - return -ENOMEM; - cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), - (1UL << (cq->size_log2)) * - sizeof(struct t3_cqe), - &(cq->dma_addr), GFP_KERNEL); - if (!cq->queue) { - kfree(cq->sw_queue); - return -ENOMEM; - } - pci_unmap_addr_set(cq, mapping, cq->dma_addr); - memset(cq->queue, 0, size); - setup.id = cq->cqid; - setup.base_addr = (u64) (cq->dma_addr); - setup.size = 1UL << cq->size_log2; - setup.credits = 65535; - setup.credit_thres = 1; - if (rdev_p->t3cdev_p->type == T3B) - setup.ovfl_mode = 0; - else - setup.ovfl_mode = 1; - return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); -} - -int cxio_resize_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) -{ - struct rdma_cq_setup setup; - setup.id = cq->cqid; - setup.base_addr = (u64) (cq->dma_addr); - setup.size = 1UL << cq->size_log2; - setup.credits = setup.size; - setup.credit_thres = setup.size; /* TBD: overflow recovery */ - setup.ovfl_mode = 1; - return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); -} - -static u32 get_qpid(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx) -{ - struct cxio_qpid_list *entry; - u32 qpid; - int i; - - mutex_lock(&uctx->lock); - if (!list_empty(&uctx->qpids)) { - entry = list_entry(uctx->qpids.next, struct cxio_qpid_list, - entry); - list_del(&entry->entry); - qpid = entry->qpid; - kfree(entry); - } else { - qpid = cxio_hal_get_qpid(rdev_p->rscp); - if (!qpid) - goto out; - for (i = qpid+1; i & rdev_p->qpmask; i++) { - entry = kmalloc(sizeof *entry, GFP_KERNEL); - if (!entry) - break; - entry->qpid = i; - list_add_tail(&entry->entry, &uctx->qpids); - } - } -out: - mutex_unlock(&uctx->lock); - PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid); - return qpid; -} - -static void put_qpid(struct cxio_rdev *rdev_p, u32 qpid, - struct cxio_ucontext *uctx) -{ - struct cxio_qpid_list *entry; - - entry = kmalloc(sizeof *entry, GFP_KERNEL); - if (!entry) - return; - PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid); - entry->qpid = qpid; - mutex_lock(&uctx->lock); - list_add_tail(&entry->entry, &uctx->qpids); - mutex_unlock(&uctx->lock); -} - -void cxio_release_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx) -{ - struct list_head *pos, *nxt; - struct cxio_qpid_list *entry; - - mutex_lock(&uctx->lock); - list_for_each_safe(pos, nxt, &uctx->qpids) { - entry = list_entry(pos, struct cxio_qpid_list, entry); - list_del_init(&entry->entry); - if (!(entry->qpid & rdev_p->qpmask)) - cxio_hal_put_qpid(rdev_p->rscp, entry->qpid); - kfree(entry); - } - mutex_unlock(&uctx->lock); -} - -void cxio_init_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx) -{ - INIT_LIST_HEAD(&uctx->qpids); - mutex_init(&uctx->lock); -} - -int cxio_create_qp(struct cxio_rdev *rdev_p, u32 kernel_domain, - struct t3_wq *wq, struct cxio_ucontext *uctx) -{ - int depth = 1UL << wq->size_log2; - int rqsize = 1UL << wq->rq_size_log2; - - wq->qpid = get_qpid(rdev_p, uctx); - if (!wq->qpid) - return -ENOMEM; - - wq->rq = kzalloc(depth * sizeof(u64), GFP_KERNEL); - if (!wq->rq) - goto err1; - - wq->rq_addr = cxio_hal_rqtpool_alloc(rdev_p, rqsize); - if (!wq->rq_addr) - goto err2; - - wq->sq = kzalloc(depth * sizeof(struct t3_swsq), GFP_KERNEL); - if (!wq->sq) - goto err3; - - wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), - depth * sizeof(union t3_wr), - &(wq->dma_addr), GFP_KERNEL); - if (!wq->queue) - goto err4; - - memset(wq->queue, 0, depth * sizeof(union t3_wr)); - pci_unmap_addr_set(wq, mapping, wq->dma_addr); - wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr; - if (!kernel_domain) - wq->udb = (u64)rdev_p->rnic_info.udbell_physbase + - (wq->qpid << rdev_p->qpshift); - PDBG("%s qpid 0x%x doorbell 0x%p udb 0x%llx\n", __FUNCTION__, - wq->qpid, wq->doorbell, wq->udb); - return 0; -err4: - kfree(wq->sq); -err3: - cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, rqsize); -err2: - kfree(wq->rq); -err1: - put_qpid(rdev_p, wq->qpid, uctx); - return -ENOMEM; -} - -int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) -{ - int err; - err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid); - kfree(cq->sw_queue); - dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), - (1UL << (cq->size_log2)) - * sizeof(struct t3_cqe), cq->queue, - pci_unmap_addr(cq, mapping)); - cxio_hal_put_cqid(rdev_p->rscp, cq->cqid); - return err; -} - -int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq, - struct cxio_ucontext *uctx) -{ - dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), - (1UL << (wq->size_log2)) - * sizeof(union t3_wr), wq->queue, - pci_unmap_addr(wq, mapping)); - kfree(wq->sq); - cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2)); - kfree(wq->rq); - put_qpid(rdev_p, wq->qpid, uctx); - return 0; -} - -static void insert_recv_cqe(struct t3_wq *wq, struct t3_cq *cq) -{ - struct t3_cqe cqe; - - PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__, - wq, cq, cq->sw_rptr, cq->sw_wptr); - memset(&cqe, 0, sizeof(cqe)); - cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) | - V_CQE_OPCODE(T3_SEND) | - V_CQE_TYPE(0) | - V_CQE_SWCQE(1) | - V_CQE_QPID(wq->qpid) | - V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr, - cq->size_log2))); - *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe; - cq->sw_wptr++; -} - -void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count) -{ - u32 ptr; - - PDBG("%s wq %p cq %p\n", __FUNCTION__, wq, cq); - - /* flush RQ */ - PDBG("%s rq_rptr %u rq_wptr %u skip count %u\n", __FUNCTION__, - wq->rq_rptr, wq->rq_wptr, count); - ptr = wq->rq_rptr + count; - while (ptr++ != wq->rq_wptr) - insert_recv_cqe(wq, cq); -} - -static void insert_sq_cqe(struct t3_wq *wq, struct t3_cq *cq, - struct t3_swsq *sqp) -{ - struct t3_cqe cqe; - - PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__, - wq, cq, cq->sw_rptr, cq->sw_wptr); - memset(&cqe, 0, sizeof(cqe)); - cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) | - V_CQE_OPCODE(sqp->opcode) | - V_CQE_TYPE(1) | - V_CQE_SWCQE(1) | - V_CQE_QPID(wq->qpid) | - V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr, - cq->size_log2))); - cqe.u.scqe.wrid_hi = sqp->sq_wptr; - - *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe; - cq->sw_wptr++; -} - -void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count) -{ - __u32 ptr; - struct t3_swsq *sqp = wq->sq + Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2); - - ptr = wq->sq_rptr + count; - sqp += count; - while (ptr != wq->sq_wptr) { - insert_sq_cqe(wq, cq, sqp); - sqp++; - ptr++; - } -} - -/* - * Move all CQEs from the HWCQ into the SWCQ. - */ -void cxio_flush_hw_cq(struct t3_cq *cq) -{ - struct t3_cqe *cqe, *swcqe; - - PDBG("%s cq %p cqid 0x%x\n", __FUNCTION__, cq, cq->cqid); - cqe = cxio_next_hw_cqe(cq); - while (cqe) { - PDBG("%s flushing hwcq rptr 0x%x to swcq wptr 0x%x\n", - __FUNCTION__, cq->rptr, cq->sw_wptr); - swcqe = cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2); - *swcqe = *cqe; - swcqe->header |= cpu_to_be32(V_CQE_SWCQE(1)); - cq->sw_wptr++; - cq->rptr++; - cqe = cxio_next_hw_cqe(cq); - } -} - -static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) -{ - if (CQE_OPCODE(*cqe) == T3_TERMINATE) - return 0; - - if ((CQE_OPCODE(*cqe) == T3_RDMA_WRITE) && RQ_TYPE(*cqe)) - return 0; - - if ((CQE_OPCODE(*cqe) == T3_READ_RESP) && SQ_TYPE(*cqe)) - return 0; - - if ((CQE_OPCODE(*cqe) == T3_SEND) && RQ_TYPE(*cqe) && - Q_EMPTY(wq->rq_rptr, wq->rq_wptr)) - return 0; - - return 1; -} - -void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count) -{ - struct t3_cqe *cqe; - u32 ptr; - - *count = 0; - ptr = cq->sw_rptr; - while (!Q_EMPTY(ptr, cq->sw_wptr)) { - cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2)); - if ((SQ_TYPE(*cqe) || (CQE_OPCODE(*cqe) == T3_READ_RESP)) && - (CQE_QPID(*cqe) == wq->qpid)) - (*count)++; - ptr++; - } - PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count); -} - -void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count) -{ - struct t3_cqe *cqe; - u32 ptr; - - *count = 0; - PDBG("%s count zero %d\n", __FUNCTION__, *count); - ptr = cq->sw_rptr; - while (!Q_EMPTY(ptr, cq->sw_wptr)) { - cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2)); - if (RQ_TYPE(*cqe) && (CQE_OPCODE(*cqe) != T3_READ_RESP) && - (CQE_QPID(*cqe) == wq->qpid) && cqe_completes_wr(cqe, wq)) - (*count)++; - ptr++; - } - PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count); -} - -static int cxio_hal_init_ctrl_cq(struct cxio_rdev *rdev_p) -{ - struct rdma_cq_setup setup; - setup.id = 0; - setup.base_addr = 0; /* NULL address */ - setup.size = 1; /* enable the CQ */ - setup.credits = 0; - - /* force SGE to redirect to RspQ and interrupt */ - setup.credit_thres = 0; - setup.ovfl_mode = 1; - return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); -} - -static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) -{ - int err; - u64 sge_cmd, ctx0, ctx1; - u64 base_addr; - struct t3_modify_qp_wr *wqe; - struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); - - - if (!skb) { - PDBG("%s alloc_skb failed\n", __FUNCTION__); - return -ENOMEM; - } - err = cxio_hal_init_ctrl_cq(rdev_p); - if (err) { - PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err); - return err; - } - rdev_p->ctrl_qp.workq = dma_alloc_coherent( - &(rdev_p->rnic_info.pdev->dev), - (1 << T3_CTRL_QP_SIZE_LOG2) * - sizeof(union t3_wr), - &(rdev_p->ctrl_qp.dma_addr), - GFP_KERNEL); - if (!rdev_p->ctrl_qp.workq) { - PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__); - return -ENOMEM; - } - pci_unmap_addr_set(&rdev_p->ctrl_qp, mapping, - rdev_p->ctrl_qp.dma_addr); - rdev_p->ctrl_qp.doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr; - memset(rdev_p->ctrl_qp.workq, 0, - (1 << T3_CTRL_QP_SIZE_LOG2) * sizeof(union t3_wr)); - - init_MUTEX(&rdev_p->ctrl_qp.sem); - init_waitqueue_head(&rdev_p->ctrl_qp.waitq); - - /* update HW Ctrl QP context */ - base_addr = rdev_p->ctrl_qp.dma_addr; - base_addr >>= 12; - ctx0 = (V_EC_SIZE((1 << T3_CTRL_QP_SIZE_LOG2)) | - V_EC_BASE_LO((u32) base_addr & 0xffff)); - ctx0 <<= 32; - ctx0 |= V_EC_CREDITS(FW_WR_NUM); - base_addr >>= 16; - ctx1 = (u32) base_addr; - base_addr >>= 32; - ctx1 |= ((u64) (V_EC_BASE_HI((u32) base_addr & 0xf) | V_EC_RESPQ(0) | - V_EC_TYPE(0) | V_EC_GEN(1) | - V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32; - wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); - memset(wqe, 0, sizeof(*wqe)); - build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1, - T3_CTL_QP_TID, 7); - wqe->flags = cpu_to_be32(MODQP_WRITE_EC); - sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3; - wqe->sge_cmd = cpu_to_be64(sge_cmd); - wqe->ctx1 = cpu_to_be64(ctx1); - wqe->ctx0 = cpu_to_be64(ctx0); - PDBG("CtrlQP dma_addr 0x%llx workq %p size %d\n", - (u64) rdev_p->ctrl_qp.dma_addr, rdev_p->ctrl_qp.workq, - 1 << T3_CTRL_QP_SIZE_LOG2); - skb->priority = CPL_PRIORITY_CONTROL; - return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); -} - -static int cxio_hal_destroy_ctrl_qp(struct cxio_rdev *rdev_p) -{ - dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), - (1UL << T3_CTRL_QP_SIZE_LOG2) - * sizeof(union t3_wr), rdev_p->ctrl_qp.workq, - pci_unmap_addr(&rdev_p->ctrl_qp, mapping)); - return cxio_hal_clear_qp_ctx(rdev_p, T3_CTRL_QP_ID); -} - -/* write len bytes of data into addr (32B aligned address) - * If data is NULL, clear len byte of memory to zero. - * caller aquires the sem before the call - */ -static int cxio_hal_ctrl_qp_write_mem(struct cxio_rdev *rdev_p, u32 addr, - u32 len, void *data, int completion) -{ - u32 i, nr_wqe, copy_len; - u8 *copy_data; - u8 wr_len, utx_len; /* lenght in 8 byte flit */ - enum t3_wr_flags flag; - __be64 *wqe; - u64 utx_cmd; - addr &= 0x7FFFFFF; - nr_wqe = len % 96 ? len / 96 + 1 : len / 96; /* 96B max per WQE */ - PDBG("%s wptr 0x%x rptr 0x%x len %d, nr_wqe %d data %p addr 0x%0x\n", - __FUNCTION__, rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, len, - nr_wqe, data, addr); - utx_len = 3; /* in 32B unit */ - for (i = 0; i < nr_wqe; i++) { - if (Q_FULL(rdev_p->ctrl_qp.rptr, rdev_p->ctrl_qp.wptr, - T3_CTRL_QP_SIZE_LOG2)) { - PDBG("%s ctrl_qp full wtpr 0x%0x rptr 0x%0x, " - "wait for more space i %d\n", __FUNCTION__, - rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, i); - if (wait_event_interruptible(rdev_p->ctrl_qp.waitq, - !Q_FULL(rdev_p->ctrl_qp.rptr, - rdev_p->ctrl_qp.wptr, - T3_CTRL_QP_SIZE_LOG2))) { - PDBG("%s ctrl_qp workq interrupted\n", - __FUNCTION__); - return -ERESTARTSYS; - } - PDBG("%s ctrl_qp wakeup, continue posting work request " - "i %d\n", __FUNCTION__, i); - } - wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr % - (1 << T3_CTRL_QP_SIZE_LOG2))); - flag = 0; - if (i == (nr_wqe - 1)) { - /* last WQE */ - flag = completion ? T3_COMPLETION_FLAG : 0; - if (len % 32) - utx_len = len / 32 + 1; - else - utx_len = len / 32; - } - - /* - * Force a CQE to return the credit to the workq in case - * we posted more than half the max QP size of WRs - */ - if ((i != 0) && - (i % (((1 << T3_CTRL_QP_SIZE_LOG2)) >> 1) == 0)) { - flag = T3_COMPLETION_FLAG; - PDBG("%s force completion at i %d\n", __FUNCTION__, i); - } - - /* build the utx mem command */ - wqe += (sizeof(struct t3_bypass_wr) >> 3); - utx_cmd = (T3_UTX_MEM_WRITE << 28) | (addr + i * 3); - utx_cmd <<= 32; - utx_cmd |= (utx_len << 28) | ((utx_len << 2) + 1); - *wqe = cpu_to_be64(utx_cmd); - wqe++; - copy_data = (u8 *) data + i * 96; - copy_len = len > 96 ? 96 : len; - - /* clear memory content if data is NULL */ - if (data) - memcpy(wqe, copy_data, copy_len); - else - memset(wqe, 0, copy_len); - if (copy_len % 32) - memset(((u8 *) wqe) + copy_len, 0, - 32 - (copy_len % 32)); - wr_len = ((sizeof(struct t3_bypass_wr)) >> 3) + 1 + - (utx_len << 2); - wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr % - (1 << T3_CTRL_QP_SIZE_LOG2))); - - /* wptr in the WRID[31:0] */ - ((union t3_wrid *)(wqe+1))->id0.low = rdev_p->ctrl_qp.wptr; - - /* - * This must be the last write with a memory barrier - * for the genbit - */ - build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_BP, flag, - Q_GENBIT(rdev_p->ctrl_qp.wptr, - T3_CTRL_QP_SIZE_LOG2), T3_CTRL_QP_ID, - wr_len); - if (flag == T3_COMPLETION_FLAG) - ring_doorbell(rdev_p->ctrl_qp.doorbell, T3_CTRL_QP_ID); - len -= 96; - rdev_p->ctrl_qp.wptr++; - } - return 0; -} - -/* IN: stag key, pdid, perm, zbva, to, len, page_size, pbl, and pbl_size - * OUT: stag index, actual pbl_size, pbl_addr allocated. - * TBD: shared memory region support - */ -static int __cxio_tpt_op(struct cxio_rdev *rdev_p, u32 reset_tpt_entry, - u32 *stag, u8 stag_state, u32 pdid, - enum tpt_mem_type type, enum tpt_mem_perm perm, - u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl, - u32 *pbl_size, u32 *pbl_addr) -{ - int err; - struct tpt_entry tpt; - u32 stag_idx; - u32 wptr; - int rereg = (*stag != T3_STAG_UNSET); - - stag_state = stag_state > 0; - stag_idx = (*stag) >> 8; - - if ((!reset_tpt_entry) && !(*stag != T3_STAG_UNSET)) { - stag_idx = cxio_hal_get_stag(rdev_p->rscp); - if (!stag_idx) - return -ENOMEM; - *stag = (stag_idx << 8) | ((*stag) & 0xFF); - } - PDBG("%s stag_state 0x%0x type 0x%0x pdid 0x%0x, stag_idx 0x%x\n", - __FUNCTION__, stag_state, type, pdid, stag_idx); - - if (reset_tpt_entry) - cxio_hal_pblpool_free(rdev_p, *pbl_addr, *pbl_size << 3); - else if (!rereg) { - *pbl_addr = cxio_hal_pblpool_alloc(rdev_p, *pbl_size << 3); - if (!*pbl_addr) { - return -ENOMEM; - } - } - - down_interruptible(&rdev_p->ctrl_qp.sem); - - /* write PBL first if any - update pbl only if pbl list exist */ - if (pbl) { - - PDBG("%s *pdb_addr 0x%x, pbl_base 0x%x, pbl_size %d\n", - __FUNCTION__, *pbl_addr, rdev_p->rnic_info.pbl_base, - *pbl_size); - err = cxio_hal_ctrl_qp_write_mem(rdev_p, - (*pbl_addr >> 5), - (*pbl_size << 3), pbl, 0); - if (err) - goto ret; - } - - /* write TPT entry */ - if (reset_tpt_entry) - memset(&tpt, 0, sizeof(tpt)); - else { - tpt.valid_stag_pdid = cpu_to_be32(F_TPT_VALID | - V_TPT_STAG_KEY((*stag) & M_TPT_STAG_KEY) | - V_TPT_STAG_STATE(stag_state) | - V_TPT_STAG_TYPE(type) | V_TPT_PDID(pdid)); - BUG_ON(page_size >= 28); - tpt.flags_pagesize_qpid = cpu_to_be32(V_TPT_PERM(perm) | - F_TPT_MW_BIND_ENABLE | - V_TPT_ADDR_TYPE((zbva ? TPT_ZBTO : TPT_VATO)) | - V_TPT_PAGE_SIZE(page_size)); - tpt.rsvd_pbl_addr = reset_tpt_entry ? 0 : - cpu_to_be32(V_TPT_PBL_ADDR(PBL_OFF(rdev_p, *pbl_addr)>>3)); - tpt.len = cpu_to_be32(len); - tpt.va_hi = cpu_to_be32((u32) (to >> 32)); - tpt.va_low_or_fbo = cpu_to_be32((u32) (to & 0xFFFFFFFFULL)); - tpt.rsvd_bind_cnt_or_pstag = 0; - tpt.rsvd_pbl_size = reset_tpt_entry ? 0 : - cpu_to_be32(V_TPT_PBL_SIZE((*pbl_size) >> 2)); - } - err = cxio_hal_ctrl_qp_write_mem(rdev_p, - stag_idx + - (rdev_p->rnic_info.tpt_base >> 5), - sizeof(tpt), &tpt, 1); - - /* release the stag index to free pool */ - if (reset_tpt_entry) - cxio_hal_put_stag(rdev_p->rscp, stag_idx); -ret: - wptr = rdev_p->ctrl_qp.wptr; - up(&rdev_p->ctrl_qp.sem); - if (!err) - if (wait_event_interruptible(rdev_p->ctrl_qp.waitq, - SEQ32_GE(rdev_p->ctrl_qp.rptr, - wptr))) - return -ERESTARTSYS; - return err; -} - -/* IN : stag key, pdid, pbl_size - * Out: stag index, actaul pbl_size, and pbl_addr allocated. - */ -int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr) -{ - *stag = T3_STAG_UNSET; - return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR, - perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr)); -} - -int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, - enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, - u8 page_size, __be64 *pbl, u32 *pbl_size, - u32 *pbl_addr) -{ - *stag = T3_STAG_UNSET; - return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm, - zbva, to, len, page_size, pbl, pbl_size, pbl_addr); -} - -int cxio_reregister_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, - enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, - u8 page_size, __be64 *pbl, u32 *pbl_size, - u32 *pbl_addr) -{ - return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm, - zbva, to, len, page_size, pbl, pbl_size, pbl_addr); -} - -int cxio_dereg_mem(struct cxio_rdev *rdev_p, u32 stag, u32 pbl_size, - u32 pbl_addr) -{ - return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL, - &pbl_size, &pbl_addr); -} - -int cxio_allocate_window(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid) -{ - u32 pbl_size = 0; - *stag = T3_STAG_UNSET; - return __cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_MW, 0, 0, 0ULL, 0, 0, - NULL, &pbl_size, NULL); -} - -int cxio_deallocate_window(struct cxio_rdev *rdev_p, u32 stag) -{ - return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL, - NULL, NULL); -} - -int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr) -{ - struct t3_rdma_init_wr *wqe; - struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_ATOMIC); - if (!skb) - return -ENOMEM; - PDBG("%s rdev_p %p\n", __FUNCTION__, rdev_p); - wqe = (struct t3_rdma_init_wr *) __skb_put(skb, sizeof(*wqe)); - wqe->wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_INIT)); - wqe->wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(attr->tid) | - V_FW_RIWR_LEN(sizeof(*wqe) >> 3)); - wqe->wrid.id1 = 0; - wqe->qpid = cpu_to_be32(attr->qpid); - wqe->pdid = cpu_to_be32(attr->pdid); - wqe->scqid = cpu_to_be32(attr->scqid); - wqe->rcqid = cpu_to_be32(attr->rcqid); - wqe->rq_addr = cpu_to_be32(attr->rq_addr - rdev_p->rnic_info.rqt_base); - wqe->rq_size = cpu_to_be32(attr->rq_size); - wqe->mpaattrs = attr->mpaattrs; - wqe->qpcaps = attr->qpcaps; - wqe->ulpdu_size = cpu_to_be16(attr->tcp_emss); - wqe->flags = cpu_to_be32(attr->flags); - wqe->ord = cpu_to_be32(attr->ord); - wqe->ird = cpu_to_be32(attr->ird); - wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr); - wqe->qp_dma_size = cpu_to_be32(attr->qp_dma_size); - wqe->rsvd = 0; - skb->priority = 0; /* 0=>ToeQ; 1=>CtrlQ */ - return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); -} - -void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb) -{ - cxio_ev_cb = ev_cb; -} - -void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb) -{ - cxio_ev_cb = NULL; -} - -static int cxio_hal_ev_handler(struct t3cdev *t3cdev_p, struct sk_buff *skb) -{ - static int cnt; - struct cxio_rdev *rdev_p = NULL; - struct respQ_msg_t *rsp_msg = (struct respQ_msg_t *) skb->data; - PDBG("%d: %s cq_id 0x%x cq_ptr 0x%x genbit %0x overflow %0x an %0x" - " se %0x notify %0x cqbranch %0x creditth %0x\n", - cnt, __FUNCTION__, RSPQ_CQID(rsp_msg), RSPQ_CQPTR(rsp_msg), - RSPQ_GENBIT(rsp_msg), RSPQ_OVERFLOW(rsp_msg), RSPQ_AN(rsp_msg), - RSPQ_SE(rsp_msg), RSPQ_NOTIFY(rsp_msg), RSPQ_CQBRANCH(rsp_msg), - RSPQ_CREDIT_THRESH(rsp_msg)); - PDBG("CQE: QPID 0x%0x genbit %0x type 0x%0x status 0x%0x opcode %d " - "len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n", - CQE_QPID(rsp_msg->cqe), CQE_GENBIT(rsp_msg->cqe), - CQE_TYPE(rsp_msg->cqe), CQE_STATUS(rsp_msg->cqe), - CQE_OPCODE(rsp_msg->cqe), CQE_LEN(rsp_msg->cqe), - CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe)); - rdev_p = (struct cxio_rdev *)t3cdev_p->ulp; - if (!rdev_p) { - PDBG("%s called by t3cdev %p with null ulp\n", __FUNCTION__, - t3cdev_p); - return 0; - } - if (CQE_QPID(rsp_msg->cqe) == T3_CTRL_QP_ID) { - rdev_p->ctrl_qp.rptr = CQE_WRID_LOW(rsp_msg->cqe) + 1; - wake_up_interruptible(&rdev_p->ctrl_qp.waitq); - dev_kfree_skb_irq(skb); - } else if (CQE_QPID(rsp_msg->cqe) == 0xfff8) - dev_kfree_skb_irq(skb); - else if (cxio_ev_cb) - (*cxio_ev_cb) (rdev_p, skb); - else - dev_kfree_skb_irq(skb); - cnt++; - return 0; -} - -/* Caller takes care of locking if needed */ -int cxio_rdev_open(struct cxio_rdev *rdev_p) -{ - struct net_device *netdev_p = NULL; - int err = 0; - if (strlen(rdev_p->dev_name)) { - if (cxio_hal_find_rdev_by_name(rdev_p->dev_name)) { - return -EBUSY; - } - netdev_p = dev_get_by_name(rdev_p->dev_name); - if (!netdev_p) { - return -EINVAL; - } - dev_put(netdev_p); - } else if (rdev_p->t3cdev_p) { - if (cxio_hal_find_rdev_by_t3cdev(rdev_p->t3cdev_p)) { - return -EBUSY; - } - netdev_p = rdev_p->t3cdev_p->lldev; - strncpy(rdev_p->dev_name, rdev_p->t3cdev_p->name, - T3_MAX_DEV_NAME_LEN); - } else { - PDBG("%s t3cdev_p or dev_name must be set\n", __FUNCTION__); - return -EINVAL; - } - - list_add_tail(&rdev_p->entry, &rdev_list); - - PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name); - memset(&rdev_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp)); - if (!rdev_p->t3cdev_p) - rdev_p->t3cdev_p = T3CDEV(netdev_p); - rdev_p->t3cdev_p->ulp = (void *) rdev_p; - err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_GET_PARAMS, - &(rdev_p->rnic_info)); - if (err) { - printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n", - __FUNCTION__, rdev_p->t3cdev_p, err); - goto err1; - } - err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, GET_PORTS, - &(rdev_p->port_info)); - if (err) { - printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n", - __FUNCTION__, rdev_p->t3cdev_p, err); - goto err1; - } - - /* - * qpshift is the number of bits to shift the qpid left in order - * to get the correct address of the doorbell for that qp. - */ - cxio_init_ucontext(rdev_p, &rdev_p->uctx); - rdev_p->qpshift = PAGE_SHIFT - - ilog2(65536 >> - ilog2(rdev_p->rnic_info.udbell_len >> - PAGE_SHIFT)); - rdev_p->qpnr = rdev_p->rnic_info.udbell_len >> PAGE_SHIFT; - rdev_p->qpmask = (65536 >> ilog2(rdev_p->qpnr)) - 1; - PDBG("%s rnic %s info: tpt_base 0x%0x tpt_top 0x%0x num stags %d " - "pbl_base 0x%0x pbl_top 0x%0x rqt_base 0x%0x, rqt_top 0x%0x\n", - __FUNCTION__, rdev_p->dev_name, rdev_p->rnic_info.tpt_base, - rdev_p->rnic_info.tpt_top, cxio_num_stags(rdev_p), - rdev_p->rnic_info.pbl_base, - rdev_p->rnic_info.pbl_top, rdev_p->rnic_info.rqt_base, - rdev_p->rnic_info.rqt_top); - PDBG("udbell_len 0x%0x udbell_physbase 0x%lx kdb_addr %p qpshift %lu " - "qpnr %d qpmask 0x%x\n", - rdev_p->rnic_info.udbell_len, - rdev_p->rnic_info.udbell_physbase, rdev_p->rnic_info.kdb_addr, - rdev_p->qpshift, rdev_p->qpnr, rdev_p->qpmask); - - err = cxio_hal_init_ctrl_qp(rdev_p); - if (err) { - printk(KERN_ERR "%s error %d initializing ctrl_qp.\n", - __FUNCTION__, err); - goto err1; - } - err = cxio_hal_init_resource(rdev_p, cxio_num_stags(rdev_p), 0, - 0, T3_MAX_NUM_QP, T3_MAX_NUM_CQ, - T3_MAX_NUM_PD); - if (err) { - printk(KERN_ERR "%s error %d initializing hal resources.\n", - __FUNCTION__, err); - goto err2; - } - err = cxio_hal_pblpool_create(rdev_p); - if (err) { - printk(KERN_ERR "%s error %d initializing pbl mem pool.\n", - __FUNCTION__, err); - goto err3; - } - err = cxio_hal_rqtpool_create(rdev_p); - if (err) { - printk(KERN_ERR "%s error %d initializing rqt mem pool.\n", - __FUNCTION__, err); - goto err4; - } - return 0; -err4: - cxio_hal_pblpool_destroy(rdev_p); -err3: - cxio_hal_destroy_resource(rdev_p->rscp); -err2: - cxio_hal_destroy_ctrl_qp(rdev_p); -err1: - list_del(&rdev_p->entry); - return err; -} - -void cxio_rdev_close(struct cxio_rdev *rdev_p) -{ - if (rdev_p) { - cxio_hal_pblpool_destroy(rdev_p); - cxio_hal_rqtpool_destroy(rdev_p); - list_del(&rdev_p->entry); - rdev_p->t3cdev_p->ulp = NULL; - cxio_hal_destroy_ctrl_qp(rdev_p); - cxio_hal_destroy_resource(rdev_p->rscp); - } -} - -int __init cxio_hal_init(void) -{ - if (cxio_hal_init_rhdl_resource(T3_MAX_NUM_RI)) - return -ENOMEM; - t3_register_cpl_handler(CPL_ASYNC_NOTIF, cxio_hal_ev_handler); - return 0; -} - -void __exit cxio_hal_exit(void) -{ - struct cxio_rdev *rdev, *tmp; - - t3_register_cpl_handler(CPL_ASYNC_NOTIF, NULL); - list_for_each_entry_safe(rdev, tmp, &rdev_list, entry) - cxio_rdev_close(rdev); - cxio_hal_destroy_rhdl_resource(); -} - -static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) -{ - struct t3_swsq *sqp; - __u32 ptr = wq->sq_rptr; - int count = Q_COUNT(wq->sq_rptr, wq->sq_wptr); - - sqp = wq->sq + Q_PTR2IDX(ptr, wq->sq_size_log2); - while (count--) - if (!sqp->signaled) { - ptr++; - sqp = wq->sq + Q_PTR2IDX(ptr, wq->sq_size_log2); - } else if (sqp->complete) { - - /* - * Insert this completed cqe into the swcq. - */ - PDBG("%s moving cqe into swcq sq idx %ld cq idx %ld\n", - __FUNCTION__, Q_PTR2IDX(ptr, wq->sq_size_log2), - Q_PTR2IDX(cq->sw_wptr, cq->size_log2)); - sqp->cqe.header |= htonl(V_CQE_SWCQE(1)); - *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) - = sqp->cqe; - cq->sw_wptr++; - sqp->signaled = 0; - break; - } else - break; -} - -static inline void create_read_req_cqe(struct t3_wq *wq, - struct t3_cqe *hw_cqe, - struct t3_cqe *read_cqe) -{ - read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr; - read_cqe->len = wq->oldest_read->read_len; - read_cqe->header = htonl(V_CQE_QPID(CQE_QPID(*hw_cqe)) | - V_CQE_SWCQE(SW_CQE(*hw_cqe)) | - V_CQE_OPCODE(T3_READ_REQ) | - V_CQE_TYPE(1)); -} - -/* - * Return a ptr to the next read wr in the SWSQ or NULL. - */ -static inline void advance_oldest_read(struct t3_wq *wq) -{ - - u32 rptr = wq->oldest_read - wq->sq + 1; - u32 wptr = Q_PTR2IDX(wq->sq_wptr, wq->sq_size_log2); - - while (Q_PTR2IDX(rptr, wq->sq_size_log2) != wptr) { - wq->oldest_read = wq->sq + Q_PTR2IDX(rptr, wq->sq_size_log2); - - if (wq->oldest_read->opcode == T3_READ_REQ) - return; - rptr++; - } - wq->oldest_read = NULL; -} - -/* - * cxio_poll_cq - * - * Caller must: - * check the validity of the first CQE, - * supply the wq assicated with the qpid. - * - * credit: cq credit to return to sge. - * cqe_flushed: 1 iff the CQE is flushed. - * cqe: copy of the polled CQE. - * - * return value: - * 0 CQE returned, - * -1 CQE skipped, try again. - */ -int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe, - u8 *cqe_flushed, u64 *cookie, u32 *credit) -{ - int ret = 0; - struct t3_cqe *hw_cqe, read_cqe; - - *cqe_flushed = 0; - *credit = 0; - hw_cqe = cxio_next_cqe(cq); - - PDBG("%s CQE OOO %d qpid 0x%0x genbit %d type %d status 0x%0x" - " opcode 0x%0x len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n", - __FUNCTION__, CQE_OOO(*hw_cqe), CQE_QPID(*hw_cqe), - CQE_GENBIT(*hw_cqe), CQE_TYPE(*hw_cqe), CQE_STATUS(*hw_cqe), - CQE_OPCODE(*hw_cqe), CQE_LEN(*hw_cqe), CQE_WRID_HI(*hw_cqe), - CQE_WRID_LOW(*hw_cqe)); - - /* - * skip cqe's not affiliated with a QP. - */ - if (wq == NULL) { - ret = -1; - goto skip_cqe; - } - - /* - * Gotta tweak READ completions: - * 1) the cqe doesn't contain the sq_wptr from the wr. - * 2) opcode not reflected from the wr. - * 3) read_len not reflected from the wr. - * 4) cq_type is RQ_TYPE not SQ_TYPE. - */ - if (RQ_TYPE(*hw_cqe) && (CQE_OPCODE(*hw_cqe) == T3_READ_RESP)) { - - /* - * Don't write to the HWCQ, so create a new read req CQE - * in local memory. - */ - create_read_req_cqe(wq, hw_cqe, &read_cqe); - hw_cqe = &read_cqe; - advance_oldest_read(wq); - } - - /* - * T3A: Discard TERMINATE CQEs. - */ - if (CQE_OPCODE(*hw_cqe) == T3_TERMINATE) { - ret = -1; - wq->error = 1; - goto skip_cqe; - } - - if (CQE_STATUS(*hw_cqe) || wq->error) { - *cqe_flushed = wq->error; - wq->error = 1; - - /* - * T3A inserts errors into the CQE. We cannot return - * these as work completions. - */ - /* incoming write failures */ - if ((CQE_OPCODE(*hw_cqe) == T3_RDMA_WRITE) - && RQ_TYPE(*hw_cqe)) { - ret = -1; - goto skip_cqe; - } - /* incoming read request failures */ - if ((CQE_OPCODE(*hw_cqe) == T3_READ_RESP) && SQ_TYPE(*hw_cqe)) { - ret = -1; - goto skip_cqe; - } - - /* incoming SEND with no receive posted failures */ - if ((CQE_OPCODE(*hw_cqe) == T3_SEND) && RQ_TYPE(*hw_cqe) && - Q_EMPTY(wq->rq_rptr, wq->rq_wptr)) { - ret = -1; - goto skip_cqe; - } - goto proc_cqe; - } - - /* - * RECV completion. - */ - if (RQ_TYPE(*hw_cqe)) { - - /* - * HW only validates 4 bits of MSN. So we must validate that - * the MSN in the SEND is the next expected MSN. If its not, - * then we complete this with TPT_ERR_MSN and mark the wq in - * error. - */ - if (unlikely((CQE_WRID_MSN(*hw_cqe) != (wq->rq_rptr + 1)))) { - wq->error = 1; - hw_cqe->header |= htonl(V_CQE_STATUS(TPT_ERR_MSN)); - goto proc_cqe; - } - goto proc_cqe; - } - - /* - * If we get here its a send completion. - * - * Handle out of order completion. These get stuffed - * in the SW SQ. Then the SW SQ is walked to move any - * now in-order completions into the SW CQ. This handles - * 2 cases: - * 1) reaping unsignaled WRs when the first subsequent - * signaled WR is completed. - * 2) out of order read completions. - */ - if (!SW_CQE(*hw_cqe) && (CQE_WRID_SQ_WPTR(*hw_cqe) != wq->sq_rptr)) { - struct t3_swsq *sqp; - - PDBG("%s out of order completion going in swsq at idx %ld\n", - __FUNCTION__, - Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2)); - sqp = wq->sq + - Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2); - sqp->cqe = *hw_cqe; - sqp->complete = 1; - ret = -1; - goto flush_wq; - } - -proc_cqe: - *cqe = *hw_cqe; - - /* - * Reap the associated WR(s) that are freed up with this - * completion. - */ - if (SQ_TYPE(*hw_cqe)) { - wq->sq_rptr = CQE_WRID_SQ_WPTR(*hw_cqe); - PDBG("%s completing sq idx %ld\n", __FUNCTION__, - Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2)); - *cookie = (wq->sq + - Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2))->wr_id; - wq->sq_rptr++; - } else { - PDBG("%s completing rq idx %ld\n", __FUNCTION__, - Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2)); - *cookie = *(wq->rq + Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2)); - wq->rq_rptr++; - } - -flush_wq: - /* - * Flush any completed cqes that are now in-order. - */ - flush_completed_wrs(wq, cq); - -skip_cqe: - if (SW_CQE(*hw_cqe)) { - PDBG("%s cq %p cqid 0x%x skip sw cqe sw_rptr 0x%x\n", - __FUNCTION__, cq, cq->cqid, cq->sw_rptr); - ++cq->sw_rptr; - } else { - PDBG("%s cq %p cqid 0x%x skip hw cqe rptr 0x%x\n", - __FUNCTION__, cq, cq->cqid, cq->rptr); - ++cq->rptr; - - /* - * T3A: compute credits. - */ - if (((cq->rptr - cq->wptr) > (1 << (cq->size_log2 - 1))) - || ((cq->rptr - cq->wptr) >= 128)) { - *credit = cq->rptr - cq->wptr; - cq->wptr = cq->rptr; - } - } - return ret; -} diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h deleted file mode 100644 index 8fb2999..0000000 --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h +++ /dev/null @@ -1,201 +0,0 @@ -/* - * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - */ -#ifndef __CXIO_HAL_H__ -#define __CXIO_HAL_H__ - -#include -#include - -#include "t3_cpl.h" -#include "t3cdev.h" -#include "cxgb3_ctl_defs.h" -#include "cxio_wr.h" - -#define T3_CTRL_QP_ID FW_RI_SGEEC_START -#define T3_CTL_QP_TID FW_RI_TID_START -#define T3_CTRL_QP_SIZE_LOG2 8 -#define T3_CTRL_CQ_ID 0 - -/* TBD */ -#define T3_MAX_NUM_RI (1<<15) -#define T3_MAX_NUM_QP (1<<15) -#define T3_MAX_NUM_CQ (1<<15) -#define T3_MAX_NUM_PD (1<<15) -#define T3_MAX_PBL_SIZE 256 -#define T3_MAX_RQ_SIZE 1024 -#define T3_MAX_NUM_STAG (1<<15) - -#define T3_STAG_UNSET 0xffffffff - -#define T3_MAX_DEV_NAME_LEN 32 - -struct cxio_hal_ctrl_qp { - u32 wptr; - u32 rptr; - struct semaphore sem; /* for the wtpr, can sleep */ - wait_queue_head_t waitq; /* wait for RspQ/CQE msg */ - union t3_wr *workq; /* the work request queue */ - dma_addr_t dma_addr; /* pci bus address of the workq */ - DECLARE_PCI_UNMAP_ADDR(mapping) - void __iomem *doorbell; -}; - -struct cxio_hal_resource { - struct kfifo *tpt_fifo; - spinlock_t tpt_fifo_lock; - struct kfifo *qpid_fifo; - spinlock_t qpid_fifo_lock; - struct kfifo *cqid_fifo; - spinlock_t cqid_fifo_lock; - struct kfifo *pdid_fifo; - spinlock_t pdid_fifo_lock; -}; - -struct cxio_qpid_list { - struct list_head entry; - u32 qpid; -}; - -struct cxio_ucontext { - struct list_head qpids; - struct mutex lock; -}; - -struct cxio_rdev { - char dev_name[T3_MAX_DEV_NAME_LEN]; - struct t3cdev *t3cdev_p; - struct rdma_info rnic_info; - struct adap_ports port_info; - struct cxio_hal_resource *rscp; - struct cxio_hal_ctrl_qp ctrl_qp; - void *ulp; - unsigned long qpshift; - u32 qpnr; - u32 qpmask; - struct cxio_ucontext uctx; - struct gen_pool *pbl_pool; - struct gen_pool *rqt_pool; - struct list_head entry; -}; - -static inline int cxio_num_stags(struct cxio_rdev *rdev_p) -{ - return min((int)T3_MAX_NUM_STAG, (int)((rdev_p->rnic_info.tpt_top - rdev_p->rnic_info.tpt_base) >> 5)); -} - -typedef void (*cxio_hal_ev_callback_func_t) (struct cxio_rdev * rdev_p, - struct sk_buff * skb); - -#define RSPQ_CQID(rsp) (be32_to_cpu(rsp->cq_ptrid) & 0xffff) -#define RSPQ_CQPTR(rsp) ((be32_to_cpu(rsp->cq_ptrid) >> 16) & 0xffff) -#define RSPQ_GENBIT(rsp) ((be32_to_cpu(rsp->flags) >> 16) & 1) -#define RSPQ_OVERFLOW(rsp) ((be32_to_cpu(rsp->flags) >> 17) & 1) -#define RSPQ_AN(rsp) ((be32_to_cpu(rsp->flags) >> 18) & 1) -#define RSPQ_SE(rsp) ((be32_to_cpu(rsp->flags) >> 19) & 1) -#define RSPQ_NOTIFY(rsp) ((be32_to_cpu(rsp->flags) >> 20) & 1) -#define RSPQ_CQBRANCH(rsp) ((be32_to_cpu(rsp->flags) >> 21) & 1) -#define RSPQ_CREDIT_THRESH(rsp) ((be32_to_cpu(rsp->flags) >> 22) & 1) - -struct respQ_msg_t { - __be32 flags; /* flit 0 */ - __be32 cq_ptrid; - __be64 rsvd; /* flit 1 */ - struct t3_cqe cqe; /* flits 2-3 */ -}; - -enum t3_cq_opcode { - CQ_ARM_AN = 0x2, - CQ_ARM_SE = 0x6, - CQ_FORCE_AN = 0x3, - CQ_CREDIT_UPDATE = 0x7 -}; - -int cxio_rdev_open(struct cxio_rdev *rdev); -void cxio_rdev_close(struct cxio_rdev *rdev); -int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq, - enum t3_cq_opcode op, u32 credit); -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid); -int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq); -int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq); -int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq); -void cxio_release_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx); -void cxio_init_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx); -int cxio_create_qp(struct cxio_rdev *rdev, u32 kernel_domain, struct t3_wq *wq, - struct cxio_ucontext *uctx); -int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq, - struct cxio_ucontext *uctx); -int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode); -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr); -int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, - u8 page_size, __be64 *pbl, u32 *pbl_size, - u32 *pbl_addr); -int cxio_reregister_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, - u8 page_size, __be64 *pbl, u32 *pbl_size, - u32 *pbl_addr); -int cxio_dereg_mem(struct cxio_rdev *rdev, u32 stag, u32 pbl_size, - u32 pbl_addr); -int cxio_allocate_window(struct cxio_rdev *rdev, u32 * stag, u32 pdid); -int cxio_deallocate_window(struct cxio_rdev *rdev, u32 stag); -int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr); -void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb); -void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb); -u32 cxio_hal_get_rhdl(void); -void cxio_hal_put_rhdl(u32 rhdl); -u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp); -void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid); -int __init cxio_hal_init(void); -void __exit cxio_hal_exit(void); -void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count); -void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count); -void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count); -void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count); -void cxio_flush_hw_cq(struct t3_cq *cq); -int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe, - u8 *cqe_flushed, u64 *cookie, u32 *credit); - -#define MOD "iw_cxgb3: " -#define PDBG(fmt, args...) pr_debug(MOD fmt, ## args) - -#ifdef DEBUG -void cxio_dump_tpt(struct cxio_rdev *rev, u32 stag); -void cxio_dump_pbl(struct cxio_rdev *rev, u32 pbl_addr, uint len, u8 shift); -void cxio_dump_wqe(union t3_wr *wqe); -void cxio_dump_wce(struct t3_cqe *wce); -void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents); -void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid); -#endif - -#endif diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c deleted file mode 100644 index 997aa32..0000000 --- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c +++ /dev/null @@ -1,331 +0,0 @@ -/* - * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - */ -/* Crude resource management */ -#include -#include -#include -#include -#include -#include -#include "cxio_resource.h" -#include "cxio_hal.h" - -static struct kfifo *rhdl_fifo; -static spinlock_t rhdl_fifo_lock; - -#define RANDOM_SIZE 16 - -static int __cxio_init_resource_fifo(struct kfifo **fifo, - spinlock_t *fifo_lock, - u32 nr, u32 skip_low, - u32 skip_high, - int random) -{ - u32 i, j, entry = 0, idx; - u32 random_bytes; - u32 rarray[16]; - spin_lock_init(fifo_lock); - - *fifo = kfifo_alloc(nr * sizeof(u32), GFP_KERNEL, fifo_lock); - if (IS_ERR(*fifo)) - return -ENOMEM; - - for (i = 0; i < skip_low + skip_high; i++) - __kfifo_put(*fifo, (unsigned char *) &entry, sizeof(u32)); - if (random) { - j = 0; - random_bytes = random32(); - for (i = 0; i < RANDOM_SIZE; i++) - rarray[i] = i + skip_low; - for (i = skip_low + RANDOM_SIZE; i < nr - skip_high; i++) { - if (j >= RANDOM_SIZE) { - j = 0; - random_bytes = random32(); - } - idx = (random_bytes >> (j * 2)) & 0xF; - __kfifo_put(*fifo, - (unsigned char *) &rarray[idx], - sizeof(u32)); - rarray[idx] = i; - j++; - } - for (i = 0; i < RANDOM_SIZE; i++) - __kfifo_put(*fifo, - (unsigned char *) &rarray[i], - sizeof(u32)); - } else - for (i = skip_low; i < nr - skip_high; i++) - __kfifo_put(*fifo, (unsigned char *) &i, sizeof(u32)); - - for (i = 0; i < skip_low + skip_high; i++) - kfifo_get(*fifo, (unsigned char *) &entry, sizeof(u32)); - return 0; -} - -static int cxio_init_resource_fifo(struct kfifo **fifo, spinlock_t * fifo_lock, - u32 nr, u32 skip_low, u32 skip_high) -{ - return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low, - skip_high, 0)); -} - -static int cxio_init_resource_fifo_random(struct kfifo **fifo, - spinlock_t * fifo_lock, - u32 nr, u32 skip_low, u32 skip_high) -{ - - return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low, - skip_high, 1)); -} - -static int cxio_init_qpid_fifo(struct cxio_rdev *rdev_p) -{ - u32 i; - - spin_lock_init(&rdev_p->rscp->qpid_fifo_lock); - - rdev_p->rscp->qpid_fifo = kfifo_alloc(T3_MAX_NUM_QP * sizeof(u32), - GFP_KERNEL, - &rdev_p->rscp->qpid_fifo_lock); - if (IS_ERR(rdev_p->rscp->qpid_fifo)) - return -ENOMEM; - - for (i = 16; i < T3_MAX_NUM_QP; i++) - if (!(i & rdev_p->qpmask)) - __kfifo_put(rdev_p->rscp->qpid_fifo, - (unsigned char *) &i, sizeof(u32)); - return 0; -} - -int cxio_hal_init_rhdl_resource(u32 nr_rhdl) -{ - return cxio_init_resource_fifo(&rhdl_fifo, &rhdl_fifo_lock, nr_rhdl, 1, - 0); -} - -void cxio_hal_destroy_rhdl_resource(void) -{ - kfifo_free(rhdl_fifo); -} - -/* nr_* must be power of 2 */ -int cxio_hal_init_resource(struct cxio_rdev *rdev_p, - u32 nr_tpt, u32 nr_pbl, - u32 nr_rqt, u32 nr_qpid, u32 nr_cqid, u32 nr_pdid) -{ - int err = 0; - struct cxio_hal_resource *rscp; - - rscp = kmalloc(sizeof(*rscp), GFP_KERNEL); - if (!rscp) - return -ENOMEM; - rdev_p->rscp = rscp; - err = cxio_init_resource_fifo_random(&rscp->tpt_fifo, - &rscp->tpt_fifo_lock, - nr_tpt, 1, 0); - if (err) - goto tpt_err; - err = cxio_init_qpid_fifo(rdev_p); - if (err) - goto qpid_err; - err = cxio_init_resource_fifo(&rscp->cqid_fifo, &rscp->cqid_fifo_lock, - nr_cqid, 1, 0); - if (err) - goto cqid_err; - err = cxio_init_resource_fifo(&rscp->pdid_fifo, &rscp->pdid_fifo_lock, - nr_pdid, 1, 0); - if (err) - goto pdid_err; - return 0; -pdid_err: - kfifo_free(rscp->cqid_fifo); -cqid_err: - kfifo_free(rscp->qpid_fifo); -qpid_err: - kfifo_free(rscp->tpt_fifo); -tpt_err: - return -ENOMEM; -} - -/* - * returns 0 if no resource available - */ -static inline u32 cxio_hal_get_resource(struct kfifo *fifo) -{ - u32 entry; - if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32))) - return entry; - else - return 0; /* fifo emptry */ -} - -static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) -{ - BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0); -} - -u32 cxio_hal_get_rhdl(void) -{ - return cxio_hal_get_resource(rhdl_fifo); -} - -void cxio_hal_put_rhdl(u32 rhdl) -{ - cxio_hal_put_resource(rhdl_fifo, rhdl); -} - -u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp) -{ - return cxio_hal_get_resource(rscp->tpt_fifo); -} - -void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag) -{ - cxio_hal_put_resource(rscp->tpt_fifo, stag); -} - -u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp) -{ - u32 qpid = cxio_hal_get_resource(rscp->qpid_fifo); - PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid); - return qpid; -} - -void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid) -{ - PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid); - cxio_hal_put_resource(rscp->qpid_fifo, qpid); -} - -u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp) -{ - return cxio_hal_get_resource(rscp->cqid_fifo); -} - -void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid) -{ - cxio_hal_put_resource(rscp->cqid_fifo, cqid); -} - -u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp) -{ - return cxio_hal_get_resource(rscp->pdid_fifo); -} - -void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid) -{ - cxio_hal_put_resource(rscp->pdid_fifo, pdid); -} - -void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp) -{ - kfifo_free(rscp->tpt_fifo); - kfifo_free(rscp->cqid_fifo); - kfifo_free(rscp->qpid_fifo); - kfifo_free(rscp->pdid_fifo); - kfree(rscp); -} - -/* - * PBL Memory Manager. Uses Linux generic allocator. - */ - -#define MIN_PBL_SHIFT 8 /* 256B == min PBL size (32 entries) */ -#define PBL_CHUNK 2*1024*1024 - -u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size) -{ - unsigned long addr = gen_pool_alloc(rdev_p->pbl_pool, size); - PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size); - return (u32)addr; -} - -void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size) -{ - PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size); - gen_pool_free(rdev_p->pbl_pool, (unsigned long)addr, size); -} - -int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p) -{ - unsigned long i; - rdev_p->pbl_pool = gen_pool_create(MIN_PBL_SHIFT, -1); - if (rdev_p->pbl_pool) - for (i = rdev_p->rnic_info.pbl_base; - i <= rdev_p->rnic_info.pbl_top - PBL_CHUNK + 1; - i += PBL_CHUNK) - gen_pool_add(rdev_p->pbl_pool, i, PBL_CHUNK, -1); - return rdev_p->pbl_pool ? 0 : -ENOMEM; -} - -void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p) -{ - gen_pool_destroy(rdev_p->pbl_pool); -} - -/* - * RQT Memory Manager. Uses Linux generic allocator. - */ - -#define MIN_RQT_SHIFT 10 /* 1KB == mini RQT size (16 entries) */ -#define RQT_CHUNK 2*1024*1024 - -u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size) -{ - unsigned long addr = gen_pool_alloc(rdev_p->rqt_pool, size << 6); - PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size << 6); - return (u32)addr; -} - -void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size) -{ - PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size << 6); - gen_pool_free(rdev_p->rqt_pool, (unsigned long)addr, size << 6); -} - -int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p) -{ - unsigned long i; - rdev_p->rqt_pool = gen_pool_create(MIN_RQT_SHIFT, -1); - if (rdev_p->rqt_pool) - for (i = rdev_p->rnic_info.rqt_base; - i <= rdev_p->rnic_info.rqt_top - RQT_CHUNK + 1; - i += RQT_CHUNK) - gen_pool_add(rdev_p->rqt_pool, i, RQT_CHUNK, -1); - return rdev_p->rqt_pool ? 0 : -ENOMEM; -} - -void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p) -{ - gen_pool_destroy(rdev_p->rqt_pool); -} diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h deleted file mode 100644 index a6bbe83..0000000 --- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h +++ /dev/null @@ -1,70 +0,0 @@ -/* - * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - */ -#ifndef __CXIO_RESOURCE_H__ -#define __CXIO_RESOURCE_H__ - -#include -#include -#include -#include -#include -#include -#include -#include "cxio_hal.h" - -extern int cxio_hal_init_rhdl_resource(u32 nr_rhdl); -extern void cxio_hal_destroy_rhdl_resource(void); -extern int cxio_hal_init_resource(struct cxio_rdev *rdev_p, - u32 nr_tpt, u32 nr_pbl, - u32 nr_rqt, u32 nr_qpid, u32 nr_cqid, - u32 nr_pdid); -extern u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp); -extern void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag); -extern u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp); -extern void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid); -extern u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp); -extern void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid); -extern void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp); - -#define PBL_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.pbl_base ) -extern int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p); -extern void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p); -extern u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size); -extern void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size); - -#define RQT_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.rqt_base ) -extern int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p); -extern void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p); -extern u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size); -extern void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size); -#endif diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h deleted file mode 100644 index 103fc42..0000000 --- a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h +++ /dev/null @@ -1,685 +0,0 @@ -/* - * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - */ -#ifndef __CXIO_WR_H__ -#define __CXIO_WR_H__ - -#include -#include -#include -#include "firmware_exports.h" - -#define T3_MAX_SGE 4 - -#define Q_EMPTY(rptr,wptr) ((rptr)==(wptr)) -#define Q_FULL(rptr,wptr,size_log2) ( (((wptr)-(rptr))>>(size_log2)) && \ - ((rptr)!=(wptr)) ) -#define Q_GENBIT(ptr,size_log2) (!(((ptr)>>size_log2)&0x1)) -#define Q_FREECNT(rptr,wptr,size_log2) ((1UL<> S_FW_RIWR_OP)) & M_FW_RIWR_OP) - -#define S_FW_RIWR_SOPEOP 22 -#define M_FW_RIWR_SOPEOP 0x3 -#define V_FW_RIWR_SOPEOP(x) ((x) << S_FW_RIWR_SOPEOP) - -#define S_FW_RIWR_FLAGS 8 -#define M_FW_RIWR_FLAGS 0x3fffff -#define V_FW_RIWR_FLAGS(x) ((x) << S_FW_RIWR_FLAGS) -#define G_FW_RIWR_FLAGS(x) ((((x) >> S_FW_RIWR_FLAGS)) & M_FW_RIWR_FLAGS) - -#define S_FW_RIWR_TID 8 -#define V_FW_RIWR_TID(x) ((x) << S_FW_RIWR_TID) - -#define S_FW_RIWR_LEN 0 -#define V_FW_RIWR_LEN(x) ((x) << S_FW_RIWR_LEN) - -#define S_FW_RIWR_GEN 31 -#define V_FW_RIWR_GEN(x) ((x) << S_FW_RIWR_GEN) - -struct t3_sge { - __be32 stag; - __be32 len; - __be64 to; -}; - -/* If num_sgle is zero, flit 5+ contains immediate data.*/ -struct t3_send_wr { - struct fw_riwrh wrh; /* 0 */ - union t3_wrid wrid; /* 1 */ - - u8 rdmaop; /* 2 */ - u8 reserved[3]; - __be32 rem_stag; - __be32 plen; /* 3 */ - __be32 num_sgle; - struct t3_sge sgl[T3_MAX_SGE]; /* 4+ */ -}; - -struct t3_local_inv_wr { - struct fw_riwrh wrh; /* 0 */ - union t3_wrid wrid; /* 1 */ - __be32 stag; /* 2 */ - __be32 reserved3; -}; - -struct t3_rdma_write_wr { - struct fw_riwrh wrh; /* 0 */ - union t3_wrid wrid; /* 1 */ - u8 rdmaop; /* 2 */ - u8 reserved[3]; - __be32 stag_sink; - __be64 to_sink; /* 3 */ - __be32 plen; /* 4 */ - __be32 num_sgle; - struct t3_sge sgl[T3_MAX_SGE]; /* 5+ */ -}; - -struct t3_rdma_read_wr { - struct fw_riwrh wrh; /* 0 */ - union t3_wrid wrid; /* 1 */ - u8 rdmaop; /* 2 */ - u8 reserved[3]; - __be32 rem_stag; - __be64 rem_to; /* 3 */ - __be32 local_stag; /* 4 */ - __be32 local_len; - __be64 local_to; /* 5 */ -}; - -enum t3_addr_type { - T3_VA_BASED_TO = 0x0, - T3_ZERO_BASED_TO = 0x1 -} __attribute__ ((packed)); - -enum t3_mem_perms { - T3_MEM_ACCESS_LOCAL_READ = 0x1, - T3_MEM_ACCESS_LOCAL_WRITE = 0x2, - T3_MEM_ACCESS_REM_READ = 0x4, - T3_MEM_ACCESS_REM_WRITE = 0x8 -} __attribute__ ((packed)); - -struct t3_bind_mw_wr { - struct fw_riwrh wrh; /* 0 */ - union t3_wrid wrid; /* 1 */ - u16 reserved; /* 2 */ - u8 type; - u8 perms; - __be32 mr_stag; - __be32 mw_stag; /* 3 */ - __be32 mw_len; - __be64 mw_va; /* 4 */ - __be32 mr_pbl_addr; /* 5 */ - u8 reserved2[3]; - u8 mr_pagesz; -}; - -struct t3_receive_wr { - struct fw_riwrh wrh; /* 0 */ - union t3_wrid wrid; /* 1 */ - u8 pagesz[T3_MAX_SGE]; - __be32 num_sgle; /* 2 */ - struct t3_sge sgl[T3_MAX_SGE]; /* 3+ */ - __be32 pbl_addr[T3_MAX_SGE]; -}; - -struct t3_bypass_wr { - struct fw_riwrh wrh; - union t3_wrid wrid; /* 1 */ -}; - -struct t3_modify_qp_wr { - struct fw_riwrh wrh; /* 0 */ - union t3_wrid wrid; /* 1 */ - __be32 flags; /* 2 */ - __be32 quiesce; /* 2 */ - __be32 max_ird; /* 3 */ - __be32 max_ord; /* 3 */ - __be64 sge_cmd; /* 4 */ - __be64 ctx1; /* 5 */ - __be64 ctx0; /* 6 */ -}; - -enum t3_modify_qp_flags { - MODQP_QUIESCE = 0x01, - MODQP_MAX_IRD = 0x02, - MODQP_MAX_ORD = 0x04, - MODQP_WRITE_EC = 0x08, - MODQP_READ_EC = 0x10, -}; - - -enum t3_mpa_attrs { - uP_RI_MPA_RX_MARKER_ENABLE = 0x1, - uP_RI_MPA_TX_MARKER_ENABLE = 0x2, - uP_RI_MPA_CRC_ENABLE = 0x4, - uP_RI_MPA_IETF_ENABLE = 0x8 -} __attribute__ ((packed)); - -enum t3_qp_caps { - uP_RI_QP_RDMA_READ_ENABLE = 0x01, - uP_RI_QP_RDMA_WRITE_ENABLE = 0x02, - uP_RI_QP_BIND_ENABLE = 0x04, - uP_RI_QP_FAST_REGISTER_ENABLE = 0x08, - uP_RI_QP_STAG0_ENABLE = 0x10 -} __attribute__ ((packed)); - -struct t3_rdma_init_attr { - u32 tid; - u32 qpid; - u32 pdid; - u32 scqid; - u32 rcqid; - u32 rq_addr; - u32 rq_size; - enum t3_mpa_attrs mpaattrs; - enum t3_qp_caps qpcaps; - u16 tcp_emss; - u32 ord; - u32 ird; - u64 qp_dma_addr; - u32 qp_dma_size; - u32 flags; -}; - -struct t3_rdma_init_wr { - struct fw_riwrh wrh; /* 0 */ - union t3_wrid wrid; /* 1 */ - __be32 qpid; /* 2 */ - __be32 pdid; - __be32 scqid; /* 3 */ - __be32 rcqid; - __be32 rq_addr; /* 4 */ - __be32 rq_size; - u8 mpaattrs; /* 5 */ - u8 qpcaps; - __be16 ulpdu_size; - __be32 flags; /* bits 31-1 - reservered */ - /* bit 0 - set if RECV posted */ - __be32 ord; /* 6 */ - __be32 ird; - __be64 qp_dma_addr; /* 7 */ - __be32 qp_dma_size; /* 8 */ - u32 rsvd; -}; - -struct t3_genbit { - u64 flit[15]; - __be64 genbit; -}; - -enum rdma_init_wr_flags { - RECVS_POSTED = 1, -}; - -union t3_wr { - struct t3_send_wr send; - struct t3_rdma_write_wr write; - struct t3_rdma_read_wr read; - struct t3_receive_wr recv; - struct t3_local_inv_wr local_inv; - struct t3_bind_mw_wr bind; - struct t3_bypass_wr bypass; - struct t3_rdma_init_wr init; - struct t3_modify_qp_wr qp_mod; - struct t3_genbit genbit; - u64 flit[16]; -}; - -#define T3_SQ_CQE_FLIT 13 -#define T3_SQ_COOKIE_FLIT 14 - -#define T3_RQ_COOKIE_FLIT 13 -#define T3_RQ_CQE_FLIT 14 - -static inline enum t3_wr_opcode fw_riwrh_opcode(struct fw_riwrh *wqe) -{ - return G_FW_RIWR_OP(be32_to_cpu(wqe->op_seop_flags)); -} - -static inline void build_fw_riwrh(struct fw_riwrh *wqe, enum t3_wr_opcode op, - enum t3_wr_flags flags, u8 genbit, u32 tid, - u8 len) -{ - wqe->op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(op) | - V_FW_RIWR_SOPEOP(M_FW_RIWR_SOPEOP) | - V_FW_RIWR_FLAGS(flags)); - wmb(); - wqe->gen_tid_len = cpu_to_be32(V_FW_RIWR_GEN(genbit) | - V_FW_RIWR_TID(tid) | - V_FW_RIWR_LEN(len)); - /* 2nd gen bit... */ - ((union t3_wr *)wqe)->genbit.genbit = cpu_to_be64(genbit); -} - -/* - * T3 ULP2_TX commands - */ -enum t3_utx_mem_op { - T3_UTX_MEM_READ = 2, - T3_UTX_MEM_WRITE = 3 -}; - -/* T3 MC7 RDMA TPT entry format */ - -enum tpt_mem_type { - TPT_NON_SHARED_MR = 0x0, - TPT_SHARED_MR = 0x1, - TPT_MW = 0x2, - TPT_MW_RELAXED_PROTECTION = 0x3 -}; - -enum tpt_addr_type { - TPT_ZBTO = 0, - TPT_VATO = 1 -}; - -enum tpt_mem_perm { - TPT_LOCAL_READ = 0x8, - TPT_LOCAL_WRITE = 0x4, - TPT_REMOTE_READ = 0x2, - TPT_REMOTE_WRITE = 0x1 -}; - -struct tpt_entry { - __be32 valid_stag_pdid; - __be32 flags_pagesize_qpid; - - __be32 rsvd_pbl_addr; - __be32 len; - __be32 va_hi; - __be32 va_low_or_fbo; - - __be32 rsvd_bind_cnt_or_pstag; - __be32 rsvd_pbl_size; -}; - -#define S_TPT_VALID 31 -#define V_TPT_VALID(x) ((x) << S_TPT_VALID) -#define F_TPT_VALID V_TPT_VALID(1U) - -#define S_TPT_STAG_KEY 23 -#define M_TPT_STAG_KEY 0xFF -#define V_TPT_STAG_KEY(x) ((x) << S_TPT_STAG_KEY) -#define G_TPT_STAG_KEY(x) (((x) >> S_TPT_STAG_KEY) & M_TPT_STAG_KEY) - -#define S_TPT_STAG_STATE 22 -#define V_TPT_STAG_STATE(x) ((x) << S_TPT_STAG_STATE) -#define F_TPT_STAG_STATE V_TPT_STAG_STATE(1U) - -#define S_TPT_STAG_TYPE 20 -#define M_TPT_STAG_TYPE 0x3 -#define V_TPT_STAG_TYPE(x) ((x) << S_TPT_STAG_TYPE) -#define G_TPT_STAG_TYPE(x) (((x) >> S_TPT_STAG_TYPE) & M_TPT_STAG_TYPE) - -#define S_TPT_PDID 0 -#define M_TPT_PDID 0xFFFFF -#define V_TPT_PDID(x) ((x) << S_TPT_PDID) -#define G_TPT_PDID(x) (((x) >> S_TPT_PDID) & M_TPT_PDID) - -#define S_TPT_PERM 28 -#define M_TPT_PERM 0xF -#define V_TPT_PERM(x) ((x) << S_TPT_PERM) -#define G_TPT_PERM(x) (((x) >> S_TPT_PERM) & M_TPT_PERM) - -#define S_TPT_REM_INV_DIS 27 -#define V_TPT_REM_INV_DIS(x) ((x) << S_TPT_REM_INV_DIS) -#define F_TPT_REM_INV_DIS V_TPT_REM_INV_DIS(1U) - -#define S_TPT_ADDR_TYPE 26 -#define V_TPT_ADDR_TYPE(x) ((x) << S_TPT_ADDR_TYPE) -#define F_TPT_ADDR_TYPE V_TPT_ADDR_TYPE(1U) - -#define S_TPT_MW_BIND_ENABLE 25 -#define V_TPT_MW_BIND_ENABLE(x) ((x) << S_TPT_MW_BIND_ENABLE) -#define F_TPT_MW_BIND_ENABLE V_TPT_MW_BIND_ENABLE(1U) - -#define S_TPT_PAGE_SIZE 20 -#define M_TPT_PAGE_SIZE 0x1F -#define V_TPT_PAGE_SIZE(x) ((x) << S_TPT_PAGE_SIZE) -#define G_TPT_PAGE_SIZE(x) (((x) >> S_TPT_PAGE_SIZE) & M_TPT_PAGE_SIZE) - -#define S_TPT_PBL_ADDR 0 -#define M_TPT_PBL_ADDR 0x1FFFFFFF -#define V_TPT_PBL_ADDR(x) ((x) << S_TPT_PBL_ADDR) -#define G_TPT_PBL_ADDR(x) (((x) >> S_TPT_PBL_ADDR) & M_TPT_PBL_ADDR) - -#define S_TPT_QPID 0 -#define M_TPT_QPID 0xFFFFF -#define V_TPT_QPID(x) ((x) << S_TPT_QPID) -#define G_TPT_QPID(x) (((x) >> S_TPT_QPID) & M_TPT_QPID) - -#define S_TPT_PSTAG 0 -#define M_TPT_PSTAG 0xFFFFFF -#define V_TPT_PSTAG(x) ((x) << S_TPT_PSTAG) -#define G_TPT_PSTAG(x) (((x) >> S_TPT_PSTAG) & M_TPT_PSTAG) - -#define S_TPT_PBL_SIZE 0 -#define M_TPT_PBL_SIZE 0xFFFFF -#define V_TPT_PBL_SIZE(x) ((x) << S_TPT_PBL_SIZE) -#define G_TPT_PBL_SIZE(x) (((x) >> S_TPT_PBL_SIZE) & M_TPT_PBL_SIZE) - -/* - * CQE defs - */ -struct t3_cqe { - __be32 header; - __be32 len; - union { - struct { - __be32 stag; - __be32 msn; - } rcqe; - struct { - u32 wrid_hi; - u32 wrid_low; - } scqe; - } u; -}; - -#define S_CQE_OOO 31 -#define M_CQE_OOO 0x1 -#define G_CQE_OOO(x) ((((x) >> S_CQE_OOO)) & M_CQE_OOO) -#define V_CEQ_OOO(x) ((x)<> S_CQE_QPID)) & M_CQE_QPID) -#define V_CQE_QPID(x) ((x)<> S_CQE_SWCQE)) & M_CQE_SWCQE) -#define V_CQE_SWCQE(x) ((x)<> S_CQE_GENBIT) & M_CQE_GENBIT) -#define V_CQE_GENBIT(x) ((x)<> S_CQE_STATUS)) & M_CQE_STATUS) -#define V_CQE_STATUS(x) ((x)<> S_CQE_TYPE)) & M_CQE_TYPE) -#define V_CQE_TYPE(x) ((x)<> S_CQE_OPCODE)) & M_CQE_OPCODE) -#define V_CQE_OPCODE(x) ((x)<queue->flit[13] = 1; -} - -static inline struct t3_cqe *cxio_next_hw_cqe(struct t3_cq *cq) -{ - struct t3_cqe *cqe; - - cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2)); - if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe)) - return cqe; - return NULL; -} - -static inline struct t3_cqe *cxio_next_sw_cqe(struct t3_cq *cq) -{ - struct t3_cqe *cqe; - - if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) { - cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2)); - return cqe; - } - return NULL; -} - -static inline struct t3_cqe *cxio_next_cqe(struct t3_cq *cq) -{ - struct t3_cqe *cqe; - - if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) { - cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2)); - return cqe; - } - cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2)); - if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe)) - return cqe; - return NULL; -} - -#endif diff --git a/drivers/infiniband/hw/cxgb3/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/cxio_dbg.c new file mode 100644 index 0000000..dfaa704 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/cxio_dbg.c @@ -0,0 +1,205 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifdef DEBUG +#include +#include "common.h" +#include "cxgb3_ioctl.h" +#include "cxio_hal.h" +#include "cxio_wr.h" + +void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag) +{ + struct ch_mem_range *m; + u64 *data; + int rc; + int size = 32; + + m = kmalloc(sizeof(*m) + size, GFP_ATOMIC); + if (!m) { + PDBG("%s couldn't allocate memory.\n", __FUNCTION__); + return; + } + m->mem_id = MEM_PMRX; + m->addr = (stag>>8) * 32 + rdev->rnic_info.tpt_base; + m->len = size; + PDBG("%s TPT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len); + rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); + if (rc) { + PDBG("%s toectl returned error %d\n", __FUNCTION__, rc); + kfree(m); + return; + } + + data = (u64 *)m->buf; + while (size > 0) { + PDBG("TPT %08x: %016llx\n", m->addr, (u64)*data); + size -= 8; + data++; + m->addr += 8; + } + kfree(m); +} + +void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, uint len, u8 shift) +{ + struct ch_mem_range *m; + u64 *data; + int rc; + int size, npages; + + shift += 12; + npages = (len + (1ULL << shift) - 1) >> shift; + size = npages * sizeof(u64); + + m = kmalloc(sizeof(*m) + size, GFP_ATOMIC); + if (!m) { + PDBG("%s couldn't allocate memory.\n", __FUNCTION__); + return; + } + m->mem_id = MEM_PMRX; + m->addr = pbl_addr; + m->len = size; + PDBG("%s PBL addr 0x%x len %d depth %d\n", + __FUNCTION__, m->addr, m->len, npages); + rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); + if (rc) { + PDBG("%s toectl returned error %d\n", __FUNCTION__, rc); + kfree(m); + return; + } + + data = (u64 *)m->buf; + while (size > 0) { + PDBG("PBL %08x: %016llx\n", m->addr, (u64)*data); + size -= 8; + data++; + m->addr += 8; + } + kfree(m); +} + +void cxio_dump_wqe(union t3_wr *wqe) +{ + __be64 *data = (__be64 *)wqe; + uint size = (uint)(be64_to_cpu(*data) & 0xff); + + if (size == 0) + size = 8; + while (size > 0) { + PDBG("WQE %p: %016llx\n", data, be64_to_cpu(*data)); + size--; + data++; + } +} + +void cxio_dump_wce(struct t3_cqe *wce) +{ + __be64 *data = (__be64 *)wce; + int size = sizeof(*wce); + + while (size > 0) { + PDBG("WCE %p: %016llx\n", data, be64_to_cpu(*data)); + size -= 8; + data++; + } +} + +void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents) +{ + struct ch_mem_range *m; + int size = nents * 64; + u64 *data; + int rc; + + m = kmalloc(sizeof(*m) + size, GFP_ATOMIC); + if (!m) { + PDBG("%s couldn't allocate memory.\n", __FUNCTION__); + return; + } + m->mem_id = MEM_PMRX; + m->addr = ((hwtid)<<10) + rdev->rnic_info.rqt_base; + m->len = size; + PDBG("%s RQT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len); + rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); + if (rc) { + PDBG("%s toectl returned error %d\n", __FUNCTION__, rc); + kfree(m); + return; + } + + data = (u64 *)m->buf; + while (size > 0) { + PDBG("RQT %08x: %016llx\n", m->addr, (u64)*data); + size -= 8; + data++; + m->addr += 8; + } + kfree(m); +} + +void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid) +{ + struct ch_mem_range *m; + int size = TCB_SIZE; + u32 *data; + int rc; + + m = kmalloc(sizeof(*m) + size, GFP_ATOMIC); + if (!m) { + PDBG("%s couldn't allocate memory.\n", __FUNCTION__); + return; + } + m->mem_id = MEM_CM; + m->addr = hwtid * size; + m->len = size; + PDBG("%s TCB %d len %d\n", __FUNCTION__, m->addr, m->len); + rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); + if (rc) { + PDBG("%s toectl returned error %d\n", __FUNCTION__, rc); + kfree(m); + return; + } + + data = (u32 *)m->buf; + while (size > 0) { + printk("%2u: %08x %08x %08x %08x %08x %08x %08x %08x\n", + m->addr, + *(data+2), *(data+3), *(data),*(data+1), + *(data+6), *(data+7), *(data+4), *(data+5)); + size -= 32; + data += 8; + m->addr += 32; + } + kfree(m); +} +#endif diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c new file mode 100644 index 0000000..19553b3 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -0,0 +1,1280 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include +#include + +#include +#include +#include +#include + +#include "cxio_resource.h" +#include "cxio_hal.h" +#include "cxgb3_offload.h" +#include "sge_defs.h" + +static LIST_HEAD(rdev_list); +static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; + +static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) +{ + struct cxio_rdev *rdev; + + list_for_each_entry(rdev, &rdev_list, entry) + if (!strcmp(rdev->dev_name, dev_name)) + return rdev; + return NULL; +} + +static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev + *tdev) +{ + struct cxio_rdev *rdev; + + list_for_each_entry(rdev, &rdev_list, entry) + if (rdev->t3cdev_p == tdev) + return rdev; + return NULL; +} + +int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq, + enum t3_cq_opcode op, u32 credit) +{ + int ret; + struct t3_cqe *cqe; + u32 rptr; + + struct rdma_cq_op setup; + setup.id = cq->cqid; + setup.credits = (op == CQ_CREDIT_UPDATE) ? credit : 0; + setup.op = op; + ret = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_OP, &setup); + + if ((ret < 0) || (op == CQ_CREDIT_UPDATE)) + return ret; + + /* + * If the rearm returned an index other than our current index, + * then there might be CQE's in flight (being DMA'd). We must wait + * here for them to complete or the consumer can miss a notification. + */ + if (Q_PTR2IDX((cq->rptr), cq->size_log2) != ret) { + int i=0; + + rptr = cq->rptr; + + /* + * Keep the generation correct by bumping rptr until it + * matches the index returned by the rearm - 1. + */ + while (Q_PTR2IDX((rptr+1), cq->size_log2) != ret) + rptr++; + + /* + * Now rptr is the index for the (last) cqe that was + * in-flight at the time the HW rearmed the CQ. We + * spin until that CQE is valid. + */ + cqe = cq->queue + Q_PTR2IDX(rptr, cq->size_log2); + while (!CQ_VLD_ENTRY(rptr, cq->size_log2, cqe)) { + udelay(1); + if (i++ > 1000000) { + BUG_ON(1); + printk(KERN_ERR "%s: stalled rnic\n", + rdev_p->dev_name); + return -EIO; + } + } + } + return 0; +} + +static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) +{ + struct rdma_cq_setup setup; + setup.id = cqid; + setup.base_addr = 0; /* NULL address */ + setup.size = 0; /* disaable the CQ */ + setup.credits = 0; + setup.credit_thres = 0; + setup.ovfl_mode = 0; + return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); +} + +int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) +{ + u64 sge_cmd; + struct t3_modify_qp_wr *wqe; + struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); + if (!skb) { + PDBG("%s alloc_skb failed\n", __FUNCTION__); + return -ENOMEM; + } + wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); + memset(wqe, 0, sizeof(*wqe)); + build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7); + wqe->flags = cpu_to_be32(MODQP_WRITE_EC); + sge_cmd = qpid << 8 | 3; + wqe->sge_cmd = cpu_to_be64(sge_cmd); + skb->priority = CPL_PRIORITY_CONTROL; + return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); +} + +int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) +{ + struct rdma_cq_setup setup; + int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe); + + cq->cqid = cxio_hal_get_cqid(rdev_p->rscp); + if (!cq->cqid) + return -ENOMEM; + cq->sw_queue = kzalloc(size, GFP_KERNEL); + if (!cq->sw_queue) + return -ENOMEM; + cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), + (1UL << (cq->size_log2)) * + sizeof(struct t3_cqe), + &(cq->dma_addr), GFP_KERNEL); + if (!cq->queue) { + kfree(cq->sw_queue); + return -ENOMEM; + } + pci_unmap_addr_set(cq, mapping, cq->dma_addr); + memset(cq->queue, 0, size); + setup.id = cq->cqid; + setup.base_addr = (u64) (cq->dma_addr); + setup.size = 1UL << cq->size_log2; + setup.credits = 65535; + setup.credit_thres = 1; + if (rdev_p->t3cdev_p->type == T3B) + setup.ovfl_mode = 0; + else + setup.ovfl_mode = 1; + return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); +} + +int cxio_resize_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) +{ + struct rdma_cq_setup setup; + setup.id = cq->cqid; + setup.base_addr = (u64) (cq->dma_addr); + setup.size = 1UL << cq->size_log2; + setup.credits = setup.size; + setup.credit_thres = setup.size; /* TBD: overflow recovery */ + setup.ovfl_mode = 1; + return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); +} + +static u32 get_qpid(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx) +{ + struct cxio_qpid_list *entry; + u32 qpid; + int i; + + mutex_lock(&uctx->lock); + if (!list_empty(&uctx->qpids)) { + entry = list_entry(uctx->qpids.next, struct cxio_qpid_list, + entry); + list_del(&entry->entry); + qpid = entry->qpid; + kfree(entry); + } else { + qpid = cxio_hal_get_qpid(rdev_p->rscp); + if (!qpid) + goto out; + for (i = qpid+1; i & rdev_p->qpmask; i++) { + entry = kmalloc(sizeof *entry, GFP_KERNEL); + if (!entry) + break; + entry->qpid = i; + list_add_tail(&entry->entry, &uctx->qpids); + } + } +out: + mutex_unlock(&uctx->lock); + PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid); + return qpid; +} + +static void put_qpid(struct cxio_rdev *rdev_p, u32 qpid, + struct cxio_ucontext *uctx) +{ + struct cxio_qpid_list *entry; + + entry = kmalloc(sizeof *entry, GFP_KERNEL); + if (!entry) + return; + PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid); + entry->qpid = qpid; + mutex_lock(&uctx->lock); + list_add_tail(&entry->entry, &uctx->qpids); + mutex_unlock(&uctx->lock); +} + +void cxio_release_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx) +{ + struct list_head *pos, *nxt; + struct cxio_qpid_list *entry; + + mutex_lock(&uctx->lock); + list_for_each_safe(pos, nxt, &uctx->qpids) { + entry = list_entry(pos, struct cxio_qpid_list, entry); + list_del_init(&entry->entry); + if (!(entry->qpid & rdev_p->qpmask)) + cxio_hal_put_qpid(rdev_p->rscp, entry->qpid); + kfree(entry); + } + mutex_unlock(&uctx->lock); +} + +void cxio_init_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx) +{ + INIT_LIST_HEAD(&uctx->qpids); + mutex_init(&uctx->lock); +} + +int cxio_create_qp(struct cxio_rdev *rdev_p, u32 kernel_domain, + struct t3_wq *wq, struct cxio_ucontext *uctx) +{ + int depth = 1UL << wq->size_log2; + int rqsize = 1UL << wq->rq_size_log2; + + wq->qpid = get_qpid(rdev_p, uctx); + if (!wq->qpid) + return -ENOMEM; + + wq->rq = kzalloc(depth * sizeof(u64), GFP_KERNEL); + if (!wq->rq) + goto err1; + + wq->rq_addr = cxio_hal_rqtpool_alloc(rdev_p, rqsize); + if (!wq->rq_addr) + goto err2; + + wq->sq = kzalloc(depth * sizeof(struct t3_swsq), GFP_KERNEL); + if (!wq->sq) + goto err3; + + wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), + depth * sizeof(union t3_wr), + &(wq->dma_addr), GFP_KERNEL); + if (!wq->queue) + goto err4; + + memset(wq->queue, 0, depth * sizeof(union t3_wr)); + pci_unmap_addr_set(wq, mapping, wq->dma_addr); + wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr; + if (!kernel_domain) + wq->udb = (u64)rdev_p->rnic_info.udbell_physbase + + (wq->qpid << rdev_p->qpshift); + PDBG("%s qpid 0x%x doorbell 0x%p udb 0x%llx\n", __FUNCTION__, + wq->qpid, wq->doorbell, wq->udb); + return 0; +err4: + kfree(wq->sq); +err3: + cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, rqsize); +err2: + kfree(wq->rq); +err1: + put_qpid(rdev_p, wq->qpid, uctx); + return -ENOMEM; +} + +int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) +{ + int err; + err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid); + kfree(cq->sw_queue); + dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), + (1UL << (cq->size_log2)) + * sizeof(struct t3_cqe), cq->queue, + pci_unmap_addr(cq, mapping)); + cxio_hal_put_cqid(rdev_p->rscp, cq->cqid); + return err; +} + +int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq, + struct cxio_ucontext *uctx) +{ + dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), + (1UL << (wq->size_log2)) + * sizeof(union t3_wr), wq->queue, + pci_unmap_addr(wq, mapping)); + kfree(wq->sq); + cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2)); + kfree(wq->rq); + put_qpid(rdev_p, wq->qpid, uctx); + return 0; +} + +static void insert_recv_cqe(struct t3_wq *wq, struct t3_cq *cq) +{ + struct t3_cqe cqe; + + PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__, + wq, cq, cq->sw_rptr, cq->sw_wptr); + memset(&cqe, 0, sizeof(cqe)); + cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) | + V_CQE_OPCODE(T3_SEND) | + V_CQE_TYPE(0) | + V_CQE_SWCQE(1) | + V_CQE_QPID(wq->qpid) | + V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr, + cq->size_log2))); + *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe; + cq->sw_wptr++; +} + +void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count) +{ + u32 ptr; + + PDBG("%s wq %p cq %p\n", __FUNCTION__, wq, cq); + + /* flush RQ */ + PDBG("%s rq_rptr %u rq_wptr %u skip count %u\n", __FUNCTION__, + wq->rq_rptr, wq->rq_wptr, count); + ptr = wq->rq_rptr + count; + while (ptr++ != wq->rq_wptr) + insert_recv_cqe(wq, cq); +} + +static void insert_sq_cqe(struct t3_wq *wq, struct t3_cq *cq, + struct t3_swsq *sqp) +{ + struct t3_cqe cqe; + + PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__, + wq, cq, cq->sw_rptr, cq->sw_wptr); + memset(&cqe, 0, sizeof(cqe)); + cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) | + V_CQE_OPCODE(sqp->opcode) | + V_CQE_TYPE(1) | + V_CQE_SWCQE(1) | + V_CQE_QPID(wq->qpid) | + V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr, + cq->size_log2))); + cqe.u.scqe.wrid_hi = sqp->sq_wptr; + + *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe; + cq->sw_wptr++; +} + +void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count) +{ + __u32 ptr; + struct t3_swsq *sqp = wq->sq + Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2); + + ptr = wq->sq_rptr + count; + sqp += count; + while (ptr != wq->sq_wptr) { + insert_sq_cqe(wq, cq, sqp); + sqp++; + ptr++; + } +} + +/* + * Move all CQEs from the HWCQ into the SWCQ. + */ +void cxio_flush_hw_cq(struct t3_cq *cq) +{ + struct t3_cqe *cqe, *swcqe; + + PDBG("%s cq %p cqid 0x%x\n", __FUNCTION__, cq, cq->cqid); + cqe = cxio_next_hw_cqe(cq); + while (cqe) { + PDBG("%s flushing hwcq rptr 0x%x to swcq wptr 0x%x\n", + __FUNCTION__, cq->rptr, cq->sw_wptr); + swcqe = cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2); + *swcqe = *cqe; + swcqe->header |= cpu_to_be32(V_CQE_SWCQE(1)); + cq->sw_wptr++; + cq->rptr++; + cqe = cxio_next_hw_cqe(cq); + } +} + +static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) +{ + if (CQE_OPCODE(*cqe) == T3_TERMINATE) + return 0; + + if ((CQE_OPCODE(*cqe) == T3_RDMA_WRITE) && RQ_TYPE(*cqe)) + return 0; + + if ((CQE_OPCODE(*cqe) == T3_READ_RESP) && SQ_TYPE(*cqe)) + return 0; + + if ((CQE_OPCODE(*cqe) == T3_SEND) && RQ_TYPE(*cqe) && + Q_EMPTY(wq->rq_rptr, wq->rq_wptr)) + return 0; + + return 1; +} + +void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count) +{ + struct t3_cqe *cqe; + u32 ptr; + + *count = 0; + ptr = cq->sw_rptr; + while (!Q_EMPTY(ptr, cq->sw_wptr)) { + cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2)); + if ((SQ_TYPE(*cqe) || (CQE_OPCODE(*cqe) == T3_READ_RESP)) && + (CQE_QPID(*cqe) == wq->qpid)) + (*count)++; + ptr++; + } + PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count); +} + +void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count) +{ + struct t3_cqe *cqe; + u32 ptr; + + *count = 0; + PDBG("%s count zero %d\n", __FUNCTION__, *count); + ptr = cq->sw_rptr; + while (!Q_EMPTY(ptr, cq->sw_wptr)) { + cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2)); + if (RQ_TYPE(*cqe) && (CQE_OPCODE(*cqe) != T3_READ_RESP) && + (CQE_QPID(*cqe) == wq->qpid) && cqe_completes_wr(cqe, wq)) + (*count)++; + ptr++; + } + PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count); +} + +static int cxio_hal_init_ctrl_cq(struct cxio_rdev *rdev_p) +{ + struct rdma_cq_setup setup; + setup.id = 0; + setup.base_addr = 0; /* NULL address */ + setup.size = 1; /* enable the CQ */ + setup.credits = 0; + + /* force SGE to redirect to RspQ and interrupt */ + setup.credit_thres = 0; + setup.ovfl_mode = 1; + return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); +} + +static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) +{ + int err; + u64 sge_cmd, ctx0, ctx1; + u64 base_addr; + struct t3_modify_qp_wr *wqe; + struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); + + + if (!skb) { + PDBG("%s alloc_skb failed\n", __FUNCTION__); + return -ENOMEM; + } + err = cxio_hal_init_ctrl_cq(rdev_p); + if (err) { + PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err); + return err; + } + rdev_p->ctrl_qp.workq = dma_alloc_coherent( + &(rdev_p->rnic_info.pdev->dev), + (1 << T3_CTRL_QP_SIZE_LOG2) * + sizeof(union t3_wr), + &(rdev_p->ctrl_qp.dma_addr), + GFP_KERNEL); + if (!rdev_p->ctrl_qp.workq) { + PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__); + return -ENOMEM; + } + pci_unmap_addr_set(&rdev_p->ctrl_qp, mapping, + rdev_p->ctrl_qp.dma_addr); + rdev_p->ctrl_qp.doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr; + memset(rdev_p->ctrl_qp.workq, 0, + (1 << T3_CTRL_QP_SIZE_LOG2) * sizeof(union t3_wr)); + + init_MUTEX(&rdev_p->ctrl_qp.sem); + init_waitqueue_head(&rdev_p->ctrl_qp.waitq); + + /* update HW Ctrl QP context */ + base_addr = rdev_p->ctrl_qp.dma_addr; + base_addr >>= 12; + ctx0 = (V_EC_SIZE((1 << T3_CTRL_QP_SIZE_LOG2)) | + V_EC_BASE_LO((u32) base_addr & 0xffff)); + ctx0 <<= 32; + ctx0 |= V_EC_CREDITS(FW_WR_NUM); + base_addr >>= 16; + ctx1 = (u32) base_addr; + base_addr >>= 32; + ctx1 |= ((u64) (V_EC_BASE_HI((u32) base_addr & 0xf) | V_EC_RESPQ(0) | + V_EC_TYPE(0) | V_EC_GEN(1) | + V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32; + wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); + memset(wqe, 0, sizeof(*wqe)); + build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1, + T3_CTL_QP_TID, 7); + wqe->flags = cpu_to_be32(MODQP_WRITE_EC); + sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3; + wqe->sge_cmd = cpu_to_be64(sge_cmd); + wqe->ctx1 = cpu_to_be64(ctx1); + wqe->ctx0 = cpu_to_be64(ctx0); + PDBG("CtrlQP dma_addr 0x%llx workq %p size %d\n", + (u64) rdev_p->ctrl_qp.dma_addr, rdev_p->ctrl_qp.workq, + 1 << T3_CTRL_QP_SIZE_LOG2); + skb->priority = CPL_PRIORITY_CONTROL; + return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); +} + +static int cxio_hal_destroy_ctrl_qp(struct cxio_rdev *rdev_p) +{ + dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), + (1UL << T3_CTRL_QP_SIZE_LOG2) + * sizeof(union t3_wr), rdev_p->ctrl_qp.workq, + pci_unmap_addr(&rdev_p->ctrl_qp, mapping)); + return cxio_hal_clear_qp_ctx(rdev_p, T3_CTRL_QP_ID); +} + +/* write len bytes of data into addr (32B aligned address) + * If data is NULL, clear len byte of memory to zero. + * caller aquires the sem before the call + */ +static int cxio_hal_ctrl_qp_write_mem(struct cxio_rdev *rdev_p, u32 addr, + u32 len, void *data, int completion) +{ + u32 i, nr_wqe, copy_len; + u8 *copy_data; + u8 wr_len, utx_len; /* lenght in 8 byte flit */ + enum t3_wr_flags flag; + __be64 *wqe; + u64 utx_cmd; + addr &= 0x7FFFFFF; + nr_wqe = len % 96 ? len / 96 + 1 : len / 96; /* 96B max per WQE */ + PDBG("%s wptr 0x%x rptr 0x%x len %d, nr_wqe %d data %p addr 0x%0x\n", + __FUNCTION__, rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, len, + nr_wqe, data, addr); + utx_len = 3; /* in 32B unit */ + for (i = 0; i < nr_wqe; i++) { + if (Q_FULL(rdev_p->ctrl_qp.rptr, rdev_p->ctrl_qp.wptr, + T3_CTRL_QP_SIZE_LOG2)) { + PDBG("%s ctrl_qp full wtpr 0x%0x rptr 0x%0x, " + "wait for more space i %d\n", __FUNCTION__, + rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, i); + if (wait_event_interruptible(rdev_p->ctrl_qp.waitq, + !Q_FULL(rdev_p->ctrl_qp.rptr, + rdev_p->ctrl_qp.wptr, + T3_CTRL_QP_SIZE_LOG2))) { + PDBG("%s ctrl_qp workq interrupted\n", + __FUNCTION__); + return -ERESTARTSYS; + } + PDBG("%s ctrl_qp wakeup, continue posting work request " + "i %d\n", __FUNCTION__, i); + } + wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr % + (1 << T3_CTRL_QP_SIZE_LOG2))); + flag = 0; + if (i == (nr_wqe - 1)) { + /* last WQE */ + flag = completion ? T3_COMPLETION_FLAG : 0; + if (len % 32) + utx_len = len / 32 + 1; + else + utx_len = len / 32; + } + + /* + * Force a CQE to return the credit to the workq in case + * we posted more than half the max QP size of WRs + */ + if ((i != 0) && + (i % (((1 << T3_CTRL_QP_SIZE_LOG2)) >> 1) == 0)) { + flag = T3_COMPLETION_FLAG; + PDBG("%s force completion at i %d\n", __FUNCTION__, i); + } + + /* build the utx mem command */ + wqe += (sizeof(struct t3_bypass_wr) >> 3); + utx_cmd = (T3_UTX_MEM_WRITE << 28) | (addr + i * 3); + utx_cmd <<= 32; + utx_cmd |= (utx_len << 28) | ((utx_len << 2) + 1); + *wqe = cpu_to_be64(utx_cmd); + wqe++; + copy_data = (u8 *) data + i * 96; + copy_len = len > 96 ? 96 : len; + + /* clear memory content if data is NULL */ + if (data) + memcpy(wqe, copy_data, copy_len); + else + memset(wqe, 0, copy_len); + if (copy_len % 32) + memset(((u8 *) wqe) + copy_len, 0, + 32 - (copy_len % 32)); + wr_len = ((sizeof(struct t3_bypass_wr)) >> 3) + 1 + + (utx_len << 2); + wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr % + (1 << T3_CTRL_QP_SIZE_LOG2))); + + /* wptr in the WRID[31:0] */ + ((union t3_wrid *)(wqe+1))->id0.low = rdev_p->ctrl_qp.wptr; + + /* + * This must be the last write with a memory barrier + * for the genbit + */ + build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_BP, flag, + Q_GENBIT(rdev_p->ctrl_qp.wptr, + T3_CTRL_QP_SIZE_LOG2), T3_CTRL_QP_ID, + wr_len); + if (flag == T3_COMPLETION_FLAG) + ring_doorbell(rdev_p->ctrl_qp.doorbell, T3_CTRL_QP_ID); + len -= 96; + rdev_p->ctrl_qp.wptr++; + } + return 0; +} + +/* IN: stag key, pdid, perm, zbva, to, len, page_size, pbl, and pbl_size + * OUT: stag index, actual pbl_size, pbl_addr allocated. + * TBD: shared memory region support + */ +static int __cxio_tpt_op(struct cxio_rdev *rdev_p, u32 reset_tpt_entry, + u32 *stag, u8 stag_state, u32 pdid, + enum tpt_mem_type type, enum tpt_mem_perm perm, + u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl, + u32 *pbl_size, u32 *pbl_addr) +{ + int err; + struct tpt_entry tpt; + u32 stag_idx; + u32 wptr; + int rereg = (*stag != T3_STAG_UNSET); + + stag_state = stag_state > 0; + stag_idx = (*stag) >> 8; + + if ((!reset_tpt_entry) && !(*stag != T3_STAG_UNSET)) { + stag_idx = cxio_hal_get_stag(rdev_p->rscp); + if (!stag_idx) + return -ENOMEM; + *stag = (stag_idx << 8) | ((*stag) & 0xFF); + } + PDBG("%s stag_state 0x%0x type 0x%0x pdid 0x%0x, stag_idx 0x%x\n", + __FUNCTION__, stag_state, type, pdid, stag_idx); + + if (reset_tpt_entry) + cxio_hal_pblpool_free(rdev_p, *pbl_addr, *pbl_size << 3); + else if (!rereg) { + *pbl_addr = cxio_hal_pblpool_alloc(rdev_p, *pbl_size << 3); + if (!*pbl_addr) { + return -ENOMEM; + } + } + + down_interruptible(&rdev_p->ctrl_qp.sem); + + /* write PBL first if any - update pbl only if pbl list exist */ + if (pbl) { + + PDBG("%s *pdb_addr 0x%x, pbl_base 0x%x, pbl_size %d\n", + __FUNCTION__, *pbl_addr, rdev_p->rnic_info.pbl_base, + *pbl_size); + err = cxio_hal_ctrl_qp_write_mem(rdev_p, + (*pbl_addr >> 5), + (*pbl_size << 3), pbl, 0); + if (err) + goto ret; + } + + /* write TPT entry */ + if (reset_tpt_entry) + memset(&tpt, 0, sizeof(tpt)); + else { + tpt.valid_stag_pdid = cpu_to_be32(F_TPT_VALID | + V_TPT_STAG_KEY((*stag) & M_TPT_STAG_KEY) | + V_TPT_STAG_STATE(stag_state) | + V_TPT_STAG_TYPE(type) | V_TPT_PDID(pdid)); + BUG_ON(page_size >= 28); + tpt.flags_pagesize_qpid = cpu_to_be32(V_TPT_PERM(perm) | + F_TPT_MW_BIND_ENABLE | + V_TPT_ADDR_TYPE((zbva ? TPT_ZBTO : TPT_VATO)) | + V_TPT_PAGE_SIZE(page_size)); + tpt.rsvd_pbl_addr = reset_tpt_entry ? 0 : + cpu_to_be32(V_TPT_PBL_ADDR(PBL_OFF(rdev_p, *pbl_addr)>>3)); + tpt.len = cpu_to_be32(len); + tpt.va_hi = cpu_to_be32((u32) (to >> 32)); + tpt.va_low_or_fbo = cpu_to_be32((u32) (to & 0xFFFFFFFFULL)); + tpt.rsvd_bind_cnt_or_pstag = 0; + tpt.rsvd_pbl_size = reset_tpt_entry ? 0 : + cpu_to_be32(V_TPT_PBL_SIZE((*pbl_size) >> 2)); + } + err = cxio_hal_ctrl_qp_write_mem(rdev_p, + stag_idx + + (rdev_p->rnic_info.tpt_base >> 5), + sizeof(tpt), &tpt, 1); + + /* release the stag index to free pool */ + if (reset_tpt_entry) + cxio_hal_put_stag(rdev_p->rscp, stag_idx); +ret: + wptr = rdev_p->ctrl_qp.wptr; + up(&rdev_p->ctrl_qp.sem); + if (!err) + if (wait_event_interruptible(rdev_p->ctrl_qp.waitq, + SEQ32_GE(rdev_p->ctrl_qp.rptr, + wptr))) + return -ERESTARTSYS; + return err; +} + +/* IN : stag key, pdid, pbl_size + * Out: stag index, actaul pbl_size, and pbl_addr allocated. + */ +int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid, + enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr) +{ + *stag = T3_STAG_UNSET; + return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR, + perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr)); +} + +int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, + enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, + u8 page_size, __be64 *pbl, u32 *pbl_size, + u32 *pbl_addr) +{ + *stag = T3_STAG_UNSET; + return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm, + zbva, to, len, page_size, pbl, pbl_size, pbl_addr); +} + +int cxio_reregister_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, + enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, + u8 page_size, __be64 *pbl, u32 *pbl_size, + u32 *pbl_addr) +{ + return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm, + zbva, to, len, page_size, pbl, pbl_size, pbl_addr); +} + +int cxio_dereg_mem(struct cxio_rdev *rdev_p, u32 stag, u32 pbl_size, + u32 pbl_addr) +{ + return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL, + &pbl_size, &pbl_addr); +} + +int cxio_allocate_window(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid) +{ + u32 pbl_size = 0; + *stag = T3_STAG_UNSET; + return __cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_MW, 0, 0, 0ULL, 0, 0, + NULL, &pbl_size, NULL); +} + +int cxio_deallocate_window(struct cxio_rdev *rdev_p, u32 stag) +{ + return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL, + NULL, NULL); +} + +int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr) +{ + struct t3_rdma_init_wr *wqe; + struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_ATOMIC); + if (!skb) + return -ENOMEM; + PDBG("%s rdev_p %p\n", __FUNCTION__, rdev_p); + wqe = (struct t3_rdma_init_wr *) __skb_put(skb, sizeof(*wqe)); + wqe->wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_INIT)); + wqe->wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(attr->tid) | + V_FW_RIWR_LEN(sizeof(*wqe) >> 3)); + wqe->wrid.id1 = 0; + wqe->qpid = cpu_to_be32(attr->qpid); + wqe->pdid = cpu_to_be32(attr->pdid); + wqe->scqid = cpu_to_be32(attr->scqid); + wqe->rcqid = cpu_to_be32(attr->rcqid); + wqe->rq_addr = cpu_to_be32(attr->rq_addr - rdev_p->rnic_info.rqt_base); + wqe->rq_size = cpu_to_be32(attr->rq_size); + wqe->mpaattrs = attr->mpaattrs; + wqe->qpcaps = attr->qpcaps; + wqe->ulpdu_size = cpu_to_be16(attr->tcp_emss); + wqe->flags = cpu_to_be32(attr->flags); + wqe->ord = cpu_to_be32(attr->ord); + wqe->ird = cpu_to_be32(attr->ird); + wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr); + wqe->qp_dma_size = cpu_to_be32(attr->qp_dma_size); + wqe->rsvd = 0; + skb->priority = 0; /* 0=>ToeQ; 1=>CtrlQ */ + return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); +} + +void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb) +{ + cxio_ev_cb = ev_cb; +} + +void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb) +{ + cxio_ev_cb = NULL; +} + +static int cxio_hal_ev_handler(struct t3cdev *t3cdev_p, struct sk_buff *skb) +{ + static int cnt; + struct cxio_rdev *rdev_p = NULL; + struct respQ_msg_t *rsp_msg = (struct respQ_msg_t *) skb->data; + PDBG("%d: %s cq_id 0x%x cq_ptr 0x%x genbit %0x overflow %0x an %0x" + " se %0x notify %0x cqbranch %0x creditth %0x\n", + cnt, __FUNCTION__, RSPQ_CQID(rsp_msg), RSPQ_CQPTR(rsp_msg), + RSPQ_GENBIT(rsp_msg), RSPQ_OVERFLOW(rsp_msg), RSPQ_AN(rsp_msg), + RSPQ_SE(rsp_msg), RSPQ_NOTIFY(rsp_msg), RSPQ_CQBRANCH(rsp_msg), + RSPQ_CREDIT_THRESH(rsp_msg)); + PDBG("CQE: QPID 0x%0x genbit %0x type 0x%0x status 0x%0x opcode %d " + "len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n", + CQE_QPID(rsp_msg->cqe), CQE_GENBIT(rsp_msg->cqe), + CQE_TYPE(rsp_msg->cqe), CQE_STATUS(rsp_msg->cqe), + CQE_OPCODE(rsp_msg->cqe), CQE_LEN(rsp_msg->cqe), + CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe)); + rdev_p = (struct cxio_rdev *)t3cdev_p->ulp; + if (!rdev_p) { + PDBG("%s called by t3cdev %p with null ulp\n", __FUNCTION__, + t3cdev_p); + return 0; + } + if (CQE_QPID(rsp_msg->cqe) == T3_CTRL_QP_ID) { + rdev_p->ctrl_qp.rptr = CQE_WRID_LOW(rsp_msg->cqe) + 1; + wake_up_interruptible(&rdev_p->ctrl_qp.waitq); + dev_kfree_skb_irq(skb); + } else if (CQE_QPID(rsp_msg->cqe) == 0xfff8) + dev_kfree_skb_irq(skb); + else if (cxio_ev_cb) + (*cxio_ev_cb) (rdev_p, skb); + else + dev_kfree_skb_irq(skb); + cnt++; + return 0; +} + +/* Caller takes care of locking if needed */ +int cxio_rdev_open(struct cxio_rdev *rdev_p) +{ + struct net_device *netdev_p = NULL; + int err = 0; + if (strlen(rdev_p->dev_name)) { + if (cxio_hal_find_rdev_by_name(rdev_p->dev_name)) { + return -EBUSY; + } + netdev_p = dev_get_by_name(rdev_p->dev_name); + if (!netdev_p) { + return -EINVAL; + } + dev_put(netdev_p); + } else if (rdev_p->t3cdev_p) { + if (cxio_hal_find_rdev_by_t3cdev(rdev_p->t3cdev_p)) { + return -EBUSY; + } + netdev_p = rdev_p->t3cdev_p->lldev; + strncpy(rdev_p->dev_name, rdev_p->t3cdev_p->name, + T3_MAX_DEV_NAME_LEN); + } else { + PDBG("%s t3cdev_p or dev_name must be set\n", __FUNCTION__); + return -EINVAL; + } + + list_add_tail(&rdev_p->entry, &rdev_list); + + PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name); + memset(&rdev_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp)); + if (!rdev_p->t3cdev_p) + rdev_p->t3cdev_p = T3CDEV(netdev_p); + rdev_p->t3cdev_p->ulp = (void *) rdev_p; + err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_GET_PARAMS, + &(rdev_p->rnic_info)); + if (err) { + printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n", + __FUNCTION__, rdev_p->t3cdev_p, err); + goto err1; + } + err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, GET_PORTS, + &(rdev_p->port_info)); + if (err) { + printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n", + __FUNCTION__, rdev_p->t3cdev_p, err); + goto err1; + } + + /* + * qpshift is the number of bits to shift the qpid left in order + * to get the correct address of the doorbell for that qp. + */ + cxio_init_ucontext(rdev_p, &rdev_p->uctx); + rdev_p->qpshift = PAGE_SHIFT - + ilog2(65536 >> + ilog2(rdev_p->rnic_info.udbell_len >> + PAGE_SHIFT)); + rdev_p->qpnr = rdev_p->rnic_info.udbell_len >> PAGE_SHIFT; + rdev_p->qpmask = (65536 >> ilog2(rdev_p->qpnr)) - 1; + PDBG("%s rnic %s info: tpt_base 0x%0x tpt_top 0x%0x num stags %d " + "pbl_base 0x%0x pbl_top 0x%0x rqt_base 0x%0x, rqt_top 0x%0x\n", + __FUNCTION__, rdev_p->dev_name, rdev_p->rnic_info.tpt_base, + rdev_p->rnic_info.tpt_top, cxio_num_stags(rdev_p), + rdev_p->rnic_info.pbl_base, + rdev_p->rnic_info.pbl_top, rdev_p->rnic_info.rqt_base, + rdev_p->rnic_info.rqt_top); + PDBG("udbell_len 0x%0x udbell_physbase 0x%lx kdb_addr %p qpshift %lu " + "qpnr %d qpmask 0x%x\n", + rdev_p->rnic_info.udbell_len, + rdev_p->rnic_info.udbell_physbase, rdev_p->rnic_info.kdb_addr, + rdev_p->qpshift, rdev_p->qpnr, rdev_p->qpmask); + + err = cxio_hal_init_ctrl_qp(rdev_p); + if (err) { + printk(KERN_ERR "%s error %d initializing ctrl_qp.\n", + __FUNCTION__, err); + goto err1; + } + err = cxio_hal_init_resource(rdev_p, cxio_num_stags(rdev_p), 0, + 0, T3_MAX_NUM_QP, T3_MAX_NUM_CQ, + T3_MAX_NUM_PD); + if (err) { + printk(KERN_ERR "%s error %d initializing hal resources.\n", + __FUNCTION__, err); + goto err2; + } + err = cxio_hal_pblpool_create(rdev_p); + if (err) { + printk(KERN_ERR "%s error %d initializing pbl mem pool.\n", + __FUNCTION__, err); + goto err3; + } + err = cxio_hal_rqtpool_create(rdev_p); + if (err) { + printk(KERN_ERR "%s error %d initializing rqt mem pool.\n", + __FUNCTION__, err); + goto err4; + } + return 0; +err4: + cxio_hal_pblpool_destroy(rdev_p); +err3: + cxio_hal_destroy_resource(rdev_p->rscp); +err2: + cxio_hal_destroy_ctrl_qp(rdev_p); +err1: + list_del(&rdev_p->entry); + return err; +} + +void cxio_rdev_close(struct cxio_rdev *rdev_p) +{ + if (rdev_p) { + cxio_hal_pblpool_destroy(rdev_p); + cxio_hal_rqtpool_destroy(rdev_p); + list_del(&rdev_p->entry); + rdev_p->t3cdev_p->ulp = NULL; + cxio_hal_destroy_ctrl_qp(rdev_p); + cxio_hal_destroy_resource(rdev_p->rscp); + } +} + +int __init cxio_hal_init(void) +{ + if (cxio_hal_init_rhdl_resource(T3_MAX_NUM_RI)) + return -ENOMEM; + t3_register_cpl_handler(CPL_ASYNC_NOTIF, cxio_hal_ev_handler); + return 0; +} + +void __exit cxio_hal_exit(void) +{ + struct cxio_rdev *rdev, *tmp; + + t3_register_cpl_handler(CPL_ASYNC_NOTIF, NULL); + list_for_each_entry_safe(rdev, tmp, &rdev_list, entry) + cxio_rdev_close(rdev); + cxio_hal_destroy_rhdl_resource(); +} + +static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) +{ + struct t3_swsq *sqp; + __u32 ptr = wq->sq_rptr; + int count = Q_COUNT(wq->sq_rptr, wq->sq_wptr); + + sqp = wq->sq + Q_PTR2IDX(ptr, wq->sq_size_log2); + while (count--) + if (!sqp->signaled) { + ptr++; + sqp = wq->sq + Q_PTR2IDX(ptr, wq->sq_size_log2); + } else if (sqp->complete) { + + /* + * Insert this completed cqe into the swcq. + */ + PDBG("%s moving cqe into swcq sq idx %ld cq idx %ld\n", + __FUNCTION__, Q_PTR2IDX(ptr, wq->sq_size_log2), + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)); + sqp->cqe.header |= htonl(V_CQE_SWCQE(1)); + *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) + = sqp->cqe; + cq->sw_wptr++; + sqp->signaled = 0; + break; + } else + break; +} + +static inline void create_read_req_cqe(struct t3_wq *wq, + struct t3_cqe *hw_cqe, + struct t3_cqe *read_cqe) +{ + read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr; + read_cqe->len = wq->oldest_read->read_len; + read_cqe->header = htonl(V_CQE_QPID(CQE_QPID(*hw_cqe)) | + V_CQE_SWCQE(SW_CQE(*hw_cqe)) | + V_CQE_OPCODE(T3_READ_REQ) | + V_CQE_TYPE(1)); +} + +/* + * Return a ptr to the next read wr in the SWSQ or NULL. + */ +static inline void advance_oldest_read(struct t3_wq *wq) +{ + + u32 rptr = wq->oldest_read - wq->sq + 1; + u32 wptr = Q_PTR2IDX(wq->sq_wptr, wq->sq_size_log2); + + while (Q_PTR2IDX(rptr, wq->sq_size_log2) != wptr) { + wq->oldest_read = wq->sq + Q_PTR2IDX(rptr, wq->sq_size_log2); + + if (wq->oldest_read->opcode == T3_READ_REQ) + return; + rptr++; + } + wq->oldest_read = NULL; +} + +/* + * cxio_poll_cq + * + * Caller must: + * check the validity of the first CQE, + * supply the wq assicated with the qpid. + * + * credit: cq credit to return to sge. + * cqe_flushed: 1 iff the CQE is flushed. + * cqe: copy of the polled CQE. + * + * return value: + * 0 CQE returned, + * -1 CQE skipped, try again. + */ +int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe, + u8 *cqe_flushed, u64 *cookie, u32 *credit) +{ + int ret = 0; + struct t3_cqe *hw_cqe, read_cqe; + + *cqe_flushed = 0; + *credit = 0; + hw_cqe = cxio_next_cqe(cq); + + PDBG("%s CQE OOO %d qpid 0x%0x genbit %d type %d status 0x%0x" + " opcode 0x%0x len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n", + __FUNCTION__, CQE_OOO(*hw_cqe), CQE_QPID(*hw_cqe), + CQE_GENBIT(*hw_cqe), CQE_TYPE(*hw_cqe), CQE_STATUS(*hw_cqe), + CQE_OPCODE(*hw_cqe), CQE_LEN(*hw_cqe), CQE_WRID_HI(*hw_cqe), + CQE_WRID_LOW(*hw_cqe)); + + /* + * skip cqe's not affiliated with a QP. + */ + if (wq == NULL) { + ret = -1; + goto skip_cqe; + } + + /* + * Gotta tweak READ completions: + * 1) the cqe doesn't contain the sq_wptr from the wr. + * 2) opcode not reflected from the wr. + * 3) read_len not reflected from the wr. + * 4) cq_type is RQ_TYPE not SQ_TYPE. + */ + if (RQ_TYPE(*hw_cqe) && (CQE_OPCODE(*hw_cqe) == T3_READ_RESP)) { + + /* + * Don't write to the HWCQ, so create a new read req CQE + * in local memory. + */ + create_read_req_cqe(wq, hw_cqe, &read_cqe); + hw_cqe = &read_cqe; + advance_oldest_read(wq); + } + + /* + * T3A: Discard TERMINATE CQEs. + */ + if (CQE_OPCODE(*hw_cqe) == T3_TERMINATE) { + ret = -1; + wq->error = 1; + goto skip_cqe; + } + + if (CQE_STATUS(*hw_cqe) || wq->error) { + *cqe_flushed = wq->error; + wq->error = 1; + + /* + * T3A inserts errors into the CQE. We cannot return + * these as work completions. + */ + /* incoming write failures */ + if ((CQE_OPCODE(*hw_cqe) == T3_RDMA_WRITE) + && RQ_TYPE(*hw_cqe)) { + ret = -1; + goto skip_cqe; + } + /* incoming read request failures */ + if ((CQE_OPCODE(*hw_cqe) == T3_READ_RESP) && SQ_TYPE(*hw_cqe)) { + ret = -1; + goto skip_cqe; + } + + /* incoming SEND with no receive posted failures */ + if ((CQE_OPCODE(*hw_cqe) == T3_SEND) && RQ_TYPE(*hw_cqe) && + Q_EMPTY(wq->rq_rptr, wq->rq_wptr)) { + ret = -1; + goto skip_cqe; + } + goto proc_cqe; + } + + /* + * RECV completion. + */ + if (RQ_TYPE(*hw_cqe)) { + + /* + * HW only validates 4 bits of MSN. So we must validate that + * the MSN in the SEND is the next expected MSN. If its not, + * then we complete this with TPT_ERR_MSN and mark the wq in + * error. + */ + if (unlikely((CQE_WRID_MSN(*hw_cqe) != (wq->rq_rptr + 1)))) { + wq->error = 1; + hw_cqe->header |= htonl(V_CQE_STATUS(TPT_ERR_MSN)); + goto proc_cqe; + } + goto proc_cqe; + } + + /* + * If we get here its a send completion. + * + * Handle out of order completion. These get stuffed + * in the SW SQ. Then the SW SQ is walked to move any + * now in-order completions into the SW CQ. This handles + * 2 cases: + * 1) reaping unsignaled WRs when the first subsequent + * signaled WR is completed. + * 2) out of order read completions. + */ + if (!SW_CQE(*hw_cqe) && (CQE_WRID_SQ_WPTR(*hw_cqe) != wq->sq_rptr)) { + struct t3_swsq *sqp; + + PDBG("%s out of order completion going in swsq at idx %ld\n", + __FUNCTION__, + Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2)); + sqp = wq->sq + + Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2); + sqp->cqe = *hw_cqe; + sqp->complete = 1; + ret = -1; + goto flush_wq; + } + +proc_cqe: + *cqe = *hw_cqe; + + /* + * Reap the associated WR(s) that are freed up with this + * completion. + */ + if (SQ_TYPE(*hw_cqe)) { + wq->sq_rptr = CQE_WRID_SQ_WPTR(*hw_cqe); + PDBG("%s completing sq idx %ld\n", __FUNCTION__, + Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2)); + *cookie = (wq->sq + + Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2))->wr_id; + wq->sq_rptr++; + } else { + PDBG("%s completing rq idx %ld\n", __FUNCTION__, + Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2)); + *cookie = *(wq->rq + Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2)); + wq->rq_rptr++; + } + +flush_wq: + /* + * Flush any completed cqes that are now in-order. + */ + flush_completed_wrs(wq, cq); + +skip_cqe: + if (SW_CQE(*hw_cqe)) { + PDBG("%s cq %p cqid 0x%x skip sw cqe sw_rptr 0x%x\n", + __FUNCTION__, cq, cq->cqid, cq->sw_rptr); + ++cq->sw_rptr; + } else { + PDBG("%s cq %p cqid 0x%x skip hw cqe rptr 0x%x\n", + __FUNCTION__, cq, cq->cqid, cq->rptr); + ++cq->rptr; + + /* + * T3A: compute credits. + */ + if (((cq->rptr - cq->wptr) > (1 << (cq->size_log2 - 1))) + || ((cq->rptr - cq->wptr) >= 128)) { + *credit = cq->rptr - cq->wptr; + cq->wptr = cq->rptr; + } + } + return ret; +} diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h new file mode 100644 index 0000000..8fb2999 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h @@ -0,0 +1,201 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef __CXIO_HAL_H__ +#define __CXIO_HAL_H__ + +#include +#include + +#include "t3_cpl.h" +#include "t3cdev.h" +#include "cxgb3_ctl_defs.h" +#include "cxio_wr.h" + +#define T3_CTRL_QP_ID FW_RI_SGEEC_START +#define T3_CTL_QP_TID FW_RI_TID_START +#define T3_CTRL_QP_SIZE_LOG2 8 +#define T3_CTRL_CQ_ID 0 + +/* TBD */ +#define T3_MAX_NUM_RI (1<<15) +#define T3_MAX_NUM_QP (1<<15) +#define T3_MAX_NUM_CQ (1<<15) +#define T3_MAX_NUM_PD (1<<15) +#define T3_MAX_PBL_SIZE 256 +#define T3_MAX_RQ_SIZE 1024 +#define T3_MAX_NUM_STAG (1<<15) + +#define T3_STAG_UNSET 0xffffffff + +#define T3_MAX_DEV_NAME_LEN 32 + +struct cxio_hal_ctrl_qp { + u32 wptr; + u32 rptr; + struct semaphore sem; /* for the wtpr, can sleep */ + wait_queue_head_t waitq; /* wait for RspQ/CQE msg */ + union t3_wr *workq; /* the work request queue */ + dma_addr_t dma_addr; /* pci bus address of the workq */ + DECLARE_PCI_UNMAP_ADDR(mapping) + void __iomem *doorbell; +}; + +struct cxio_hal_resource { + struct kfifo *tpt_fifo; + spinlock_t tpt_fifo_lock; + struct kfifo *qpid_fifo; + spinlock_t qpid_fifo_lock; + struct kfifo *cqid_fifo; + spinlock_t cqid_fifo_lock; + struct kfifo *pdid_fifo; + spinlock_t pdid_fifo_lock; +}; + +struct cxio_qpid_list { + struct list_head entry; + u32 qpid; +}; + +struct cxio_ucontext { + struct list_head qpids; + struct mutex lock; +}; + +struct cxio_rdev { + char dev_name[T3_MAX_DEV_NAME_LEN]; + struct t3cdev *t3cdev_p; + struct rdma_info rnic_info; + struct adap_ports port_info; + struct cxio_hal_resource *rscp; + struct cxio_hal_ctrl_qp ctrl_qp; + void *ulp; + unsigned long qpshift; + u32 qpnr; + u32 qpmask; + struct cxio_ucontext uctx; + struct gen_pool *pbl_pool; + struct gen_pool *rqt_pool; + struct list_head entry; +}; + +static inline int cxio_num_stags(struct cxio_rdev *rdev_p) +{ + return min((int)T3_MAX_NUM_STAG, (int)((rdev_p->rnic_info.tpt_top - rdev_p->rnic_info.tpt_base) >> 5)); +} + +typedef void (*cxio_hal_ev_callback_func_t) (struct cxio_rdev * rdev_p, + struct sk_buff * skb); + +#define RSPQ_CQID(rsp) (be32_to_cpu(rsp->cq_ptrid) & 0xffff) +#define RSPQ_CQPTR(rsp) ((be32_to_cpu(rsp->cq_ptrid) >> 16) & 0xffff) +#define RSPQ_GENBIT(rsp) ((be32_to_cpu(rsp->flags) >> 16) & 1) +#define RSPQ_OVERFLOW(rsp) ((be32_to_cpu(rsp->flags) >> 17) & 1) +#define RSPQ_AN(rsp) ((be32_to_cpu(rsp->flags) >> 18) & 1) +#define RSPQ_SE(rsp) ((be32_to_cpu(rsp->flags) >> 19) & 1) +#define RSPQ_NOTIFY(rsp) ((be32_to_cpu(rsp->flags) >> 20) & 1) +#define RSPQ_CQBRANCH(rsp) ((be32_to_cpu(rsp->flags) >> 21) & 1) +#define RSPQ_CREDIT_THRESH(rsp) ((be32_to_cpu(rsp->flags) >> 22) & 1) + +struct respQ_msg_t { + __be32 flags; /* flit 0 */ + __be32 cq_ptrid; + __be64 rsvd; /* flit 1 */ + struct t3_cqe cqe; /* flits 2-3 */ +}; + +enum t3_cq_opcode { + CQ_ARM_AN = 0x2, + CQ_ARM_SE = 0x6, + CQ_FORCE_AN = 0x3, + CQ_CREDIT_UPDATE = 0x7 +}; + +int cxio_rdev_open(struct cxio_rdev *rdev); +void cxio_rdev_close(struct cxio_rdev *rdev); +int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq, + enum t3_cq_opcode op, u32 credit); +int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid); +int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq); +int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq); +int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq); +void cxio_release_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx); +void cxio_init_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx); +int cxio_create_qp(struct cxio_rdev *rdev, u32 kernel_domain, struct t3_wq *wq, + struct cxio_ucontext *uctx); +int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq, + struct cxio_ucontext *uctx); +int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode); +int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid, + enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr); +int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, + enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, + u8 page_size, __be64 *pbl, u32 *pbl_size, + u32 *pbl_addr); +int cxio_reregister_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, + enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, + u8 page_size, __be64 *pbl, u32 *pbl_size, + u32 *pbl_addr); +int cxio_dereg_mem(struct cxio_rdev *rdev, u32 stag, u32 pbl_size, + u32 pbl_addr); +int cxio_allocate_window(struct cxio_rdev *rdev, u32 * stag, u32 pdid); +int cxio_deallocate_window(struct cxio_rdev *rdev, u32 stag); +int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr); +void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb); +void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb); +u32 cxio_hal_get_rhdl(void); +void cxio_hal_put_rhdl(u32 rhdl); +u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp); +void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid); +int __init cxio_hal_init(void); +void __exit cxio_hal_exit(void); +void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count); +void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count); +void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count); +void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count); +void cxio_flush_hw_cq(struct t3_cq *cq); +int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe, + u8 *cqe_flushed, u64 *cookie, u32 *credit); + +#define MOD "iw_cxgb3: " +#define PDBG(fmt, args...) pr_debug(MOD fmt, ## args) + +#ifdef DEBUG +void cxio_dump_tpt(struct cxio_rdev *rev, u32 stag); +void cxio_dump_pbl(struct cxio_rdev *rev, u32 pbl_addr, uint len, u8 shift); +void cxio_dump_wqe(union t3_wr *wqe); +void cxio_dump_wce(struct t3_cqe *wce); +void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents); +void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid); +#endif + +#endif diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.c b/drivers/infiniband/hw/cxgb3/cxio_resource.c new file mode 100644 index 0000000..997aa32 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/cxio_resource.c @@ -0,0 +1,331 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +/* Crude resource management */ +#include +#include +#include +#include +#include +#include +#include "cxio_resource.h" +#include "cxio_hal.h" + +static struct kfifo *rhdl_fifo; +static spinlock_t rhdl_fifo_lock; + +#define RANDOM_SIZE 16 + +static int __cxio_init_resource_fifo(struct kfifo **fifo, + spinlock_t *fifo_lock, + u32 nr, u32 skip_low, + u32 skip_high, + int random) +{ + u32 i, j, entry = 0, idx; + u32 random_bytes; + u32 rarray[16]; + spin_lock_init(fifo_lock); + + *fifo = kfifo_alloc(nr * sizeof(u32), GFP_KERNEL, fifo_lock); + if (IS_ERR(*fifo)) + return -ENOMEM; + + for (i = 0; i < skip_low + skip_high; i++) + __kfifo_put(*fifo, (unsigned char *) &entry, sizeof(u32)); + if (random) { + j = 0; + random_bytes = random32(); + for (i = 0; i < RANDOM_SIZE; i++) + rarray[i] = i + skip_low; + for (i = skip_low + RANDOM_SIZE; i < nr - skip_high; i++) { + if (j >= RANDOM_SIZE) { + j = 0; + random_bytes = random32(); + } + idx = (random_bytes >> (j * 2)) & 0xF; + __kfifo_put(*fifo, + (unsigned char *) &rarray[idx], + sizeof(u32)); + rarray[idx] = i; + j++; + } + for (i = 0; i < RANDOM_SIZE; i++) + __kfifo_put(*fifo, + (unsigned char *) &rarray[i], + sizeof(u32)); + } else + for (i = skip_low; i < nr - skip_high; i++) + __kfifo_put(*fifo, (unsigned char *) &i, sizeof(u32)); + + for (i = 0; i < skip_low + skip_high; i++) + kfifo_get(*fifo, (unsigned char *) &entry, sizeof(u32)); + return 0; +} + +static int cxio_init_resource_fifo(struct kfifo **fifo, spinlock_t * fifo_lock, + u32 nr, u32 skip_low, u32 skip_high) +{ + return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low, + skip_high, 0)); +} + +static int cxio_init_resource_fifo_random(struct kfifo **fifo, + spinlock_t * fifo_lock, + u32 nr, u32 skip_low, u32 skip_high) +{ + + return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low, + skip_high, 1)); +} + +static int cxio_init_qpid_fifo(struct cxio_rdev *rdev_p) +{ + u32 i; + + spin_lock_init(&rdev_p->rscp->qpid_fifo_lock); + + rdev_p->rscp->qpid_fifo = kfifo_alloc(T3_MAX_NUM_QP * sizeof(u32), + GFP_KERNEL, + &rdev_p->rscp->qpid_fifo_lock); + if (IS_ERR(rdev_p->rscp->qpid_fifo)) + return -ENOMEM; + + for (i = 16; i < T3_MAX_NUM_QP; i++) + if (!(i & rdev_p->qpmask)) + __kfifo_put(rdev_p->rscp->qpid_fifo, + (unsigned char *) &i, sizeof(u32)); + return 0; +} + +int cxio_hal_init_rhdl_resource(u32 nr_rhdl) +{ + return cxio_init_resource_fifo(&rhdl_fifo, &rhdl_fifo_lock, nr_rhdl, 1, + 0); +} + +void cxio_hal_destroy_rhdl_resource(void) +{ + kfifo_free(rhdl_fifo); +} + +/* nr_* must be power of 2 */ +int cxio_hal_init_resource(struct cxio_rdev *rdev_p, + u32 nr_tpt, u32 nr_pbl, + u32 nr_rqt, u32 nr_qpid, u32 nr_cqid, u32 nr_pdid) +{ + int err = 0; + struct cxio_hal_resource *rscp; + + rscp = kmalloc(sizeof(*rscp), GFP_KERNEL); + if (!rscp) + return -ENOMEM; + rdev_p->rscp = rscp; + err = cxio_init_resource_fifo_random(&rscp->tpt_fifo, + &rscp->tpt_fifo_lock, + nr_tpt, 1, 0); + if (err) + goto tpt_err; + err = cxio_init_qpid_fifo(rdev_p); + if (err) + goto qpid_err; + err = cxio_init_resource_fifo(&rscp->cqid_fifo, &rscp->cqid_fifo_lock, + nr_cqid, 1, 0); + if (err) + goto cqid_err; + err = cxio_init_resource_fifo(&rscp->pdid_fifo, &rscp->pdid_fifo_lock, + nr_pdid, 1, 0); + if (err) + goto pdid_err; + return 0; +pdid_err: + kfifo_free(rscp->cqid_fifo); +cqid_err: + kfifo_free(rscp->qpid_fifo); +qpid_err: + kfifo_free(rscp->tpt_fifo); +tpt_err: + return -ENOMEM; +} + +/* + * returns 0 if no resource available + */ +static inline u32 cxio_hal_get_resource(struct kfifo *fifo) +{ + u32 entry; + if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32))) + return entry; + else + return 0; /* fifo emptry */ +} + +static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) +{ + BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0); +} + +u32 cxio_hal_get_rhdl(void) +{ + return cxio_hal_get_resource(rhdl_fifo); +} + +void cxio_hal_put_rhdl(u32 rhdl) +{ + cxio_hal_put_resource(rhdl_fifo, rhdl); +} + +u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp) +{ + return cxio_hal_get_resource(rscp->tpt_fifo); +} + +void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag) +{ + cxio_hal_put_resource(rscp->tpt_fifo, stag); +} + +u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp) +{ + u32 qpid = cxio_hal_get_resource(rscp->qpid_fifo); + PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid); + return qpid; +} + +void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid) +{ + PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid); + cxio_hal_put_resource(rscp->qpid_fifo, qpid); +} + +u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp) +{ + return cxio_hal_get_resource(rscp->cqid_fifo); +} + +void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid) +{ + cxio_hal_put_resource(rscp->cqid_fifo, cqid); +} + +u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp) +{ + return cxio_hal_get_resource(rscp->pdid_fifo); +} + +void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid) +{ + cxio_hal_put_resource(rscp->pdid_fifo, pdid); +} + +void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp) +{ + kfifo_free(rscp->tpt_fifo); + kfifo_free(rscp->cqid_fifo); + kfifo_free(rscp->qpid_fifo); + kfifo_free(rscp->pdid_fifo); + kfree(rscp); +} + +/* + * PBL Memory Manager. Uses Linux generic allocator. + */ + +#define MIN_PBL_SHIFT 8 /* 256B == min PBL size (32 entries) */ +#define PBL_CHUNK 2*1024*1024 + +u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size) +{ + unsigned long addr = gen_pool_alloc(rdev_p->pbl_pool, size); + PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size); + return (u32)addr; +} + +void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size) +{ + PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size); + gen_pool_free(rdev_p->pbl_pool, (unsigned long)addr, size); +} + +int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p) +{ + unsigned long i; + rdev_p->pbl_pool = gen_pool_create(MIN_PBL_SHIFT, -1); + if (rdev_p->pbl_pool) + for (i = rdev_p->rnic_info.pbl_base; + i <= rdev_p->rnic_info.pbl_top - PBL_CHUNK + 1; + i += PBL_CHUNK) + gen_pool_add(rdev_p->pbl_pool, i, PBL_CHUNK, -1); + return rdev_p->pbl_pool ? 0 : -ENOMEM; +} + +void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p) +{ + gen_pool_destroy(rdev_p->pbl_pool); +} + +/* + * RQT Memory Manager. Uses Linux generic allocator. + */ + +#define MIN_RQT_SHIFT 10 /* 1KB == mini RQT size (16 entries) */ +#define RQT_CHUNK 2*1024*1024 + +u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size) +{ + unsigned long addr = gen_pool_alloc(rdev_p->rqt_pool, size << 6); + PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size << 6); + return (u32)addr; +} + +void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size) +{ + PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size << 6); + gen_pool_free(rdev_p->rqt_pool, (unsigned long)addr, size << 6); +} + +int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p) +{ + unsigned long i; + rdev_p->rqt_pool = gen_pool_create(MIN_RQT_SHIFT, -1); + if (rdev_p->rqt_pool) + for (i = rdev_p->rnic_info.rqt_base; + i <= rdev_p->rnic_info.rqt_top - RQT_CHUNK + 1; + i += RQT_CHUNK) + gen_pool_add(rdev_p->rqt_pool, i, RQT_CHUNK, -1); + return rdev_p->rqt_pool ? 0 : -ENOMEM; +} + +void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p) +{ + gen_pool_destroy(rdev_p->rqt_pool); +} diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.h b/drivers/infiniband/hw/cxgb3/cxio_resource.h new file mode 100644 index 0000000..a6bbe83 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/cxio_resource.h @@ -0,0 +1,70 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef __CXIO_RESOURCE_H__ +#define __CXIO_RESOURCE_H__ + +#include +#include +#include +#include +#include +#include +#include +#include "cxio_hal.h" + +extern int cxio_hal_init_rhdl_resource(u32 nr_rhdl); +extern void cxio_hal_destroy_rhdl_resource(void); +extern int cxio_hal_init_resource(struct cxio_rdev *rdev_p, + u32 nr_tpt, u32 nr_pbl, + u32 nr_rqt, u32 nr_qpid, u32 nr_cqid, + u32 nr_pdid); +extern u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp); +extern void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag); +extern u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp); +extern void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid); +extern u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp); +extern void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid); +extern void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp); + +#define PBL_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.pbl_base ) +extern int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p); +extern void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p); +extern u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size); +extern void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size); + +#define RQT_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.rqt_base ) +extern int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p); +extern void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p); +extern u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size); +extern void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size); +#endif diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h new file mode 100644 index 0000000..103fc42 --- /dev/null +++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h @@ -0,0 +1,685 @@ +/* + * Copyright (c) 2006 Chelsio, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef __CXIO_WR_H__ +#define __CXIO_WR_H__ + +#include +#include +#include +#include "firmware_exports.h" + +#define T3_MAX_SGE 4 + +#define Q_EMPTY(rptr,wptr) ((rptr)==(wptr)) +#define Q_FULL(rptr,wptr,size_log2) ( (((wptr)-(rptr))>>(size_log2)) && \ + ((rptr)!=(wptr)) ) +#define Q_GENBIT(ptr,size_log2) (!(((ptr)>>size_log2)&0x1)) +#define Q_FREECNT(rptr,wptr,size_log2) ((1UL<> S_FW_RIWR_OP)) & M_FW_RIWR_OP) + +#define S_FW_RIWR_SOPEOP 22 +#define M_FW_RIWR_SOPEOP 0x3 +#define V_FW_RIWR_SOPEOP(x) ((x) << S_FW_RIWR_SOPEOP) + +#define S_FW_RIWR_FLAGS 8 +#define M_FW_RIWR_FLAGS 0x3fffff +#define V_FW_RIWR_FLAGS(x) ((x) << S_FW_RIWR_FLAGS) +#define G_FW_RIWR_FLAGS(x) ((((x) >> S_FW_RIWR_FLAGS)) & M_FW_RIWR_FLAGS) + +#define S_FW_RIWR_TID 8 +#define V_FW_RIWR_TID(x) ((x) << S_FW_RIWR_TID) + +#define S_FW_RIWR_LEN 0 +#define V_FW_RIWR_LEN(x) ((x) << S_FW_RIWR_LEN) + +#define S_FW_RIWR_GEN 31 +#define V_FW_RIWR_GEN(x) ((x) << S_FW_RIWR_GEN) + +struct t3_sge { + __be32 stag; + __be32 len; + __be64 to; +}; + +/* If num_sgle is zero, flit 5+ contains immediate data.*/ +struct t3_send_wr { + struct fw_riwrh wrh; /* 0 */ + union t3_wrid wrid; /* 1 */ + + u8 rdmaop; /* 2 */ + u8 reserved[3]; + __be32 rem_stag; + __be32 plen; /* 3 */ + __be32 num_sgle; + struct t3_sge sgl[T3_MAX_SGE]; /* 4+ */ +}; + +struct t3_local_inv_wr { + struct fw_riwrh wrh; /* 0 */ + union t3_wrid wrid; /* 1 */ + __be32 stag; /* 2 */ + __be32 reserved3; +}; + +struct t3_rdma_write_wr { + struct fw_riwrh wrh; /* 0 */ + union t3_wrid wrid; /* 1 */ + u8 rdmaop; /* 2 */ + u8 reserved[3]; + __be32 stag_sink; + __be64 to_sink; /* 3 */ + __be32 plen; /* 4 */ + __be32 num_sgle; + struct t3_sge sgl[T3_MAX_SGE]; /* 5+ */ +}; + +struct t3_rdma_read_wr { + struct fw_riwrh wrh; /* 0 */ + union t3_wrid wrid; /* 1 */ + u8 rdmaop; /* 2 */ + u8 reserved[3]; + __be32 rem_stag; + __be64 rem_to; /* 3 */ + __be32 local_stag; /* 4 */ + __be32 local_len; + __be64 local_to; /* 5 */ +}; + +enum t3_addr_type { + T3_VA_BASED_TO = 0x0, + T3_ZERO_BASED_TO = 0x1 +} __attribute__ ((packed)); + +enum t3_mem_perms { + T3_MEM_ACCESS_LOCAL_READ = 0x1, + T3_MEM_ACCESS_LOCAL_WRITE = 0x2, + T3_MEM_ACCESS_REM_READ = 0x4, + T3_MEM_ACCESS_REM_WRITE = 0x8 +} __attribute__ ((packed)); + +struct t3_bind_mw_wr { + struct fw_riwrh wrh; /* 0 */ + union t3_wrid wrid; /* 1 */ + u16 reserved; /* 2 */ + u8 type; + u8 perms; + __be32 mr_stag; + __be32 mw_stag; /* 3 */ + __be32 mw_len; + __be64 mw_va; /* 4 */ + __be32 mr_pbl_addr; /* 5 */ + u8 reserved2[3]; + u8 mr_pagesz; +}; + +struct t3_receive_wr { + struct fw_riwrh wrh; /* 0 */ + union t3_wrid wrid; /* 1 */ + u8 pagesz[T3_MAX_SGE]; + __be32 num_sgle; /* 2 */ + struct t3_sge sgl[T3_MAX_SGE]; /* 3+ */ + __be32 pbl_addr[T3_MAX_SGE]; +}; + +struct t3_bypass_wr { + struct fw_riwrh wrh; + union t3_wrid wrid; /* 1 */ +}; + +struct t3_modify_qp_wr { + struct fw_riwrh wrh; /* 0 */ + union t3_wrid wrid; /* 1 */ + __be32 flags; /* 2 */ + __be32 quiesce; /* 2 */ + __be32 max_ird; /* 3 */ + __be32 max_ord; /* 3 */ + __be64 sge_cmd; /* 4 */ + __be64 ctx1; /* 5 */ + __be64 ctx0; /* 6 */ +}; + +enum t3_modify_qp_flags { + MODQP_QUIESCE = 0x01, + MODQP_MAX_IRD = 0x02, + MODQP_MAX_ORD = 0x04, + MODQP_WRITE_EC = 0x08, + MODQP_READ_EC = 0x10, +}; + + +enum t3_mpa_attrs { + uP_RI_MPA_RX_MARKER_ENABLE = 0x1, + uP_RI_MPA_TX_MARKER_ENABLE = 0x2, + uP_RI_MPA_CRC_ENABLE = 0x4, + uP_RI_MPA_IETF_ENABLE = 0x8 +} __attribute__ ((packed)); + +enum t3_qp_caps { + uP_RI_QP_RDMA_READ_ENABLE = 0x01, + uP_RI_QP_RDMA_WRITE_ENABLE = 0x02, + uP_RI_QP_BIND_ENABLE = 0x04, + uP_RI_QP_FAST_REGISTER_ENABLE = 0x08, + uP_RI_QP_STAG0_ENABLE = 0x10 +} __attribute__ ((packed)); + +struct t3_rdma_init_attr { + u32 tid; + u32 qpid; + u32 pdid; + u32 scqid; + u32 rcqid; + u32 rq_addr; + u32 rq_size; + enum t3_mpa_attrs mpaattrs; + enum t3_qp_caps qpcaps; + u16 tcp_emss; + u32 ord; + u32 ird; + u64 qp_dma_addr; + u32 qp_dma_size; + u32 flags; +}; + +struct t3_rdma_init_wr { + struct fw_riwrh wrh; /* 0 */ + union t3_wrid wrid; /* 1 */ + __be32 qpid; /* 2 */ + __be32 pdid; + __be32 scqid; /* 3 */ + __be32 rcqid; + __be32 rq_addr; /* 4 */ + __be32 rq_size; + u8 mpaattrs; /* 5 */ + u8 qpcaps; + __be16 ulpdu_size; + __be32 flags; /* bits 31-1 - reservered */ + /* bit 0 - set if RECV posted */ + __be32 ord; /* 6 */ + __be32 ird; + __be64 qp_dma_addr; /* 7 */ + __be32 qp_dma_size; /* 8 */ + u32 rsvd; +}; + +struct t3_genbit { + u64 flit[15]; + __be64 genbit; +}; + +enum rdma_init_wr_flags { + RECVS_POSTED = 1, +}; + +union t3_wr { + struct t3_send_wr send; + struct t3_rdma_write_wr write; + struct t3_rdma_read_wr read; + struct t3_receive_wr recv; + struct t3_local_inv_wr local_inv; + struct t3_bind_mw_wr bind; + struct t3_bypass_wr bypass; + struct t3_rdma_init_wr init; + struct t3_modify_qp_wr qp_mod; + struct t3_genbit genbit; + u64 flit[16]; +}; + +#define T3_SQ_CQE_FLIT 13 +#define T3_SQ_COOKIE_FLIT 14 + +#define T3_RQ_COOKIE_FLIT 13 +#define T3_RQ_CQE_FLIT 14 + +static inline enum t3_wr_opcode fw_riwrh_opcode(struct fw_riwrh *wqe) +{ + return G_FW_RIWR_OP(be32_to_cpu(wqe->op_seop_flags)); +} + +static inline void build_fw_riwrh(struct fw_riwrh *wqe, enum t3_wr_opcode op, + enum t3_wr_flags flags, u8 genbit, u32 tid, + u8 len) +{ + wqe->op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(op) | + V_FW_RIWR_SOPEOP(M_FW_RIWR_SOPEOP) | + V_FW_RIWR_FLAGS(flags)); + wmb(); + wqe->gen_tid_len = cpu_to_be32(V_FW_RIWR_GEN(genbit) | + V_FW_RIWR_TID(tid) | + V_FW_RIWR_LEN(len)); + /* 2nd gen bit... */ + ((union t3_wr *)wqe)->genbit.genbit = cpu_to_be64(genbit); +} + +/* + * T3 ULP2_TX commands + */ +enum t3_utx_mem_op { + T3_UTX_MEM_READ = 2, + T3_UTX_MEM_WRITE = 3 +}; + +/* T3 MC7 RDMA TPT entry format */ + +enum tpt_mem_type { + TPT_NON_SHARED_MR = 0x0, + TPT_SHARED_MR = 0x1, + TPT_MW = 0x2, + TPT_MW_RELAXED_PROTECTION = 0x3 +}; + +enum tpt_addr_type { + TPT_ZBTO = 0, + TPT_VATO = 1 +}; + +enum tpt_mem_perm { + TPT_LOCAL_READ = 0x8, + TPT_LOCAL_WRITE = 0x4, + TPT_REMOTE_READ = 0x2, + TPT_REMOTE_WRITE = 0x1 +}; + +struct tpt_entry { + __be32 valid_stag_pdid; + __be32 flags_pagesize_qpid; + + __be32 rsvd_pbl_addr; + __be32 len; + __be32 va_hi; + __be32 va_low_or_fbo; + + __be32 rsvd_bind_cnt_or_pstag; + __be32 rsvd_pbl_size; +}; + +#define S_TPT_VALID 31 +#define V_TPT_VALID(x) ((x) << S_TPT_VALID) +#define F_TPT_VALID V_TPT_VALID(1U) + +#define S_TPT_STAG_KEY 23 +#define M_TPT_STAG_KEY 0xFF +#define V_TPT_STAG_KEY(x) ((x) << S_TPT_STAG_KEY) +#define G_TPT_STAG_KEY(x) (((x) >> S_TPT_STAG_KEY) & M_TPT_STAG_KEY) + +#define S_TPT_STAG_STATE 22 +#define V_TPT_STAG_STATE(x) ((x) << S_TPT_STAG_STATE) +#define F_TPT_STAG_STATE V_TPT_STAG_STATE(1U) + +#define S_TPT_STAG_TYPE 20 +#define M_TPT_STAG_TYPE 0x3 +#define V_TPT_STAG_TYPE(x) ((x) << S_TPT_STAG_TYPE) +#define G_TPT_STAG_TYPE(x) (((x) >> S_TPT_STAG_TYPE) & M_TPT_STAG_TYPE) + +#define S_TPT_PDID 0 +#define M_TPT_PDID 0xFFFFF +#define V_TPT_PDID(x) ((x) << S_TPT_PDID) +#define G_TPT_PDID(x) (((x) >> S_TPT_PDID) & M_TPT_PDID) + +#define S_TPT_PERM 28 +#define M_TPT_PERM 0xF +#define V_TPT_PERM(x) ((x) << S_TPT_PERM) +#define G_TPT_PERM(x) (((x) >> S_TPT_PERM) & M_TPT_PERM) + +#define S_TPT_REM_INV_DIS 27 +#define V_TPT_REM_INV_DIS(x) ((x) << S_TPT_REM_INV_DIS) +#define F_TPT_REM_INV_DIS V_TPT_REM_INV_DIS(1U) + +#define S_TPT_ADDR_TYPE 26 +#define V_TPT_ADDR_TYPE(x) ((x) << S_TPT_ADDR_TYPE) +#define F_TPT_ADDR_TYPE V_TPT_ADDR_TYPE(1U) + +#define S_TPT_MW_BIND_ENABLE 25 +#define V_TPT_MW_BIND_ENABLE(x) ((x) << S_TPT_MW_BIND_ENABLE) +#define F_TPT_MW_BIND_ENABLE V_TPT_MW_BIND_ENABLE(1U) + +#define S_TPT_PAGE_SIZE 20 +#define M_TPT_PAGE_SIZE 0x1F +#define V_TPT_PAGE_SIZE(x) ((x) << S_TPT_PAGE_SIZE) +#define G_TPT_PAGE_SIZE(x) (((x) >> S_TPT_PAGE_SIZE) & M_TPT_PAGE_SIZE) + +#define S_TPT_PBL_ADDR 0 +#define M_TPT_PBL_ADDR 0x1FFFFFFF +#define V_TPT_PBL_ADDR(x) ((x) << S_TPT_PBL_ADDR) +#define G_TPT_PBL_ADDR(x) (((x) >> S_TPT_PBL_ADDR) & M_TPT_PBL_ADDR) + +#define S_TPT_QPID 0 +#define M_TPT_QPID 0xFFFFF +#define V_TPT_QPID(x) ((x) << S_TPT_QPID) +#define G_TPT_QPID(x) (((x) >> S_TPT_QPID) & M_TPT_QPID) + +#define S_TPT_PSTAG 0 +#define M_TPT_PSTAG 0xFFFFFF +#define V_TPT_PSTAG(x) ((x) << S_TPT_PSTAG) +#define G_TPT_PSTAG(x) (((x) >> S_TPT_PSTAG) & M_TPT_PSTAG) + +#define S_TPT_PBL_SIZE 0 +#define M_TPT_PBL_SIZE 0xFFFFF +#define V_TPT_PBL_SIZE(x) ((x) << S_TPT_PBL_SIZE) +#define G_TPT_PBL_SIZE(x) (((x) >> S_TPT_PBL_SIZE) & M_TPT_PBL_SIZE) + +/* + * CQE defs + */ +struct t3_cqe { + __be32 header; + __be32 len; + union { + struct { + __be32 stag; + __be32 msn; + } rcqe; + struct { + u32 wrid_hi; + u32 wrid_low; + } scqe; + } u; +}; + +#define S_CQE_OOO 31 +#define M_CQE_OOO 0x1 +#define G_CQE_OOO(x) ((((x) >> S_CQE_OOO)) & M_CQE_OOO) +#define V_CEQ_OOO(x) ((x)<> S_CQE_QPID)) & M_CQE_QPID) +#define V_CQE_QPID(x) ((x)<> S_CQE_SWCQE)) & M_CQE_SWCQE) +#define V_CQE_SWCQE(x) ((x)<> S_CQE_GENBIT) & M_CQE_GENBIT) +#define V_CQE_GENBIT(x) ((x)<> S_CQE_STATUS)) & M_CQE_STATUS) +#define V_CQE_STATUS(x) ((x)<> S_CQE_TYPE)) & M_CQE_TYPE) +#define V_CQE_TYPE(x) ((x)<> S_CQE_OPCODE)) & M_CQE_OPCODE) +#define V_CQE_OPCODE(x) ((x)<queue->flit[13] = 1; +} + +static inline struct t3_cqe *cxio_next_hw_cqe(struct t3_cq *cq) +{ + struct t3_cqe *cqe; + + cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2)); + if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe)) + return cqe; + return NULL; +} + +static inline struct t3_cqe *cxio_next_sw_cqe(struct t3_cq *cq) +{ + struct t3_cqe *cqe; + + if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) { + cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2)); + return cqe; + } + return NULL; +} + +static inline struct t3_cqe *cxio_next_cqe(struct t3_cq *cq) +{ + struct t3_cqe *cqe; + + if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) { + cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2)); + return cqe; + } + cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2)); + if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe)) + return cqe; + return NULL; +} + +#endif diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 8e519f2..d02cd72 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -50,7 +50,7 @@ #include #include #include -#include +#include "cxio_hal.h" #include "iwch.h" #include "iwch_provider.h" #include "iwch_cm.h" From swise at opengridcomputing.com Sat Feb 10 11:23:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 10 Feb 2007 13:23:43 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <1171035668.26453.11.camel@trinity.ogc.int> References: <1171035668.26453.11.camel@trinity.ogc.int> Message-ID: <1171135423.11017.61.camel@stevo-desktop> Here is a patch that Tom and I think fixes the race condition Roland discovered, plus cleans up the issues Krishna attempted to fix in his first patch. I'm testing it now with a series of rping tests looping with random sizes and counts and it seems to work, but the patch needs more testing and review. Krishna, can you review this carefully and also test it and let us know if you think its good? The patch is against for-2.6.21 from Roland's tree. Roland, can you review this too and verify that it fixes the race condition? Krishna: here are comments to your original patch description: cm_conn_req_handler() : > 1. Calling destroy_cm_id leaks 3 work 'free' list entries. I don't think your original patch fixed all places this memory was leaked. This has been address in the patch below by creating a function free_cm_id() that frees the list entries -and- frees the cm_id memory. It is then called from the 3 places where the cm_id can be freed. > 2. cm_id is freed up wrongly and not cm_id_priv (though the > effect is the same since cm_id is the first element of > cm_id_priv, but still a bug if the top level cm_id > changes). > This is addressed in the patch below since we have a prototyped function for freeing the cm_id_priv. > 3. Reject message has to be sent on failure. Tested this > without the fix and found the client hangs, waited for > about > 20 mins and then did Ctrl-C but the process is unkillable. > The call to iw_cm_reject() is now in destroy_cm_id() and is called based on the cm_id state. So whenever a connection is destroyed, if it needs a rejection sent, it will be sent. > 4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle) > doesn't achieve anything, since checking for > IWCM_F_CALLBACK_DESTROY in the parent's flag (in > cm_work_handler) means that this will never be true. > It does achieve something. I think you are failing to consider the fact that the iWARP provider can have a reference on the cm_id even in the case where the callback function returns an error thus giving destruction ownership to the IWCM. Perhaps the ammasso provider never does this, but cxgb3 can. And we haven't put any restrictions on exactly when the provider _must_ release its reference. If the provider _does_ have a reference at this point in the code, then the cm_id will not be freed, and must be freed when the refcnt finally reaches zero when the provider removes its reference. I wish to clarify this for everyone (and we need this text in Documentation/infiniband/iwcm.txt IMO): This design is based on the RDMA_CM and IB_CM behavior. If the app issues the destroy via rdma_destroy_cm_id, then we block that thread until all references are gone. If the app returns non-zero in a callback for a given cm_id, then the CM owns destroying the cm_id and the application is done with it. That's the short of it. Here's the long of it: There are 2 paths for freeing iw_cm memory. 1) the application issues a rdma_destroy_cm_id() which calls iw_destroy_cm_id(). In this case (and this case only), the thread is blocked until the refcnt reaches 0, then the thread continues and frees the memory. 2) the application returns non zero from a callback function. In this case, the IWCM is responsible to destroy the cm_id. However, the IWCM _cannot_ block in its event handler thread because this can cause a deadlock. A deadlock can occur if the provider has a reference to the cm_id and needs to post some event before removing the reference. If the IWCM were to block awaiting the refcnt to go to zero, it would deadlock with the provider trying to post the last event before derefing the cm_id. So the IWCM_F_CALLBACK_DESTROY bit is used to indicate that the IWCM owns destroying this. If, in cm_work_handler(), the refcnt goes to zero -and- the DESTROY bit is set, then the cm_id can be freed. If the refcnt doesn't go to zero in that function, then either the provider still has a reference, or subsequent queued work items have additional references. In either case, the cm_id is not freed and cm_work_handler() keeps chunking through the events and processing them. Since the cm_id is marked DESTROYING, the events get dropped and the references released on the cm_id. Eventually the cm_id will get freed either in cm_work_handler() -or- in rem_ref() called by the provider it the provider has the last reference. So based on the above design, there are 3 places in the code where the cm_id can be freed: A) in case 1 above the memory will always be freed in iw_destroy_cm_id() after the thread is awakened with a refcnt of zero. B) in case 2 above if the last reference is due to a queued work event for the iwm. In this case the memory if freed in cm_work_handler(). C) in case 2 above if the provider has the last reference, then the cm_id is freed in rem_ref(). I hope this clarifies things. Here's the proposed patch: iw_cm_id destruction race condition fixes. From: Steve Wise Several changes: - iwcm_deref_id() always wakes up if there's another reference. - move iw_cm_reject() into destroy_cm_id() to reduce code replication. - clean up race condition in cm_work_handler(). - create static void free_cm_id() which deallocs the work entries and then kfrees the cm_id memory. This reduces code replication. Signed-off-by: Steve Wise Signed-off-by: Tom Tucker --- drivers/infiniband/core/iwcm.c | 48 +++++++++++++++++++++------------------- 1 files changed, 25 insertions(+), 23 deletions(-) diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c index 1039ad5..403daed 100644 --- a/drivers/infiniband/core/iwcm.c +++ b/drivers/infiniband/core/iwcm.c @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c return 0; } +static void free_cm_id(struct iwcm_id_private *cm_id_priv) +{ + dealloc_work_entries(cm_id_priv); + kfree(cm_id_priv); +} + /* * Release a reference on cm_id. If the last reference is being * released, enable the waiting thread (in iw_destroy_cm_id) to @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c */ static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) { - int ret = 0; - BUG_ON(atomic_read(&cm_id_priv->refcount)==0); if (atomic_dec_and_test(&cm_id_priv->refcount)) { BUG_ON(!list_empty(&cm_id_priv->work_list)); - if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { - BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); - BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, - &cm_id_priv->flags)); - ret = 1; - } complete(&cm_id_priv->destroy_comp); + return 1; } - return ret; + return 0; } static void add_ref(struct iw_cm_id *cm_id) @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_ { struct iwcm_id_private *cm_id_priv; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); - iwcm_deref_id(cm_id_priv); + if (iwcm_deref_id(cm_id_priv) && + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { + BUG_ON(!list_empty(&cm_id_priv->work_list)); + free_cm_id(cm_id_priv); + } } static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event); @@ -355,8 +358,11 @@ static void destroy_cm_id(struct iw_cm_i case IW_CM_STATE_CONN_RECV: /* * App called destroy before/without calling accept after - * receiving connection request event notification. + * receiving connection request event notification or + * returned non zero from the event callback function. + * In either case, must tell the provider to reject. */ + iw_cm_reject(cm_id, NULL, 0); cm_id_priv->state = IW_CM_STATE_DESTROYING; break; case IW_CM_STATE_CONN_SENT: @@ -391,9 +397,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c wait_for_completion(&cm_id_priv->destroy_comp); - dealloc_work_entries(cm_id_priv); - - kfree(cm_id_priv); + free_cm_id(cm_id_priv); } EXPORT_SYMBOL(iw_destroy_cm_id); @@ -639,7 +643,6 @@ static void cm_conn_req_handler(struct i ret = alloc_work_entries(cm_id_priv, 3); if (ret) { - iw_cm_reject(cm_id, NULL, 0); iw_destroy_cm_id(cm_id); goto out; } @@ -650,7 +653,7 @@ static void cm_conn_req_handler(struct i set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); destroy_cm_id(cm_id); if (atomic_read(&cm_id_priv->refcount)==0) - kfree(cm_id); + free_cm_id(cm_id_priv); } out: @@ -854,13 +857,12 @@ static void cm_work_handler(struct work_ destroy_cm_id(&cm_id_priv->id); } BUG_ON(atomic_read(&cm_id_priv->refcount)==0); - if (iwcm_deref_id(cm_id_priv)) - return; - - if (atomic_read(&cm_id_priv->refcount)==0 && - test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { - dealloc_work_entries(cm_id_priv); - kfree(cm_id_priv); + if (iwcm_deref_id(cm_id_priv)) { + if (test_bit(IWCM_F_CALLBACK_DESTROY, + &cm_id_priv->flags)) { + BUG_ON(!list_empty(&cm_id_priv->work_list)); + free_cm_id(cm_id_priv); + } return; } spin_lock_irqsave(&cm_id_priv->lock, flags); From rowland at cse.ohio-state.edu Sat Feb 10 11:25:16 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Sat, 10 Feb 2007 14:25:16 -0500 Subject: [openib-general] MVAPICH2 SRPM update and install files patch Message-ID: <45CE1C1C.70406@cse.ohio-state.edu> I updated the latest MVAPICH2 SRPM: https://www.openfabrics.org/~rowland/ofed_1_2/ I am including a patch to the latest ofed_1_2_scripts git files. Since these files are the same as those used in the OFED-1.2-20070208-1508.tgz package, this patch can also be applied there. This patch is required to use the new MVAPICH2 SRPM file and should not be used with the older versions. I've done the following: - Updated some of the dependencies when mvapich2 is selected. - Added new mvapich2 configuration prompts if mvapich2 is selected. This is all contained within the mvapich2_config shell function. These values are stored in the configuration file, etc. and prefixed with MVAPICH2_CONF_. There are two implementation choices for the MVAPICH2 build: OFA and uDAPL. The OFA build should allow IB, IB + RDMA-CM, and iWARP to be used. The mode is controlled by the following runtime environment variables: IB -- No additional environment variable required (default case). IB + RDMA-CM ------------ MV2_USE_RDMA_CM=1 iWARP ----- MV2_ENABLE_IWARP_MODE=1 -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ofed_1_2_scripts.patch URL: From swise at opengridcomputing.com Sat Feb 10 12:36:03 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 10 Feb 2007 14:36:03 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <1171135423.11017.61.camel@stevo-desktop> References: <1171035668.26453.11.camel@trinity.ogc.int> <1171135423.11017.61.camel@stevo-desktop> Message-ID: <1171139763.11017.68.camel@stevo-desktop> ugh. There is at least one bug in this patch. I cannot call iw_cm_reject() inside destroy_cm_id() because both functions grab the iw_cm lock... On Sat, 2007-02-10 at 13:23 -0600, Steve Wise wrote: > Here is a patch that Tom and I think fixes the race condition Roland > discovered, plus cleans up the issues Krishna attempted to fix in his > first patch. I'm testing it now with a series of rping tests looping > with random sizes and counts and it seems to work, but the patch needs > more testing and review. > > Krishna, can you review this carefully and also test it and let us know > if you think its good? The patch is against for-2.6.21 from Roland's > tree. > > Roland, can you review this too and verify that it fixes the race > condition? > > > Krishna: here are comments to your original patch description: > > > cm_conn_req_handler() : > > 1. Calling destroy_cm_id leaks 3 work 'free' list entries. > > I don't think your original patch fixed all places this memory was > leaked. > > This has been address in the patch below by creating a function > free_cm_id() that frees the list entries -and- frees the cm_id memory. > It is then called from the 3 places where the cm_id can be freed. > > > 2. cm_id is freed up wrongly and not cm_id_priv (though the > > effect is the same since cm_id is the first element of > > cm_id_priv, but still a bug if the top level cm_id > > changes). > > > > This is addressed in the patch below since we have a prototyped function > for freeing the cm_id_priv. > > > 3. Reject message has to be sent on failure. Tested this > > without the fix and found the client hangs, waited for > > about > > 20 mins and then did Ctrl-C but the process is unkillable. > > > > The call to iw_cm_reject() is now in destroy_cm_id() and is called based > on the cm_id state. So whenever a connection is destroyed, if it needs > a rejection sent, it will be sent. > > > 4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle) > > doesn't achieve anything, since checking for > > IWCM_F_CALLBACK_DESTROY in the parent's flag (in > > cm_work_handler) means that this will never be true. > > > > It does achieve something. I think you are failing to consider the fact > that the iWARP provider can have a reference on the cm_id even in the > case where the callback function returns an error thus giving > destruction ownership to the IWCM. Perhaps the ammasso provider never > does this, but cxgb3 can. And we haven't put any restrictions on > exactly when the provider _must_ release its reference. If the provider > _does_ have a reference at this point in the code, then the cm_id will > not be freed, and must be freed when the refcnt finally reaches zero > when the provider removes its reference. > > I wish to clarify this for everyone (and we need this text in > Documentation/infiniband/iwcm.txt IMO): > > This design is based on the RDMA_CM and IB_CM behavior. If the app > issues the destroy via rdma_destroy_cm_id, then we block that thread > until all references are gone. If the app returns non-zero in a > callback for a given cm_id, then the CM owns destroying the cm_id and > the application is done with it. That's the short of it. Here's the > long of it: > > There are 2 paths for freeing iw_cm memory. > > 1) the application issues a rdma_destroy_cm_id() which calls > iw_destroy_cm_id(). In this case (and this case only), the thread is > blocked until the refcnt reaches 0, then the thread continues and frees > the memory. > > 2) the application returns non zero from a callback function. In this > case, the IWCM is responsible to destroy the cm_id. However, the IWCM > _cannot_ block in its event handler thread because this can cause a > deadlock. A deadlock can occur if the provider has a reference to the > cm_id and needs to post some event before removing the reference. If > the IWCM were to block awaiting the refcnt to go to zero, it would > deadlock with the provider trying to post the last event before derefing > the cm_id. So the IWCM_F_CALLBACK_DESTROY bit is used to indicate that > the IWCM owns destroying this. If, in cm_work_handler(), the refcnt > goes to zero -and- the DESTROY bit is set, then the cm_id can be freed. > If the refcnt doesn't go to zero in that function, then either the > provider still has a reference, or subsequent queued work items have > additional references. In either case, the cm_id is not freed and > cm_work_handler() keeps chunking through the events and processing them. > Since the cm_id is marked DESTROYING, the events get dropped and the > references released on the cm_id. Eventually the cm_id will get freed > either in cm_work_handler() -or- in rem_ref() called by the provider it > the provider has the last reference. > > So based on the above design, there are 3 places in the code where the > cm_id can be freed: > > A) in case 1 above the memory will always be freed in iw_destroy_cm_id() > after the thread is awakened with a refcnt of zero. > > B) in case 2 above if the last reference is due to a queued work event > for the iwm. In this case the memory if freed in cm_work_handler(). > > C) in case 2 above if the provider has the last reference, then the > cm_id is freed in rem_ref(). > > > I hope this clarifies things. > > Here's the proposed patch: > > iw_cm_id destruction race condition fixes. > > From: Steve Wise > > Several changes: > > - iwcm_deref_id() always wakes up if there's another reference. > > - move iw_cm_reject() into destroy_cm_id() to reduce code replication. > > - clean up race condition in cm_work_handler(). > > - create static void free_cm_id() which deallocs the work entries and then > kfrees the cm_id memory. This reduces code replication. > > Signed-off-by: Steve Wise > Signed-off-by: Tom Tucker > --- > > drivers/infiniband/core/iwcm.c | 48 +++++++++++++++++++++------------------- > 1 files changed, 25 insertions(+), 23 deletions(-) > > diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c > index 1039ad5..403daed 100644 > --- a/drivers/infiniband/core/iwcm.c > +++ b/drivers/infiniband/core/iwcm.c > @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c > return 0; > } > > +static void free_cm_id(struct iwcm_id_private *cm_id_priv) > +{ > + dealloc_work_entries(cm_id_priv); > + kfree(cm_id_priv); > +} > + > /* > * Release a reference on cm_id. If the last reference is being > * released, enable the waiting thread (in iw_destroy_cm_id) to > @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c > */ > static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) > { > - int ret = 0; > - > BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > if (atomic_dec_and_test(&cm_id_priv->refcount)) { > BUG_ON(!list_empty(&cm_id_priv->work_list)); > - if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { > - BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); > - BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, > - &cm_id_priv->flags)); > - ret = 1; > - } > complete(&cm_id_priv->destroy_comp); > + return 1; > } > > - return ret; > + return 0; > } > > static void add_ref(struct iw_cm_id *cm_id) > @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_ > { > struct iwcm_id_private *cm_id_priv; > cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > - iwcm_deref_id(cm_id_priv); > + if (iwcm_deref_id(cm_id_priv) && > + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { > + BUG_ON(!list_empty(&cm_id_priv->work_list)); > + free_cm_id(cm_id_priv); > + } > } > > static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event); > @@ -355,8 +358,11 @@ static void destroy_cm_id(struct iw_cm_i > case IW_CM_STATE_CONN_RECV: > /* > * App called destroy before/without calling accept after > - * receiving connection request event notification. > + * receiving connection request event notification or > + * returned non zero from the event callback function. > + * In either case, must tell the provider to reject. > */ > + iw_cm_reject(cm_id, NULL, 0); > cm_id_priv->state = IW_CM_STATE_DESTROYING; > break; > case IW_CM_STATE_CONN_SENT: > @@ -391,9 +397,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c > > wait_for_completion(&cm_id_priv->destroy_comp); > > - dealloc_work_entries(cm_id_priv); > - > - kfree(cm_id_priv); > + free_cm_id(cm_id_priv); > } > EXPORT_SYMBOL(iw_destroy_cm_id); > > @@ -639,7 +643,6 @@ static void cm_conn_req_handler(struct i > > ret = alloc_work_entries(cm_id_priv, 3); > if (ret) { > - iw_cm_reject(cm_id, NULL, 0); > iw_destroy_cm_id(cm_id); > goto out; > } > @@ -650,7 +653,7 @@ static void cm_conn_req_handler(struct i > set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); > destroy_cm_id(cm_id); > if (atomic_read(&cm_id_priv->refcount)==0) > - kfree(cm_id); > + free_cm_id(cm_id_priv); > } > > out: > @@ -854,13 +857,12 @@ static void cm_work_handler(struct work_ > destroy_cm_id(&cm_id_priv->id); > } > BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > - if (iwcm_deref_id(cm_id_priv)) > - return; > - > - if (atomic_read(&cm_id_priv->refcount)==0 && > - test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { > - dealloc_work_entries(cm_id_priv); > - kfree(cm_id_priv); > + if (iwcm_deref_id(cm_id_priv)) { > + if (test_bit(IWCM_F_CALLBACK_DESTROY, > + &cm_id_priv->flags)) { > + BUG_ON(!list_empty(&cm_id_priv->work_list)); > + free_cm_id(cm_id_priv); > + } > return; > } > spin_lock_irqsave(&cm_id_priv->lock, flags); > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Sat Feb 10 13:10:21 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Feb 2007 23:10:21 +0200 Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes In-Reply-To: References: Message-ID: <20070210211021.GA14903@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes > > Michael> What about the mthca memory registration patches? I > Michael> thought they are on their way. Should I repost? > > Sorry, I forgot about that. Yes, please resend the latest state. OK, coming up. -- MST From mst at mellanox.co.il Sat Feb 10 13:13:12 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Feb 2007 23:13:12 +0200 Subject: [openib-general] [PATCH 1 of 4] IB/mthca: merge MR and FMR space on 64 bit Message-ID: <20070210211312.GB14903@mellanox.co.il> For Tavor, we currently reserve separate MPT and MTT space for FMRs to avoid abusing the vmalloc space on 32 bit kernels. No such problem exists on 64 bit kernels so let's not do it there. This way we have a shared pool for MR and FMR resources, used on demand. This will also make it possible to write MTTs for regular regions directly from driver. Signed-off-by: Michael S. Tsirkin --- Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_mr.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c @@ -761,7 +761,7 @@ void mthca_arbel_fmr_unmap(struct mthca_ int mthca_init_mr_table(struct mthca_dev *dev) { unsigned long addr; - int err, i; + int mpts, mtts, err, i; err = mthca_alloc_init(&dev->mr_table.mpt_alloc, dev->limits.num_mpts, @@ -795,13 +795,21 @@ int mthca_init_mr_table(struct mthca_dev err = -EINVAL; goto err_fmr_mpt; } + mpts = mtts = 1 << i; + } else { + mpts = dev->limits.num_mtt_segs; + mtts = dev->limits.num_mpts; + } + + if (!mthca_is_memfree(dev) && + (dev->mthca_flags & MTHCA_FLAG_FMR)) { addr = pci_resource_start(dev->pdev, 4) + ((pci_resource_len(dev->pdev, 4) - 1) & dev->mr_table.mpt_base); dev->mr_table.tavor_fmr.mpt_base = - ioremap(addr, (1 << i) * sizeof(struct mthca_mpt_entry)); + ioremap(addr, mpts * sizeof(struct mthca_mpt_entry)); if (!dev->mr_table.tavor_fmr.mpt_base) { mthca_warn(dev, "MPT ioremap for FMR failed.\n"); @@ -814,19 +822,21 @@ int mthca_init_mr_table(struct mthca_dev dev->mr_table.mtt_base); dev->mr_table.tavor_fmr.mtt_base = - ioremap(addr, (1 << i) * MTHCA_MTT_SEG_SIZE); + ioremap(addr, mtts * MTHCA_MTT_SEG_SIZE); if (!dev->mr_table.tavor_fmr.mtt_base) { mthca_warn(dev, "MTT ioremap for FMR failed.\n"); err = -ENOMEM; goto err_fmr_mtt; } + } - err = mthca_buddy_init(&dev->mr_table.tavor_fmr.mtt_buddy, i); + if (dev->limits.fmr_reserved_mtts) { + err = mthca_buddy_init(&dev->mr_table.tavor_fmr.mtt_buddy, fls(mtts - 1)); if (err) goto err_fmr_mtt_buddy; /* Prevent regular MRs from using FMR keys */ - err = mthca_buddy_alloc(&dev->mr_table.mtt_buddy, i); + err = mthca_buddy_alloc(&dev->mr_table.mtt_buddy, fls(mtts - 1)); if (err) goto err_reserve_fmr; Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_profile.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_profile.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_profile.c @@ -277,7 +277,7 @@ u64 mthca_make_profile(struct mthca_dev * out of the MR pool. They don't use additional memory, but * we assign them as part of the HCA profile anyway. */ - if (mthca_is_memfree(dev)) + if (mthca_is_memfree(dev) || BITS_PER_LONG == 64) dev->limits.fmr_reserved_mtts = 0; else dev->limits.fmr_reserved_mtts = request->fmr_reserved_mtts; -- MST From mst at mellanox.co.il Sat Feb 10 13:14:25 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Feb 2007 23:14:25 +0200 Subject: [openib-general] [PATCH 2 of 4] IB/mthca: always fill MTTs from CPU Message-ID: <20070210211425.GC14903@mellanox.co.il> Speed up memory registration by filling in MTTs directly. This reduces the number of FW commands needed to register an MR by at least a factor of 2. This applies to all memfree cards, and to tavor mode on 64 bit systems with the patch I posted earlier. Signed-off-by: Michael S. Tsirkin --- Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_dev.h =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_dev.h +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_dev.h @@ -464,6 +464,8 @@ void mthca_uar_free(struct mthca_dev *de int mthca_pd_alloc(struct mthca_dev *dev, int privileged, struct mthca_pd *pd); void mthca_pd_free(struct mthca_dev *dev, struct mthca_pd *pd); +int mthca_write_mtt_size(struct mthca_dev *dev); + struct mthca_mtt *mthca_alloc_mtt(struct mthca_dev *dev, int size); void mthca_free_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt); int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt, Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_mr.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c @@ -243,8 +243,8 @@ void mthca_free_mtt(struct mthca_dev *de kfree(mtt); } -int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt, - int start_index, u64 *buffer_list, int list_len) +static int __mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt, + int start_index, u64 *buffer_list, int list_len) { struct mthca_mailbox *mailbox; __be64 *mtt_entry; @@ -295,6 +295,84 @@ out: return err; } +void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt, + int start_index, u64 *buffer_list, int list_len) +{ + u64 __iomem *mtts; + u32 mtt_seg; + int i; + + mtt_seg = mtt->first_seg * MTHCA_MTT_SEG_SIZE; + mtts = dev->mr_table.tavor_fmr.mtt_base + mtt_seg + start_index * sizeof (u64); + for (i = 0; i < list_len; ++i) { + __be64 mtt_entry = cpu_to_be64(buffer_list[i] | + MTHCA_MTT_FLAG_PRESENT); + mthca_write64_raw(mtt_entry, mtts + i); + } +} + +void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt, + int start_index, u64 *buffer_list, int list_len) +{ + __be64 *mtts; + int i; + int s = start_index * sizeof (u64); + + /* For Arbel, all MTTs must fit in the same page. */ + BUG_ON(s / PAGE_SIZE != (s + list_len * sizeof(u64) - 1) / PAGE_SIZE); + /* Require full segments */ + BUG_ON(s % MTHCA_MTT_SEG_SIZE); + + mtts = mthca_table_find(dev->mr_table.mtt_table, mtt->first_seg + + s / MTHCA_MTT_SEG_SIZE); + + BUG_ON(!mtts); + + for (i = 0; i < list_len; ++i) + mtts[i] = cpu_to_be64(buffer_list[i] | MTHCA_MTT_FLAG_PRESENT); +} + +int mthca_write_mtt_size(struct mthca_dev *dev) +{ + if (dev->mr_table.fmr_mtt_buddy != &dev->mr_table.mtt_buddy) + /* + * Be friendly to WRITE_MTT command + * and leave two empty slots for the + * index and reserved fields of the + * mailbox. + */ + return PAGE_SIZE / sizeof (u64) - 2; + + /* For Arbel, all MTTs must fit in the same page. */ + return mthca_is_memfree(dev) ? (PAGE_SIZE / sizeof (u64)) : 0x7ffffff; +} + +int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt, + int start_index, u64 *buffer_list, int list_len) +{ + int size = mthca_write_mtt_size(dev); + int chunk; + + if (dev->mr_table.fmr_mtt_buddy != &dev->mr_table.mtt_buddy) + return __mthca_write_mtt(dev, mtt, start_index, buffer_list, list_len); + + while (list_len > 0) { + chunk = min(size, list_len); + if (mthca_is_memfree(dev)) + mthca_arbel_write_mtt_seg(dev, mtt, start_index, + buffer_list, chunk); + else + mthca_tavor_write_mtt_seg(dev, mtt, start_index, + buffer_list, chunk); + + list_len -= chunk; + start_index += chunk; + buffer_list += chunk; + } + + return 0; +} + static inline u32 tavor_hw_index_to_key(u32 ind) { return ind; Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_provider.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.c @@ -1015,6 +1015,7 @@ static struct ib_mr *mthca_reg_user_mr(s int shift, n, len; int i, j, k; int err = 0; + int write_mtt_size; shift = ffs(region->page_size) - 1; @@ -1040,6 +1041,8 @@ static struct ib_mr *mthca_reg_user_mr(s i = n = 0; + write_mtt_size = min(mthca_write_mtt_size(dev), (int)(PAGE_SIZE / sizeof *pages)); + list_for_each_entry(chunk, ®ion->chunk_list, list) for (j = 0; j < chunk->nmap; ++j) { len = sg_dma_len(&chunk->page_list[j]) >> shift; @@ -1047,14 +1050,11 @@ static struct ib_mr *mthca_reg_user_mr(s pages[i++] = sg_dma_address(&chunk->page_list[j]) + region->page_size * k; /* - * Be friendly to WRITE_MTT command - * and leave two empty slots for the - * index and reserved fields of the - * mailbox. + * Be friendly to write_mtt and pass it chunks + * of appropriate size. */ - if (i == PAGE_SIZE / sizeof (u64) - 2) { - err = mthca_write_mtt(dev, mr->mtt, - n, pages, i); + if (i == write_mtt_size) { + err = mthca_write_mtt(dev, mr->mtt, n, pages, i); if (err) goto mtt_done; n += i; -- MST From mst at mellanox.co.il Sat Feb 10 13:15:08 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Feb 2007 23:15:08 +0200 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree Message-ID: <20070210211508.GD14903@mellanox.co.il> Fix non-cache-coherent CPUs with memfree HCAs. We allocate the MTT table with alloc_pages() and then do pci_map_sg(), so we must call pci_dma_sync_sg after the CPU writes to the MTT table (this works since device never writes the MTTs on memfree). For MPTs, both the device and CPU might write there, so we must allocate dma coherent memory for these. Signed-off-by: Michael S. Tsirkin --- Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_memfree.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -35,6 +35,8 @@ */ #include +#include +#include #include "mthca_memfree.h" #include "mthca_dev.h" @@ -58,22 +60,31 @@ struct mthca_user_db_table { } page[0]; }; -void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm) +void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm, int coherent) { struct mthca_icm_chunk *chunk, *tmp; + void *buf; int i; if (!icm) return; list_for_each_entry_safe(chunk, tmp, &icm->chunk_list, list) { - if (chunk->nsg > 0) - pci_unmap_sg(dev->pdev, chunk->mem, chunk->npages, - PCI_DMA_BIDIRECTIONAL); - - for (i = 0; i < chunk->npages; ++i) - __free_pages(chunk->mem[i].page, - get_order(chunk->mem[i].length)); + if (coherent) + for (i = 0; i < chunk->npages; ++i) { + buf = lowmem_page_address(chunk->mem[i].page); + dma_free_coherent(&dev->pdev->dev, chunk->mem[i].length, + buf, sg_dma_address(&chunk->mem[i])); + } + else { + if (chunk->nsg > 0) + pci_unmap_sg(dev->pdev, chunk->mem, chunk->npages, + PCI_DMA_BIDIRECTIONAL); + + for (i = 0; i < chunk->npages; ++i) + __free_pages(chunk->mem[i].page, + get_order(chunk->mem[i].length)); + } kfree(chunk); } @@ -81,12 +92,41 @@ void mthca_free_icm(struct mthca_dev *de kfree(icm); } +static int mthca_alloc_icm_pages(struct scatterlist *mem, int order, gfp_t gfp_mask) +{ + mem->page = alloc_pages(gfp_mask, order); + if (!mem->page) + return -ENOMEM; + + mem->length = PAGE_SIZE << order; + mem->offset = 0; + return 0; +} + +static int mthca_alloc_icm_coherent(struct device *dev, struct scatterlist *mem, + int order, gfp_t gfp_mask) +{ + void *buf = dma_alloc_coherent(dev, PAGE_SIZE << order, &sg_dma_address(mem), + gfp_mask); + if (!buf) + return -ENOMEM; + + sg_set_buf(mem, buf, PAGE_SIZE << order); + BUG_ON(mem->offset); + sg_dma_len(mem) = PAGE_SIZE << order; + return 0; +} + struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages, - gfp_t gfp_mask) + gfp_t gfp_mask, int coherent) { struct mthca_icm *icm; struct mthca_icm_chunk *chunk = NULL; int cur_order; + int ret; + + /* We use sg_set_buf for coherent allocs, which assumes low memory */ + BUG_ON(coherent && (gfp_mask & __GFP_HIGHMEM)); icm = kmalloc(sizeof *icm, gfp_mask & ~(__GFP_HIGHMEM | __GFP_NOWARN)); if (!icm) @@ -112,21 +152,28 @@ struct mthca_icm *mthca_alloc_icm(struct while (1 << cur_order > npages) --cur_order; - chunk->mem[chunk->npages].page = alloc_pages(gfp_mask, cur_order); - if (chunk->mem[chunk->npages].page) { - chunk->mem[chunk->npages].length = PAGE_SIZE << cur_order; - chunk->mem[chunk->npages].offset = 0; + if (coherent) + ret = mthca_alloc_icm_coherent(&dev->pdev->dev, + &chunk->mem[chunk->npages], + cur_order, gfp_mask); + else + ret = mthca_alloc_icm_pages(&chunk->mem[chunk->npages], + cur_order, gfp_mask); - if (++chunk->npages == MTHCA_ICM_CHUNK_LEN) { + if (!ret) { + ++chunk->npages; + + if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) { chunk->nsg = pci_map_sg(dev->pdev, chunk->mem, chunk->npages, PCI_DMA_BIDIRECTIONAL); if (chunk->nsg <= 0) goto fail; + } + if (chunk->npages == MTHCA_ICM_CHUNK_LEN) chunk = NULL; - } npages -= 1 << cur_order; } else { @@ -136,7 +183,7 @@ struct mthca_icm *mthca_alloc_icm(struct } } - if (chunk) { + if (!coherent && chunk) { chunk->nsg = pci_map_sg(dev->pdev, chunk->mem, chunk->npages, PCI_DMA_BIDIRECTIONAL); @@ -148,7 +195,7 @@ struct mthca_icm *mthca_alloc_icm(struct return icm; fail: - mthca_free_icm(dev, icm); + mthca_free_icm(dev, icm, coherent); return NULL; } @@ -167,7 +214,7 @@ int mthca_table_get(struct mthca_dev *de table->icm[i] = mthca_alloc_icm(dev, MTHCA_TABLE_CHUNK_SIZE >> PAGE_SHIFT, (table->lowmem ? GFP_KERNEL : GFP_HIGHUSER) | - __GFP_NOWARN); + __GFP_NOWARN, table->coherent); if (!table->icm[i]) { ret = -ENOMEM; goto out; @@ -175,7 +222,7 @@ int mthca_table_get(struct mthca_dev *de if (mthca_MAP_ICM(dev, table->icm[i], table->virt + i * MTHCA_TABLE_CHUNK_SIZE, &status) || status) { - mthca_free_icm(dev, table->icm[i]); + mthca_free_icm(dev, table->icm[i], table->coherent); table->icm[i] = NULL; ret = -ENOMEM; goto out; @@ -204,16 +251,16 @@ void mthca_table_put(struct mthca_dev *d mthca_UNMAP_ICM(dev, table->virt + i * MTHCA_TABLE_CHUNK_SIZE, MTHCA_TABLE_CHUNK_SIZE / MTHCA_ICM_PAGE_SIZE, &status); - mthca_free_icm(dev, table->icm[i]); + mthca_free_icm(dev, table->icm[i], table->coherent); table->icm[i] = NULL; } mutex_unlock(&table->mutex); } -void *mthca_table_find(struct mthca_icm_table *table, int obj) +void *mthca_table_find(struct mthca_icm_table *table, int obj, dma_addr_t *dma_handle) { - int idx, offset, i; + int idx, offset, dma_offset, i; struct mthca_icm_chunk *chunk; struct mthca_icm *icm; struct page *page = NULL; @@ -225,13 +272,22 @@ void *mthca_table_find(struct mthca_icm_ idx = (obj & (table->num_obj - 1)) * table->obj_size; icm = table->icm[idx / MTHCA_TABLE_CHUNK_SIZE]; - offset = idx % MTHCA_TABLE_CHUNK_SIZE; + dma_offset = offset = idx % MTHCA_TABLE_CHUNK_SIZE; if (!icm) goto out; list_for_each_entry(chunk, &icm->chunk_list, list) { for (i = 0; i < chunk->npages; ++i) { + if (dma_handle && dma_offset >= 0) { + if (sg_dma_len(&chunk->mem[i]) > dma_offset) + *dma_handle = sg_dma_address(&chunk->mem[i]) + + dma_offset; + dma_offset -= sg_dma_len(&chunk->mem[i]); + } + /* DMA mapping can merge pages but not split them, + * so if we found the page, dma_handle has already + * been assigned to. */ if (chunk->mem[i].length > offset) { page = chunk->mem[i].page; goto out; @@ -283,7 +339,7 @@ void mthca_table_put_range(struct mthca_ struct mthca_icm_table *mthca_alloc_icm_table(struct mthca_dev *dev, u64 virt, int obj_size, int nobj, int reserved, - int use_lowmem) + int use_lowmem, int use_coherent) { struct mthca_icm_table *table; int num_icm; @@ -302,6 +358,7 @@ struct mthca_icm_table *mthca_alloc_icm_ table->num_obj = nobj; table->obj_size = obj_size; table->lowmem = use_lowmem; + table->coherent = use_coherent; mutex_init(&table->mutex); for (i = 0; i < num_icm; ++i) @@ -314,12 +371,12 @@ struct mthca_icm_table *mthca_alloc_icm_ table->icm[i] = mthca_alloc_icm(dev, chunk_size >> PAGE_SHIFT, (use_lowmem ? GFP_KERNEL : GFP_HIGHUSER) | - __GFP_NOWARN); + __GFP_NOWARN, use_coherent); if (!table->icm[i]) goto err; if (mthca_MAP_ICM(dev, table->icm[i], virt + i * MTHCA_TABLE_CHUNK_SIZE, &status) || status) { - mthca_free_icm(dev, table->icm[i]); + mthca_free_icm(dev, table->icm[i], table->coherent); table->icm[i] = NULL; goto err; } @@ -339,7 +396,7 @@ err: mthca_UNMAP_ICM(dev, virt + i * MTHCA_TABLE_CHUNK_SIZE, MTHCA_TABLE_CHUNK_SIZE / MTHCA_ICM_PAGE_SIZE, &status); - mthca_free_icm(dev, table->icm[i]); + mthca_free_icm(dev, table->icm[i], table->coherent); } kfree(table); @@ -357,7 +414,7 @@ void mthca_free_icm_table(struct mthca_d mthca_UNMAP_ICM(dev, table->virt + i * MTHCA_TABLE_CHUNK_SIZE, MTHCA_TABLE_CHUNK_SIZE / MTHCA_ICM_PAGE_SIZE, &status); - mthca_free_icm(dev, table->icm[i]); + mthca_free_icm(dev, table->icm[i], table->coherent); } kfree(table); Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_main.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c @@ -379,7 +379,7 @@ static int mthca_load_fw(struct mthca_de mdev->fw.arbel.fw_icm = mthca_alloc_icm(mdev, mdev->fw.arbel.fw_pages, - GFP_HIGHUSER | __GFP_NOWARN); + GFP_HIGHUSER | __GFP_NOWARN, 0); if (!mdev->fw.arbel.fw_icm) { mthca_err(mdev, "Couldn't allocate FW area, aborting.\n"); return -ENOMEM; @@ -412,7 +412,7 @@ err_unmap_fa: mthca_UNMAP_FA(mdev, &status); err_free: - mthca_free_icm(mdev, mdev->fw.arbel.fw_icm); + mthca_free_icm(mdev, mdev->fw.arbel.fw_icm, 0); return err; } @@ -441,7 +441,7 @@ static int mthca_init_icm(struct mthca_d (unsigned long long) aux_pages << 2); mdev->fw.arbel.aux_icm = mthca_alloc_icm(mdev, aux_pages, - GFP_HIGHUSER | __GFP_NOWARN); + GFP_HIGHUSER | __GFP_NOWARN, 0); if (!mdev->fw.arbel.aux_icm) { mthca_err(mdev, "Couldn't allocate aux memory, aborting.\n"); return -ENOMEM; @@ -467,7 +467,8 @@ static int mthca_init_icm(struct mthca_d mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, init_hca->mtt_base, MTHCA_MTT_SEG_SIZE, mdev->limits.num_mtt_segs, - mdev->limits.reserved_mtts, 1); + mdev->limits.reserved_mtts, + 1, 0); if (!mdev->mr_table.mtt_table) { mthca_err(mdev, "Failed to map MTT context memory, aborting.\n"); err = -ENOMEM; @@ -477,7 +478,8 @@ static int mthca_init_icm(struct mthca_d mdev->mr_table.mpt_table = mthca_alloc_icm_table(mdev, init_hca->mpt_base, dev_lim->mpt_entry_sz, mdev->limits.num_mpts, - mdev->limits.reserved_mrws, 1); + mdev->limits.reserved_mrws, + 1, 1); if (!mdev->mr_table.mpt_table) { mthca_err(mdev, "Failed to map MPT context memory, aborting.\n"); err = -ENOMEM; @@ -487,7 +489,8 @@ static int mthca_init_icm(struct mthca_d mdev->qp_table.qp_table = mthca_alloc_icm_table(mdev, init_hca->qpc_base, dev_lim->qpc_entry_sz, mdev->limits.num_qps, - mdev->limits.reserved_qps, 0); + mdev->limits.reserved_qps, + 0, 0); if (!mdev->qp_table.qp_table) { mthca_err(mdev, "Failed to map QP context memory, aborting.\n"); err = -ENOMEM; @@ -497,7 +500,8 @@ static int mthca_init_icm(struct mthca_d mdev->qp_table.eqp_table = mthca_alloc_icm_table(mdev, init_hca->eqpc_base, dev_lim->eqpc_entry_sz, mdev->limits.num_qps, - mdev->limits.reserved_qps, 0); + mdev->limits.reserved_qps, + 0, 0); if (!mdev->qp_table.eqp_table) { mthca_err(mdev, "Failed to map EQP context memory, aborting.\n"); err = -ENOMEM; @@ -507,7 +511,7 @@ static int mthca_init_icm(struct mthca_d mdev->qp_table.rdb_table = mthca_alloc_icm_table(mdev, init_hca->rdb_base, MTHCA_RDB_ENTRY_SIZE, mdev->limits.num_qps << - mdev->qp_table.rdb_shift, + mdev->qp_table.rdb_shift, 0, 0, 0); if (!mdev->qp_table.rdb_table) { mthca_err(mdev, "Failed to map RDB context memory, aborting\n"); @@ -518,7 +522,8 @@ static int mthca_init_icm(struct mthca_d mdev->cq_table.table = mthca_alloc_icm_table(mdev, init_hca->cqc_base, dev_lim->cqc_entry_sz, mdev->limits.num_cqs, - mdev->limits.reserved_cqs, 0); + mdev->limits.reserved_cqs, + 0, 0); if (!mdev->cq_table.table) { mthca_err(mdev, "Failed to map CQ context memory, aborting.\n"); err = -ENOMEM; @@ -530,7 +535,8 @@ static int mthca_init_icm(struct mthca_d mthca_alloc_icm_table(mdev, init_hca->srqc_base, dev_lim->srq_entry_sz, mdev->limits.num_srqs, - mdev->limits.reserved_srqs, 0); + mdev->limits.reserved_srqs, + 0, 0); if (!mdev->srq_table.table) { mthca_err(mdev, "Failed to map SRQ context memory, " "aborting.\n"); @@ -550,7 +556,7 @@ static int mthca_init_icm(struct mthca_d mdev->limits.num_amgms, mdev->limits.num_mgms + mdev->limits.num_amgms, - 0); + 0, 0); if (!mdev->mcg_table.table) { mthca_err(mdev, "Failed to map MCG context memory, aborting.\n"); err = -ENOMEM; @@ -588,7 +594,7 @@ err_unmap_aux: mthca_UNMAP_ICM_AUX(mdev, &status); err_free_aux: - mthca_free_icm(mdev, mdev->fw.arbel.aux_icm); + mthca_free_icm(mdev, mdev->fw.arbel.aux_icm, 0); return err; } @@ -609,7 +615,7 @@ static void mthca_free_icms(struct mthca mthca_unmap_eq_icm(mdev); mthca_UNMAP_ICM_AUX(mdev, &status); - mthca_free_icm(mdev, mdev->fw.arbel.aux_icm); + mthca_free_icm(mdev, mdev->fw.arbel.aux_icm, 0); } static int mthca_init_arbel(struct mthca_dev *mdev) @@ -693,7 +699,7 @@ err_free_icm: err_stop_fw: mthca_UNMAP_FA(mdev, &status); - mthca_free_icm(mdev, mdev->fw.arbel.fw_icm); + mthca_free_icm(mdev, mdev->fw.arbel.fw_icm, 0); err_disable: if (!(mdev->mthca_flags & MTHCA_FLAG_NO_LAM)) @@ -712,7 +718,7 @@ static void mthca_close_hca(struct mthca mthca_free_icms(mdev); mthca_UNMAP_FA(mdev, &status); - mthca_free_icm(mdev, mdev->fw.arbel.fw_icm); + mthca_free_icm(mdev, mdev->fw.arbel.fw_icm, 0); if (!(mdev->mthca_flags & MTHCA_FLAG_NO_LAM)) mthca_DISABLE_LAM(mdev, &status); Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.h =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_memfree.h +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.h @@ -69,6 +69,7 @@ struct mthca_icm_table { int num_obj; int obj_size; int lowmem; + int coherent; struct mutex mutex; struct mthca_icm *icm[0]; }; @@ -82,17 +83,17 @@ struct mthca_icm_iter { struct mthca_dev; struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages, - gfp_t gfp_mask); -void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm); + gfp_t gfp_mask, int coherent); +void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm, int coherent); struct mthca_icm_table *mthca_alloc_icm_table(struct mthca_dev *dev, u64 virt, int obj_size, int nobj, int reserved, - int use_lowmem); + int use_lowmem, int use_coherent); void mthca_free_icm_table(struct mthca_dev *dev, struct mthca_icm_table *table); int mthca_table_get(struct mthca_dev *dev, struct mthca_icm_table *table, int obj); void mthca_table_put(struct mthca_dev *dev, struct mthca_icm_table *table, int obj); -void *mthca_table_find(struct mthca_icm_table *table, int obj); +void *mthca_table_find(struct mthca_icm_table *table, int obj, dma_addr_t *dma_handle); int mthca_table_get_range(struct mthca_dev *dev, struct mthca_icm_table *table, int start, int end); void mthca_table_put_range(struct mthca_dev *dev, struct mthca_icm_table *table, Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_mr.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c @@ -315,6 +315,7 @@ void mthca_arbel_write_mtt_seg(struct mt int start_index, u64 *buffer_list, int list_len) { __be64 *mtts; + dma_addr_t dma_handle; int i; int s = start_index * sizeof (u64); @@ -324,12 +325,14 @@ void mthca_arbel_write_mtt_seg(struct mt BUG_ON(s % MTHCA_MTT_SEG_SIZE); mtts = mthca_table_find(dev->mr_table.mtt_table, mtt->first_seg + - s / MTHCA_MTT_SEG_SIZE); + s / MTHCA_MTT_SEG_SIZE, &dma_handle); BUG_ON(!mtts); for (i = 0; i < list_len; ++i) mtts[i] = cpu_to_be64(buffer_list[i] | MTHCA_MTT_FLAG_PRESENT); + + dma_sync_single(&dev->pdev->dev, dma_handle, list_len * sizeof(u64), DMA_TO_DEVICE); } int mthca_write_mtt_size(struct mthca_dev *dev) @@ -602,7 +605,7 @@ int mthca_fmr_alloc(struct mthca_dev *de if (err) goto err_out_mpt_free; - mr->mem.arbel.mpt = mthca_table_find(dev->mr_table.mpt_table, key); + mr->mem.arbel.mpt = mthca_table_find(dev->mr_table.mpt_table, key, NULL); BUG_ON(!mr->mem.arbel.mpt); } else mr->mem.tavor.mpt = dev->mr_table.tavor_fmr.mpt_base + @@ -616,7 +619,8 @@ int mthca_fmr_alloc(struct mthca_dev *de if (mthca_is_memfree(dev)) { mr->mem.arbel.mtts = mthca_table_find(dev->mr_table.mtt_table, - mr->mtt->first_seg); + mr->mtt->first_seg, + &mr->mem.arbel.dma_handle); BUG_ON(!mr->mem.arbel.mtts); } else mr->mem.tavor.mtts = dev->mr_table.tavor_fmr.mtt_base + mtt_seg; @@ -790,6 +794,9 @@ int mthca_arbel_map_phys_fmr(struct ib_f fmr->mem.arbel.mtts[i] = cpu_to_be64(page_list[i] | MTHCA_MTT_FLAG_PRESENT); + dma_sync_single(&dev->pdev->dev, fmr->mem.arbel.dma_handle, + list_len * sizeof(u64), DMA_TO_DEVICE); + fmr->mem.arbel.mpt->key = cpu_to_be32(key); fmr->mem.arbel.mpt->lkey = cpu_to_be32(key); fmr->mem.arbel.mpt->length = cpu_to_be64(list_len * (1ull << fmr->attr.page_shift)); Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.h =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_provider.h +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.h @@ -89,6 +89,7 @@ struct mthca_fmr { struct { struct mthca_mpt_entry *mpt; __be64 *mtts; + dma_addr_t dma_handle; } arbel; } mem; }; -- MST From mst at mellanox.co.il Sat Feb 10 13:17:26 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Feb 2007 23:17:26 +0200 Subject: [openib-general] [PATCH 4 of 4] IB/mthca: give reserved MTTs a separate cache line Message-ID: <20070210211726.GE14903@mellanox.co.il> This fixes several issues related to reserved MTTs and memory alignment. 1. MTTs are allocated in non-cache-coherent memory, so we must give reserved MTTs their own cache line, to prevent both device and CPU from writing into the same cache line at the same time. 2. reserved_mtts field has different meaning in Tavor and Arbel, so we are wasting mtt entries on memfree. Fix the Arbel case to match Tavor semantics. Signed-off-by: Michael S. Tsirkin --- Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_main.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c @@ -464,6 +464,10 @@ static int mthca_init_icm(struct mthca_d goto err_unmap_aux; } + /* CPU writes to non-reserved MTTs, while HCA might DMA to reserved mtts */ + mdev->limits.reserved_mtts = ALIGN(mdev->limits.reserved_mtts * MTHCA_MTT_SEG_SIZE, + dma_get_cache_alignment()) / MTHCA_MTT_SEG_SIZE; + mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, init_hca->mtt_base, MTHCA_MTT_SEG_SIZE, mdev->limits.num_mtt_segs, Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_cmd.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_cmd.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -1051,7 +1051,11 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_EQ_OFFSET); dev_lim->max_eqs = 1 << (field & 0x7); MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MTT_OFFSET); - dev_lim->reserved_mtts = 1 << (field >> 4); + if (mthca_is_memfree(dev)) + dev_lim->reserved_mtts = ALIGN((1 << (field >> 4)) * sizeof(u64), + MTHCA_MTT_SEG_SIZE) / MTHCA_MTT_SEG_SIZE; + else + dev_lim->reserved_mtts = 1 << (field >> 4); MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_MRW_SZ_OFFSET); dev_lim->max_mrw_sz = 1 << field; MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MRW_OFFSET); -- MST From swise at opengridcomputing.com Sat Feb 10 13:26:35 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 10 Feb 2007 15:26:35 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <1171139763.11017.68.camel@stevo-desktop> References: <1171035668.26453.11.camel@trinity.ogc.int> <1171135423.11017.61.camel@stevo-desktop> <1171139763.11017.68.camel@stevo-desktop> Message-ID: <1171142795.11017.71.camel@stevo-desktop> On Sat, 2007-02-10 at 14:36 -0600, Steve Wise wrote: > ugh. > > There is at least one bug in this patch. I cannot call iw_cm_reject() > inside destroy_cm_id() because both functions grab the iw_cm lock... > > This patch puts the iw_cm_reject() calls back in cm_conn_req_handler()... --- iw_cm_id destruction race condition fixes. From: Steve Wise Several changes: - iwcm_deref_id() always wakes up if there's another reference. - clean up race condition in cm_work_handler(). - create static void free_cm_id() which deallocs the work entries and then kfrees the cm_id memory. This reduces code replication. - rem_ref() if this is the last reference -and- the IWCM owns freeing the cm_id, then free it. Signed-off-by: Steve Wise Signed-off-by: Tom Tucker --- drivers/infiniband/core/iwcm.c | 47 +++++++++++++++++++++------------------- 1 files changed, 25 insertions(+), 22 deletions(-) diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c index 1039ad5..891d1fa 100644 --- a/drivers/infiniband/core/iwcm.c +++ b/drivers/infiniband/core/iwcm.c @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c return 0; } +static void free_cm_id(struct iwcm_id_private *cm_id_priv) +{ + dealloc_work_entries(cm_id_priv); + kfree(cm_id_priv); +} + /* * Release a reference on cm_id. If the last reference is being * released, enable the waiting thread (in iw_destroy_cm_id) to @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c */ static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) { - int ret = 0; - BUG_ON(atomic_read(&cm_id_priv->refcount)==0); if (atomic_dec_and_test(&cm_id_priv->refcount)) { BUG_ON(!list_empty(&cm_id_priv->work_list)); - if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { - BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); - BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, - &cm_id_priv->flags)); - ret = 1; - } complete(&cm_id_priv->destroy_comp); + return 1; } - return ret; + return 0; } static void add_ref(struct iw_cm_id *cm_id) @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_ { struct iwcm_id_private *cm_id_priv; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); - iwcm_deref_id(cm_id_priv); + if (iwcm_deref_id(cm_id_priv) && + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { + BUG_ON(!list_empty(&cm_id_priv->work_list)); + free_cm_id(cm_id_priv); + } } static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event); @@ -355,7 +358,9 @@ static void destroy_cm_id(struct iw_cm_i case IW_CM_STATE_CONN_RECV: /* * App called destroy before/without calling accept after - * receiving connection request event notification. + * receiving connection request event notification or + * returned non zero from the event callback function. + * In either case, must tell the provider to reject. */ cm_id_priv->state = IW_CM_STATE_DESTROYING; break; @@ -391,9 +396,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c wait_for_completion(&cm_id_priv->destroy_comp); - dealloc_work_entries(cm_id_priv); - - kfree(cm_id_priv); + free_cm_id(cm_id_priv); } EXPORT_SYMBOL(iw_destroy_cm_id); @@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i /* Call the client CM handler */ ret = cm_id->cm_handler(cm_id, iw_event); if (ret) { + iw_cm_reject(cm_id, NULL, 0); set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); destroy_cm_id(cm_id); if (atomic_read(&cm_id_priv->refcount)==0) - kfree(cm_id); + free_cm_id(cm_id_priv); } out: @@ -854,13 +858,12 @@ static void cm_work_handler(struct work_ destroy_cm_id(&cm_id_priv->id); } BUG_ON(atomic_read(&cm_id_priv->refcount)==0); - if (iwcm_deref_id(cm_id_priv)) - return; - - if (atomic_read(&cm_id_priv->refcount)==0 && - test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { - dealloc_work_entries(cm_id_priv); - kfree(cm_id_priv); + if (iwcm_deref_id(cm_id_priv)) { + if (test_bit(IWCM_F_CALLBACK_DESTROY, + &cm_id_priv->flags)) { + BUG_ON(!list_empty(&cm_id_priv->work_list)); + free_cm_id(cm_id_priv); + } return; } spin_lock_irqsave(&cm_id_priv->lock, flags); From rdreier at cisco.com Sat Feb 10 15:12:04 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 10 Feb 2007 15:12:04 -0800 Subject: [openib-general] [PATCH] for-2.6.21 Remove hw/cxgb3/core subdirectory. In-Reply-To: <1171133573.11017.41.camel@stevo-desktop> (Steve Wise's message of "Sat, 10 Feb 2007 12:52:53 -0600") References: <1171133573.11017.41.camel@stevo-desktop> Message-ID: Thanks, applied this and the previous patch, and pushed out my for-2.6.21 branch. I also rebased so the cxgb3 net driver builds now. From mst at mellanox.co.il Sat Feb 10 15:32:29 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Feb 2007 01:32:29 +0200 Subject: [openib-general] integer overflow Message-ID: <20070210233229.GE32216@mellanox.co.il> Roland, the following code in ipoib: while ((int) priv->tx_tail - (int) priv->tx_head < 0) { seems to rely on integer overflow which seems to be undefined behaviour. Should we care? -- MST From rdreier at cisco.com Sat Feb 10 15:52:45 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 10 Feb 2007 15:52:45 -0800 Subject: [openib-general] integer overflow In-Reply-To: <20070210233229.GE32216@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 11 Feb 2007 01:32:29 +0200") References: <20070210233229.GE32216@mellanox.co.il> Message-ID: > while ((int) priv->tx_tail - (int) priv->tx_head < 0) { > > seems to rely on integer overflow which seems to be > undefined behaviour. tx_tail and tx_head are unsigned, and overflow is defined for unsigned integers. - R. From mst at mellanox.co.il Sat Feb 10 15:59:35 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Feb 2007 01:59:35 +0200 Subject: [openib-general] integer overflow In-Reply-To: References: Message-ID: <20070210235935.GF32216@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: integer overflow > > > while ((int) priv->tx_tail - (int) priv->tx_head < 0) { > > > > seems to rely on integer overflow which seems to be > > undefined behaviour. > > tx_tail and tx_head are unsigned, and overflow is defined for unsigned > integers. Yes but we cast them to signed int here - no? -- MST From rdreier at cisco.com Sat Feb 10 16:01:01 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 10 Feb 2007 16:01:01 -0800 Subject: [openib-general] integer overflow In-Reply-To: <20070210235935.GF32216@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 11 Feb 2007 01:59:35 +0200") References: <20070210235935.GF32216@mellanox.co.il> Message-ID: > Yes but we cast them to signed int here - no? That's true, I guess it is technically undefined. But time_after() is relying on the same thing working, so I would say we don't care. - R. From mst at mellanox.co.il Sat Feb 10 16:31:40 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Feb 2007 02:31:40 +0200 Subject: [openib-general] [PATCH RFC] use common cq for ipoib cm send side Message-ID: <20070211003140.GH32216@mellanox.co.il> The following untested patch moves all TX processing in IPoIB CM to common CQ. This should help reduce the number of interrupts for bi-directional traffic (such as TCP). Is this a good idea? What do others think? Signed-off-by: Michael S. Tsirkin --- diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index eb885ee..ef703c7 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -99,9 +99,9 @@ enum { #define IPOIB_OP_RECV (1ul << 31) #ifdef CONFIG_INFINIBAND_IPOIB_CM -#define IPOIB_CM_OP_SRQ (1ul << 30) +#define IPOIB_OP_CM (1ul << 30) #else -#define IPOIB_CM_OP_SRQ (0) +#define IPOIB_OP_CM (0) #endif /* structs */ @@ -144,7 +144,6 @@ struct ipoib_cm_rx { struct ipoib_cm_tx { struct ib_cm_id *id; - struct ib_cq *cq; struct ib_qp *qp; struct list_head list; struct net_device *dev; @@ -233,6 +232,7 @@ struct ipoib_dev_priv { unsigned tx_tail; struct ib_sge tx_sge; struct ib_send_wr tx_wr; + unsigned tx_outstanding; struct ib_wc ibwc[IPOIB_NUM_WC]; @@ -439,6 +439,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, unsigned int mtu); void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc); +void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc); #else struct ipoib_cm_tx; @@ -527,6 +528,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *w { } +static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) +{ +} #endif #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 8ee6f06..47c868c 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -85,7 +85,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) struct ib_recv_wr *bad_wr; int i, ret; - priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; + priv->cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV; for (i = 0; i < IPOIB_CM_RX_SG; ++i) priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i]; @@ -346,7 +346,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; + unsigned int wr_id = wc->wr_id & ~(IPOIB_OP_CM | IPOIB_OP_RECV); struct sk_buff *skb; struct ipoib_cm_rx *p; unsigned long flags; @@ -433,7 +433,7 @@ static inline int post_send(struct ipoib_dev_priv *priv, priv->tx_sge.addr = addr; priv->tx_sge.length = len; - priv->tx_wr.wr_id = wr_id; + priv->tx_wr.wr_id = wr_id | IPOIB_OP_CM; return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr); } @@ -484,20 +484,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ dev->trans_start = jiffies; ++tx->tx_head; - if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) { + if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n", tx->qp->qp_num); netif_stop_queue(dev); - set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); } } } -static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx, - struct ib_wc *wc) +void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id; + struct ipoib_cm_tx *tx = wc->qp->qp_context; + unsigned int wr_id = wc->wr_id & ~IPOIB_OP_CM; struct ipoib_tx_buf *tx_req; unsigned long flags; @@ -522,11 +521,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx spin_lock_irqsave(&priv->tx_lock, flags); ++tx->tx_tail; - if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) && - tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) { - clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(dev); - } if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) { @@ -551,8 +549,17 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx /* queue would be re-started anyway when TX is destroyed, * but it makes sense to do it ASAP here. */ - if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) - netif_wake_queue(dev); + while ((int) tx->tx_tail - (int) tx->tx_head < 0) { + tx_req = &tx->tx_ring[tx->tx_tail & (ipoib_sendq_size - 1)]; + ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, + DMA_TO_DEVICE); + dev_kfree_skb_any(tx_req->skb); + ++tx->tx_tail; + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(tx->dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) + netif_wake_queue(tx->dev); + } if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { list_move(&tx->list, &priv->cm.reap_list); @@ -567,19 +574,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx spin_unlock_irqrestore(&priv->tx_lock, flags); } -static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr) -{ - struct ipoib_cm_tx *tx = tx_ptr; - int n, i; - - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); - do { - n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc); - for (i = 0; i < n; ++i) - ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i); - } while (n == IPOIB_NUM_WC); -} - int ipoib_cm_dev_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -699,17 +693,18 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even return 0; } -static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq) +static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_cm_tx *tx) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_init_attr attr = {}; attr.recv_cq = priv->cq; + attr.send_cq = priv->cq; attr.srq = priv->cm.srq; attr.cap.max_send_wr = ipoib_sendq_size; attr.cap.max_send_sge = 1; attr.sq_sig_type = IB_SIGNAL_ALL_WR; attr.qp_type = IB_QPT_RC; - attr.send_cq = cq; + attr.qp_context = tx; return ib_create_qp(priv->pd, &attr); } @@ -789,21 +784,7 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn, goto err_tx; } - p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p, - ipoib_sendq_size + 1); - if (IS_ERR(p->cq)) { - ret = PTR_ERR(p->cq); - ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret); - goto err_cq; - } - - ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP); - if (ret) { - ipoib_warn(priv, "failed to request completion notification: %d\n", ret); - goto err_req_notify; - } - - p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq); + p->qp = ipoib_cm_create_tx_qp(p->dev, p); if (IS_ERR(p->qp)) { ret = PTR_ERR(p->qp); ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret); @@ -840,12 +821,8 @@ err_modify: err_id: p->id = NULL; ib_destroy_qp(p->qp); -err_req_notify: err_qp: p->qp = NULL; - ib_destroy_cq(p->cq); -err_cq: - p->cq = NULL; err_tx: return ret; } @@ -854,6 +831,7 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) { struct ipoib_dev_priv *priv = netdev_priv(p->dev); struct ipoib_tx_buf *tx_req; + unsigned long flags; ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n", p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail); @@ -864,12 +842,6 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) if (p->qp) ib_destroy_qp(p->qp); - if (p->cq) - ib_destroy_cq(p->cq); - - if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags)) - netif_wake_queue(p->dev); - if (p->tx_ring) { while ((int) p->tx_tail - (int) p->tx_head < 0) { tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)]; @@ -877,6 +849,12 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) DMA_TO_DEVICE); dev_kfree_skb_any(tx_req->skb); ++p->tx_tail; + spin_lock_irqsave(&priv->tx_lock, flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(p->dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) + netif_wake_queue(p->dev); + spin_unlock_irqrestore(&priv->tx_lock, flags); } kfree(p->tx_ring); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f2aa923..19a3d3e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -266,11 +266,10 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; - if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) { - clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(dev); - } spin_unlock_irqrestore(&priv->tx_lock, flags); if (wc->status != IB_WC_SUCCESS && @@ -282,12 +281,17 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc) { - if (wc->wr_id & IPOIB_CM_OP_SRQ) - ipoib_cm_handle_rx_wc(dev, wc); - else if (wc->wr_id & IPOIB_OP_RECV) - ipoib_ib_handle_rx_wc(dev, wc); - else - ipoib_ib_handle_tx_wc(dev, wc); + if (wc->wr_id & IPOIB_OP_CM) { + if (wc->wr_id & IPOIB_OP_RECV) + ipoib_cm_handle_rx_wc(dev, wc); + else + ipoib_cm_handle_tx_wc(dev, wc); + } else { + if (wc->wr_id & IPOIB_OP_RECV) + ipoib_ib_handle_rx_wc(dev, wc); + else + ipoib_ib_handle_tx_wc(dev, wc); + } } void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) @@ -370,10 +374,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, address->last_send = priv->tx_head; ++priv->tx_head; - if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) { + if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); netif_stop_queue(dev); - set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); } } } @@ -549,6 +552,7 @@ int ipoib_ib_dev_stop(struct net_device *dev) DMA_TO_DEVICE); dev_kfree_skb_any(tx_req->skb); ++priv->tx_tail; + --priv->tx_outstanding; } for (i = 0; i < ipoib_recvq_size; ++i) { diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 19e82db..7c7b136 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -900,7 +900,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port) goto out_rx_ring_cleanup; } - /* priv->tx_head & tx_tail are already 0 */ + /* priv->tx_head, tx_tail & tx_outstanding are already 0 */ if (ipoib_ib_dev_init(dev, ca, port)) goto out_tx_ring_cleanup; -- MST From vlad at mellanox.co.il Sat Feb 10 23:58:05 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 11 Feb 2007 09:58:05 +0200 Subject: [openib-general] [PATCH ofed-1.2] ofa_user.spec: fix libehca directory structure In-Reply-To: <200702081832.14862.ossrosch@linux.vnet.ibm.com> References: <200702081832.14862.ossrosch@linux.vnet.ibm.com> Message-ID: <1171180685.5694.2.camel@vladsk-laptop> On Thu, 2007-02-08 at 18:32 +0100, Stefan Roscher wrote: > Correct directory structure according to new driver loading scheme from libibverbs > > > Signed-off-by: Stefan Roscher > --- > > > --- ofa_user.spec_old 2007-02-08 09:03:33.000000000 -0800 > +++ ofa_user.spec_new 2007-02-08 09:07:32.000000000 -0800 Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From vlad at mellanox.co.il Sun Feb 11 00:33:10 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 11 Feb 2007 10:33:10 +0200 Subject: [openib-general] [PATCH ofed-1.2] ofa_user.spec: fix installation path for ehca.driver In-Reply-To: <200702091437.02142.ossrosch@linux.vnet.ibm.com> References: <200702091437.02142.ossrosch@linux.vnet.ibm.com> Message-ID: <1171182790.5694.4.camel@vladsk-laptop> On Fri, 2007-02-09 at 14:37 +0100, Stefan Roscher wrote: > Hi Vladimir, > > we tested the newest ofed1.2 package and found out that ehca.driver file is > not copied into /usr/local/ofed/etc/libibverbs.d/ > > This patch add the installation path for ehca.driver to ofa_user.spec. > Please ensure you first apply the ofa_user.spec patch I sent yesterday: > http://openib.org/pipermail/openib-general/2007-February/032736.html > > > Signed-off-by: Stefan Roscher > --- > > > ofa_user.spec | 1 + > 1 files changed, 1 insertion(+) > > > > diff -Nurp ofed_scripts_old/ofa_user.spec ofed_scripts_new/ofa_user.spec > --- ofed_scripts_old/ofa_user.spec 2007-02-09 14:00:38.000000000 +0100 > +++ ofed_scripts_new/ofa_user.spec 2007-02-09 14:02:45.000000000 +0100 > @@ -1165,6 +1165,7 @@ fi > %files -n libehca -f libehca-files > %defattr(-,root,root,-) > %{_libdir}/libehca*.so* > +%config %{_prefix}/etc/libibverbs.d/ehca.driver > # %doc AUTHORS COPYING ChangeLog README > %endif Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From vlad at lists.openfabrics.org Sun Feb 11 02:24:24 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sun, 11 Feb 2007 02:24:24 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070211-0200 daily build status Message-ID: <20070211102424.ACFEAE60808@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’ /home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’: /home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From ogerlitz at voltaire.com Sun Feb 11 03:23:20 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 11 Feb 2007 13:23:20 +0200 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> Message-ID: <45CEFCA8.4000008@voltaire.com> Roland Dreier wrote: > I merged the "increment port number" and "remove redundant '_wq'" > patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland > > I plan to review to multicast stuff next week and I hope to merge it > for 2.6.21. Or, have you or anyone else at Voltaire read over the > code in addition to using it? Do you see anything that should be > cleaned up? OK, I spent some time today on reviewing and playing with the ib_sa: track multicast join/leave requests patch - and have no special comments. I think the two patches are ready for merge, let me know if you have any specific question. Or. From tziporet at mellanox.co.il Sun Feb 11 05:43:10 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 11 Feb 2007 15:43:10 +0200 Subject: [openib-general] Reminder: OFED 1.2 coordination meeting on Mon Feb-12 9am PST Message-ID: <45CF1D6E.4080101@mellanox.co.il> Reminder: OFED 1.2 coordination meeting on Mon Feb-12 9am PST Agenda: * OFED 1.2 Alpha status update Tziporet ------------------------------------------------------------------------------------------- Bridge info: Meeting ID: 2106670 Meeting Password: Global Access Numbers: http://cisco.com/en/US/about/doing_business/conferencing/index.html US/Canada: +1.866.432.9903 United Kingdom: +44.20.8824.0117 India: +91.80.4103.3979 Germany: +49.619.6773.9002 Japan: +81.3.5763.9394 China: +86.10.8515.5666 for world-wide access numbers see: http://openib.org/pipermail/openib-general/2007-January/031282.html From pasha at dev.mellanox.co.il Sun Feb 11 06:52:52 2007 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Sun, 11 Feb 2007 16:52:52 +0200 Subject: [openib-general] [openfabrics-ewg] MVAPICH 0.9.9-beta release is available In-Reply-To: <200702092228.l19MSGEo006670@xi.cse.ohio-state.edu> References: <200702092228.l19MSGEo006670@xi.cse.ohio-state.edu> Message-ID: <45CF2DC4.8050402@dev.mellanox.co.il> SRPM with latest version of mvapich 0.9.9 (beta 0.9.9) was uploaded to OFED 1.2 repository http://www.openfabrics.org/~pasha/ofed_1_2/mvapich/ Regards, Pasha Dhabaleswar Panda wrote: > The MVAPICH team is pleased to announce the availability of MVAPICH > 0.9.9-beta with the following NEW features: > > - Message coalescing support to enable reduction of per Queue-pair > send queues for reduction in memory requirement on large scale > clusters. This design also increases the small message messaging > rate significantly. > > - Designs for avoiding hot-spots in networks of large-scale clusters > > - Multi-pathing support leveraging LMC mechanism > - Multi-port support for enabling user processes to bind to > different IB ports for balanced communication performance > on multi-core platforms > > - Multi-core optimized scalable shared memory design > > - Memory Hook support provided by integration with ptmalloc2 library. > This provides safe release of memory to the Operating System and > is expected to benefit the memory usage of applications that > frequently use malloc and free operations. > > - Optimized, high-performance shared memory aware collective > operations for multi-core platforms > > - Shared-Memory only channel (This interface support is useful for > running MPI jobs on multi-processor systems without using any > high-performance network. For example, multi-core servers, > desktops, and laptops; and clusters with serial nodes.) > > A new "Multiple-pair Bandwidth and Message Rate" test is also > available as a part of OSU_Benchmarks. > > For downloading MVAPICH 0.9.9-beta package and accessing the anonymous > SVN, please visit the following URL: > > http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ > > MVAPICH 0.9.9-beta is also available for OFED 1.2 testing. > > All feedbacks, including bug reports and hints for performance tuning, > are welcome. Please post it to the mvapich-discuss mailing list. > > Thanks, > > MVAPICH Team > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From sweitzen at cisco.com Sun Feb 11 10:44:54 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 11 Feb 2007 10:44:54 -0800 Subject: [openib-general] Problem with install.sh openib-diags OFED-1.2-20070208-1508.tgz Message-ID: I'm using install.sh on RHEL4 U3 x86_64 Preparing... ################################################## kernel-ib-devel ################################################## kernel-ib ################################################## error: Failed dependencies: perl(IBswcountlimits) is needed by openib-diags-1.2.0-pre1.x86_64 ERROR: Failed executing "/bin/rpm -ihv /tmp/OFED-1.2-20070208-1508/RPMS/redhat-\ release-4AS-4.1/dapl-1.2.0-0.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat\ -release-4AS-4.1/dapl-devel-1.2.0-0.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS\ /redhat-release-4AS-4.1/libibcommon-1.0.2-0.x86_64.rpm /tmp/OFED-1.2-20070208-1\ 508/RPMS/redhat-release-4AS-4.1/libibcommon-devel-1.0.2-0.x86_64.rpm /tmp/OFED-\ 1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibmad-1.0.2-0.x86_64.rp m /tmp/\ OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibmad-devel-1.0.2- 0.x86_6\ 4.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibumad-1.0.2- 0\ .x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibumad-d\ evel-1.0.2-0.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1\ /libibverbs-1.1-pre1.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release\ -4AS-4.1/libibverbs-devel-1.1-pre1.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/\ redhat-release-4AS-4.1/libibverbs-utils-1.1-pre1.x86_64.rpm /tmp/OFED-1.2-20070\ 208-1508/RPMS/redhat-release-4AS-4.1/libmthca-1.0.4-pre.x86_64.rpm /tmp/OFED-1.\ 2-20070208-1508/RPMS/redhat-release-4AS-4.1/libmthca-devel-1.0.4-pre.x86 _64.rpm\ /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libopensm-3.0.1- 0.x86_\ 64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libopensm-devel- \ 3.0.1-0.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libo\ smcomp-3.0.1-0.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4\ .1/libosmcomp-devel-3.0.1-0.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-\ release-4AS-4.1/libosmvendor-3.0.1-0.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPM\ S/redhat-release-4AS-4.1/libosmvendor-devel-3.0.1-0.x86_64.rpm /tmp/OFED-1.2-20\ 070208-1508/RPMS/redhat-release-4AS-4.1/librdmacm-0.9.0-0.x86_64.rpm /tmp/OFED-\ 1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/librdmacm-devel-0.9.0-0.x8 6_64.rp\ m /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libsdp-1.1.99-0. x86_6\ 4.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/openib-diags-1.2 .\ 0-pre1.x86_64.rpm /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/perft\ est-1.2-0.x86_64.rpm " Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Sun Feb 11 11:58:19 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Sun, 11 Feb 2007 13:58:19 -0600 Subject: [openib-general] [PATCH] ofed_1-2 IWCM - Set iniator depth and responder resources to device max values. Message-ID: <1171223899.4027.1.camel@linux-q667.site> IWCM - Set initiator depth and responder resources to device max values. For OFED 1.2, the IWCM will set the initiator depth and responder resources to the device max values for new connect request events. Signed-off-by: Steve Wise --- kernel_patches/fixes/iwcm_ordird.patch | 43 ++++++++++++++++++++++++++++++++ 1 files changed, 43 insertions(+), 0 deletions(-) diff --git a/kernel_patches/fixes/iwcm_ordird.patch b/kernel_patches/fixes/iwcm_ordird.patch new file mode 100644 index 0000000..3a9f643 --- /dev/null +++ b/kernel_patches/fixes/iwcm_ordird.patch @@ -0,0 +1,43 @@ +commit 7175034c7adf6b5fb5ba311929376af7501387a1 +Author: Steve Wise +Date: Sat Feb 10 14:16:35 2007 -0600 + + IWCM - Set iniator depth and responder resources to device max values. + + For OFED 1.2, the IWCM will set the initiator depth and responder + resources to the device max values for new connect request events. + + Signed-off-by: Steve Wise + +diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c +index 9e0ab04..e3afdf8 100644 +--- a/drivers/infiniband/core/cma.c ++++ b/drivers/infiniband/core/cma.c +@@ -1137,6 +1137,7 @@ static int iw_conn_req_handler(struct iw + struct net_device *dev = NULL; + struct rdma_cm_event event; + int ret; ++ struct ib_device_attr attr; + + listen_id = cm_id->context; + atomic_inc(&listen_id->dev_remove); +@@ -1189,10 +1190,19 @@ static int iw_conn_req_handler(struct iw + sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; + *sin = iw_event->remote_addr; + ++ ret = ib_query_device(conn_id->id.device, &attr); ++ if (ret) { ++ cma_release_remove(conn_id); ++ rdma_destroy_id(new_cm_id); ++ goto out; ++ } ++ + memset(&event, 0, sizeof event); + event.event = RDMA_CM_EVENT_CONNECT_REQUEST; + event.param.conn.private_data = iw_event->private_data; + event.param.conn.private_data_len = iw_event->private_data_len; ++ event.param.conn.initiator_depth = attr.max_qp_init_rd_atom; ++ event.param.conn.responder_resources = attr.max_qp_rd_atom; + ret = conn_id->id.event_handler(&conn_id->id, &event); + if (ret) { + /* User wants to destroy the CM ID */ From mst at mellanox.co.il Sun Feb 11 13:03:07 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Feb 2007 23:03:07 +0200 Subject: [openib-general] [PATCH RFC] use common cq for ipoib cm send side In-Reply-To: <20070211003140.GH32216@mellanox.co.il> References: <20070211003140.GH32216@mellanox.co.il> Message-ID: <20070211210307.GB28231@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: [PATCH RFC] use common cq for ipoib cm send side > > The following untested patch moves all TX processing in IPoIB CM to common CQ. > This should help reduce the number of interrupts for bi-directional traffic > (such as TCP). Is this a good idea? What do others think? > > Signed-off-by: Michael S. Tsirkin FYI, this was just thinking aloud. The version below works fine here but the performance gain seems to be very small (about 1%). The gain with NAPI might be bigger but this is yet to be tested. I'll continue looking into this. Feedback wellcome. ipoib.h | 10 +++++-- ipoib_cm.c | 78 +++++++++++++++-------------------------------------------- ipoib_ib.c | 28 ++++++++++++--------- ipoib_main.c | 2 - 4 files changed, 45 insertions(+), 73 deletions(-) ------------ Use common CQ for all TX QPs: keep a per-device counter out outstanding tx WRs, and stop the interface when this counter reaches the send queue size, to avoid CQ overruns. Signed-off-by: Michael S. Tsirkin --- diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index eb885ee..ef703c7 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -99,9 +99,9 @@ enum { #define IPOIB_OP_RECV (1ul << 31) #ifdef CONFIG_INFINIBAND_IPOIB_CM -#define IPOIB_CM_OP_SRQ (1ul << 30) +#define IPOIB_OP_CM (1ul << 30) #else -#define IPOIB_CM_OP_SRQ (0) +#define IPOIB_OP_CM (0) #endif /* structs */ @@ -144,7 +144,6 @@ struct ipoib_cm_rx { struct ipoib_cm_tx { struct ib_cm_id *id; - struct ib_cq *cq; struct ib_qp *qp; struct list_head list; struct net_device *dev; @@ -233,6 +232,7 @@ struct ipoib_dev_priv { unsigned tx_tail; struct ib_sge tx_sge; struct ib_send_wr tx_wr; + unsigned tx_outstanding; struct ib_wc ibwc[IPOIB_NUM_WC]; @@ -439,6 +439,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, unsigned int mtu); void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc); +void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc); #else struct ipoib_cm_tx; @@ -527,6 +528,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *w { } +static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) +{ +} #endif #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 8ee6f06..af36562 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -85,7 +85,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) struct ib_recv_wr *bad_wr; int i, ret; - priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; + priv->cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV; for (i = 0; i < IPOIB_CM_RX_SG; ++i) priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i]; @@ -346,7 +346,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; + unsigned int wr_id = wc->wr_id & ~(IPOIB_OP_CM | IPOIB_OP_RECV); struct sk_buff *skb; struct ipoib_cm_rx *p; unsigned long flags; @@ -433,7 +433,7 @@ static inline int post_send(struct ipoib_dev_priv *priv, priv->tx_sge.addr = addr; priv->tx_sge.length = len; - priv->tx_wr.wr_id = wr_id; + priv->tx_wr.wr_id = wr_id | IPOIB_OP_CM; return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr); } @@ -484,20 +484,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ dev->trans_start = jiffies; ++tx->tx_head; - if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) { + if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n", tx->qp->qp_num); netif_stop_queue(dev); - set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); } } } -static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx, - struct ib_wc *wc) +void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id; + struct ipoib_cm_tx *tx = wc->qp->qp_context; + unsigned int wr_id = wc->wr_id & ~IPOIB_OP_CM; struct ipoib_tx_buf *tx_req; unsigned long flags; @@ -522,11 +521,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx spin_lock_irqsave(&priv->tx_lock, flags); ++tx->tx_tail; - if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) && - tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) { - clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(dev); - } if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) { @@ -549,11 +547,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx tx->neigh = NULL; } - /* queue would be re-started anyway when TX is destroyed, - * but it makes sense to do it ASAP here. */ - if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) - netif_wake_queue(dev); - if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { list_move(&tx->list, &priv->cm.reap_list); queue_work(ipoib_workqueue, &priv->cm.reap_task); @@ -567,19 +560,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx spin_unlock_irqrestore(&priv->tx_lock, flags); } -static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr) -{ - struct ipoib_cm_tx *tx = tx_ptr; - int n, i; - - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); - do { - n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc); - for (i = 0; i < n; ++i) - ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i); - } while (n == IPOIB_NUM_WC); -} - int ipoib_cm_dev_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -699,17 +679,18 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even return 0; } -static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq) +static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_cm_tx *tx) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_init_attr attr = {}; attr.recv_cq = priv->cq; + attr.send_cq = priv->cq; attr.srq = priv->cm.srq; attr.cap.max_send_wr = ipoib_sendq_size; attr.cap.max_send_sge = 1; attr.sq_sig_type = IB_SIGNAL_ALL_WR; attr.qp_type = IB_QPT_RC; - attr.send_cq = cq; + attr.qp_context = tx; return ib_create_qp(priv->pd, &attr); } @@ -789,21 +770,7 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn, goto err_tx; } - p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p, - ipoib_sendq_size + 1); - if (IS_ERR(p->cq)) { - ret = PTR_ERR(p->cq); - ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret); - goto err_cq; - } - - ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP); - if (ret) { - ipoib_warn(priv, "failed to request completion notification: %d\n", ret); - goto err_req_notify; - } - - p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq); + p->qp = ipoib_cm_create_tx_qp(p->dev, p); if (IS_ERR(p->qp)) { ret = PTR_ERR(p->qp); ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret); @@ -840,12 +807,8 @@ err_modify: err_id: p->id = NULL; ib_destroy_qp(p->qp); -err_req_notify: err_qp: p->qp = NULL; - ib_destroy_cq(p->cq); -err_cq: - p->cq = NULL; err_tx: return ret; } @@ -854,6 +817,7 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) { struct ipoib_dev_priv *priv = netdev_priv(p->dev); struct ipoib_tx_buf *tx_req; + unsigned long flags; ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n", p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail); @@ -864,12 +828,6 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) if (p->qp) ib_destroy_qp(p->qp); - if (p->cq) - ib_destroy_cq(p->cq); - - if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags)) - netif_wake_queue(p->dev); - if (p->tx_ring) { while ((int) p->tx_tail - (int) p->tx_head < 0) { tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)]; @@ -877,6 +835,12 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) DMA_TO_DEVICE); dev_kfree_skb_any(tx_req->skb); ++p->tx_tail; + spin_lock_irqsave(&priv->tx_lock, flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(p->dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) + netif_wake_queue(p->dev); + spin_unlock_irqrestore(&priv->tx_lock, flags); } kfree(p->tx_ring); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f2aa923..19a3d3e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -266,11 +266,10 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; - if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) { - clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(dev); - } spin_unlock_irqrestore(&priv->tx_lock, flags); if (wc->status != IB_WC_SUCCESS && @@ -282,12 +281,17 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc) { - if (wc->wr_id & IPOIB_CM_OP_SRQ) - ipoib_cm_handle_rx_wc(dev, wc); - else if (wc->wr_id & IPOIB_OP_RECV) - ipoib_ib_handle_rx_wc(dev, wc); - else - ipoib_ib_handle_tx_wc(dev, wc); + if (wc->wr_id & IPOIB_OP_CM) { + if (wc->wr_id & IPOIB_OP_RECV) + ipoib_cm_handle_rx_wc(dev, wc); + else + ipoib_cm_handle_tx_wc(dev, wc); + } else { + if (wc->wr_id & IPOIB_OP_RECV) + ipoib_ib_handle_rx_wc(dev, wc); + else + ipoib_ib_handle_tx_wc(dev, wc); + } } void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) @@ -370,10 +374,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, address->last_send = priv->tx_head; ++priv->tx_head; - if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) { + if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); netif_stop_queue(dev); - set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); } } } @@ -549,6 +552,7 @@ int ipoib_ib_dev_stop(struct net_device *dev) DMA_TO_DEVICE); dev_kfree_skb_any(tx_req->skb); ++priv->tx_tail; + --priv->tx_outstanding; } for (i = 0; i < ipoib_recvq_size; ++i) { diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 19e82db..7c7b136 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -900,7 +900,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port) goto out_rx_ring_cleanup; } - /* priv->tx_head & tx_tail are already 0 */ + /* priv->tx_head, tx_tail & tx_outstanding are already 0 */ if (ipoib_ib_dev_init(dev, ca, port)) goto out_tx_ring_cleanup; -- MST From swise at opengridcomputing.com Sun Feb 11 13:14:49 2007 From: swise at opengridcomputing.com (Steve WIse) Date: Sun, 11 Feb 2007 15:14:49 -0600 Subject: [openib-general] [PATCH] iw_cxgb3 Change cxio semaphore to mutex. Message-ID: <1171228489.4027.4.camel@linux-q667.site> From: Steve Wise Change cxio semaphore to mutex. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/cxio_hal.c | 10 +++++----- drivers/infiniband/hw/cxgb3/cxio_hal.h | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 19553b3..de3cb15 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -30,9 +30,9 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ -#include #include +#include #include #include #include @@ -527,7 +527,7 @@ static int cxio_hal_init_ctrl_qp(struct memset(rdev_p->ctrl_qp.workq, 0, (1 << T3_CTRL_QP_SIZE_LOG2) * sizeof(union t3_wr)); - init_MUTEX(&rdev_p->ctrl_qp.sem); + mutex_init(&rdev_p->ctrl_qp.lock); init_waitqueue_head(&rdev_p->ctrl_qp.waitq); /* update HW Ctrl QP context */ @@ -570,7 +570,7 @@ static int cxio_hal_destroy_ctrl_qp(stru /* write len bytes of data into addr (32B aligned address) * If data is NULL, clear len byte of memory to zero. - * caller aquires the sem before the call + * caller aquires the ctrl_qp lock before the call */ static int cxio_hal_ctrl_qp_write_mem(struct cxio_rdev *rdev_p, u32 addr, u32 len, void *data, int completion) @@ -705,7 +705,7 @@ static int __cxio_tpt_op(struct cxio_rde } } - down_interruptible(&rdev_p->ctrl_qp.sem); + mutex_lock(&rdev_p->ctrl_qp.lock); /* write PBL first if any - update pbl only if pbl list exist */ if (pbl) { @@ -752,7 +752,7 @@ static int __cxio_tpt_op(struct cxio_rde cxio_hal_put_stag(rdev_p->rscp, stag_idx); ret: wptr = rdev_p->ctrl_qp.wptr; - up(&rdev_p->ctrl_qp.sem); + mutex_unlock(&rdev_p->ctrl_qp.lock); if (!err) if (wait_event_interruptible(rdev_p->ctrl_qp.waitq, SEQ32_GE(rdev_p->ctrl_qp.rptr, diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h index 8fb2999..1b97e80 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.h +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h @@ -62,8 +62,8 @@ #define T3_MAX_DEV_NAME_LEN 32 struct cxio_hal_ctrl_qp { u32 wptr; u32 rptr; - struct semaphore sem; /* for the wtpr, can sleep */ - wait_queue_head_t waitq; /* wait for RspQ/CQE msg */ + struct mutex lock; /* for the wtpr, can sleep */ + wait_queue_head_t waitq;/* wait for RspQ/CQE msg */ union t3_wr *workq; /* the work request queue */ dma_addr_t dma_addr; /* pci bus address of the workq */ DECLARE_PCI_UNMAP_ADDR(mapping) From rdreier at cisco.com Sun Feb 11 15:11:38 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 11 Feb 2007 15:11:38 -0800 Subject: [openib-general] [PATCH] iw_cxgb3 Change cxio semaphore to mutex. In-Reply-To: <1171228489.4027.4.camel@linux-q667.site> (Steve WIse's message of "Sun, 11 Feb 2007 15:14:49 -0600") References: <1171228489.4027.4.camel@linux-q667.site> Message-ID: Thanks, applied along with the following warning cleanup for archs where u64 is unsigned long instead unsigned long long: diff --git a/drivers/infiniband/hw/cxgb3/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/cxio_dbg.c index dfaa704..5a7306f 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_dbg.c +++ b/drivers/infiniband/hw/cxgb3/cxio_dbg.c @@ -62,7 +62,7 @@ void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag) data = (u64 *)m->buf; while (size > 0) { - PDBG("TPT %08x: %016llx\n", m->addr, (u64)*data); + PDBG("TPT %08x: %016llx\n", m->addr, (unsigned long long) *data); size -= 8; data++; m->addr += 8; @@ -100,7 +100,7 @@ void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, uint len, u8 shift) data = (u64 *)m->buf; while (size > 0) { - PDBG("PBL %08x: %016llx\n", m->addr, (u64)*data); + PDBG("PBL %08x: %016llx\n", m->addr, (unsigned long long) *data); size -= 8; data++; m->addr += 8; @@ -116,7 +116,8 @@ void cxio_dump_wqe(union t3_wr *wqe) if (size == 0) size = 8; while (size > 0) { - PDBG("WQE %p: %016llx\n", data, be64_to_cpu(*data)); + PDBG("WQE %p: %016llx\n", data, + (unsigned long long) be64_to_cpu(*data)); size--; data++; } @@ -128,7 +129,8 @@ void cxio_dump_wce(struct t3_cqe *wce) int size = sizeof(*wce); while (size > 0) { - PDBG("WCE %p: %016llx\n", data, be64_to_cpu(*data)); + PDBG("WCE %p: %016llx\n", data, + (unsigned long long) be64_to_cpu(*data)); size -= 8; data++; } @@ -159,7 +161,7 @@ void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents) data = (u64 *)m->buf; while (size > 0) { - PDBG("RQT %08x: %016llx\n", m->addr, (u64)*data); + PDBG("RQT %08x: %016llx\n", m->addr, (unsigned long long) *data); size -= 8; data++; m->addr += 8; diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 19553b3..0531b94 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -298,7 +298,7 @@ int cxio_create_qp(struct cxio_rdev *rdev_p, u32 kernel_domain, wq->udb = (u64)rdev_p->rnic_info.udbell_physbase + (wq->qpid << rdev_p->qpshift); PDBG("%s qpid 0x%x doorbell 0x%p udb 0x%llx\n", __FUNCTION__, - wq->qpid, wq->doorbell, wq->udb); + wq->qpid, wq->doorbell, (unsigned long long) wq->udb); return 0; err4: kfree(wq->sq); @@ -553,8 +553,8 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) wqe->ctx1 = cpu_to_be64(ctx1); wqe->ctx0 = cpu_to_be64(ctx0); PDBG("CtrlQP dma_addr 0x%llx workq %p size %d\n", - (u64) rdev_p->ctrl_qp.dma_addr, rdev_p->ctrl_qp.workq, - 1 << T3_CTRL_QP_SIZE_LOG2); + (unsigned long long) rdev_p->ctrl_qp.dma_addr, + rdev_p->ctrl_qp.workq, 1 << T3_CTRL_QP_SIZE_LOG2); skb->priority = CPL_PRIORITY_CONTROL; return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c index 3d7c96f..98b3bdb 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cq.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c @@ -87,7 +87,7 @@ static int iwch_poll_cq_one(struct iwch_dev *rhp, struct iwch_cq *chp, "lo 0x%x cookie 0x%llx\n", __FUNCTION__, CQE_QPID(cqe), CQE_TYPE(cqe), CQE_OPCODE(cqe), CQE_STATUS(cqe), CQE_WRID_HI(cqe), - CQE_WRID_LOW(cqe), cookie); + CQE_WRID_LOW(cqe), (unsigned long long) cookie); if (CQE_TYPE(cqe) == 0) { if (!CQE_STATUS(cqe)) diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c index 5909ec5..2b6cd53 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_mem.c +++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c @@ -163,7 +163,9 @@ int build_phys_page_list(struct ib_phys_buf *buffer_list, ((u64) j << *shift)); PDBG("%s va 0x%llx mask 0x%llx shift %d len %lld pbl_size %d\n", - __FUNCTION__, *iova_start, mask, *shift, *total_size, *npages); + __FUNCTION__, (unsigned long long) *iova_start, + (unsigned long long) mask, *shift, (unsigned long long) *total_size, + *npages); return 0; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index d02cd72..549de0a 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -213,7 +213,7 @@ static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, } PDBG("created cqid 0x%0x chp %p size 0x%0x, dma_addr 0x%0llx\n", chp->cq.cqid, chp, (1 << chp->cq.size_log2), - (u64)chp->cq.dma_addr); + (unsigned long long) chp->cq.dma_addr); return &chp->ibcq; } @@ -323,7 +323,7 @@ static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) struct iwch_ucontext *ucontext; PDBG("%s off 0x%lx addr 0x%llx len %d\n", __FUNCTION__, vma->vm_pgoff, - pgaddr, len); + (unsigned long long) pgaddr, len); if (vma->vm_start & (PAGE_SIZE-1)) { return -EINVAL; @@ -873,7 +873,8 @@ static struct ib_qp *iwch_create_qp(struct ib_pd *pd, PDBG("%s sq_num_entries %d, rq_num_entries %d " "qpid 0x%0x qhp %p dma_addr 0x%llx size %d\n", __FUNCTION__, qhp->attr.sq_num_entries, qhp->attr.rq_num_entries, - qhp->wq.qpid, qhp, (u64)qhp->wq.dma_addr, 1 << qhp->wq.size_log2); + qhp->wq.qpid, qhp, (unsigned long long) qhp->wq.dma_addr, + 1 << qhp->wq.size_log2); return &qhp->ibqp; } diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index b2eb29e..5680d82 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -212,8 +212,8 @@ static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext, if (mm->addr == addr && mm->len == len) { list_del_init(&mm->entry); spin_unlock(&ucontext->mmap_lock); - PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, - mm->len); + PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, + (unsigned long long) mm->addr, mm->len); return mm; } } @@ -225,7 +225,8 @@ static inline void insert_mmap(struct iwch_ucontext *ucontext, struct iwch_mm_entry *mm) { spin_lock(&ucontext->mmap_lock); - PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, mm->len); + PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, + (unsigned long long) mm->addr, mm->len); list_add_tail(&mm->entry, &ucontext->mmaps); spin_unlock(&ucontext->mmap_lock); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 8b44b69..e066727 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -329,7 +329,7 @@ int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, Q_GENBIT(qhp->wq.wptr, qhp->wq.size_log2), 0, t3_wr_flit_cnt); PDBG("%s cookie 0x%llx wq idx 0x%x swsq idx %ld opcode %d\n", - __FUNCTION__, wr->wr_id, idx, + __FUNCTION__, (unsigned long long) wr->wr_id, idx, Q_PTR2IDX(qhp->wq.sq_wptr, qhp->wq.sq_size_log2), sqp->opcode); wr = wr->next; @@ -381,8 +381,8 @@ int iwch_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr, Q_GENBIT(qhp->wq.wptr, qhp->wq.size_log2), 0, sizeof(struct t3_receive_wr) >> 3); PDBG("%s cookie 0x%llx idx 0x%x rq_wptr 0x%x rw_rptr 0x%x " - "wqe %p \n", __FUNCTION__, wr->wr_id, idx, - qhp->wq.rq_wptr, qhp->wq.rq_rptr, wqe); + "wqe %p \n", __FUNCTION__, (unsigned long long) wr->wr_id, + idx, qhp->wq.rq_wptr, qhp->wq.rq_rptr, wqe); ++(qhp->wq.rq_wptr); ++(qhp->wq.wptr); wr = wr->next; From jgunthorpe at obsidianresearch.com Sun Feb 11 15:09:35 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Sun, 11 Feb 2007 16:09:35 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> Message-ID: <20070211230935.GT11411@obsidianresearch.com> On Fri, Feb 09, 2007 at 06:08:34PM -0800, Sean Hefty wrote: > >So basically what you are saying is that the TClass and FlowLabel act > >as some kind of global dis-ambiguation that lets all SAs know that the > >tuple MUST be matched with > >on each side. > > Sort of... My reasoning is that if you look at a packet traveling > from the source QP to the destination QP, and examine the packet in > some intermediate subnet (say between two routers), then the only > information that it carries is the > tuple. This information must be sufficient to direct the routing at > the endpoints. Ah, I think I missed the key step in your scheme.. You plan to query the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I was thinking only about the SGID=local DGID=remote query direction) Yes, I agree this works in the simple cases. Quite well in fact... The reversed direction of the PR query is very much aligned with the idea that the GRH is only a destination affecting thing. Let my try to outline to you what I think you are proposing. This is the diagram I am thinking of: SA SA' Node1 --> (LID 1) Router A ------- Router A' (LID A) ---> Node2 |-> (LID 2) Router A | |-> (LID 3) Router B ------- Router B' (LID B) --| Router A and Router B are independent redundant devices, not a route cloud of some sort. B -> A' is not a possible path. So your idea is to do: PR0: Node 1 asks SA for Node1 -> Node2 reversable path. SA returns SLID=Node1 DLID=1, FlowLabel=Magic Reversable indicator. This path is used for CM GMPs, or for the normal non-routed CM. PR1: Detecting a routed situation from PR0, Node 1 asks SA for Node2 -> Node1. SA returns SLID=1 DLID=Node1 and a GRH that configures Router A to use SLID=1 You reverse the local LIDS from that path to get the QP configuration. PR2: Node 1 asks SA' for Node1 -> Node2. SA returns SLID=A DLID=Node. OK. But what if: PR1: Node 1 asks SA for Node2 -> Node1. SA returns SLID=3 DLID=Node1 PR2: Node 1 asks SA' for Node1 -> Node2. SA returns SLID=A DLID=Node2. Now the LIDs don't match and the QP won't work. SA' has no idea that SA picked Router B. > It shouldn't need information about the paths used by packets on the > remote subnet. If a subnet has multiple routers into it, they can > forward packets to the correct router if needed. (Could the routers > just forward to the end node and insert the expected SLID?) Right, this is a good way to solve the problem. Going with the example above, SA' returns a GRH that configures Router B' to use SLID=A and the GRH SA returned configures Router A to use SLID=3. Router B' and A both are faking the SLID in the LRH. This effectively defeats the QP SLID check and everything works :> [Like I said before, this check seems to be a misfeature] I can think of the following downsides: 1) Re-reading Michael Krause's email makes me think that defeating the QP SLID check is contrary to the spirit of IBA 2) Routers now require a GRH->LRH translation table size that is proportional to all the router LIDs in the subnet, not just its own LIDs. [Smart selection of the Flow Label could mitigate this growth though] 3) The reverse PR query method requires 3 PR queries for the simple case and as many as 5 if you want non-reversible paths. 4) Some means of remote SA communication needs to be decided pre-standardization :< (I agree that a magic GID seems best) But... It is the SLID faking that solves the multiple-router-path problem, not the reverse PR. Do you think something like that could be standardized? I guess the big question I have is if IBA chooses to standardize some other method, how much chance is there that it would also make this unsupportable? Ie by preventing the remote SA communication mechanism or by defining a reverse PR to mean something else? I could easially imagine the reverse PR being defined as a way to ask the local SA about the *remote* LIDs. [Actually, if you define it that way and use a MultiPathRecord query then there is enough information to return working LIDs for both subnets. The SAs would have to communicate between themselves and the routers using a new protocol, but that is doable. This does require that a PR be defined so that the LIDs are relative to the subnet of the SGID - not to the local subnet!] > I'm still trying to find a solution that doesn't violate the > architecture as defined. I don't see why my idea wouldn't work yet. > It just requires some unspecified coordination between the local SA > and local routers. I'd also very much like to not have to change the passive side to make this work. But this has turned into such a complex problem it seems really hard to predict what will pass through to standardization.. That is the main benifit I see of the small change to the passive side. No matter what is standardized it can be accomidated in the resulting standard, wheras defining a PR with SGID==offsubnet to mean one thing or another seems much more risky. Jason From devesh28 at gmail.com Sun Feb 11 21:10:48 2007 From: devesh28 at gmail.com (Devesh Sharma) Date: Mon, 12 Feb 2007 10:40:48 +0530 Subject: [openib-general] Immediate data question In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> Message-ID: <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> On 2/10/07, Tang, Changqing wrote: > > > > > >Not for the receiver, but the sender will be severely slowed down by > > >having to wait for the RNR timeouts. > > > > RNR = Receiver Not Ready so by definition, the data flow > > isn't going to > > progress until the receiver is ready to receive data. If a > > receive QP > > enters RNR for a RC, then it is likely not progressing as > > desired. RNR > > was initially put in place to enable a receiver to create > > back pressure to the sender without causing a fatal error > > condition. It should rarely be entered and therefore should > > have negligible impact on overall performance however when a > > RNR occurs, no forward progress will occur so performance is > > essentially zero. > > Mike: > I still do not quite understand this issue. I have two > situations that have RNR triggered. > > 1. process A and process B is connected with QP. A first post a send to > B, B does not post receive. Then A and B are doing a long time > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE > message. Finally B will post a receive. Does the first pending send in A > block all the later RDMA_WRITE ? According to IBTA spec HCA will process WR entries in strict order in which they are posted so the send will block all WR posted after this send, Until-unless HCA has multiple processing elements, I think even then processing order will be maintained by HCA If not, since RNR is triggered > periodically till B post receive, does it affect the RDMA_WRITE > performance between A and B ? > > 2. extend above to three processes, A connect to B, B connect to C, so B > has two QPs, but one CQ. A posts a send to B, B does not post receive, > rather B and C are doing a long time RDMA_WRITE, or send/recv. But B > must sends RNR periodically to A, right?. So does the pending message > from A affects B's overall performance between B and C ? > > Thank you. > > --CQ > > > > > > Mike > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From sweitzen at cisco.com Sun Feb 11 21:53:17 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 11 Feb 2007 21:53:17 -0800 Subject: [openib-general] no RDS in OFED 1.2? Message-ID: I don't see RDS in the feature freeze builds yet. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Sun Feb 11 23:30:31 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 12 Feb 2007 09:30:31 +0200 Subject: [openib-general] [openfabrics-ewg] no RDS in OFED 1.2? Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0DDD8@mtlexch01.mtl.com> Vlad is working on this. It will be in the alpha release Tziporet ________________________________ From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Monday, February 12, 2007 7:53 AM To: openfabrics-ewg at openib.org Cc: openib-general Subject: [openfabrics-ewg] no RDS in OFED 1.2? I don't see RDS in the feature freeze builds yet. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Mon Feb 12 02:24:14 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Mon, 12 Feb 2007 02:24:14 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070212-0200 daily build status Message-ID: <20070212102414.E14F9E60806@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Failed: From halr at voltaire.com Mon Feb 12 03:36:10 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Feb 2007 06:36:10 -0500 Subject: [openib-general] Problem with install.sh openib-diags OFED-1.2-20070208-1508.tgz In-Reply-To: References: Message-ID: <1171280132.31538.409786.camel@hal.voltaire.com> On Sun, 2007-02-11 at 13:44, Scott Weitzenkamp (sweitzen) wrote: > I'm using install.sh on RHEL4 U3 x86_64 > > Preparing... > ################################################## > kernel-ib-devel > ################################################## > kernel-ib > ################################################## > error: Failed dependencies: > perl(IBswcountlimits) This is supposed to be IBswcountlimits.pm. I think there was a change for the localtion of this to be under /lib/perl some days ago but not sure whether this change is in the OFED 1.2 install (for OFED-1.2-20070208). Vlad, do you know what is causing this error ? -- Hal > is needed by openib-diags-1.2.0-pre1.x86_64 > ERROR: Failed executing "/bin/rpm -ihv > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-\ > release-4AS-4.1/dapl-1.2.0-0.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat\ > -release-4AS-4.1/dapl-devel-1.2.0-0.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS\ > /redhat-release-4AS-4.1/libibcommon-1.0.2-0.x86_64.rpm > /tmp/OFED-1.2-20070208-1\ > 508/RPMS/redhat-release-4AS-4.1/libibcommon-devel-1.0.2-0.x86_64.rpm > /tmp/OFED-\ > 1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibmad-1.0.2-0.x86_64.rpm /tmp/\ > OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibmad-devel-1.0.2-0.x86_6\ > 4.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibumad-1.0.2-0\ > .x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibumad-d\ > evel-1.0.2-0.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1\ > /libibverbs-1.1-pre1.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release\ > -4AS-4.1/libibverbs-devel-1.1-pre1.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/\ > redhat-release-4AS-4.1/libibverbs-utils-1.1-pre1.x86_64.rpm > /tmp/OFED-1.2-20070\ > 208-1508/RPMS/redhat-release-4AS-4.1/libmthca-1.0.4-pre.x86_64.rpm > /tmp/OFED-1.\ > 2-20070208-1508/RPMS/redhat-release-4AS-4.1/libmthca-devel-1.0.4-pre.x86_64.rpm\ > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libopensm-3.0.1-0.x86_\ > 64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libopensm-devel-\ > 3.0.1-0.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libo\ > smcomp-3.0.1-0.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4\ > .1/libosmcomp-devel-3.0.1-0.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-\ > release-4AS-4.1/libosmvendor-3.0.1-0.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPM\ > S/redhat-release-4AS-4.1/libosmvendor-devel-3.0.1-0.x86_64.rpm > /tmp/OFED-1.2-20\ > 070208-1508/RPMS/redhat-release-4AS-4.1/librdmacm-0.9.0-0.x86_64.rpm > /tmp/OFED-\ > 1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/librdmacm-devel-0.9.0-0.x86_64.rp\ > m > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libsdp-1.1.99-0.x86_6\ > 4.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/openib-diags-1.2.\ > 0-pre1.x86_64.rpm > /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/perft\ > est-1.2-0.x86_64.rpm " > > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From bugzilla-daemon at lists.openfabrics.org Mon Feb 12 04:40:58 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 12 Feb 2007 04:40:58 -0800 (PST) Subject: [openib-general] [Bug 351] New: Routing table problem in SLES10 when using port #2 Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=351 Summary: Routing table problem in SLES10 when using port #2 Product: OpenFabrics Linux Version: 1.2 Platform: All OS/Version: SLES 10 Status: NEW Severity: major Priority: P1 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: yohadd at mellanox.co.il There is an issue with the routing table on SLES10 when using IB port #2. After host reboot the routing table contain two entries for 12.X.X.X !!! One of the entries is correct and point to ib1, the other one point to ib0. Route output: sw087:~ # route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.4.0.0 * 255.255.0.0 U 0 0 0 eth0 12.4.0.0 * 255.255.0.0 U 0 0 0 ib1 12.4.0.0 * 255.255.0.0 U 0 0 0 ib0 link-local * 255.255.0.0 U 0 0 0 eth0 11.4.0.0 * 255.255.0.0 U 0 0 0 ib0 loopback * 255.0.0.0 U 0 0 0 lo default 10.4.0.211 0.0.0.0 UG 0 0 0 eth0 The first entry for 12.4.0.0 point to I/F ib1, so in this configuration the ipoib over 12.4.X.X will work fine. After restarting the ib1 I/F with the ifconfig commands, the routing table changed (the order between the two 12.4.0.0 entries changed) and ipoib over 12.4.X.X will not work any more. Restarting ib1 I/F: sw087:~ # ifdown ib1 ib1 device: Mellanox Technologies MT23108 InfiniHost (rev a1) sw087:~ # ifup ib1 ib1 device: Mellanox Technologies MT23108 InfiniHost (rev a1) Route output: sw087:~ # route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.4.0.0 * 255.255.0.0 U 0 0 0 eth0 12.4.0.0 * 255.255.0.0 U 0 0 0 ib0 12.4.0.0 * 255.255.0.0 U 0 0 0 ib1 link-local * 255.255.0.0 U 0 0 0 eth0 11.4.0.0 * 255.255.0.0 U 0 0 0 ib0 loopback * 255.0.0.0 U 0 0 0 lo default 10.4.0.211 0.0.0.0 UG 0 0 0 eth0 Host info: sw087:~ # hostinfo Name =sw087 IP =10.4.3.87 CpuNum =4 CpuVendor =GenuineIntel CpuModel = Intel(R) Xeon(TM) CPU 3.20GHz CpuMhz =3200.190 MemSizeKb =4047700 MachType =x86_64 KernelRev =2.6.16.21-0.8-smp ChipSet =Intel Corporation E7520 Memory Controller Hub (rev 0c) Os =Welcome to SUSE Linux Enterprise Server 10 (x86_64) - Kernel \r (\l). IBDevsNum =1 HCA0Name =mthca0 HCA0Desc =sw087 HCA-1 HCA0Type =MT23108 HCA0FWVer =3.5.0 HCA0PSID =MT_0030000001 HCA0GUIDS =NODE:0x0002c9871297bce0;SYS:0x0002c9871297bce3 HCA0Ports =1:0x0002c9871297bce1:0x0:11.4.3.87:DOWN;2:0x0002c9871297bce2:0x0:12.4.3.87:INIT HCA1Name =NONE HCA1Desc =NONE HCA1Type =NONE HCA1FWVer =NONE HCA1PSID =NONE HCA1GUIDS =NONE HCA1Ports =NONE IBStack =/usr/local/ IBStackType =ofed IBStackVer =OFED-1.2-20070211-1558 IBMPI =/usr/local//mpi MST_BUILD =4.3.6 IBADM_BUILD =IBADM 2.1.0, 20060720-1410 WRITE_BW =/usr/local/bin/ib_write_bw -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at mellanox.co.il Mon Feb 12 05:05:56 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 12 Feb 2007 15:05:56 +0200 Subject: [openib-general] Problem with install.sh openib-diags OFED-1.2-20070208-1508.tgz In-Reply-To: <1171280132.31538.409786.camel@hal.voltaire.com> References: <1171280132.31538.409786.camel@hal.voltaire.com> Message-ID: <1171285556.6265.6.camel@vladsk-laptop> On Mon, 2007-02-12 at 06:36 -0500, Hal Rosenstock wrote: > On Sun, 2007-02-11 at 13:44, Scott Weitzenkamp (sweitzen) wrote: > > I'm using install.sh on RHEL4 U3 x86_64 > > > > Preparing... > > ################################################## > > kernel-ib-devel > > ################################################## > > kernel-ib > > ################################################## > > error: Failed dependencies: > > perl(IBswcountlimits) > > This is supposed to be IBswcountlimits.pm. I think there was a change > for the localtion of this to be under /lib/perl some days ago > but not sure whether this change is in the OFED 1.2 install (for > OFED-1.2-20070208). > > Vlad, do you know what is causing this error ? > > -- Hal I fixed this by adding the following line to ofa_user.spec file: Provides: perl(IBswcountlimits) -- Vladimir Sokolovsky Mellanox Technologies Ltd. From suri at baymicrosystems.com Mon Feb 12 06:27:12 2007 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Mon, 12 Feb 2007 09:27:12 -0500 Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation In-Reply-To: <1171288297.31538.417657.camel@hal.voltaire.com> References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com> <1170072757.4555.242192.camel@hal.voltaire.com> <039701c7494b$6bd5d860$1914a8c0@surioffice> <1171050441.31538.180858.camel@hal.voltaire.com> <048101c74c91$e0f54dd0$1914a8c0@surioffice> <1171288297.31538.417657.camel@hal.voltaire.com> Message-ID: <04ba01c74eb1$e77fd180$1914a8c0@surioffice> Hal: > > Ref: comment on mad.c (ib_mad_recv_done_handler(). > > > > Even if I make the relevant changes to smi.c functions how do I get the > > packet to get forwarded, without making additional changes in this > function? > > > > Meaning, when smi_handle_dr_smp_send(),smi_check_forward_dr_smp() are > called > > and you determine that the packet has to be forwarded instead of > consuming > > where do you actually do the send? I think this chain is missing! > > My initial thought was what I wrote but in looking at this further, as > you point out, the SMI routines are only updating the packet and > indicating its disposition. The actual sending needs to be elsewhere. > I'm not sure what the code ends up looking like with the changes > suggested and would just like this to look as clean as possible and use > the SMI routines where appropriate here. Does this make sense ? > I am not sure I follow this last statement. From suri at baymicrosystems.com Mon Feb 12 06:13:25 2007 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Mon, 12 Feb 2007 09:13:25 -0500 Subject: [openib-general] FW: patches to 2.6.19.1 kernel for switch Operation Message-ID: <04b901c74eaf$f934aa10$1914a8c0@surioffice> Just copying the list. -----Original Message----- From: Suresh Shelvapille [mailto:suri at baymicrosystems.com] Sent: Friday, February 09, 2007 4:33 PM To: 'Hal Rosenstock' Subject: RE: patches to 2.6.19.1 kernel for switch Operation Hal: Many thanks for your response, Ref: comment on mad.c (ib_mad_recv_done_handler(). Even if I make the relevant changes to smi.c functions how do I get the packet to get forwarded, without making additional changes in this function? Meaning, when smi_handle_dr_smp_send(),smi_check_forward_dr_smp() are called and you determine that the packet has to be forwarded instead of consuming where do you actually do the send? I think this chain is missing! Thanks, Suri > + if (!agent_send_response(&response->mad.mad, > + &response->grh, wc, > + port_priv->device, > + port_num, > + qp_info->qp->qp_num)) > + response = NULL; > > Per the above change, it appears that smi_check_forward_dr_smp and > smi_handle_dr_smp_send are no longer used at least here > (smi_check_forward_dr_smp is not used at all with this change). Couldn't > these be fixed to do the right thing for this case (as well as existing > cases) ? I'm not sure your changes work for end ports (CA and router > ports). > > Also, based on smi comments below, there might also be changes to > following: > + if (!ib_get_smp_direction(&recv->mad.smp)) > + port_num = > recv->mad.smp.initial_path[recv->mad.smp.hop_ptr+1]; > + else > + port_num = > recv->mad.smp.return_path[recv->mad.smp.hop_ptr-1]; > + > From halr at voltaire.com Mon Feb 12 06:43:26 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Feb 2007 09:43:26 -0500 Subject: [openib-general] FW: patches to 2.6.19.1 kernel for switch Operation In-Reply-To: <04b901c74eaf$f934aa10$1914a8c0@surioffice> References: <04b901c74eaf$f934aa10$1914a8c0@surioffice> Message-ID: <1171291368.31538.420357.camel@hal.voltaire.com> On Mon, 2007-02-12 at 09:13, Suresh Shelvapille wrote: > Just copying the list. > > -----Original Message----- > From: Suresh Shelvapille [mailto:suri at baymicrosystems.com] > Sent: Friday, February 09, 2007 4:33 PM > To: 'Hal Rosenstock' > Subject: RE: patches to 2.6.19.1 kernel for switch Operation > > > > Hal: > > Many thanks for your response, > > Ref: comment on mad.c (ib_mad_recv_done_handler(). > > Even if I make the relevant changes to smi.c functions how do I get the > packet to get forwarded, without making additional changes in this function? > > Meaning, when smi_handle_dr_smp_send(),smi_check_forward_dr_smp() are called > and you determine that the packet has to be forwarded instead of consuming > where do you actually do the send? I think this chain is missing! My initial thought was what I wrote but in looking at this further, as you point out, the SMI routines are only updating the packet and indicating its disposition. The actual sending needs to be elsewhere. I'm not sure what the code ends up looking like with the changes suggested and would just like this to look as clean as possible and use the SMI routines where appropriate here. Does this make sense ? -- Hal > Thanks, > Suri > > > + if (!agent_send_response(&response->mad.mad, > > + &response->grh, wc, > > + port_priv->device, > > + port_num, > > + qp_info->qp->qp_num)) > > + response = NULL; > > > > Per the above change, it appears that smi_check_forward_dr_smp and > > smi_handle_dr_smp_send are no longer used at least here > > (smi_check_forward_dr_smp is not used at all with this change). Couldn't > > these be fixed to do the right thing for this case (as well as existing > > cases) ? I'm not sure your changes work for end ports (CA and router > > ports). > > > > Also, based on smi comments below, there might also be changes to > > following: > > + if (!ib_get_smp_direction(&recv->mad.smp)) > > + port_num = > > recv->mad.smp.initial_path[recv->mad.smp.hop_ptr+1]; > > + else > > + port_num = > > recv->mad.smp.return_path[recv->mad.smp.hop_ptr-1]; > > + > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Mon Feb 12 06:59:14 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 08:59:14 -0600 Subject: [openib-general] OFED 1.2 build problem Message-ID: <1171292354.16167.9.camel@stevo-desktop> Dunno if this has already been resolved? Building the 20070208-1508 OFED 1.2 kit. RHEL3U4 with that distro's kernel. Ran build.sh and selected "all". It fails building ipath: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c:44:22: linux/io.h: No such file or directory /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c: In function `ipath_diag_open': /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c:283: warning: implicit declaration of function `mutex_lock' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c:308: warning: implicit declaration of function `mutex_unlock' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c: In function `ipath_diagpkt_write': /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c:429: warning: implicit declaration of function `__iowrite32_copy' make[4]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.o] Error 1 make[3]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband] Error 2 make[1]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2] Error 2 make[1]: Leaving directory `/usr/src/kernels/2.6.9-42.EL-smp-x86_64' make: *** [kernel] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.66790 (%install) From halr at voltaire.com Mon Feb 12 07:18:46 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Feb 2007 10:18:46 -0500 Subject: [openib-general] [PATCH TRIVIAL] opensm: remove #ifdef __WIN__ in not shared file. In-Reply-To: <20070208231412.GA22807@sashak.voltaire.com> References: <20070208231412.GA22807@sashak.voltaire.com> Message-ID: <1171293506.31538.422142.camel@hal.voltaire.com> On Thu, 2007-02-08 at 18:14, Sasha Khapyorsky wrote: > opensm/main.c is not shared by win OpenSM, and #ifdef __WIN__ is not > needed here. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From mshefty at ichips.intel.com Mon Feb 12 08:19:34 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Feb 2007 08:19:34 -0800 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <1171135423.11017.61.camel@stevo-desktop> References: <1171035668.26453.11.camel@trinity.ogc.int> <1171135423.11017.61.camel@stevo-desktop> Message-ID: <45D09396.5060603@ichips.intel.com> > This design is based on the RDMA_CM and IB_CM behavior. If the app > issues the destroy via rdma_destroy_cm_id, then we block that thread > until all references are gone. If the app returns non-zero in a > callback for a given cm_id, then the CM owns destroying the cm_id and > the application is done with it. That's the short of it. Here's the > long of it: Note that the goal of this behavior is simply to ensure that no thread will touch any code in their callback after destroying their cm_id. That is all that needs to be guaranteed, if this helps any. - Sean From swise at opengridcomputing.com Mon Feb 12 08:20:07 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 10:20:07 -0600 Subject: [openib-general] [PATCH] ofed_1-2 IWCM - Set iniator depth and responder resources to device max values. In-Reply-To: <1171223899.4027.1.camel@linux-q667.site> References: <1171223899.4027.1.camel@linux-q667.site> Message-ID: <1171297207.16167.24.camel@stevo-desktop> BTW: We need this for the alpha1 build or DAPL applications won't work over iWARP devices. Steve. On Sun, 2007-02-11 at 13:58 -0600, Steve WIse wrote: > IWCM - Set initiator depth and responder resources to device max values. > > For OFED 1.2, the IWCM will set the initiator depth and responder > resources to the device max values for new connect request events. > > > Signed-off-by: Steve Wise > --- > > kernel_patches/fixes/iwcm_ordird.patch | 43 ++++++++++++++++++++++++++++++++ > 1 files changed, 43 insertions(+), 0 deletions(-) > > diff --git a/kernel_patches/fixes/iwcm_ordird.patch b/kernel_patches/fixes/iwcm_ordird.patch > new file mode 100644 > index 0000000..3a9f643 > --- /dev/null > +++ b/kernel_patches/fixes/iwcm_ordird.patch > @@ -0,0 +1,43 @@ > +commit 7175034c7adf6b5fb5ba311929376af7501387a1 > +Author: Steve Wise > +Date: Sat Feb 10 14:16:35 2007 -0600 > + > + IWCM - Set iniator depth and responder resources to device max values. > + > + For OFED 1.2, the IWCM will set the initiator depth and responder > + resources to the device max values for new connect request events. > + > + Signed-off-by: Steve Wise > + > +diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > +index 9e0ab04..e3afdf8 100644 > +--- a/drivers/infiniband/core/cma.c > ++++ b/drivers/infiniband/core/cma.c > +@@ -1137,6 +1137,7 @@ static int iw_conn_req_handler(struct iw > + struct net_device *dev = NULL; > + struct rdma_cm_event event; > + int ret; > ++ struct ib_device_attr attr; > + > + listen_id = cm_id->context; > + atomic_inc(&listen_id->dev_remove); > +@@ -1189,10 +1190,19 @@ static int iw_conn_req_handler(struct iw > + sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; > + *sin = iw_event->remote_addr; > + > ++ ret = ib_query_device(conn_id->id.device, &attr); > ++ if (ret) { > ++ cma_release_remove(conn_id); > ++ rdma_destroy_id(new_cm_id); > ++ goto out; > ++ } > ++ > + memset(&event, 0, sizeof event); > + event.event = RDMA_CM_EVENT_CONNECT_REQUEST; > + event.param.conn.private_data = iw_event->private_data; > + event.param.conn.private_data_len = iw_event->private_data_len; > ++ event.param.conn.initiator_depth = attr.max_qp_init_rd_atom; > ++ event.param.conn.responder_resources = attr.max_qp_rd_atom; > + ret = conn_id->id.event_handler(&conn_id->id, &event); > + if (ret) { > + /* User wants to destroy the CM ID */ > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From changquing.tang at hp.com Mon Feb 12 08:21:57 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 12 Feb 2007 16:21:57 -0000 Subject: [openib-general] Immediate data question In-Reply-To: <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403685756@G3W0634.americas.hpqcorp.net> > > 1. process A and process B is connected with QP. A first > post a send > > to B, B does not post receive. Then A and B are doing a long time > > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE > > message. Finally B will post a receive. Does the first > pending send in > > A block all the later RDMA_WRITE ? > According to IBTA spec HCA will process WR entries in strict > order in which they are posted so the send will block all WR > posted after this send, Until-unless HCA has multiple > processing elements, I think even then processing order will > be maintained by HCA Thanks, I can not use such code style. > > > > 2. extend above to three processes, A connect to B, B > connect to C, so > > B has two QPs, but one CQ. A posts a send to B, B does not post > > receive, rather B and C are doing a long time RDMA_WRITE, or > > send/recv. But B must sends RNR periodically to A, right?. > So does the > > pending message from A affects B's overall performance > between B and C ? Do you have any idea about this second situation ? --CQ > > > > Thank you. > > > > --CQ > > > > > > > > > > Mike > > > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > From tom at opengridcomputing.com Mon Feb 12 08:37:33 2007 From: tom at opengridcomputing.com (Tom Tucker) Date: Mon, 12 Feb 2007 10:37:33 -0600 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <45D09396.5060603@ichips.intel.com> References: <1171035668.26453.11.camel@trinity.ogc.int> <1171135423.11017.61.camel@stevo-desktop> <45D09396.5060603@ichips.intel.com> Message-ID: <1171298253.12228.9.camel@trinity.ogc.int> On Mon, 2007-02-12 at 08:19 -0800, Sean Hefty wrote: > > This design is based on the RDMA_CM and IB_CM behavior. If the app > > issues the destroy via rdma_destroy_cm_id, then we block that thread > > until all references are gone. If the app returns non-zero in a > > callback for a given cm_id, then the CM owns destroying the cm_id and > > the application is done with it. That's the short of it. Here's the > > long of it: > > Note that the goal of this behavior is simply to ensure that no thread will > touch any code in their callback after destroying their cm_id. That is all that > needs to be guaranteed, if this helps any. It help a lot actually. We've discussed simplifying this code by not blocking the destroy and guaranteeing that events received after the destroy are never delivered, but we didn't want to do something this drastic without some time to get it right. > > - Sean From ossrosch at linux.vnet.ibm.com Mon Feb 12 08:36:34 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Mon, 12 Feb 2007 17:36:34 +0100 Subject: [openib-general] 32-bit build for ppc64 is required Message-ID: <200702121736.35468.ossrosch@linux.vnet.ibm.com> Hi, after building the latest ofed build package we recognized that on PPC64 only 64-bit libaries were build. Because we have customers using older userpace apllications which are certified for 32-bit we think additional 32bit support is a requirement for 64bit builds. If OFED 1.2 supports 32 bit on ppc64, we have to change the install directory.I would suggest to install 32-bit binaries into /usr/local/ofed/bin32 directory. So no changes on current naming conventions has to be done.The libaries are installed in the /usr/local/ofed/lib directory. Feedback appriciated. Kind regards Stefan Roscher From tziporet at mellanox.co.il Mon Feb 12 08:42:10 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 12 Feb 2007 18:42:10 +0200 Subject: [openib-general] OFED 1.2 components list - for the meeting today Message-ID: <45D098E2.6000804@mellanox.co.il> This is the full OFED 1.2 components list that we will review in the meeting Tziporet # Kernel ib_verbs (core) ib_mthca ib_ipoib ib_ipath - currently works on 2.6.20 only. Backport patches cannot applied ib_iser ib_sdp ib_srp ib_ehca - PPC only cxgb3 vnic rds - currently works on kernel 2.6.20 and 2.6.19 ib-bonding - RHEL4UP3 & SLES10 # User libraries libibverbs libibcm libmthca libipathverbs libcxgb3 libsdp libehca sdpnetstat libibcommon libibmad libibumad libopensm libosmcomp libosmvendor librdmacm dapl - not working with iWARP # User utilities perftest mstflint ibutils opensm qlvnictools openib-diags srptools ipoibtools tvflash # MPI: mvapich mvapich2 - Build issue openmpi mpitests # OFED specific: ofed_docs - taken from 1.1 - not yet updated for 1.2 ofed_scripts From halr at voltaire.com Mon Feb 12 08:49:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Feb 2007 11:49:15 -0500 Subject: [openib-general] OFED 1.2 components list - for the meeting today In-Reply-To: <45D098E2.6000804@mellanox.co.il> References: <45D098E2.6000804@mellanox.co.il> Message-ID: <1171298946.31538.427171.camel@hal.voltaire.com> On Mon, 2007-02-12 at 11:42, Tziporet Koren wrote: > This is the full OFED 1.2 components list that we will review in the meeting > > Tziporet > > # Kernel > ib_verbs (core) > ib_mthca > ib_ipoib > ib_ipath - currently works on 2.6.20 only. Backport patches cannot applied > ib_iser > ib_sdp > ib_srp > ib_ehca - PPC only > cxgb3 > vnic > rds - currently works on kernel 2.6.20 and 2.6.19 > ib-bonding - RHEL4UP3 & SLES10 Was ib_madeye carried over from OFED 1.1 or does this need to be added for OFED 1.2 ? -- Hal > # User libraries > libibverbs > libibcm > libmthca > libipathverbs > libcxgb3 > libsdp > libehca > sdpnetstat > libibcommon > libibmad > libibumad > libopensm > libosmcomp > libosmvendor > librdmacm > dapl - not working with iWARP > > # User utilities > perftest > mstflint > ibutils > opensm > qlvnictools > openib-diags > srptools > ipoibtools > tvflash > > # MPI: > mvapich > mvapich2 - Build issue > openmpi > mpitests > > # OFED specific: > ofed_docs - taken from 1.1 - not yet updated for 1.2 > ofed_scripts > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Mon Feb 12 08:55:36 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 10:55:36 -0600 Subject: [openib-general] cxgb3 compilation fails on RHEL4.0U3 In-Reply-To: <1171296412.6265.24.camel@vladsk-laptop> References: <1171296412.6265.24.camel@vladsk-laptop> Message-ID: <1171299336.16167.31.camel@stevo-desktop> I only backported to RHEL4U4 since that was the supported platform. Is OFED 1.2 supporting U3 too? I can add the backport if needed. On Mon, 2007-02-12 at 18:06 +0200, Vladimir Sokolovsky wrote: > Hi Steve, > I got the following compilation failure on RHEL4.0U3 (2.6.9-34.ELsmp): > > gcc -Wp,-MD,/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/.cxgb3_offload.o.d -nostdinc -iwithprefix include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.9_U3/include/ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include -Iinclude -include include/linux/autoconf.h -include /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/linux/autoconf.h -D__KERNEL__ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.9_U3/include/ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include -Iinclude -include include/linux/autoconf.h -include /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/linux/autoconf.h -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -fomit-frame-pointer -g -Wdeclaration-after-statement -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare - f! > unit-at-a-time -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/debug -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/cxgb3/core -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3 -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/rds -DMODULE -DKBUILD_BASENAME=cxgb3_offload -DKBUILD_MODNAME=cxgb3 -c -o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/.tmp_cxgb3_offload.o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: error: syntax error before "adapter_list_lock" > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: warning: type defaults to `int' in declaration of `adapter_list_lock' > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: error: incompatible types in initialization > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: warning: data definition has no type or storage class > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `is_offloading': > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:885: warning: passing arg 1 of `_read_lock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:889: warning: passing arg 1 of `_read_unlock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:894: warning: passing arg 1 of `_read_unlock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `add_adapter': > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1062: warning: passing arg 1 of `_write_lock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1064: warning: passing arg 1 of `_write_unlock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `remove_adapter': > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1069: warning: passing arg 1 of `_write_lock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1071: warning: passing arg 1 of `_write_unlock_bh' from incompatible pointer type > make[3]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.o] Error 1 > make[2]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3] Error 2 > make[1]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2] Error 2 > make[1]: Leaving directory `/usr/src/kernels/2.6.9-34.EL-smp-x86_64' > make: *** [kernel] Error 2 > > From mshefty at ichips.intel.com Mon Feb 12 09:23:06 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Feb 2007 09:23:06 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070211230935.GT11411@obsidianresearch.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> Message-ID: <45D0A27A.2010302@ichips.intel.com> > Ah, I think I missed the key step in your scheme.. You plan to query > the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I > was thinking only about the SGID=local DGID=remote query direction) I'm not sure that the query needs the GIDs reversed, as long as the path is reversible. So, the local query would be: SGID=local, DGID=remote, reversible=1 And the remote query would be: SGID=local, DGID=remote, reversible=1, TClass & FlowLabel=from previous query response Use of reversible indicates that the remote side can send a packet back, and it will be received successfully at the local side. This seems to imply information about the local routing tables and GID to LID mappings. That is, packets traveling from the SGID->DGID and DGID->SGID use the same local LID pair. > SA SA' > Node1 --> (LID 1) Router A ------- Router A' (LID A) ---> Node2 > |-> (LID 2) Router A | > |-> (LID 3) Router B ------- Router B' (LID B) --| > > Router A and Router B are independent redundant devices, not a route > cloud of some sort. B -> A' is not a possible path. Since A' and B' connect to the same subnet, B -> A' should be a valid path. > So your idea is to do: > PR0: Node 1 asks SA for Node1 -> Node2 reversable path. > SA returns SLID=Node1 DLID=1, FlowLabel=Magic Reversable > indicator. This path is used for CM GMPs, or for the > normal non-routed CM. > PR1: Detecting a routed situation from PR0, > Node 1 asks SA for Node2 -> Node1. SA returns SLID=1 > DLID=Node1 and a GRH that configures Router A to use SLID=1 > You reverse the local LIDS from that path to get the QP > configuration. I think PR0 and PR1 could be the same. > I can think of the following downsides: > 1) Re-reading Michael Krause's email makes me think that defeating > the QP SLID check is contrary to the spirit of IBA I don't think we need to defeat the QP SLID check if we want extra routing, but having redundant routers use the same link layer address isn't necessarily a bad thing. > 4) Some means of remote SA communication needs to be decided > pre-standardization :< (I agree that a magic GID seems best) I think this is the first thing that must be solved, regardless of other details. We should see if we can at least get agreement on this, and if there are any issues. > But this has turned into such a complex problem it seems really hard > to predict what will pass through to standardization.. That is the > main benifit I see of the small change to the passive side. No matter > what is standardized it can be accomidated in the resulting > standard, wheras defining a PR with SGID==offsubnet to mean one thing > or another seems much more risky. I think the only thing we're asking for so far is a magic GID, unless I'm reading too much into what a reversible path indicates. - Sean From robert.j.woodruff at intel.com Mon Feb 12 09:58:43 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 12 Feb 2007 09:58:43 -0800 Subject: [openib-general] OFED 1.2 components list - for the meeting today In-Reply-To: <45D098E2.6000804@mellanox.co.il> Message-ID: BTW. Is the ibdiagui code going to be part of this release. I did not see it in the list below or is it just part of the openib-diags ? I thought that we discussed this as an OFED 1.2 feature. I have someone that is interested in trying it out. woody -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Monday, February 12, 2007 8:42 AM To: OPENIB; EWG Subject: [openib-general] OFED 1.2 components list - for the meeting today This is the full OFED 1.2 components list that we will review in the meeting Tziporet # Kernel ib_verbs (core) ib_mthca ib_ipoib ib_ipath - currently works on 2.6.20 only. Backport patches cannot applied ib_iser ib_sdp ib_srp ib_ehca - PPC only cxgb3 vnic rds - currently works on kernel 2.6.20 and 2.6.19 ib-bonding - RHEL4UP3 & SLES10 # User libraries libibverbs libibcm libmthca libipathverbs libcxgb3 libsdp libehca sdpnetstat libibcommon libibmad libibumad libopensm libosmcomp libosmvendor librdmacm dapl - not working with iWARP # User utilities perftest mstflint ibutils opensm qlvnictools openib-diags srptools ipoibtools tvflash # MPI: mvapich mvapich2 - Build issue openmpi mpitests # OFED specific: ofed_docs - taken from 1.1 - not yet updated for 1.2 ofed_scripts _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Mon Feb 12 10:07:49 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 12:07:49 -0600 Subject: [openib-general] cxgb3 compilation fails on RHEL4.0U3 In-Reply-To: <1171302989.12725.0.camel@vladsk-laptop> References: <1171296412.6265.24.camel@vladsk-laptop> <1171299336.16167.31.camel@stevo-desktop> <1171302989.12725.0.camel@vladsk-laptop> Message-ID: <1171303669.16167.48.camel@stevo-desktop> On Mon, 2007-02-12 at 19:56 +0200, Vladimir Sokolovsky wrote: > On Mon, 2007-02-12 at 10:55 -0600, Steve Wise wrote: > > I only backported to RHEL4U4 since that was the supported platform. > > > > Is OFED 1.2 supporting U3 too? > > > > I can add the backport if needed. > > > > > RHEL4U3 is not officially supported but there are some patches for cxgb3 > under kernel_patches/backport/2.6.9_U3: > > kernel_patches/backport/2.6.9_U3/cxgb3_main_to_2_6_13.patch > kernel_patches/backport/2.6.9_U3/cxgb3_makefile_to_2_6_19.patch Looks like Michael added this with commit: ea110866d640317fe889abdc3aaba317ae20da65 For alpha1, please just don't build cxgb3/libcxb3 for RHEL4U3. Steve. From todd.rimmer at qlogic.com Mon Feb 12 10:17:38 2007 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Mon, 12 Feb 2007 12:17:38 -0600 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45D0A27A.2010302@ichips.intel.com> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE061191A5D8F@EPEXCH2.qlogic.org> > From: Sean Hefty > Sent: Monday, February 12, 2007 12:23 PM > To: Jason Gunthorpe; Hal Rosenstock > Cc: openib-general at openib.org > Subject: Re: [openib-general] Problem is routing CM REQ There has been a lot of good discussion and proposed designs for this solution. I think it would be very helpful for Sean and Jason to put together a single living document (which could become a kernel Documents/ file later) summarizing the present proposed solution and the expectations from each component (router, SM/SA, CM, etc). That would certainly be a lot easier to follow than attempting to piece together the conclusions from this long email chain. It would also likely avoid omissions and allow for easier review by a larger audience. Thank you, Todd Rimmer From goatsbenefactresses at draka.fr Mon Feb 12 11:40:57 2007 From: goatsbenefactresses at draka.fr (Trey Irek) Date: Mon, 12 Feb 2007 18:40:57 -0060 Subject: [openib-general] Fwd: MHII Message-ID: <1ADU1MZRRH0_VN9CM_HKM9IB@draka.fr> Good day Todays market started and we have the latest news for investors: MHII at OBB Last: 0.02 We know that you have a stake in fresh and live data only.The effectiveness of your and our work depends on truthful information and live data. That is why we offer you only online news which is represent the facts. MHII is on vantage point now and the comapny is going to increase their positions. Don't lose time. It's better moment to act now. Call your broker now. els0o2bakjc1m1zy3kd0bvd839ikitymqnx9sel 746A6E326A6A6645757367746C7573676C737771746A6E3377 9WADC4CIVTE546J9GUPM90EDVZHUTOETU96XQ From halr at voltaire.com Mon Feb 12 11:07:57 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Feb 2007 14:07:57 -0500 Subject: [openib-general] OFED 1.2 components list - for the meeting today In-Reply-To: References: Message-ID: <1171307245.31538.434613.camel@hal.voltaire.com> On Mon, 2007-02-12 at 12:58, Woodruff, Robert J wrote: > BTW. > > Is the ibdiagui code going to be part of this release. > I did not see it in the list below or is it just part of > the openib-diags ? It's part of ibutils. -- Hal > I thought that we discussed this as an OFED 1.2 feature. > I have someone that is interested in trying it out. > > woody > > > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren > Sent: Monday, February 12, 2007 8:42 AM > To: OPENIB; EWG > Subject: [openib-general] OFED 1.2 components list - for the meeting > today > > This is the full OFED 1.2 components list that we will review in the > meeting > > Tziporet > > # Kernel > ib_verbs (core) > ib_mthca > ib_ipoib > ib_ipath - currently works on 2.6.20 only. Backport patches cannot > applied > ib_iser > ib_sdp > ib_srp > ib_ehca - PPC only > cxgb3 > vnic > rds - currently works on kernel 2.6.20 and 2.6.19 > ib-bonding - RHEL4UP3 & SLES10 > > # User libraries > libibverbs > libibcm > libmthca > libipathverbs > libcxgb3 > libsdp > libehca > sdpnetstat > libibcommon > libibmad > libibumad > libopensm > libosmcomp > libosmvendor > librdmacm > dapl - not working with iWARP > > # User utilities > perftest > mstflint > ibutils > opensm > qlvnictools > openib-diags > srptools > ipoibtools > tvflash > > # MPI: > mvapich > mvapich2 - Build issue > openmpi > mpitests > > # OFED specific: > ofed_docs - taken from 1.1 - not yet updated for 1.2 > ofed_scripts > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Mon Feb 12 11:11:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 11:11:06 -0800 Subject: [openib-general] [PATCH 4 of 4] IB/mthca: give reserved MTTs a separate cache line In-Reply-To: <20070210211726.GE14903@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 10 Feb 2007 23:17:26 +0200") References: <20070210211726.GE14903@mellanox.co.il> Message-ID: Thanks, applied as 2 separate patches. From swise at opengridcomputing.com Mon Feb 12 11:30:10 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 13:30:10 -0600 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. Message-ID: <1171308610.16167.69.camel@stevo-desktop> Roland, can you review this? ----- From: Steve Wise Currently iw_cxgb3 uses the physical address as the key/offset to return to the user process for maping kernel memory into userspace. The user process then calls mmap() using this key as the offset. Because the physical address is 64 bits, this introduces a problem with 32-bit userspace, which might not be able to pass an arbitrary 64-bit address back into the kernel (since mmap2() is limited to a 32-bit number of pages for the offset, which limits it to 44-bit addresses). Change the mmap logic to use a u32 counter as the offset for mapping. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 66 +++++++++++++++++---------- drivers/infiniband/hw/cxgb3/iwch_provider.h | 13 +++-- drivers/infiniband/hw/cxgb3/iwch_user.h | 6 +- 3 files changed, 52 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index d02cd72..b2c88d6 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -115,7 +115,7 @@ static struct ib_ucontext *iwch_alloc_uc struct iwch_dev *rhp = to_iwch_dev(ibdev); PDBG("%s ibdev %p\n", __FUNCTION__, ibdev); - context = kmalloc(sizeof(*context), GFP_KERNEL); + context = kzalloc(sizeof(*context), GFP_KERNEL); if (!context) return ERR_PTR(-ENOMEM); cxio_init_ucontext(&rhp->rdev, &context->uctx); @@ -141,13 +141,14 @@ static int iwch_destroy_cq(struct ib_cq } static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, - struct ib_ucontext *context, + struct ib_ucontext *ib_context, struct ib_udata *udata) { struct iwch_dev *rhp; struct iwch_cq *chp; struct iwch_create_cq_resp uresp; struct iwch_create_cq_req ureq; + struct iwch_ucontext *ucontext = NULL; PDBG("%s ib_dev %p entries %d\n", __FUNCTION__, ibdev, entries); rhp = to_iwch_dev(ibdev); @@ -155,12 +156,15 @@ static struct ib_cq *iwch_create_cq(stru if (!chp) return ERR_PTR(-ENOMEM); - if (context && !t3a_device(rhp)) { - if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) { - kfree(chp); - return ERR_PTR(-EFAULT); + if (ib_context) { + ucontext = to_iwch_ucontext(ib_context); + if (!t3a_device(rhp)) { + if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) { + kfree(chp); + return ERR_PTR(-EFAULT); + } + chp->user_rptr_addr = (u32 __user *)(unsigned long)ureq.user_rptr_addr; } - chp->user_rptr_addr = (u32 __user *)(unsigned long)ureq.user_rptr_addr; } if (t3a_device(rhp)) { @@ -190,7 +194,7 @@ static struct ib_cq *iwch_create_cq(stru init_waitqueue_head(&chp->wait); insert_handle(rhp, &rhp->cqidr, chp, chp->cq.cqid); - if (context) { + if (ucontext) { struct iwch_mm_entry *mm; mm = kmalloc(sizeof *mm, GFP_KERNEL); @@ -200,16 +204,20 @@ static struct ib_cq *iwch_create_cq(stru } uresp.cqid = chp->cq.cqid; uresp.size_log2 = chp->cq.size_log2; - uresp.physaddr = virt_to_phys(chp->cq.queue); + spin_lock(&ucontext->mmap_lock); + uresp.key = ucontext->key; + ucontext->key += PAGE_SIZE; + spin_unlock(&ucontext->mmap_lock); if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) { kfree(mm); iwch_destroy_cq(&chp->ibcq); return ERR_PTR(-EFAULT); } - mm->addr = uresp.physaddr; + mm->key = uresp.key; + mm->addr = virt_to_phys(chp->cq.queue); mm->len = PAGE_ALIGN((1UL << uresp.size_log2) * sizeof (struct t3_cqe)); - insert_mmap(to_iwch_ucontext(context), mm); + insert_mmap(ucontext, mm); } PDBG("created cqid 0x%0x chp %p size 0x%0x, dma_addr 0x%0llx\n", chp->cq.cqid, chp, (1 << chp->cq.size_log2), @@ -316,14 +324,14 @@ static int iwch_arm_cq(struct ib_cq *ibc static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) { int len = vma->vm_end - vma->vm_start; - u64 pgaddr = vma->vm_pgoff << PAGE_SHIFT; + u32 key = vma->vm_pgoff << PAGE_SHIFT; struct cxio_rdev *rdev_p; int ret = 0; struct iwch_mm_entry *mm; struct iwch_ucontext *ucontext; - PDBG("%s off 0x%lx addr 0x%llx len %d\n", __FUNCTION__, vma->vm_pgoff, - pgaddr, len); + PDBG("%s pgoff 0x%lx key 0x%x len %d\n", __FUNCTION__, vma->vm_pgoff, + key, len); if (vma->vm_start & (PAGE_SIZE-1)) { return -EINVAL; @@ -332,13 +340,13 @@ static int iwch_mmap(struct ib_ucontext rdev_p = &(to_iwch_dev(context->device)->rdev); ucontext = to_iwch_ucontext(context); - mm = remove_mmap(ucontext, pgaddr, len); + mm = remove_mmap(ucontext, key, len); if (!mm) return -EINVAL; kfree(mm); - if ((pgaddr >= rdev_p->rnic_info.udbell_physbase) && - (pgaddr < (rdev_p->rnic_info.udbell_physbase + + if ((mm->addr >= rdev_p->rnic_info.udbell_physbase) && + (mm->addr < (rdev_p->rnic_info.udbell_physbase + rdev_p->rnic_info.udbell_len))) { /* @@ -351,15 +359,17 @@ static int iwch_mmap(struct ib_ucontext vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND; vma->vm_flags &= ~VM_MAYREAD; - ret = io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - len, vma->vm_page_prot); + ret = io_remap_pfn_range(vma, vma->vm_start, + mm->addr >> PAGE_SHIFT, + len, vma->vm_page_prot); } else { /* * Map WQ or CQ contig dma memory... */ - ret = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - len, vma->vm_page_prot); + ret = remap_pfn_range(vma, vma->vm_start, + mm->addr >> PAGE_SHIFT, + len, vma->vm_page_prot); } return ret; @@ -853,18 +863,24 @@ static struct ib_qp *iwch_create_qp(stru uresp.size_log2 = qhp->wq.size_log2; uresp.sq_size_log2 = qhp->wq.sq_size_log2; uresp.rq_size_log2 = qhp->wq.rq_size_log2; - uresp.physaddr = virt_to_phys(qhp->wq.queue); - uresp.doorbell = qhp->wq.udb; + spin_lock(&ucontext->mmap_lock); + uresp.key = ucontext->key; + ucontext->key += PAGE_SIZE; + uresp.db_key = ucontext->key; + ucontext->key += PAGE_SIZE; + spin_unlock(&ucontext->mmap_lock); if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) { kfree(mm1); kfree(mm2); iwch_destroy_qp(&qhp->ibqp); return ERR_PTR(-EFAULT); } - mm1->addr = uresp.physaddr; + mm1->key = uresp.key; + mm1->addr = virt_to_phys(qhp->wq.queue); mm1->len = PAGE_ALIGN(wqsize * sizeof (union t3_wr)); insert_mmap(ucontext, mm1); - mm2->addr = uresp.doorbell & PAGE_MASK; + mm2->key = uresp.db_key; + mm2->addr = qhp->wq.udb & PAGE_MASK; mm2->len = PAGE_SIZE; insert_mmap(ucontext, mm2); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index b2eb29e..463e746 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -184,6 +184,7 @@ struct ib_qp *iwch_get_qp(struct ib_devi struct iwch_ucontext { struct ib_ucontext ibucontext; struct cxio_ucontext uctx; + u32 key; spinlock_t mmap_lock; struct list_head mmaps; }; @@ -196,11 +197,12 @@ static inline struct iwch_ucontext *to_i struct iwch_mm_entry { struct list_head entry; u64 addr; + u32 key; unsigned len; }; static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext, - u64 addr, unsigned len) + u32 key, unsigned len) { struct list_head *pos, *nxt; struct iwch_mm_entry *mm; @@ -209,11 +211,11 @@ static inline struct iwch_mm_entry *remo list_for_each_safe(pos, nxt, &ucontext->mmaps) { mm = list_entry(pos, struct iwch_mm_entry, entry); - if (mm->addr == addr && mm->len == len) { + if (mm->key == key && mm->len == len) { list_del_init(&mm->entry); spin_unlock(&ucontext->mmap_lock); - PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, - mm->len); + PDBG("%s addr 0x%llx key 0x%x len %d\n", + __FUNCTION__, mm->addr, mm->key, mm->len); return mm; } } @@ -225,7 +227,8 @@ static inline void insert_mmap(struct iw struct iwch_mm_entry *mm) { spin_lock(&ucontext->mmap_lock); - PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, mm->len); + PDBG("%s addr 0x%llx key 0x%x len %d\n", __FUNCTION__, + mm->addr, mm->key, mm->len); list_add_tail(&mm->entry, &ucontext->mmaps); spin_unlock(&ucontext->mmap_lock); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h index 14e1517..c4e7fbe 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_user.h +++ b/drivers/infiniband/hw/cxgb3/iwch_user.h @@ -47,14 +47,14 @@ struct iwch_create_cq_req { }; struct iwch_create_cq_resp { - __u64 physaddr; + __u64 key; __u32 cqid; __u32 size_log2; }; struct iwch_create_qp_resp { - __u64 physaddr; - __u64 doorbell; + __u64 key; + __u64 db_key; __u32 qpid; __u32 size_log2; __u32 sq_size_log2; From rdreier at cisco.com Mon Feb 12 11:58:13 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 11:58:13 -0800 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. In-Reply-To: <1171308610.16167.69.camel@stevo-desktop> (Steve Wise's message of "Mon, 12 Feb 2007 13:30:10 -0600") References: <1171308610.16167.69.camel@stevo-desktop> Message-ID: Looks mostly sane (assuming it works on 32-bit userspace on 64-bit kernel now), but: > - context = kmalloc(sizeof(*context), GFP_KERNEL); > + context = kzalloc(sizeof(*context), GFP_KERNEL); Why do you need this? Is this an unrelated change? - R. From swise at opengridcomputing.com Mon Feb 12 12:04:30 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 14:04:30 -0600 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. In-Reply-To: References: <1171308610.16167.69.camel@stevo-desktop> Message-ID: <1171310670.16167.89.camel@stevo-desktop> On Mon, 2007-02-12 at 11:58 -0800, Roland Dreier wrote: > Looks mostly sane (assuming it works on 32-bit userspace on 64-bit > kernel now), but: > > > - context = kmalloc(sizeof(*context), GFP_KERNEL); > > + context = kzalloc(sizeof(*context), GFP_KERNEL); > > Why do you need this? Is this an unrelated change? > Because the key generator u32 is in the context now, and the kzalloc() initializes it. I could have done: context->key = 0; But km -> kz was less typing. ;-) Steve. From rdreier at cisco.com Mon Feb 12 12:08:15 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 12:08:15 -0800 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. In-Reply-To: <1171310670.16167.89.camel@stevo-desktop> (Steve Wise's message of "Mon, 12 Feb 2007 14:04:30 -0600") References: <1171308610.16167.69.camel@stevo-desktop> <1171310670.16167.89.camel@stevo-desktop> Message-ID: > Because the key generator u32 is in the context now, and the kzalloc() > initializes it. I could have done: > > context->key = 0; > > But km -> kz was less typing. ;-) OK, got it. Anyway as I said, from a quick read the changes look sane, with the assumption that they work. From swise at opengridcomputing.com Mon Feb 12 12:19:36 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 14:19:36 -0600 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. In-Reply-To: References: <1171308610.16167.69.camel@stevo-desktop> <1171310670.16167.89.camel@stevo-desktop> Message-ID: <1171311576.16167.91.camel@stevo-desktop> On Mon, 2007-02-12 at 12:08 -0800, Roland Dreier wrote: > > Because the key generator u32 is in the context now, and the kzalloc() > > initializes it. I could have done: > > > > context->key = 0; > > > > But km -> kz was less typing. ;-) > > OK, got it. Anyway as I said, from a quick read the changes look > sane, with the assumption that they work. I tested and it works. Do you want to pull this in before you push the driver upstream? Do I need to repost it? Thanks, Steve. From rdreier at cisco.com Mon Feb 12 12:20:31 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 12:20:31 -0800 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. In-Reply-To: <1171311576.16167.91.camel@stevo-desktop> (Steve Wise's message of "Mon, 12 Feb 2007 14:19:36 -0600") References: <1171308610.16167.69.camel@stevo-desktop> <1171310670.16167.89.camel@stevo-desktop> <1171311576.16167.91.camel@stevo-desktop> Message-ID: Steve> I tested and it works. Do you want to pull this in before Steve> you push the driver upstream? Do I need to repost it? I'll grab it and merge it in. I expect to ask Linus to pull later today. - R. From rdreier at cisco.com Mon Feb 12 12:23:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 12:23:29 -0800 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. In-Reply-To: <1171311576.16167.91.camel@stevo-desktop> (Steve Wise's message of "Mon, 12 Feb 2007 14:19:36 -0600") References: <1171308610.16167.69.camel@stevo-desktop> <1171310670.16167.89.camel@stevo-desktop> <1171311576.16167.91.camel@stevo-desktop> Message-ID: Actually, that patch doesn't apply because of the "%llx" warning fixes I pushed out. And git-apply also complains about trailing whitespace. Can you resend a version that applies to the my for-2.6.21 branch? Thanks From rdreier at cisco.com Mon Feb 12 12:25:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 12:25:30 -0800 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: <20070210211508.GD14903@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 10 Feb 2007 23:15:08 +0200") References: <20070210211508.GD14903@mellanox.co.il> Message-ID: > + sg_set_buf(mem, buf, PAGE_SIZE << order); > + BUG_ON(mem->offset); > + sg_dma_len(mem) = PAGE_SIZE << order; What am I missing? Any reason to set sg_dma_len() again after sg_set_buf()? From jgunthorpe at obsidianresearch.com Mon Feb 12 12:56:34 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 12 Feb 2007 13:56:34 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45D0A27A.2010302@ichips.intel.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> Message-ID: <20070212205634.GW11411@obsidianresearch.com> On Mon, Feb 12, 2007 at 09:23:06AM -0800, Sean Hefty wrote: > >Ah, I think I missed the key step in your scheme.. You plan to query > >the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I > >was thinking only about the SGID=local DGID=remote query direction) > > I'm not sure that the query needs the GIDs reversed, as long as the path is > reversible. So, the local query would be: > > SGID=local, DGID=remote, reversible=1 (to SA) > > And the remote query would be: > > SGID=local, DGID=remote, reversible=1, (to SA') > TClass & FlowLabel=from previous query response 1) What does the TClass and FlowLabel returned from SGID=local DGID=remote mean? Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 direction or both? 1a) If it is Node1 -> Node2 then the local SA has to query SA' to figure what FlowLabel to return. 1b) If it is for both directions then somehow SA, SA' and all four router ports need to agree on global flowlabels. 2) In the 2nd query, passing SGID=local, DGID=remote is 'reversed' since SGID=local is the wrong subnet for SA'. I think defining this to mean something is risky. 2b) A PR query with TClass and FlowLabel present in the query is currently expected to return an answer with those fields matching. That implies #1b.. So, here is how I see this working.. - There is a single well known 'reversible' flowlabel. When a router processes a GRH with that flowlabel it produces a packet that has a SLID that is always the same, no matter what router port is used (A' or B' in my example). The LRH is also reversible according to the rules in IBA. A well known value side-steps the global information problem and allows the GRH to be reversible. - Whenever a PR has reversible=1 the result returns the well known flowlabel. The router LID is always the single shared SLID. - To get a more optimal path the following sequence of queries are used: to SA: SGID=Node1 DGID=Node2 [In the background SA asks SA' what flow label to use] to SA': SGID=Node1 DGID=Node2 FlowLabel=(from above) to SA': SGID=Node2 DGID=Node1 SLID=(dlid from above) [In the background SA' asks SA what flow label to use] to SA: SGID=Node2 DGID=Node1 FlowLabel=(from above) It is almost guarenteed that the FlowLabel will be asymetric. This is to keep the flowlabel space local to each subnet. In the background quries SA and SA' also examine the global route topology to select an optimal no-spoof needed router LID. The background exchange is how the disambiguation problem with multiple-router path is solved. Implicit in this are five IBA affecting things: - that PRs with SGID=non-local mean something specific - PRs with DGID=non-local cause the SA to communicate with the remote SA to learn the GRH's FlowLabel (except in the case where reversible=1) - clients can communicate with remote SA's - Routers do the SLID spoofing you outlined. - SA's and routers collaborate quite closely on how the router produces a LRH. In particular the SA controls the SLID spoofing A new query type or maybe some kind of modified multi-path-record query could be defined by IBA to reduce the 6 exchanges required to something more efficient. Does this match what you are thinking? > > SA SA' > >Node1 --> (LID 1) Router A ------- Router A' (LID A) ---> Node2 > > |-> (LID 2) Router A | > > |-> (LID 3) Router B ------- Router B' (LID B) --| > > > >Router A and Router B are independent redundant devices, not a route > >cloud of some sort. B -> A' is not a possible path. > > Since A' and B' connect to the same subnet, B -> A' should be a valid path. Please don't dismiss this case as it is a simple case of a more generalized problem. People will want to deploy primay and seconday routers (like dual star switching) that don't intercommunicate for reliability. The B -> A' path does not exist because the A and B routers are seperate non-linked devices and not just 4 ports on one large router. [A more general view would be a router ring architecture where the clockwise and counterclockwise paths use different hardware/cables] There is alot of complex work in the router and SA side to make this kind of topology work, but it is critical that the clients use path queries that can provide enough data to the SA and return enough data to the client to support this. > >I can think of the following downsides: > > 1) Re-reading Michael Krause's email makes me think that defeating > > the QP SLID check is contrary to the spirit of IBA > > I don't think we need to defeat the QP SLID check if we want extra routing, > but having redundant routers use the same link layer address isn't > necessarily a bad thing. Well, it is one and the same, the SLID is really only used in the QP SLID check so changing it around only serves to defeat that check. Jason From tziporet at mellanox.co.il Mon Feb 12 13:15:37 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 12 Feb 2007 23:15:37 +0200 Subject: [openib-general] OFED 1.2 build problem In-Reply-To: <1171292354.16167.9.camel@stevo-desktop> References: <1171292354.16167.9.camel@stevo-desktop> Message-ID: <45D0D8F9.9060908@mellanox.co.il> Steve Wise wrote: > Dunno if this has already been resolved? > > Building the 20070208-1508 OFED 1.2 kit. > RHEL3U4 with that distro's kernel. > Ran build.sh and selected "all". > > ipath drive does not have any backport patch. I hope they will have some today. Tziporet From krause at cup.hp.com Mon Feb 12 13:06:27 2007 From: krause at cup.hp.com (Michael Krause) Date: Mon, 12 Feb 2007 13:06:27 -0800 Subject: [openib-general] dapl broken for iWARP In-Reply-To: References: Message-ID: <6.2.0.14.2.20070212130325.08f31f10@esmail.cup.hp.com> At 07:29 AM 2/9/2007, Kanevsky, Arkady wrote: >Mike, >this is not a DAPL issue. >There are 2 ways to deal with it. >One is for all ULPs to use private data to exchange CM info. >yes, some ULPs, like SDP do that in hello world message. > >Another is to let CM handle it. >This way ULP does not have to deal with it. >This is analogous to the IBTA CM IP addressing Annex. >It ensure backwards compatibility and does not break any existing apps >which use MPA as specified by IETF. > >No need to bother IETF until we have it working. Given what it took to get MPA specified, I don't see changing the specification for this as likely welcomed by many. The ULP used within the IETF are largely able to solve this problem at their login exchange so unless there is some ground swell of IETF ULP that can't solve it as these do, I think this may be a challenge to gain any traction. Mike >Thanks, > >Arkady Kanevsky email: arkady at netapp.com >Network Appliance Inc. phone: 781-768-5395 >1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 >Waltham, MA 02451 central phone: 781-768-5300 > > > > -----Original Message----- > > From: Michael Krause [mailto:krause at cup.hp.com] > > Sent: Thursday, February 08, 2007 4:27 PM > > To: Kanevsky, Arkady; Steve Wise; Arlin Davis > > Cc: openib-general > > Subject: Re: [openib-general] dapl broken for iWARP > > > > At 07:43 AM 2/8/2007, Kanevsky, Arkady wrote: > > >That is correct. > > >I am working with Krishna on it. > > >Expect patches soon. > > > > > >By the way the problem is not DAPL specific and so is a proposed > > >solution. > > > > > >There are 3 aspects of the solution. > > >One is APIs. We suggest that we do not augment these. > > >That is a connection requestor sets its QP RDMA ORD and IRD. > > >When connection is established user can check the QP RDMA > > ORD and IRD > > >to see what he has now to use over the connection. > > >We may consider to extend QP attributes to support transport > > specific > > >parameters passing in the future. > > >For example, iWARP MPA CRC request. > > > > > >Second is the semantic that CM provides. > > >The proposal is to match IBCM semantic. > > >That is CM guarantee that local IRD is >= remote ORD. > > >This guarantees that incoming RDMA Read requests will not > > overwhelm the > > >QP RDMA Read capabilities. > > >Again there is not changes to IBCM only to IWCM. > > >Notice that as part of this IWCM will pass down to driver > > and extract > > >from driver needed info. > > > > > >The final part is iWARP CM extension to exchange RDMA ORD, IRD. > > >This is similar to IBTA Annex for IP Addressing. > > >The harder part that this will eventually require IETF MPA spec > > >extension, and the fact that MPA protocol is implemented in > > RNIC HW by > > >many vendors, and hence can not be done by IWCM itself. > > > > We looked at this quite a bit during the creation of the > > specification. All of the targeted usage models exchange > > this information > > as part of their "hello" or login exchanges. As such, the > > "hum" was to > > not change MPA to communicate such information and leave it > > to software to > > exchange these values through existing mechanisms. I > > seriously doubt > > there will be much support for modifying the MPA > > specification at this stage since the implementations are > > largely complete and a modification would have to deal with > > the legacy interoperability issue which likely would be > > solved in software any way. It would be simpler to simply > > modify the underlying DAPL implementation to exchange the > > information and keep this hidden from both the application > > and the RNIC providers. > > > > Mike > > > > > > >Thanks, > > > > > >Arkady Kanevsky email: arkady at netapp.com > > >Network Appliance Inc. phone: 781-768-5395 > > >1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 > > >Waltham, MA 02451 central phone: 781-768-5300 > > > > > > > > > > -----Original Message----- > > > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > > > Sent: Wednesday, February 07, 2007 6:12 PM > > > > To: Arlin Davis > > > > Cc: openib-general > > > > Subject: Re: [openib-general] dapl broken for iWARP > > > > > > > > On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote: > > > > > Steve Wise wrote: > > > > > > > > > > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: > > > > > > > > > > > > > > > > > >>Arlin, > > > > > >> > > > > > >>The OFED dapl code is assuming the responder_resources and > > > > > >>initiator_depth passed up on a connection request event > > > > are from the > > > > > >>remote peer. This doesn't happen for iWARP. In the > > > > current iWARP > > > > > >>specifications, its up to the application to exchange this > > > > > >>information somehow. So these are defaulting to 0 on the > > > > server side > > > > > >>of any dapl connection over iWARP. > > > > > >> > > > > > >>This is a fairly recent change, I think. We need to > > come up with > > > > > >>some way to deal with this for OFED 1.2 IMO. > > > > > >> > > > > > >> > > > > > Yes, this was changed recently to sync up with the > > rdma_cm changes > > > > > that exposed the values. > > > > > > > > > > >> > > > > > >> > > > > > > > > > > > >The IWCM could set these to the device max values for instance. > > > > > > > > > > > > > > > > > That would work fine as long as you know the remote > > > > settings will be > > > > > equal or better. The provider just sets the min of > > local device max > > > > > values and the remote values provided with the request. > > > > > > > > > > > > > I know Krishna Kumar is working on a solution for exchanging > > > > this info in private data so the IWCM can "do the right > > > > thing". Stay tuned for a patch series to review for this. > > > > But this functionality is definitely post OFED-1.2. > > > > > > > > > > > > So for the OFED-1.2, I will set these to the device max > > in the IWCM. > > > > Assuming the other side is OFED 1.2 DAPL, then it will work fine. > > > > > > > > Steve. > > > > > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > >_______________________________________________ > > >openib-general mailing list > > >openib-general at openib.org > > >http://openib.org/mailman/listinfo/openib-general > > > > > >To unsubscribe, please visit > > >http://openib.org/mailman/listinfo/openib-general > > > > From krause at cup.hp.com Mon Feb 12 13:14:28 2007 From: krause at cup.hp.com (Michael Krause) Date: Mon, 12 Feb 2007 13:14:28 -0800 Subject: [openib-general] Immediate data question In-Reply-To: <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.co m> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> Message-ID: <6.2.0.14.2.20070212130704.09018a60@esmail.cup.hp.com> At 09:10 PM 2/11/2007, Devesh Sharma wrote: >On 2/10/07, Tang, Changqing wrote: >> > > >> > >Not for the receiver, but the sender will be severely slowed down by >> > >having to wait for the RNR timeouts. >> > >> > RNR = Receiver Not Ready so by definition, the data flow >> > isn't going to >> > progress until the receiver is ready to receive data. If a >> > receive QP >> > enters RNR for a RC, then it is likely not progressing as >> > desired. RNR >> > was initially put in place to enable a receiver to create >> > back pressure to the sender without causing a fatal error >> > condition. It should rarely be entered and therefore should >> > have negligible impact on overall performance however when a >> > RNR occurs, no forward progress will occur so performance is >> > essentially zero. >> >>Mike: >> I still do not quite understand this issue. I have two >>situations that have RNR triggered. >> >>1. process A and process B is connected with QP. A first post a send to >>B, B does not post receive. Then A and B are doing a long time >>RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE >>message. Finally B will post a receive. Does the first pending send in A >>block all the later RDMA_WRITE ? >According to IBTA spec HCA will process WR entries in strict order in >which they are posted so the send will block all WR posted after this >send, Until-unless HCA has multiple processing elements, I think even >then processing order will be maintained by HCA >If not, since RNR is triggered The source HCA is responsible for processing work requests in the order they are posted. If the SEND cannot proceed and receives a RNR, then the subsequent RDMA Write should not proceed, i.e. the sequence numbers that define the valid window will not progress and given IB requires strong ordering within the fabric, nothing sent subsequently should be made visible at the sink HCA. In your example, if A is sending a SEND followed by a RDMA Write, the first check should have been that B had provided an ACK with a credit indicating that a SEND is allowed. If B subsequently removed access to the buffer that had to be posted to provide that credit, then it should trigger a RNR NAK and the subsequent RDMA Writes should not be visible at B since there is no an effective hole in the transmission stream. >>periodically till B post receive, does it affect the RDMA_WRITE >>performance between A and B ? >> >>2. extend above to three processes, A connect to B, B connect to C, so B >>has two QPs, but one CQ. A posts a send to B, B does not post receive, >>rather B and C are doing a long time RDMA_WRITE, or send/recv. But B >>must sends RNR periodically to A, right?. So does the pending message >>from A affects B's overall performance between B and C ? Neither IB nor iWARP provide any ordering guarantees between different data flows. This is strictly under application control. Hence, if a RNR NAK or whatever occurs on a RC between A and B, then it has no impact on what occurs between A and C or B and C. It is simply outside the scope of either technology to address. Mike From swise at opengridcomputing.com Mon Feb 12 13:15:53 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 15:15:53 -0600 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. In-Reply-To: References: <1171308610.16167.69.camel@stevo-desktop> <1171310670.16167.89.camel@stevo-desktop> <1171311576.16167.91.camel@stevo-desktop> Message-ID: <1171314953.16167.96.camel@stevo-desktop> On Mon, 2007-02-12 at 12:23 -0800, Roland Dreier wrote: > Actually, that patch doesn't apply because of the "%llx" warning fixes > I pushed out. And git-apply also complains about trailing > whitespace. Can you resend a version that applies to the my > for-2.6.21 branch? > > Thanks Here it is... Don't use the physical address for mapping memory into userspace. From: Steve Wise Currently iw_cxgb3 uses the physical address as the key/offset to return to the user process for maping kernel memory into userspace. The user process then calls mmap() using this key as the offset. Because the physical address is 64 bits, this introduces a problem with 32-bit userspace, which might not be able to pass an arbitrary 64-bit address back into the kernel (since mmap2() is limited to a 32-bit number of pages for the offset, which limits it to 44-bit addresses). Change the mmap logic to use a u32 counter as the offset for mapping. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 66 +++++++++++++++++---------- drivers/infiniband/hw/cxgb3/iwch_provider.h | 14 +++--- drivers/infiniband/hw/cxgb3/iwch_user.h | 6 +- 3 files changed, 52 insertions(+), 34 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 549de0a..2e05e94 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -115,7 +115,7 @@ static struct ib_ucontext *iwch_alloc_uc struct iwch_dev *rhp = to_iwch_dev(ibdev); PDBG("%s ibdev %p\n", __FUNCTION__, ibdev); - context = kmalloc(sizeof(*context), GFP_KERNEL); + context = kzalloc(sizeof(*context), GFP_KERNEL); if (!context) return ERR_PTR(-ENOMEM); cxio_init_ucontext(&rhp->rdev, &context->uctx); @@ -141,13 +141,14 @@ static int iwch_destroy_cq(struct ib_cq } static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, - struct ib_ucontext *context, + struct ib_ucontext *ib_context, struct ib_udata *udata) { struct iwch_dev *rhp; struct iwch_cq *chp; struct iwch_create_cq_resp uresp; struct iwch_create_cq_req ureq; + struct iwch_ucontext *ucontext = NULL; PDBG("%s ib_dev %p entries %d\n", __FUNCTION__, ibdev, entries); rhp = to_iwch_dev(ibdev); @@ -155,12 +156,15 @@ static struct ib_cq *iwch_create_cq(stru if (!chp) return ERR_PTR(-ENOMEM); - if (context && !t3a_device(rhp)) { - if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) { - kfree(chp); - return ERR_PTR(-EFAULT); + if (ib_context) { + ucontext = to_iwch_ucontext(ib_context); + if (!t3a_device(rhp)) { + if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) { + kfree(chp); + return ERR_PTR(-EFAULT); + } + chp->user_rptr_addr = (u32 __user *)(unsigned long)ureq.user_rptr_addr; } - chp->user_rptr_addr = (u32 __user *)(unsigned long)ureq.user_rptr_addr; } if (t3a_device(rhp)) { @@ -190,7 +194,7 @@ static struct ib_cq *iwch_create_cq(stru init_waitqueue_head(&chp->wait); insert_handle(rhp, &rhp->cqidr, chp, chp->cq.cqid); - if (context) { + if (ucontext) { struct iwch_mm_entry *mm; mm = kmalloc(sizeof *mm, GFP_KERNEL); @@ -200,16 +204,20 @@ static struct ib_cq *iwch_create_cq(stru } uresp.cqid = chp->cq.cqid; uresp.size_log2 = chp->cq.size_log2; - uresp.physaddr = virt_to_phys(chp->cq.queue); + spin_lock(&ucontext->mmap_lock); + uresp.key = ucontext->key; + ucontext->key += PAGE_SIZE; + spin_unlock(&ucontext->mmap_lock); if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) { kfree(mm); iwch_destroy_cq(&chp->ibcq); return ERR_PTR(-EFAULT); } - mm->addr = uresp.physaddr; + mm->key = uresp.key; + mm->addr = virt_to_phys(chp->cq.queue); mm->len = PAGE_ALIGN((1UL << uresp.size_log2) * sizeof (struct t3_cqe)); - insert_mmap(to_iwch_ucontext(context), mm); + insert_mmap(ucontext, mm); } PDBG("created cqid 0x%0x chp %p size 0x%0x, dma_addr 0x%0llx\n", chp->cq.cqid, chp, (1 << chp->cq.size_log2), @@ -316,14 +324,14 @@ static int iwch_arm_cq(struct ib_cq *ibc static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) { int len = vma->vm_end - vma->vm_start; - u64 pgaddr = vma->vm_pgoff << PAGE_SHIFT; + u32 key = vma->vm_pgoff << PAGE_SHIFT; struct cxio_rdev *rdev_p; int ret = 0; struct iwch_mm_entry *mm; struct iwch_ucontext *ucontext; - PDBG("%s off 0x%lx addr 0x%llx len %d\n", __FUNCTION__, vma->vm_pgoff, - (unsigned long long) pgaddr, len); + PDBG("%s pgoff 0x%lx key 0x%x len %d\n", __FUNCTION__, vma->vm_pgoff, + key, len); if (vma->vm_start & (PAGE_SIZE-1)) { return -EINVAL; @@ -332,13 +340,13 @@ static int iwch_mmap(struct ib_ucontext rdev_p = &(to_iwch_dev(context->device)->rdev); ucontext = to_iwch_ucontext(context); - mm = remove_mmap(ucontext, pgaddr, len); + mm = remove_mmap(ucontext, key, len); if (!mm) return -EINVAL; kfree(mm); - if ((pgaddr >= rdev_p->rnic_info.udbell_physbase) && - (pgaddr < (rdev_p->rnic_info.udbell_physbase + + if ((mm->addr >= rdev_p->rnic_info.udbell_physbase) && + (mm->addr < (rdev_p->rnic_info.udbell_physbase + rdev_p->rnic_info.udbell_len))) { /* @@ -351,15 +359,17 @@ static int iwch_mmap(struct ib_ucontext vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND; vma->vm_flags &= ~VM_MAYREAD; - ret = io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - len, vma->vm_page_prot); + ret = io_remap_pfn_range(vma, vma->vm_start, + mm->addr >> PAGE_SHIFT, + len, vma->vm_page_prot); } else { /* * Map WQ or CQ contig dma memory... */ - ret = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - len, vma->vm_page_prot); + ret = remap_pfn_range(vma, vma->vm_start, + mm->addr >> PAGE_SHIFT, + len, vma->vm_page_prot); } return ret; @@ -853,18 +863,24 @@ static struct ib_qp *iwch_create_qp(stru uresp.size_log2 = qhp->wq.size_log2; uresp.sq_size_log2 = qhp->wq.sq_size_log2; uresp.rq_size_log2 = qhp->wq.rq_size_log2; - uresp.physaddr = virt_to_phys(qhp->wq.queue); - uresp.doorbell = qhp->wq.udb; + spin_lock(&ucontext->mmap_lock); + uresp.key = ucontext->key; + ucontext->key += PAGE_SIZE; + uresp.db_key = ucontext->key; + ucontext->key += PAGE_SIZE; + spin_unlock(&ucontext->mmap_lock); if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) { kfree(mm1); kfree(mm2); iwch_destroy_qp(&qhp->ibqp); return ERR_PTR(-EFAULT); } - mm1->addr = uresp.physaddr; + mm1->key = uresp.key; + mm1->addr = virt_to_phys(qhp->wq.queue); mm1->len = PAGE_ALIGN(wqsize * sizeof (union t3_wr)); insert_mmap(ucontext, mm1); - mm2->addr = uresp.doorbell & PAGE_MASK; + mm2->key = uresp.db_key; + mm2->addr = qhp->wq.udb & PAGE_MASK; mm2->len = PAGE_SIZE; insert_mmap(ucontext, mm2); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 5680d82..61e3278 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -184,6 +184,7 @@ struct ib_qp *iwch_get_qp(struct ib_devi struct iwch_ucontext { struct ib_ucontext ibucontext; struct cxio_ucontext uctx; + u32 key; spinlock_t mmap_lock; struct list_head mmaps; }; @@ -196,11 +197,12 @@ static inline struct iwch_ucontext *to_i struct iwch_mm_entry { struct list_head entry; u64 addr; + u32 key; unsigned len; }; static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext, - u64 addr, unsigned len) + u32 key, unsigned len) { struct list_head *pos, *nxt; struct iwch_mm_entry *mm; @@ -209,11 +211,11 @@ static inline struct iwch_mm_entry *remo list_for_each_safe(pos, nxt, &ucontext->mmaps) { mm = list_entry(pos, struct iwch_mm_entry, entry); - if (mm->addr == addr && mm->len == len) { + if (mm->key == key && mm->len == len) { list_del_init(&mm->entry); spin_unlock(&ucontext->mmap_lock); - PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, - (unsigned long long) mm->addr, mm->len); + PDBG("%s key 0x%x addr 0x%llx len %d\n", __FUNCTION__, + key, (unsigned long long) mm->addr, mm->len); return mm; } } @@ -225,8 +227,8 @@ static inline void insert_mmap(struct iw struct iwch_mm_entry *mm) { spin_lock(&ucontext->mmap_lock); - PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, - (unsigned long long) mm->addr, mm->len); + PDBG("%s key 0x%x addr 0x%llx len %d\n", __FUNCTION__, + mm->key, (unsigned long long) mm->addr, mm->len); list_add_tail(&mm->entry, &ucontext->mmaps); spin_unlock(&ucontext->mmap_lock); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h index 14e1517..c4e7fbe 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_user.h +++ b/drivers/infiniband/hw/cxgb3/iwch_user.h @@ -47,14 +47,14 @@ struct iwch_create_cq_req { }; struct iwch_create_cq_resp { - __u64 physaddr; + __u64 key; __u32 cqid; __u32 size_log2; }; struct iwch_create_qp_resp { - __u64 physaddr; - __u64 doorbell; + __u64 key; + __u64 db_key; __u32 qpid; __u32 size_log2; __u32 sq_size_log2; From tziporet at mellanox.co.il Mon Feb 12 13:14:01 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 12 Feb 2007 23:14:01 +0200 Subject: [openib-general] OFED 1.2 components list - for the meeting today In-Reply-To: <1171307245.31538.434613.camel@hal.voltaire.com> References: <1171307245.31538.434613.camel@hal.voltaire.com> Message-ID: <45D0D899.4000505@mellanox.co.il> Hal Rosenstock wrote: > On Mon, 2007-02-12 at 12:58, Woodruff, Robert J wrote: > >> BTW. >> >> Is the ibdiagui code going to be part of this release. >> I did not see it in the list below or is it just part of >> the openib-diags ? >> > > It's part of ibutils. > And already part of OFED 1.2 >> I thought that we discussed this as an OFED 1.2 feature. >> I have someone that is interested in trying it out. >> You can try it now. Tziporet From rdreier at cisco.com Mon Feb 12 13:43:31 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 13:43:31 -0800 Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace. In-Reply-To: <1171314953.16167.96.camel@stevo-desktop> (Steve Wise's message of "Mon, 12 Feb 2007 15:15:53 -0600") References: <1171308610.16167.69.camel@stevo-desktop> <1171310670.16167.89.camel@stevo-desktop> <1171311576.16167.91.camel@stevo-desktop> <1171314953.16167.96.camel@stevo-desktop> Message-ID: OK, merged into for-2.6.21 and pushed out. From halr at voltaire.com Mon Feb 12 14:40:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Feb 2007 17:40:15 -0500 Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation In-Reply-To: <04ba01c74eb1$e77fd180$1914a8c0@surioffice> References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com> <1170072757.4555.242192.camel@hal.voltaire.com> <039701c7494b$6bd5d860$1914a8c0@surioffice> <1171050441.31538.180858.camel@hal.voltaire.com> <048101c74c91$e0f54dd0$1914a8c0@surioffice> <1171288297.31538.417657.camel@hal.voltaire.com> <04ba01c74eb1$e77fd180$1914a8c0@surioffice> Message-ID: <1171319946.31538.446427.camel@hal.voltaire.com> Suri, On Mon, 2007-02-12 at 09:27, Suresh Shelvapille wrote: > Hal: > > > > Ref: comment on mad.c (ib_mad_recv_done_handler(). > > > > > > Even if I make the relevant changes to smi.c functions how do I get the > > > packet to get forwarded, without making additional changes in this > > function? > > > > > > Meaning, when smi_handle_dr_smp_send(),smi_check_forward_dr_smp() are > > called > > > and you determine that the packet has to be forwarded instead of > > consuming > > > where do you actually do the send? I think this chain is missing! > > > > My initial thought was what I wrote but in looking at this further, as > > you point out, the SMI routines are only updating the packet and > > indicating its disposition. The actual sending needs to be elsewhere. > > I'm not sure what the code ends up looking like with the changes > > suggested and would just like this to look as clean as possible and use > > the SMI routines where appropriate here. Does this make sense ? > > > I am not sure I follow this last statement. I was trying to say that the send needs to be elsewhere from the SMI code for the forward case so it may go in the routine where you placed it. I was also trying to say that I'm not 100% sure what this could look like until the other changes described are made so this may take twp more iterations rather than one. Is that any clearer ? -- Hal From mshefty at ichips.intel.com Mon Feb 12 14:47:42 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Feb 2007 14:47:42 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070212205634.GW11411@obsidianresearch.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> Message-ID: <45D0EE8E.4030906@ichips.intel.com> > 1) What does the TClass and FlowLabel returned from SGID=local > DGID=remote mean? > Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 direction > or both? Maybe it would help if we can agree on a set of expectations. These are what I am thinking: 1. An SA should be able to respond to a valid PR query if at least one of the GIDs in the path record is local. 2. The LIDs in a PR are relative to the SA's subnet that returned the record. 3. An IB router should not failover transparently to QPs sending traffic through that router. 4. A PR from the local SA with reversible=1 indicates that data sent from the remote GID to the local GID using the PR TC and FL will route locally using the specified LID pair. This holds whether the PR SGID is local or remote. 5. A PR from a remote SA with reversible=1 indicates that data sent from the local GID to the remote GID using the PR TC and FL will route remotely using the specified LID pair. This holds whether the PR SGID is local or remote. 6. A PR with reversible=0 is relative to SA's subnet. The SGID->DGID data flow over the PR TC and FL indicates the SLID->DLID mapping for that subnet. Do your expectations differ from these? The use of reversible between subnets is what's concerning me. It may be that an SA could not return any paths as reversible between two subnets without using some trick like what you mentioned. These add a requirement on the SA that they must be aware of the routes packets take between two GIDs using a given TC and FL, but I don't believe that this necessarily forces SA to SA communication. The SA may only need to exchange information with a router...? > Implicit in this are five IBA affecting things: > - that PRs with SGID=non-local mean something specific I don't think that we're changing any of the meanings of the fields though. > - Routers do the SLID spoofing you outlined. I'm not sure this is something that we do want now. APM should really handle path failover. > There is alot of complex work in the router and SA side to make this > kind of topology work, but it is critical that the clients use path > queries that can provide enough data to the SA and return enough data > to the client to support this. I'm still deciding if the existing path record attribute is sufficient. - Sean From rdreier at cisco.com Mon Feb 12 15:08:53 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 15:08:53 -0800 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: <20070210211508.GD14903@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 10 Feb 2007 23:15:08 +0200") References: <20070210211508.GD14903@mellanox.co.il> Message-ID: Queued for 2.6.21, although I think a further cleanup would be: > mdev->mr_table.mpt_table = mthca_alloc_icm_table(mdev, init_hca->mpt_base, > dev_lim->mpt_entry_sz, > mdev->limits.num_mpts, > - mdev->limits.reserved_mrws, 1); > + mdev->limits.reserved_mrws, > + 1, 1); instead of having use_lowmem and use_coherent be separate parameters, we should probably convert it to a type parameter, and have MTHCA_ICM_TABLE_HIGHMEM, _LOWMEM and _COHERENT. That would make these calls a lot easier to read and get correct. - R. From swise at opengridcomputing.com Mon Feb 12 15:24:07 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 17:24:07 -0600 Subject: [openib-general] issues with compilation of ofed 1.2 In-Reply-To: <1170973693.19297.2.camel@firewall.xsintricity.com> References: <45C9EE31.2040602@voltaire.com> <6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com> <1170973693.19297.2.camel@firewall.xsintricity.com> Message-ID: <1171322647.28500.41.camel@stevo-desktop> I still get this error building on rhel5b2 with the latest from the ofa git trees: ERROR: The sysfsutils-devel package is required to build libibverbs_devel RPM [root at vic12 OFED-1.2-stevo]# rpm -qa|grep sysfs libsysfs-2.0.0-6 libsysfs-devel-2.0.0-6 libsysfs-2.0.0-6 sysfsutils-2.0.0-6 libsysfs-devel-2.0.0-6 I installed all the sysfs rpms I could find. So is there some dependency problem here in the OFED build script that is looking for the wrong rpm in rhel5? Is there a bug to track this issue? Steve. On Thu, 2007-02-08 at 17:28 -0500, Doug Ledford wrote: > On Thu, 2007-02-08 at 09:02 +0200, Moni Levy wrote: > > Doug, > > On 2/7/07, Yosef Etigin wrote: > > > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro. > > > > Can you please help us with that ? > > The value of the sysfsutils is far overshadowed by the value of libsysfs > (and libsysfs is far more commonly used). So, in RHEL5, the rpm package > names reflect this: > > libsysfs > sysfsutils (I think, might be libsysfs-utils) > libsysfs-devel > > It's all still there, just a different name. > > > -- Moni > > > > > > > > -- > > > Yosef Etigin > > > Alex Tabachnik > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From krause at cup.hp.com Mon Feb 12 15:31:15 2007 From: krause at cup.hp.com (Michael Krause) Date: Mon, 12 Feb 2007 15:31:15 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070212205634.GW11411@obsidianresearch.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> Message-ID: <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> At 12:56 PM 2/12/2007, Jason Gunthorpe wrote: >On Mon, Feb 12, 2007 at 09:23:06AM -0800, Sean Hefty wrote: > > >Ah, I think I missed the key step in your scheme.. You plan to query > > >the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I > > >was thinking only about the SGID=local DGID=remote query direction) > > > > I'm not sure that the query needs the GIDs reversed, as long as the > path is > > reversible. So, the local query would be: > > > > SGID=local, DGID=remote, reversible=1 (to SA) > > > > And the remote query would be: > > > > SGID=local, DGID=remote, reversible=1, (to SA') > > TClass & FlowLabel=from previous query response > >1) What does the TClass and FlowLabel returned from SGID=local > DGID=remote mean? > Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 > direction > or both? >1a) If it is Node1 -> Node2 then the local SA has to query SA' to figure > what FlowLabel to return. >1b) If it is for both directions then somehow SA, SA' and all four > router ports need to agree on global flowlabels. >2) In the 2nd query, passing SGID=local, DGID=remote is 'reversed' > since SGID=local is the wrong subnet for SA'. > I think defining this to mean something is risky. >2b) A PR query with TClass and FlowLabel present in the query is > currently expected to return an answer with those fields matching. > That implies #1b.. TClass is intended to communicate the end-to-end QoS desired. TClass is then mapped to a SL that is local to each subnet. A flow label is intended to much the same as in the IP world and is left, in essence, to routers to manage. An endnode look up should be to find the address vector to the remote. A look up may return multiple vectors. The SLID would correspond to each local subnet router port that acts as a first-hop destination to the remote subnet. I don't see why the router protocol would not simply enable all paths on the local subnet to a given remote subnet be acquired. All of the work is kept local to the SA / SM in the source subnet when determining a remote path to take. Why is there any need to define more than just this? Define a router protocol to communicate the each subnet's prefix, TClass, etc. and apply KISS. A management entity that wanted to manage out each subnet provides router management in terms of route selection, etc. can be constructed by using the existing protocols / tools combined with a new router protocol which only does DGID to next hop SLID mapping. Mike >So, here is how I see this working.. > >- There is a single well known 'reversible' flowlabel. When a router > processes a GRH with that flowlabel it produces a packet that > has a SLID that is always the same, no matter what router port is > used (A' or B' in my example). The LRH is also reversible according > to the rules in IBA. > > A well known value side-steps the global information problem and > allows the GRH to be reversible. >- Whenever a PR has reversible=1 the result returns the well known flowlabel. > The router LID is always the single shared SLID. >- To get a more optimal path the following sequence of queries are used: > to SA: SGID=Node1 DGID=Node2 > [In the background SA asks SA' what flow label to use] > to SA': SGID=Node1 DGID=Node2 FlowLabel=(from above) > to SA': SGID=Node2 DGID=Node1 SLID=(dlid from above) > [In the background SA' asks SA what flow label to use] > to SA: SGID=Node2 DGID=Node1 FlowLabel=(from above) > > It is almost guarenteed that the FlowLabel will be asymetric. This > is to keep the flowlabel space local to each subnet. > > In the background quries SA and SA' also examine the global route > topology to select an optimal no-spoof needed router LID. The > background exchange is how the disambiguation problem with > multiple-router path is solved. > >Implicit in this are five IBA affecting things: > - that PRs with SGID=non-local mean something specific > - PRs with DGID=non-local cause the SA to communicate with the remote > SA to learn the GRH's FlowLabel > (except in the case where reversible=1) > - clients can communicate with remote SA's > - Routers do the SLID spoofing you outlined. > - SA's and routers collaborate quite closely on how the > router produces a LRH. In particular the SA controls the SLID > spoofing > >A new query type or maybe some kind of modified multi-path-record >query could be defined by IBA to reduce the 6 exchanges required to >something more efficient. > >Does this match what you are thinking? > > > > SA SA' > > >Node1 --> (LID 1) Router A ------- Router A' (LID A) ---> Node2 > > > |-> (LID 2) Router A | > > > |-> (LID 3) Router B ------- Router B' (LID B) --| > > > > > >Router A and Router B are independent redundant devices, not a route > > >cloud of some sort. B -> A' is not a possible path. > > > > Since A' and B' connect to the same subnet, B -> A' should be a valid path. > >Please don't dismiss this case as it is a simple case of a more >generalized problem. People will want to deploy primay and seconday >routers (like dual star switching) that don't intercommunicate for >reliability. The B -> A' path does not exist because the A and B >routers are seperate non-linked devices and not just 4 ports on one >large router. [A more general view would be a router ring architecture >where the clockwise and counterclockwise paths use different >hardware/cables] > >There is alot of complex work in the router and SA side to make this >kind of topology work, but it is critical that the clients use path >queries that can provide enough data to the SA and return enough data >to the client to support this. > > > >I can think of the following downsides: > > > 1) Re-reading Michael Krause's email makes me think that defeating > > > the QP SLID check is contrary to the spirit of IBA > > > > I don't think we need to defeat the QP SLID check if we want extra > routing, > > but having redundant routers use the same link layer address isn't > > necessarily a bad thing. > >Well, it is one and the same, the SLID is really only used in the QP >SLID check so changing it around only serves to defeat that check. > >Jason > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Mon Feb 12 15:35:54 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Feb 2007 17:35:54 -0600 Subject: [openib-general] issues with compilation of ofed 1.2 In-Reply-To: <1171322647.28500.41.camel@stevo-desktop> References: <45C9EE31.2040602@voltaire.com> <6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com> <1170973693.19297.2.camel@firewall.xsintricity.com> <1171322647.28500.41.camel@stevo-desktop> Message-ID: <1171323354.28500.48.camel@stevo-desktop> Ok, I hacked around this by changing the build_env.sh. But I think build_env.sh will have to distinguish between rhel5 and all other redhat releases so it can correctly set the prerequisite rpms... Steve. On Mon, 2007-02-12 at 17:24 -0600, Steve Wise wrote: > I still get this error building on rhel5b2 with the latest from the ofa > git trees: > > ERROR: The sysfsutils-devel package is required to build libibverbs_devel RPM > [root at vic12 OFED-1.2-stevo]# rpm -qa|grep sysfs > libsysfs-2.0.0-6 > libsysfs-devel-2.0.0-6 > libsysfs-2.0.0-6 > sysfsutils-2.0.0-6 > libsysfs-devel-2.0.0-6 > > > I installed all the sysfs rpms I could find. So is there some > dependency problem here in the OFED build script that is looking for the > wrong rpm in rhel5? > > Is there a bug to track this issue? > > Steve. > > > On Thu, 2007-02-08 at 17:28 -0500, Doug Ledford wrote: > > On Thu, 2007-02-08 at 09:02 +0200, Moni Levy wrote: > > > Doug, > > > On 2/7/07, Yosef Etigin wrote: > > > > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro. > > > > > > Can you please help us with that ? > > > > The value of the sysfsutils is far overshadowed by the value of libsysfs > > (and libsysfs is far more commonly used). So, in RHEL5, the rpm package > > names reflect this: > > > > libsysfs > > sysfsutils (I think, might be libsysfs-utils) > > libsysfs-devel > > > > It's all still there, just a different name. > > > > > -- Moni > > > > > > > > > > > -- > > > > Yosef Etigin > > > > Alex Tabachnik > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Mon Feb 12 15:48:24 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Feb 2007 15:48:24 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> Message-ID: <45D0FCC8.4090304@ichips.intel.com> > An endnode look up should be to find the address > vector to the remote. A look up may return multiple vectors. The > SLID would correspond to each local subnet router port that acts as a > first-hop destination to the remote subnet. I don't see why the > router protocol would not simply enable all paths on the local subnet to > a given remote subnet be acquired. All of the work is kept local to the > SA / SM in the source subnet when determining a remote path to take. > Why is there any need to define more than just this? For an RC QP, we need at least two sets of LIDs. In the simplest case, we need the SLID/router DLID for the local subnet, and the router SLID/DLID for the remote subnet. The problem is in obtaining the SLID/DLID for the remote subnet. - Sean From krause at cup.hp.com Mon Feb 12 15:35:45 2007 From: krause at cup.hp.com (Michael Krause) Date: Mon, 12 Feb 2007 15:35:45 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45D0EE8E.4030906@ichips.intel.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <45D0EE8E.4030906@ichips.intel.com> Message-ID: <6.2.0.14.2.20070212153253.08e8c7b8@esmail.cup.hp.com> At 02:47 PM 2/12/2007, Sean Hefty wrote: > > 1) What does the TClass and FlowLabel returned from SGID=local > > DGID=remote mean? > > Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 > direction > > or both? > >Maybe it would help if we can agree on a set of expectations. These are >what I >am thinking: > >1. An SA should be able to respond to a valid PR query if at least one of the >GIDs in the path record is local. > >2. The LIDs in a PR are relative to the SA's subnet that returned the record. > >3. An IB router should not failover transparently to QPs sending traffic >through >that router. There is no reason for such a restriction. APM can work with routers and the IB protocol will recover from any out of order packet processing just fine. >4. A PR from the local SA with reversible=1 indicates that data sent from the >remote GID to the local GID using the PR TC and FL will route locally >using the >specified LID pair. This holds whether the PR SGID is local or remote. > >5. A PR from a remote SA with reversible=1 indicates that data sent from the >local GID to the remote GID using the PR TC and FL will route remotely >using the >specified LID pair. This holds whether the PR SGID is local or remote. > >6. A PR with reversible=0 is relative to SA's subnet. The SGID->DGID data >flow >over the PR TC and FL indicates the SLID->DLID mapping for that subnet. > >Do your expectations differ from these? > >The use of reversible between subnets is what's concerning me. It may be >that >an SA could not return any paths as reversible between two subnets without >using >some trick like what you mentioned. > >These add a requirement on the SA that they must be aware of the routes >packets >take between two GIDs using a given TC and FL, but I don't believe that this >necessarily forces SA to SA communication. The SA may only need to exchange >information with a router...? It should not force SA to SA communication. Such communication is overly complex and will be a major issue to control and manage in the end. Further, security concerns, partition management, etc. start to complex enough as it is without adding more fuel to the fire. > > Implicit in this are five IBA affecting things: > > - that PRs with SGID=non-local mean something specific > >I don't think that we're changing any of the meanings of the fields though. > > > - Routers do the SLID spoofing you outlined. > >I'm not sure this is something that we do want now. APM should really handle >path failover. > > > There is alot of complex work in the router and SA side to make this > > kind of topology work, but it is critical that the clients use path > > queries that can provide enough data to the SA and return enough data > > to the client to support this. > >I'm still deciding if the existing path record attribute is sufficient. Our original IB router work I believe drove some of what is in the current records so I suspect they are fine as is. Mike From jgunthorpe at obsidianresearch.com Mon Feb 12 15:54:04 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 12 Feb 2007 16:54:04 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45D0EE8E.4030906@ichips.intel.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <45D0EE8E.4030906@ichips.intel.com> Message-ID: <20070212235404.GX11411@obsidianresearch.com> On Mon, Feb 12, 2007 at 02:47:42PM -0800, Sean Hefty wrote: > Maybe it would help if we can agree on a set of expectations. These are > what I am thinking: > > 1. An SA should be able to respond to a valid PR query if at least one of > the GIDs in the path record is local. > > 2. The LIDs in a PR are relative to the SA's subnet that returned the > record. > > 3. An IB router should not failover transparently to QPs sending traffic > through that router. OK to these > 4. A PR from the local SA with reversible=1 indicates that data sent from > the remote GID to the local GID using the PR TC and FL will route locally > using the specified LID pair. This holds whether the PR SGID is local or > remote. > 5. A PR from a remote SA with reversible=1 indicates that data sent from > the local GID to the remote GID using the PR TC and FL will route remotely > using the specified LID pair. This holds whether the PR SGID is local or > remote. I can't think how to actually implement these restrictions in the general case without SLID spoofing and the general method I outlined in my prior email. *Especially* reversible - which by definition requires the FL and TC to be the same on both directions of the path! > 6. A PR with reversible=0 is relative to SA's subnet. The SGID->DGID data > flow over the PR TC and FL indicates the SLID->DLID mapping for that subnet. Think about this - it is backwards for the UD case. You have specified that the SGID->DGID direction uses the returned SLID/DLID which are ensured by the flowlabel in the GRH. But the local side only controls what it sends. How does this GRH get to the remote side? In UD the returned GRH from the PR controls the selection of LID on the DGID's subnet. That is how it must be. QPs have a specific definition of where the GRH comes from, and for a local PR query with SGID=myself the GRH programmed into the QP must come from that query. This is necessary for UD and I don't think it can be changed around. Plus, in the multi-router path, the GRH alone does not contain the information to know which physical router port the flow exits from. (See prior diagram) - so the SLID spoofing is the only way to fix things up if the PR queries are left unchanged. > The use of reversible between subnets is what's concerning me. It may be > that an SA could not return any paths as reversible between two subnets > without using some trick like what you mentioned. I really don't see how it can work any other way right now.. > These add a requirement on the SA that they must be aware of the routes > packets take between two GIDs using a given TC and FL, but I don't believe > that this necessarily forces SA to SA communication. The SA may only need > to exchange information with a router...? The major problem is that there are multiple router paths that a given GRH can take that are only fully disambiguated by the router lid at the sender. > > - Routers do the SLID spoofing you outlined. > > I'm not sure this is something that we do want now. APM should really > handle path failover. It has absolutely nothing to do with failover. This is necessary to make multiple router paths work at all. It is necessary for reversible to work with multiple routers at all. > >There is alot of complex work in the router and SA side to make this > >kind of topology work, but it is critical that the clients use path > >queries that can provide enough data to the SA and return enough data > >to the client to support this. > > I'm still deciding if the existing path record attribute is sufficient. I'm of the opinion that it isn't a good fit. Look at how tortured things are just because the PR record does not have enough information to let the SA answer in the best way. Jason From jgunthorpe at obsidianresearch.com Mon Feb 12 16:10:45 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 12 Feb 2007 17:10:45 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> Message-ID: <20070213001045.GY11411@obsidianresearch.com> On Mon, Feb 12, 2007 at 03:31:15PM -0800, Michael Krause wrote: > TClass is intended to communicate the end-to-end QoS desired. TClass is > then mapped to a SL that is local to each subnet. A flow label is > intended to much the same as in the IP world and is left, in essence, to > routers to manage. An endnode look up should be to find the address > vector to the remote. A look up may return multiple vectors. The SLID > would correspond to each local subnet router port that acts as a first-hop > destination to the remote subnet. I don't see why the router protocol > would not simply enable all paths on the local subnet to a given remote > subnet be acquired. All of the work is kept local to the SA / SM in the > source subnet when determining a remote path to take. Why is there any > need to define more than just this? Define a router protocol to > communicate the each subnet's prefix, TClass, etc. and apply KISS. A > management entity that wanted to manage out each subnet provides router > management in terms of route selection, etc. can be constructed by using > the existing protocols / tools combined with a new router protocol which > only does DGID to next hop SLID mapping. All of this complexity is due to the RC QP requirement that the SLID of an incoming LRH match the DLID programmed into the QP. Translated into a network with routers this means that for a RC flow to successfully work both the *forward* and *reverse* direction must traverse the same router *LID* not just *port* on both subnets. Please see the little ascii diagram I drew in a prior email to understand my concern. There is no such restriction in a real IP network. It would be akin to having a host match the source MAC address in the ethernet frame to double check that it came from the router port it is sending outgoing packets to. Which means simple one-sided solutions from IP land don't work here. Things work exactly the way you outline today for UD. They don't work at all for the general case of RC. Get rid of the QP requirement and things work the way you outline for RC too. Keep it in and you must use the FlowLabel to force the flows onto the right router LID. That is why I said previously that the QP matching rules are a mistake. The best way to solve this is to change C9-54 to only be in effect if the GRH is not present. CM also introduces the much smaller problem of getting the LIDs to the passive side - but that cannot be solved without a broad solution to the RC QP SLID matching problem. Jason From rdreier at cisco.com Mon Feb 12 16:18:23 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 16:18:23 -0800 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will add the new cxgb3 RDMA driver for Chelsio T3 NICs, as well as IPoIB connected mode and various other smaller changes: Ahmed S. Darwish (1): IB/core: Use ARRAY_SIZE macro for mandatory_table Akinobu Mita (1): IB/ehca: Fix memleak on module unloading David Howells (1): IB/mthca: Work around gcc bug on sparc64 Michael S. Tsirkin (6): IPoIB: Connected mode experimental support IB/mthca: Fix reserved MTTs calculation on mem-free HCAs IB/mthca: Give reserved MTTs a separate cache line IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs IB/mthca: Merge MR and FMR space on 64-bit systems IB/mthca: Always fill MTTs from CPU Roland Dreier (1): IB/mthca: Use correct structure size in call to memset() Sean Hefty (2): RDMA/cma: Increment port number after close to avoid re-use IB: Remove redundant "_wq" from workqueue names Steve Wise (1): RDMA/cxgb3: Add driver for Chelsio T3 RNIC drivers/infiniband/Kconfig | 1 + drivers/infiniband/Makefile | 1 + drivers/infiniband/core/addr.c | 2 +- drivers/infiniband/core/cma.c | 68 +- drivers/infiniband/core/device.c | 3 +- drivers/infiniband/hw/cxgb3/Kconfig | 27 + drivers/infiniband/hw/cxgb3/Makefile | 12 + drivers/infiniband/hw/cxgb3/cxio_dbg.c | 207 +++ drivers/infiniband/hw/cxgb3/cxio_hal.c | 1280 +++++++++++++++ drivers/infiniband/hw/cxgb3/cxio_hal.h | 201 +++ drivers/infiniband/hw/cxgb3/cxio_resource.c | 331 ++++ drivers/infiniband/hw/cxgb3/cxio_resource.h | 70 + drivers/infiniband/hw/cxgb3/cxio_wr.h | 685 ++++++++ drivers/infiniband/hw/cxgb3/iwch.c | 189 +++ drivers/infiniband/hw/cxgb3/iwch.h | 177 ++ drivers/infiniband/hw/cxgb3/iwch_cm.c | 2081 ++++++++++++++++++++++++ drivers/infiniband/hw/cxgb3/iwch_cm.h | 223 +++ drivers/infiniband/hw/cxgb3/iwch_cq.c | 225 +++ drivers/infiniband/hw/cxgb3/iwch_ev.c | 231 +++ drivers/infiniband/hw/cxgb3/iwch_mem.c | 172 ++ drivers/infiniband/hw/cxgb3/iwch_provider.c | 1203 ++++++++++++++ drivers/infiniband/hw/cxgb3/iwch_provider.h | 367 +++++ drivers/infiniband/hw/cxgb3/iwch_qp.c | 1007 ++++++++++++ drivers/infiniband/hw/cxgb3/iwch_user.h | 67 + drivers/infiniband/hw/cxgb3/tcb.h | 632 +++++++ drivers/infiniband/hw/ehca/ehca_irq.c | 2 + drivers/infiniband/hw/mthca/mthca_cmd.c | 6 +- drivers/infiniband/hw/mthca/mthca_dev.h | 2 + drivers/infiniband/hw/mthca/mthca_main.c | 40 +- drivers/infiniband/hw/mthca/mthca_memfree.c | 127 ++- drivers/infiniband/hw/mthca/mthca_memfree.h | 9 +- drivers/infiniband/hw/mthca/mthca_mr.c | 110 ++- drivers/infiniband/hw/mthca/mthca_profile.c | 2 +- drivers/infiniband/hw/mthca/mthca_provider.c | 14 +- drivers/infiniband/hw/mthca/mthca_provider.h | 1 + drivers/infiniband/hw/mthca/mthca_qp.c | 2 +- drivers/infiniband/hw/mthca/mthca_srq.c | 9 +- drivers/infiniband/ulp/ipoib/Kconfig | 16 +- drivers/infiniband/ulp/ipoib/Makefile | 1 + drivers/infiniband/ulp/ipoib/ipoib.h | 215 +++ drivers/infiniband/ulp/ipoib/ipoib_cm.c | 1237 ++++++++++++++ drivers/infiniband/ulp/ipoib/ipoib_ib.c | 29 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 63 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 4 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 40 +- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 2 + 46 files changed, 11279 insertions(+), 114 deletions(-) create mode 100644 drivers/infiniband/hw/cxgb3/Kconfig create mode 100644 drivers/infiniband/hw/cxgb3/Makefile create mode 100644 drivers/infiniband/hw/cxgb3/cxio_dbg.c create mode 100644 drivers/infiniband/hw/cxgb3/cxio_hal.c create mode 100644 drivers/infiniband/hw/cxgb3/cxio_hal.h create mode 100644 drivers/infiniband/hw/cxgb3/cxio_resource.c create mode 100644 drivers/infiniband/hw/cxgb3/cxio_resource.h create mode 100644 drivers/infiniband/hw/cxgb3/cxio_wr.h create mode 100644 drivers/infiniband/hw/cxgb3/iwch.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch.h create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cm.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cm.h create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cq.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_ev.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_mem.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_provider.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_provider.h create mode 100644 drivers/infiniband/hw/cxgb3/iwch_qp.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_user.h create mode 100644 drivers/infiniband/hw/cxgb3/tcb.h create mode 100644 drivers/infiniband/ulp/ipoib/ipoib_cm.c From mshefty at ichips.intel.com Mon Feb 12 16:45:33 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Feb 2007 16:45:33 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070212235404.GX11411@obsidianresearch.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <45D0EE8E.4030906@ichips.intel.com> <20070212235404.GX11411@obsidianresearch.com> Message-ID: <45D10A2D.10104@ichips.intel.com> >>4. A PR from the local SA with reversible=1 indicates that data sent from >>the remote GID to the local GID using the PR TC and FL will route locally >>using the specified LID pair. This holds whether the PR SGID is local or >>remote. > >>5. A PR from a remote SA with reversible=1 indicates that data sent from >>the local GID to the remote GID using the PR TC and FL will route remotely >>using the specified LID pair. This holds whether the PR SGID is local or >>remote. > > I can't think how to actually implement these restrictions in the > general case without SLID spoofing and the general method I outlined > in my prior email. But you agree with the expectations, and what reversible indicates? Or are you claiming that reversible paths between different subnets is undefined, or means something different than specified in 13.5.4? (E.g. reversible applies only at the network level if global routing is used.) > Think about this - it is backwards for the UD case. You have specified > that the SGID->DGID direction uses the returned SLID/DLID which are > ensured by the flowlabel in the GRH. But the local side only controls > what it sends. How does this GRH get to the remote side? In UD the > returned GRH from the PR controls the selection of LID on the DGID's > subnet. That is how it must be. I'm not following you here. For UD, query the local SA, then direct the send to the router LID. I would only query the remote SA for RC, in order to get the remote LID information to put into the CM REQ. > The major problem is that there are multiple router paths that a given > GRH can take that are only fully disambiguated by the router lid at > the sender. But doesn't 19.2.4.1 imply that once a router selects a path, it will continue to use that same path for similar packets? So, if we inject a GRH into the internetwork from the source router, then isn't a single path followed to the remote endpoint? Relaxing 9.6.1.5 seems like a nice solution to most of the problems, but it also seems like one that would fail to work with any existing HCAs. - Sean From dledford at redhat.com Mon Feb 12 17:20:26 2007 From: dledford at redhat.com (Doug Ledford) Date: Mon, 12 Feb 2007 20:20:26 -0500 Subject: [openib-general] issues with compilation of ofed 1.2 In-Reply-To: <1171323354.28500.48.camel@stevo-desktop> References: <45C9EE31.2040602@voltaire.com> <6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com> <1170973693.19297.2.camel@firewall.xsintricity.com> <1171322647.28500.41.camel@stevo-desktop> <1171323354.28500.48.camel@stevo-desktop> Message-ID: <1171329626.3161.36.camel@fc6.xsintricity.com> On Mon, 2007-02-12 at 17:35 -0600, Steve Wise wrote: > Ok, I hacked around this by changing the build_env.sh. > > But I think build_env.sh will have to distinguish between rhel5 and all > other redhat releases so it can correctly set the prerequisite rpms... Yes, it will. > Steve. > > On Mon, 2007-02-12 at 17:24 -0600, Steve Wise wrote: > > I still get this error building on rhel5b2 with the latest from the ofa > > git trees: > > > > ERROR: The sysfsutils-devel package is required to build libibverbs_devel RPM > > [root at vic12 OFED-1.2-stevo]# rpm -qa|grep sysfs > > libsysfs-2.0.0-6 > > libsysfs-devel-2.0.0-6 > > libsysfs-2.0.0-6 > > sysfsutils-2.0.0-6 > > libsysfs-devel-2.0.0-6 > > > > > > I installed all the sysfs rpms I could find. So is there some > > dependency problem here in the OFED build script that is looking for the > > wrong rpm in rhel5? > > > > Is there a bug to track this issue? > > > > Steve. > > > > > > On Thu, 2007-02-08 at 17:28 -0500, Doug Ledford wrote: > > > On Thu, 2007-02-08 at 09:02 +0200, Moni Levy wrote: > > > > Doug, > > > > On 2/7/07, Yosef Etigin wrote: > > > > > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro. > > > > > > > > Can you please help us with that ? > > > > > > The value of the sysfsutils is far overshadowed by the value of libsysfs > > > (and libsysfs is far more commonly used). So, in RHEL5, the rpm package > > > names reflect this: > > > > > > libsysfs > > > sysfsutils (I think, might be libsysfs-utils) > > > libsysfs-devel > > > > > > It's all still there, just a different name. > > > > > > > -- Moni > > > > > > > > > > > > > > -- > > > > > Yosef Etigin > > > > > Alex Tabachnik > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jgunthorpe at obsidianresearch.com Mon Feb 12 18:03:30 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 12 Feb 2007 19:03:30 -0700 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45D10A2D.10104@ichips.intel.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <45D0EE8E.4030906@ichips.intel.com> <20070212235404.GX11411@obsidianresearch.com> <45D10A2D.10104@ichips.intel.com> Message-ID: <20070213020330.GZ11411@obsidianresearch.com> On Mon, Feb 12, 2007 at 04:45:33PM -0800, Sean Hefty wrote: > >>4. A PR from the local SA with reversible=1 indicates that data sent from > >>the remote GID to the local GID using the PR TC and FL will route locally > >>using the specified LID pair. This holds whether the PR SGID is local or > >>remote. > > > >>5. A PR from a remote SA with reversible=1 indicates that data sent from > >>the local GID to the remote GID using the PR TC and FL will route > >>remotely using the specified LID pair. This holds whether the PR SGID is > >>local or remote. > > > >I can't think how to actually implement these restrictions in the > >general case without SLID spoofing and the general method I outlined > >in my prior email. > > But you agree with the expectations, and what reversible indicates? Or are > you claiming that reversible paths between different subnets is undefined, > or means something different than specified in 13.5.4? (E.g. reversible > applies only at the network level if global routing is used.) I think pure reversible paths are a good idea to support on routed paths - meaning strictly the definition from 13.5.4. That is a GMP sender can request a PR with reversible=1 and know that if the receiver applies 13.5.4 then the reply packet will get back to the receiver. Note: As per the QP LID matching rules the SLID is not matched for UD - so a reversible PR would not have to guarentee the return path router SLID on the local side. What your #4 and #5 are talking about is not just that, but also PR queries that can unambigously identify the LID selections of the router in advance. That is hugely different! IMHO, just because a reversible path exists and will be used by the router shouldn't be taken to mean that the it is the only one or that the SA can tell you which of many possible choices it will be. > >Think about this - it is backwards for the UD case. You have specified > >that the SGID->DGID direction uses the returned SLID/DLID which are > >ensured by the flowlabel in the GRH. But the local side only controls > >what it sends. How does this GRH get to the remote side? In UD the > >returned GRH from the PR controls the selection of LID on the DGID's > >subnet. That is how it must be. > > I'm not following you here. For UD, query the local SA, then direct the > send to the router LID. I would only query the remote SA for RC, in order > to get the remote LID information to put into the CM REQ. I'm talking about the locality of information in the PR. Eg: PR query to SA: SGID=Node1 DGID=Node2 ==> Flowlabel=XX SLID=Node1 DLID=1 What direction does FlowLabel=xx refer to? Do you put it in the local side's QP or do you put it in the CM REQ? The use model that UD defines says it is to go in the QP, not the CM REQ. It also more or less requires that the remote SA have a hand in selecting the FlowLabel since the router on the Node2 subnet is the one that acts on it. When I read your mails I get the impression you want to put the FlowLabel from the local PR in the CM REQ - which makes huge amounts of sense, but is not really what is set out in IBA I feel. :< Staying aligned with the UD use model for PRs is why I outlined a solution that required the local SA to consult the remote SA to get the FlowLabel. > >The major problem is that there are multiple router paths that a given > >GRH can take that are only fully disambiguated by the router lid at > >the sender. > > But doesn't 19.2.4.1 imply that once a router selects a path, it will > continue to use that same path for similar packets? So, if we inject a GRH > into the internetwork from the source router, then isn't a single path > followed to the remote endpoint? Yes. Absolutely. I view this problem not as if there is an existing fixed path, but trying to find a way to support unambiguous identification of that path when the DGID alone is not enough information. [Ingress port, DGID, Flowlabel and TClass are the minimum required set AFAIK] BTW, 19.2.4.1 seems to imply that nothing in the spec is going to cause a problem for the routers path selection since 'a session is used in a deliberately vauge way'. My reading of 9.6.1.5 makes me pretty sure it causes a problem due to the LRH.SLID matching - you also agree right? > Relaxing 9.6.1.5 seems like a nice solution to most of the problems, but it > also seems like one that would fail to work with any existing HCAs. I agree. In fact until your mail last week I was operating under the assumption (reinforced by text like 19.2.4.1) that nothing like 9.6.1.5 existed in the spec. It wouldn't suprise me if the spec writers intended things to work as though 9.6.1.5 didn't cause this problem and reworked it. If so then cards that can't be fixed with a firmware upgrade wouldn't support mutliple routed paths, but would support the simple single router LID case. That might be acceptable. If so then I'd expect also for a SGID=off-subnet query to return the remote LIDs to make CM work properly with existing conforming implementations (that use 3 PR queries to get non-reversable paths ;>). Jason From krkumar2 at in.ibm.com Mon Feb 12 19:31:42 2007 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Tue, 13 Feb 2007 09:01:42 +0530 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <1171061217.4525.15.camel@stevo-desktop> Message-ID: Steve, I was doing "random%5 == 0" or some such and failing in iw_conn_req_handler(). There was no other explicit test case other than running rdma_bw using this hack. thanks, - KK Steve Wise wrote on 02/10/2007 04:16:57 AM: > > > All 4 above cases were tested by injecting random error in > > iw_conn_req_handler() and running rdma_bw/krping, they were > > confirmed. I added the BUG_ON() to confirm the earlier check > > for id_priv->refcount==0 should always be true (and could be > > removed). > > Can you post the test case you're using for this? > > > Steve. > > From krkumar2 at in.ibm.com Mon Feb 12 21:06:56 2007 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Tue, 13 Feb 2007 10:36:56 +0530 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler() In-Reply-To: <1171142795.11017.71.camel@stevo-desktop> Message-ID: Hi Steve, Thanks for the explanation. I reviewed the patch and had two comments : 1. When the set_bit(CALLBACK_DESTROY) was done, the refcount could be such so that the free_cm_id is not called, resulting in cm_id having work entries. But there are two places where a BUG_ON(!list_empty(work_list)) was added (before doing a free_cm_id()), both under check for CALLBACK_DESTROY. Isn't it possible for these BUG_ON's to get hit ? This is an error case and may not hit in normal testing. 2. > @@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i > /* Call the client CM handler */ > ret = cm_id->cm_handler(cm_id, iw_event); > if (ret) { > + iw_cm_reject(cm_id, NULL, 0); > set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); > destroy_cm_id(cm_id); > if (atomic_read(&cm_id_priv->refcount)==0) > - kfree(cm_id); > + free_cm_id(cm_id_priv); > } Though this is not a bug, the code just above this calls iw_destroy_cm_id() if alloc_work_entries() failed. Is it possible for the provider to get a reference to this cm_id during the failed call to the client CM handler ? I didn't think so, which is why in my original patch I had simply called iw_destroy_cm_id here. I had read your explanation about the provider possibly acquiring a reference, but in this place aren't we calling iw_conn_req_handler() which in turn cannot go to the device and cache a reference count ? The rest of the patch looks great. I am going to test this today and will post the results. Thanks, - KK > On Sat, 2007-02-10 at 14:36 -0600, Steve Wise wrote: > > ugh. > > > > There is at least one bug in this patch. I cannot call iw_cm_reject() > > inside destroy_cm_id() because both functions grab the iw_cm lock... > > > > > > This patch puts the iw_cm_reject() calls back in > cm_conn_req_handler()... > > > --- > > iw_cm_id destruction race condition fixes. > > From: Steve Wise > > Several changes: > > - iwcm_deref_id() always wakes up if there's another reference. > > - clean up race condition in cm_work_handler(). > > - create static void free_cm_id() which deallocs the work entries and then > kfrees the cm_id memory. This reduces code replication. > > - rem_ref() if this is the last reference -and- the IWCM owns freeing the > cm_id, then free it. > > Signed-off-by: Steve Wise > Signed-off-by: Tom Tucker > --- > > drivers/infiniband/core/iwcm.c | 47 > +++++++++++++++++++++------------------- > 1 files changed, 25 insertions(+), 22 deletions(-) > > diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c > index 1039ad5..891d1fa 100644 > --- a/drivers/infiniband/core/iwcm.c > +++ b/drivers/infiniband/core/iwcm.c > @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c > return 0; > } > > +static void free_cm_id(struct iwcm_id_private *cm_id_priv) > +{ > + dealloc_work_entries(cm_id_priv); > + kfree(cm_id_priv); > +} > + > /* > * Release a reference on cm_id. If the last reference is being > * released, enable the waiting thread (in iw_destroy_cm_id) to > @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c > */ > static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) > { > - int ret = 0; > - > BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > if (atomic_dec_and_test(&cm_id_priv->refcount)) { > BUG_ON(!list_empty(&cm_id_priv->work_list)); > - if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { > - BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); > - BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, > - &cm_id_priv->flags)); > - ret = 1; > - } > complete(&cm_id_priv->destroy_comp); > + return 1; > } > > - return ret; > + return 0; > } > > static void add_ref(struct iw_cm_id *cm_id) > @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_ > { > struct iwcm_id_private *cm_id_priv; > cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > - iwcm_deref_id(cm_id_priv); > + if (iwcm_deref_id(cm_id_priv) && > + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { > + BUG_ON(!list_empty(&cm_id_priv->work_list)); > + free_cm_id(cm_id_priv); > + } > } > > static int cm_event_handler(struct iw_cm_id *cm_id, struct > iw_cm_event *event); > @@ -355,7 +358,9 @@ static void destroy_cm_id(struct iw_cm_i > case IW_CM_STATE_CONN_RECV: > /* > * App called destroy before/without calling accept after > - * receiving connection request event notification. > + * receiving connection request event notification or > + * returned non zero from the event callback function. > + * In either case, must tell the provider to reject. > */ > cm_id_priv->state = IW_CM_STATE_DESTROYING; > break; > @@ -391,9 +396,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c > > wait_for_completion(&cm_id_priv->destroy_comp); > > - dealloc_work_entries(cm_id_priv); > - > - kfree(cm_id_priv); > + free_cm_id(cm_id_priv); > } > EXPORT_SYMBOL(iw_destroy_cm_id); > > @@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i > /* Call the client CM handler */ > ret = cm_id->cm_handler(cm_id, iw_event); > if (ret) { > + iw_cm_reject(cm_id, NULL, 0); > set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); > destroy_cm_id(cm_id); > if (atomic_read(&cm_id_priv->refcount)==0) > - kfree(cm_id); > + free_cm_id(cm_id_priv); > } > > out: > @@ -854,13 +858,12 @@ static void cm_work_handler(struct work_ > destroy_cm_id(&cm_id_priv->id); > } > BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > - if (iwcm_deref_id(cm_id_priv)) > - return; > - > - if (atomic_read(&cm_id_priv->refcount)==0 && > - test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { > - dealloc_work_entries(cm_id_priv); > - kfree(cm_id_priv); > + if (iwcm_deref_id(cm_id_priv)) { > + if (test_bit(IWCM_F_CALLBACK_DESTROY, > + &cm_id_priv->flags)) { > + BUG_ON(!list_empty(&cm_id_priv->work_list)); > + free_cm_id(cm_id_priv); > + } > return; > } > spin_lock_irqsave(&cm_id_priv->lock, flags); > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Mon Feb 12 21:45:19 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Feb 2007 21:45:19 -0800 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: <20070210211508.GD14903@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 10 Feb 2007 23:15:08 +0200") References: <20070210211508.GD14903@mellanox.co.il> Message-ID: OK, I already merged this but now I'm thinking it's somewhat buggy: > + if (coherent) > + ret = mthca_alloc_icm_coherent(&dev->pdev->dev, > + &chunk->mem[chunk->npages], > + cur_order, gfp_mask); > + else > + ret = mthca_alloc_icm_pages(&chunk->mem[chunk->npages], > + cur_order, gfp_mask); > > - if (++chunk->npages == MTHCA_ICM_CHUNK_LEN) { > + if (!ret) { > + ++chunk->npages; > + > + if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) { > chunk->nsg = pci_map_sg(dev->pdev, chunk->mem, I don't see anything that ever bumps chunk->nsg if we're allocating a coherent region and we end up needing more than one allocation to do it. Maybe something like this on top of the patch? diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index 0b9d053..48f7c65 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -175,7 +175,9 @@ struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages, if (!ret) { ++chunk->npages; - if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) { + if (coherent) + ++chunk->nsg; + else if (chunk->npages == MTHCA_ICM_CHUNK_LEN) { chunk->nsg = pci_map_sg(dev->pdev, chunk->mem, chunk->npages, PCI_DMA_BIDIRECTIONAL); From erezz at voltaire.com Mon Feb 12 22:21:57 2007 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 13 Feb 2007 08:21:57 +0200 Subject: [openib-general] OFED 1.2 components list - for the meeting today In-Reply-To: <45D098E2.6000804@mellanox.co.il> References: <45D098E2.6000804@mellanox.co.il> Message-ID: <45D15905.8010204@voltaire.com> Tziporet Koren wrote: > This is the full OFED 1.2 components list that we will review in the meeting > > Tziporet > > # Kernel > ib_verbs (core) > ib_mthca > ib_ipoib > ib_ipath - currently works on 2.6.20 only. Backport patches cannot applied > ib_iser > ib_sdp > ib_srp > ib_ehca - PPC only > cxgb3 > vnic > rds - currently works on kernel 2.6.20 and 2.6.19 > ib-bonding - RHEL4UP3 & SLES10 > > # User libraries > libibverbs > libibcm > libmthca > libipathverbs > libcxgb3 > libsdp > libehca > sdpnetstat > libibcommon > libibmad > libibumad > libopensm > libosmcomp > libosmvendor > librdmacm > dapl - not working with iWARP > > # User utilities > perftest > mstflint > ibutils > opensm > qlvnictools > openib-diags > srptools > ipoibtools > tvflash > > open-iscsi is missing and should be placed under "User utilities". > # MPI: > mvapich > mvapich2 - Build issue > openmpi > mpitests > > # OFED specific: > ofed_docs - taken from 1.1 - not yet updated for 1.2 > ofed_scripts > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From erezz at voltaire.com Mon Feb 12 22:26:04 2007 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 13 Feb 2007 08:26:04 +0200 Subject: [openib-general] ofa_1_2_kernel 20070212-0200 daily build status In-Reply-To: <20070212102414.E14F9E60806@openfabrics.org> References: <20070212102414.E14F9E60806@openfabrics.org> Message-ID: <45D159FC.50806@voltaire.com> vlad at lists.openfabrics.org wrote: > This email was generated automatically, please do not reply > > > Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod > Vlad, We talked about adding open-iscsi over iSER to this daily build several weeks ago. Can you tell when will you be able to do that? It is really important for us. Thanks, Erez From karun.sharma at qlogic.com Mon Feb 12 22:29:23 2007 From: karun.sharma at qlogic.com (Karun Sharma) Date: Tue, 13 Feb 2007 00:29:23 -0600 Subject: [openib-general] new OFED 1.2 package References: <45CB938E.5040305@mellanox.co.il> Message-ID: Not able to install OFED1.2 on SLES10 machines (x86_64) even after disabling ipath. Attached are the logs generated by install script. Observed some error with open-iscsi module. Disabling this module also doesn't help. Thanks Karun ________________________________ From: openib-general-bounces at openib.org on behalf of Tziporet Koren Sent: Fri 2/9/2007 2:48 AM To: EWG; OPENIB Subject: [openib-general] new OFED 1.2 package New OFED package was uploaded to the OFA server: http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070208-1508.tgz Many of the issues reported on the previous version are resolved (bugzilla will be updated next week). Since we had lab restructuring we did only basic tests on RHEL up4 and SLES10 (x86 and x86_64) All - we are going for our weekend now. Please report all issues you encounter so we will be able to fix and do the alpha release on Monday. Thanks, Tziporet & Vlad _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: error.log Type: application/octet-stream Size: 1454707 bytes Desc: error.log URL: From vlad at dev.mellanox.co.il Mon Feb 12 11:24:21 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 12 Feb 2007 21:24:21 +0200 Subject: [openib-general] MVAPICH2 SRPM update and install files patch In-Reply-To: <45CE1C1C.70406@cse.ohio-state.edu> References: <45CE1C1C.70406@cse.ohio-state.edu> Message-ID: <1171308261.12725.9.camel@vladsk-laptop> On Sat, 2007-02-10 at 14:25 -0500, Shaun Rowland wrote: > I updated the latest MVAPICH2 SRPM: > > https://www.openfabrics.org/~rowland/ofed_1_2/ > > I am including a patch to the latest ofed_1_2_scripts git files. Since > these files are the same as those used in the OFED-1.2-20070208-1508.tgz > package, this patch can also be applied there. This patch is required to > use the new MVAPICH2 SRPM file and should not be used with the older > versions. Hi Shaun, Mvapich2 RPM build fails. Please fix the files list in mvapich2.spec. You need to put the path to mvapich2 directory instead of prefix. %{prefix} includes all OFED's files. mvapich2.spec: %files %{_prefix} -- Vladimir Sokolovsky Mellanox Technologies Ltd. From dotanb at dev.mellanox.co.il Tue Feb 13 00:17:28 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 13 Feb 2007 10:17:28 +0200 Subject: [openib-general] [GIT PULL] please pull infiniband.git In-Reply-To: References: Message-ID: <45D17418.3000508@dev.mellanox.co.il> Hi Roland. Roland Dreier wrote: > Linus, please pull from > > master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus > > This tree is also available from kernel.org mirrors at: > > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus > > This will add the new cxgb3 RDMA driver for Chelsio T3 NICs, as well > as IPoIB connected mode and various other smaller changes: > > Ahmed S. Darwish (1): > IB/core: Use ARRAY_SIZE macro for mandatory_table > > Akinobu Mita (1): > IB/ehca: Fix memleak on module unloading > > David Howells (1): > IB/mthca: Work around gcc bug on sparc64 > > Michael S. Tsirkin (6): > IPoIB: Connected mode experimental support > IB/mthca: Fix reserved MTTs calculation on mem-free HCAs > IB/mthca: Give reserved MTTs a separate cache line > IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs > IB/mthca: Merge MR and FMR space on 64-bit systems > IB/mthca: Always fill MTTs from CPU > > Roland Dreier (1): > IB/mthca: Use correct structure size in call to memset() > > Sean Hefty (2): > RDMA/cma: Increment port number after close to avoid re-use > IB: Remove redundant "_wq" from workqueue names > > Steve Wise (1): > RDMA/cxgb3: Add driver for Chelsio T3 RNIC > What about the patch that i sent on "Allow the following QP state transition : reset --> reset"? thanks Dotan From erezz at voltaire.com Tue Feb 13 01:42:21 2007 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 13 Feb 2007 11:42:21 +0200 Subject: [openib-general] new OFED 1.2 package In-Reply-To: References: <45CB938E.5040305@mellanox.co.il> Message-ID: <45D187FD.4070500@voltaire.com> Karun Sharma wrote: > Not able to install OFED1.2 on SLES10 machines (x86_64) even after > disabling ipath. > Attached are the logs generated by install script. Observed some error > with open-iscsi module. Disabling this module also doesn't help. I made a fix in open-iscsi build. It should work once it is merged into the new OFED build. Let me know if it doesn't work. Erez From vlad at lists.openfabrics.org Tue Feb 13 02:23:48 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Tue, 13 Feb 2007 02:23:48 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070213-0200 daily build status Message-ID: <20070213102349.5FECBE60809@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Failed: From devesh28 at gmail.com Tue Feb 13 05:37:22 2007 From: devesh28 at gmail.com (Devesh Sharma) Date: Tue, 13 Feb 2007 19:07:22 +0530 Subject: [openib-general] Immediate data question In-Reply-To: <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> Message-ID: <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com> On 2/12/07, Devesh Sharma wrote: > On 2/10/07, Tang, Changqing wrote: > > > > > > > >Not for the receiver, but the sender will be severely slowed down by > > > >having to wait for the RNR timeouts. > > > > > > RNR = Receiver Not Ready so by definition, the data flow > > > isn't going to > > > progress until the receiver is ready to receive data. If a > > > receive QP > > > enters RNR for a RC, then it is likely not progressing as > > > desired. RNR > > > was initially put in place to enable a receiver to create > > > back pressure to the sender without causing a fatal error > > > condition. It should rarely be entered and therefore should > > > have negligible impact on overall performance however when a > > > RNR occurs, no forward progress will occur so performance is > > > essentially zero. > > > > Mike: > > I still do not quite understand this issue. I have two > > situations that have RNR triggered. > > > > 1. process A and process B is connected with QP. A first post a send to > > B, B does not post receive. Then A and B are doing a long time > > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE > > message. Finally B will post a receive. Does the first pending send in A > > block all the later RDMA_WRITE ? > According to IBTA spec HCA will process WR entries in strict order in > which they are posted so the send will block all WR posted after this > send, Until-unless HCA has multiple processing elements, I think even > then processing order will be maintained by HCA > If not, since RNR is triggered > > periodically till B post receive, does it affect the RDMA_WRITE > > performance between A and B ? > > > > 2. extend above to three processes, A connect to B, B connect to C, so B > > has two QPs, but one CQ.A posts a send to B, B does not post receive, post ordering accross QP is not guaranteed hence presence of same CQ or different CQ will not affect any thing. > > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C, _may_ affect the performance, since load is on same HCA. In case of Send/Recv again _may_ affect the performance, with the same reason. > > must sends RNR periodically to A, right?. So does the pending message > > from A affects B's overall performance between B and C ? But RNR NAK is not for very long time.....possibly this performance hit you will not be able to observe even. The moment rnr_counter expires connection will be broken! > > > > Thank you. > > > > --CQ > > > > > > > > > > Mike > > > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From jsquyres at cisco.com Tue Feb 13 06:07:03 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 13 Feb 2007 09:07:03 -0500 Subject: [openib-general] uDAPL in OFED 1.1 question Message-ID: I have an OFED 1.1 cluster where something odd is happening in the udapl Open MPI plugin (I'm not excluding the possibility that we have a bug in the OMPI udapl plugin -- I'm just trying to understand some uDAPL behavior). In some cases, we are getting back the error DAT_CONN_QUAL_IN_USE from dat_ep_create(). However, someone more knowledgeable about udapl than me said that the spec says that DAT_CONN_QUAL_IN_USE should only be reported back from a call to dat_psp_create() or dat_rsp_create(). Can someone tell me exactly what dat_ep_create() returning DAT_CONN_QUAL_IN_USE means? Thanks. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From halr at voltaire.com Tue Feb 13 07:15:04 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Feb 2007 10:15:04 -0500 Subject: [openib-general] OSM QoS policy file In-Reply-To: <45C72515.8090100@dev.mellanox.co.il> References: <45C72515.8090100@dev.mellanox.co.il> Message-ID: <1171379703.22446.15877.camel@hal.voltaire.com> Hi Yevgeny, Sorry for the slow response; I've been consumed getting ready for OFED 1.2 alpha. On Mon, 2007-02-05 at 07:37, Yevgeny Kliteynik wrote: > Hi Hal. > > I added osm/doc/qos-policy.txt file with the description of the QoS > policy file, and an example of such file (with more comments inside). > I'm sure you'll have questions and corrections regarding this file, > so for now, to make our work easier, I'm not sending it as patch, > but just as text. Please review the file. Thanks for doing this. This helps but I still do have a number of questions on it as you expected. See below for specifics. It would be nice to turn this into a DTD when things get closer to finalizing so XML configs could readily be validated. Can you do this ? I'd also like to see a futures/todo list. I think we've discussed a few topics which fall into this category. Thanks. -- Hal > Thanks > > -- Yevgeny > > ============================================================= > > QoS Policy File > =============== > > The QoS policy file is divided into 4 sub sections: > > - Port Group: a set of CAs, Routers or Switches that share > the same settings. A port group might be a partition > defined by the partition manager policy in terms of > GUIDs. Future implementations might provide support > for NodeDescription based definition of port groups. IMO, this group be a separate schema on which this (and partitions and perhaps other things are based) ? > - Fabric Setup: > Defines how the SL2VL and VLArb tables should be setup. > This policy definition assumes the computation of target > behavior should be performed outside of OpenSM. Rather than fabric setup, is this better named QoS Setup (which seems consistent with the tag used below) or QoS Fabric Setup ? Also, what is the relation of this group to the port group ? > - QoS-Levels Definition: > This section defines the possible sets of parameters for > QoS that a client might be mapped to. Each set holds: SL > and optionally: Max MTU, Max Rate, Packet Lifiteme and > QoS Class. > > - Matching Rules: > A list of rules that match an incoming PathRecord request > to a QoS-Level. The rules are processed in order such as > the first match is applied. Each rule is built out of set > of match expressions which should all match for the rule > to apply. The matching expressions are defined for the > following fields: > - SRC and DST to lists of port groups > - Service-ID to a list of Service-ID or Service-ID ranges > - QoS Class to a list of QoS Class values or ranges > > > Example of the QoS policy file > ============================== > > > > > > > > Storage > I would think the name is for logging. use also ? > our SRP storage targets > 0x1000000000000001 > 0x1000000000000002 > > > Virtual Servers > node desc and IB port # > > vs1/HCA-1/P1 > vs3/HCA-1/P1 > vs3/HCA-2/P1 Shouldn't this be CA rather than HCA ? I think this may also cover routers too. Also, any support for switches ? > > > > Partition 1 > default settings > Part1 Thiswould correlate to the partition named Part1 in the partition configuration. Should pkey based port groups be supported as well ? Just wondering... The current partition config indicates a set of port GUIDs and whether they are full or limited members. As mentioned before, I would prefer that this heads towards a port grouping schema on which both partitions and QoS and perhaps other things depend. > > > > Routers > all routers > ROUTER > > This grouping is similar to existing QoS support. For switches, there are external/physical ports and extended switch port 0 which are different. Base switch port 0 does not support QoS. > > > > > Part1 > > * > > * > > 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > > > > Storage1 > > Storage2 > 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0 > > > > > > > > > Storage > > Storage > > 0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1 > 8:255,9:127,10:63,11:31,12:15,13:7,14:3 > 10 > > > > > > > > > > 1 > for the lowest priority comm > 16 SL 16 is not valid. > 123 Will this take hex as well as decimal ? > 16 > > > > 2 > low latency best bandwidth > 0 > 7 > > > > 3 > just an example > 0 > 32 > 1 > 1 > 12 > > > > > > > > 1 Can this be rather than or can't the keywords be duplicated ? > low latency by class 7-9 or 11 > 7-9,11 > > > > 2 > Storage targets connection> > Storage Is destination a port group and used for matching destination GID or LID on SA PR/MPR lookups ? > 22,4719-5000 > > > > 3 > bla bla > Storage Is source a port group and used for matching source GID or LID on SA PR/MPR lookups ? > > > > > > > Explanation of some fields > ========================== > > Most of the tags meaning is either intuitive or explained by the > comments along the file. One section that deserves a special > explanation is SL2VL tables definition - . > > In general, VL is a function of in-port (the port that the packet > has entered through), out-port (the port that the packet is supposed > to come out from) and the SL. > In OpenSM, SL2VL table is defined on every port, where this port is > an out-port. Hence, on every port, SL2VL table is defined as function > of in-port and SL. Would the syntax work for any SM ? Are the below tags applicable to more than switches or only switches ? > n,m Will it take n-m too (port range) ? Might be more concise for some configs. > This means that of all the ports of the specified port group, define > SL2VL tables where to-ports are ports number n and m. Since SL2VL > table is defined per out-port, using effectively means defining > SL2VL table on ports n and m. > In order to specify that SL2VL table should be defined on all the > ports, an asterisk (*) can be used. > > i,j Will this take i-j too (port range) ? Might be more concise for some configs. > This means that of all the ports of the specified port group that were > not filtered out by the value, define SL2VL table only for entries > where from-ports are ports number i and j. > In order to specify that SL2VL table should be defined for all the in-ports, > an asterisk (*) can be used. > > To specify that all the SL2VL tables entries should be defined for all > the ports of a certain group, use the following: > port_group > * > * > > PortGroupName > > This is combination of keyword (that can be found in VLArb tables > definition) and keyword. > PortGroupName means that the ports that we're talking about > are all the ports that are connected to ports that belong to PortGroupName. > Essintially, PortGroupName means the folowing: > list_of_all_the_ports_that_are_connected_to_group_PortGroupName > > Example of usage of : > A user has a set of 'special' nodes (e.g. storage nodes), and all the > traffic to these nodes has to get specific VL. The solution is to define port > group (i.e "Storage") that will include all the ports of these nodes, and then > to configure SL2VL tables on all the switch ports that are connected to the > Storage port group by specifying Storage > > PortGroupName > > Similar to , is combination of and > keywords. Is omission of these keywords treated as a wildcard (*) ? After initial read of this, I have the following higher level questions/thoughts: How are trunk (switch to switch) links handled by the QoS syntax ? I also need to think more about the across ramifications. Is it really simpler to use this syntax than to specify the specific ports in question ? From swise at opengridcomputing.com Tue Feb 13 07:30:10 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Feb 2007 09:30:10 -0600 Subject: [openib-general] mvapich2 ofed 1.2 problem Message-ID: <1171380610.15471.25.camel@stevo-desktop> Hey Roland, Does this stack indicate that libibverbs is accessing a 1.0 provider? cxgb3 shouldn't be 1.0 right? Core was generated by `IMB_2.3/src/IMB-MPI1'. Program terminated with signal 11, Segmentation fault. ... (gdb) bt #0 __ibv_alloc_pd (context=0x1) at src/verbs.c:143 #1 0x00002b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830) at src/compat-1_0.c:572 #2 0x00002b832cfef04e in rdma_cm_init_pd_cq () from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so #3 0x00002b832cfef415 in rdma_cm_create_qp () from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so #4 0x00002b832cfefa37 in ib_cma_event_handler () from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so #5 0x00002b832cfefcc0 in cm_thread () from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so #6 0x0000003cd9406305 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003cd88cd66d in clone () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () (gdb) p *context Cannot access memory at address 0x1 (gdb) up #1 0x00002b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830) at src/compat-1_0.c:572 572 src/compat-1_0.c: No such file or directory. in src/compat-1_0.c (gdb) p *context $1 = {device = 0x617100, ops = { query_device = 0x2b832dcf2bc0 , query_port = 0x2b832dcf2ba0 , alloc_pd = 0x2b832dcf2b30 , dealloc_pd = 0x2b832dcf2af0 , reg_mr = 0x2b832dcf29b0 , dereg_mr = 0x2b832dcf2c30 , create_cq = 0x2b832dcf3050 , poll_cq = 0x2b832dcf1770 , req_notify_cq = 0x2b832dcf10c0 , cq_event = 0, resize_cq = 0x2b832dcf2870 , destroy_cq = 0x2b832dcf2f50 , create_srq = 0x2b832dcf2880 , modify_srq = 0x2b832dcf2890 , query_srq = 0, destroy_srq = 0x2b832dcf28a0 , post_srq_recv = 0x2b832dcf28b0 , create_qp = 0x2b832dcf2d30 , query_qp = 0, modify_qp = 0x2b832dcf2900 , destroy_qp = 0x2b832dcf3200 , post_send = 0x2b832dcf1fa0 , post_recv = 0x2b832dcf2460 , create_ah = 0x2b832dcf28c0 , destroy_ah = 0x2b832dcf28d0 , attach_mcast = 0x2b832dcf28e0 , detach_mcast = 0x2b832dcf28f0 }, cmd_fd = 768552128, async_fd = 11139, num_comp_vectors = 8, real_context = 0x1} From rdreier at cisco.com Tue Feb 13 09:07:05 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 13 Feb 2007 09:07:05 -0800 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <1171380610.15471.25.camel@stevo-desktop> (Steve Wise's message of "Tue, 13 Feb 2007 09:30:10 -0600") References: <1171380610.15471.25.camel@stevo-desktop> Message-ID: > Does this stack indicate that libibverbs is accessing a 1.0 provider? > cxgb3 shouldn't be 1.0 right? > #1 0x00002b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830) > at src/compat-1_0.c:572 > #2 0x00002b832cfef04e in rdma_cm_init_pd_cq () > from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so This means that the app (or maybe the RDMA CM library?) is linked against the 1.0 API -- which should work even with cxgb3 actually. But maybe mvapich is built against the 1.1 API and the RDMA CM is built against 1.0 or something? - R. From swise at opengridcomputing.com Tue Feb 13 09:11:26 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Feb 2007 11:11:26 -0600 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: References: <1171380610.15471.25.camel@stevo-desktop> Message-ID: <1171386686.15471.36.camel@stevo-desktop> On Tue, 2007-02-13 at 09:07 -0800, Roland Dreier wrote: > > Does this stack indicate that libibverbs is accessing a 1.0 provider? > > cxgb3 shouldn't be 1.0 right? > > > #1 0x00002b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830) > > at src/compat-1_0.c:572 > > #2 0x00002b832cfef04e in rdma_cm_init_pd_cq () > > from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so > > This means that the app (or maybe the RDMA CM library?) is linked > against the 1.0 API -- which should work even with cxgb3 actually. > But maybe mvapich is built against the 1.1 API and the RDMA CM is > built against 1.0 or something? > How do I tell? Can I tell from the .so files? I can build a non-mpi app against the librdmacm and libibverbs that got installed and things work ok. So maybe libmpich is balled up somehow. Interestingly, the mpi example program, cpi, that gets built with the rpm works. Its just mpi programs that I build using the mpicc which links to the libmpich.so Steve. From vlad at dev.mellanox.co.il Tue Feb 13 09:19:27 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 13 Feb 2007 19:19:27 +0200 Subject: [openib-general] new OFED 1.2 package Message-ID: <1171387167.3978.90.camel@vladsk-laptop> New OFED package was uploaded to the OFA server: http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.tgz Known issues: mvapich2 RPM build fails (will be fixed in alpha1). sdpnetstat compilation fails in RHEL5 -- Vladimir Sokolovsky Mellanox Technologies Ltd. From rdreier at cisco.com Tue Feb 13 09:21:13 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 13 Feb 2007 09:21:13 -0800 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <1171386686.15471.36.camel@stevo-desktop> (Steve Wise's message of "Tue, 13 Feb 2007 11:11:26 -0600") References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> Message-ID: > How do I tell? Can I tell from the .so files? ldd on the .so and the app would probably give you good info. I'm pretty sure that mpicc must be linking against an libibverbs 1.0 from somewhere. - R. From swise at opengridcomputing.com Tue Feb 13 09:36:36 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Feb 2007 11:36:36 -0600 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> Message-ID: <1171388196.15471.47.camel@stevo-desktop> On Tue, 2007-02-13 at 09:21 -0800, Roland Dreier wrote: > > How do I tell? Can I tell from the .so files? > > ldd on the .so and the app would probably give you good info. > > I'm pretty sure that mpicc must be linking against an libibverbs 1.0 > from somewhere. > > - R. By the way, the problem also happens running over mthca/IB with librdmacm. mpicc has '-libverbs' mpicc.conf has '-libverbs' too. ldd output. Looks like they are all linking to libibverbs.so.1. Is that correct? [mpi at vic20 ~]$ ldd IMB_2.3/src/IMB-MPI1 libmpich.so => /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so (0x00002b0d7cefb000) librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b0d7d1b3000) libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b0d7d2b8000) libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b0d7d3c3000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000) librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000) libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000) libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000) libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b0d7d4cf000) /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000) [mpi at vic20 ~]$ ldd /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so libc.so.6 => /lib64/tls/libc.so.6 (0x00002b6a6061d000) /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) [mpi at vic20 ~]$ ldd /usr/local/ofed/lib64/librdmacm.so libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b3ef50de000) libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x00002b3ef51ea000) libc.so.6 => /lib64/tls/libc.so.6 (0x00002b3ef52f6000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00002b3ef552a000) libdl.so.2 => /lib64/libdl.so.2 (0x00002b3ef5640000) /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) [mpi at vic20 ~]$ ldd /usr/local/ofed/lib64/libcxgb3-rdmav2.so libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b83c160e000) libc.so.6 => /lib64/tls/libc.so.6 (0x00002b83c171a000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00002b83c194e000) libdl.so.2 => /lib64/libdl.so.2 (0x00002b83c1a63000) /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) [mpi at vic20 ~]$ ldd /usr/local/ofed/lib64/libcxgb3.so libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002ac8e4920000) libc.so.6 => /lib64/tls/libc.so.6 (0x00002ac8e4a2c000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00002ac8e4c60000) libdl.so.2 => /lib64/libdl.so.2 (0x00002ac8e4d75000) /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) From swise at opengridcomputing.com Tue Feb 13 09:51:24 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Feb 2007 11:51:24 -0600 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <1171388196.15471.47.camel@stevo-desktop> References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <1171388196.15471.47.camel@stevo-desktop> Message-ID: <1171389084.15471.56.camel@stevo-desktop> So this program doesn't work: > [mpi at vic20 ~]$ ldd IMB_2.3/src/IMB-MPI1 > libmpich.so => /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so (0x00002b0d7cefb000) > librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b0d7d1b3000) > libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b0d7d2b8000) > libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b0d7d3c3000) > libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000) > librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000) > libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000) > libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000) > libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000) > libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b0d7d4cf000) > /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000) > And this one does: [root at vic20 ~]# ldd /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/examples/cpi libm.so.6 => /lib64/tls/libm.so.6 (0x0000003e06800000) librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b1353b65000) libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b1353c6a000) libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b1353d75000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000) librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000) libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000) libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000) libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b1353e81000) /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000) Note the cpi program doesn't dynamically link with libmpich.so. That appears to be the difference... From sean.hefty at intel.com Tue Feb 13 09:53:10 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Feb 2007 09:53:10 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070213020330.GZ11411@obsidianresearch.com> Message-ID: <000501c74f97$d3497090$8698070a@amr.corp.intel.com> >What your #4 and #5 are talking about is not just that, but also PR >queries that can unambigously identify the LID selections of the >router in advance. That is hugely different! IMHO, just because a >reversible path exists and will be used by the router shouldn't be >taken to mean that the it is the only one or that the SA can tell you >which of many possible choices it will be. Yes - I was trying to define a routed path as reversible with respect to a connection. It makes things easier. :) This is where we've been disconnecting. I was wanting a packet sent from the remote GID to the local GID to come back over the local DLID/SLID path specified in the path record if reversible is true. I give. This was too strong of an assumption, since the response path could travel a different DLID/SLID path and still qualify as reversible. So, it seems that with respect to connections between subnets, path records should be treated as if they were not reversible. Using my model then would require 4 queries... (I need to read back through the discussion and see if the different ideas can be condensed/summarized.) >If so then I'd expect also for a SGID=off-subnet query to return the >remote LIDs to make CM work properly with existing conforming >implementations (that use 3 PR queries to get non-reversable paths >;>). I think it makes more sense to push interaction with a remote SA to the end node to give them greater control over the query and avoid the local SA indirection. - Sean From panda at cse.ohio-state.edu Tue Feb 13 09:59:19 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue, 13 Feb 2007 12:59:19 -0500 (EST) Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <1171389084.15471.56.camel@stevo-desktop> from "Steve Wise" at Feb 13, 2007 11:51:24 AM Message-ID: <200702131759.l1DHxJGC027072@xi.cse.ohio-state.edu> Steve - Shaun will send a detailed reply to you on this issue shortly. It looks like the patch sent by Shaun to Vlad (on Saturday) was not applied to the latest OFED install script/build. This might be causing all these problems. Vlad and Shaun have discussed this issue today morning. Shaun has sent another updated patch to Vlad today (during the last hour). Vlad will check and apply it tomorrow. Hopefully, this will solve all the problems. Thanks, DK > So this program doesn't work: > > > [mpi at vic20 ~]$ ldd IMB_2.3/src/IMB-MPI1 > > libmpich.so => /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so (0x00002b0d7cefb000) > > librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b0d7d1b3000) > > libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b0d7d2b8000) > > libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b0d7d3c3000) > > libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000) > > librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000) > > libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000) > > libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000) > > libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000) > > libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b0d7d4cf000) > > /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000) > > > > And this one does: > > [root at vic20 ~]# ldd /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/examples/cpi > libm.so.6 => /lib64/tls/libm.so.6 (0x0000003e06800000) > librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b1353b65000) > libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b1353c6a000) > libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b1353d75000) > libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000) > librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000) > libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000) > libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000) > libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000) > libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b1353e81000) > /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000) > > > Note the cpi program doesn't dynamically link with libmpich.so. That > appears to be the difference... > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rowland at cse.ohio-state.edu Tue Feb 13 10:01:47 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Tue, 13 Feb 2007 13:01:47 -0500 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> Message-ID: <45D1FD0B.2080606@cse.ohio-state.edu> Roland Dreier wrote: > > How do I tell? Can I tell from the .so files? > > ldd on the .so and the app would probably give you good info. > > I'm pretty sure that mpicc must be linking against an libibverbs 1.0 > from somewhere. > > - R. When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is built, at least by looking at the .so file result: [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a libibverbs.so libibverbs.so.1 libibverbs.so.1.0.0 This seems odd. Is it correct? I have updated the MVAPICH2 SRPM and sent a new patch for the OFED install scripts. This won't be reflected until the alpha1 release. Still, does the above seem strange? I noticed this recently. I see symbols for both versions though: 0000000000005a50 T ibv_detach_mcast at IBVERBS_1.0 00000000000082c0 T ibv_detach_mcast@@IBVERBS_1.1 0000000000000000 A IBVERBS_1.0 0000000000000000 A IBVERBS_1.1 Our code links to these libraries, and by default mpicc should use what's in /usr/local/ofed/lib[64] in the -L path itself directly too. Is this an issue in the library? The libmpich.so file should not be any different when built. We will investigate this. I can provide a patch against the latest OFED tar.gz to use the mvapich2-0.9.8-3.src.rpm once I download the release if that would help, as we have changed some things since the -2 SRPM release. Again, this should be reflected in the alpha1 release. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From vatsa at veritas.com Tue Feb 13 10:06:04 2007 From: vatsa at veritas.com (vatsa at veritas.com) Date: Tue, 13 Feb 2007 10:06:04 -0800 Subject: [openib-general] new OFED 1.2 package In-Reply-To: <1171387167.3978.90.camel@vladsk-laptop> References: <1171387167.3978.90.camel@vladsk-laptop> Message-ID: <45D1FE0C.1050203@veritas.com> Hi, Is there a way to get OFED 1.2 binary rpms for RHEL4 Update 4 on x86_64 ? Thanks, Sreevatsa > New OFED package was uploaded to the OFA server: > http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.tgz > > > > Known issues: > mvapich2 RPM build fails (will be fixed in alpha1). > sdpnetstat compilation fails in RHEL5 > > > From rdreier at cisco.com Tue Feb 13 10:45:58 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 13 Feb 2007 10:45:58 -0800 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <45D1FD0B.2080606@cse.ohio-state.edu> (Shaun Rowland's message of "Tue, 13 Feb 2007 13:01:47 -0500") References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> Message-ID: > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is > built, at least by looking at the .so file result: > > [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a > libibverbs.so > libibverbs.so.1 > libibverbs.so.1.0.0 The soname hasn't changed because the library is still compatible. But (I hope at least) OFED has libibverbs 1.1. From rdreier at cisco.com Tue Feb 13 11:05:09 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 13 Feb 2007 11:05:09 -0800 Subject: [openib-general] [GIT PULL] please pull infiniband.git In-Reply-To: <45D17418.3000508@dev.mellanox.co.il> (Dotan Barak's message of "Tue, 13 Feb 2007 10:17:28 +0200") References: <45D17418.3000508@dev.mellanox.co.il> Message-ID: > What about the patch that i sent on "Allow the following QP state > transition : reset --> reset"? OK, I'll merge that in the next patch. It's the kind of patch I'm not happy about merging, since it bloats the code to handle a corner case no one is likely to hit in practice, but it is technically correct so I guess we're forced to merge it. - R. From swise at opengridcomputing.com Tue Feb 13 11:52:20 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Feb 2007 13:52:20 -0600 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <45D1FD0B.2080606@cse.ohio-state.edu> References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> Message-ID: <1171396340.21471.2.camel@stevo-desktop> On Tue, 2007-02-13 at 13:01 -0500, Shaun Rowland wrote: > Roland Dreier wrote: > > > How do I tell? Can I tell from the .so files? > > > > ldd on the .so and the app would probably give you good info. > > > > I'm pretty sure that mpicc must be linking against an libibverbs 1.0 > > from somewhere. > > > > - R. > > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is > built, at least by looking at the .so file result: > > [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a > libibverbs.so > libibverbs.so.1 > libibverbs.so.1.0.0 > > This seems odd. Is it correct? I have updated the MVAPICH2 SRPM and sent > a new patch for the OFED install scripts. This won't be reflected until > the alpha1 release. Still, does the above seem strange? I noticed this > recently. I see symbols for both versions though: > > 0000000000005a50 T ibv_detach_mcast at IBVERBS_1.0 > 00000000000082c0 T ibv_detach_mcast@@IBVERBS_1.1 > 0000000000000000 A IBVERBS_1.0 > 0000000000000000 A IBVERBS_1.1 > > Our code links to these libraries, and by default mpicc > should use what's in /usr/local/ofed/lib[64] in the -L path itself > directly too. Is this an issue in the library? The libmpich.so file > should not be any different when built. We will investigate this. > > I can provide a patch against the latest OFED tar.gz to use the > mvapich2-0.9.8-3.src.rpm once I download the release if that would help, > as we have changed some things since the -2 SRPM release. Again, this > should be reflected in the alpha1 release. I was hoping to sniff-test mvapich2 over OFA/iWARP. So if you can get something that works I'll try it out. Steve. From tziporet at mellanox.co.il Tue Feb 13 12:03:59 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 13 Feb 2007 22:03:59 +0200 Subject: [openib-general] new OFED 1.2 package In-Reply-To: <45D1FE0C.1050203@veritas.com> References: <1171387167.3978.90.camel@vladsk-laptop> <45D1FE0C.1050203@veritas.com> Message-ID: <45D219AF.3090008@mellanox.co.il> vatsa at veritas.com wrote: > Hi, > > Is there a way to get OFED 1.2 binary rpms for RHEL4 Update 4 on x86_64 ? > You should build them on your machines - see the OFED installation guide (you can also access it from git: http://staging.openfabrics.org/git/?p=~tziporet/docs.git;a=blob;f=OFED_Installation_Guide.txt;h=3b832cc14ac53c07e1935f5ca3bee750755c437a;hb=f43a950c36d081c939fbb407c64d1fd6d97c1cd7 Tziporet From swise at opengridcomputing.com Tue Feb 13 12:10:01 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Feb 2007 14:10:01 -0600 Subject: [openib-general] [PATCH] ofed_1_2/iw_cxgb3 - Free any pending mmaps in iwch_dealloc_ucontext(). Message-ID: <1171397401.21471.5.camel@stevo-desktop> Vlad/Michael, This should be pushed into ofed_1_2. It can wait until after alpha1, however, if you want. Steve. ----- Free any pending mmaps in iwch_dealloc_ucontext(). Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index dbb3f71..4a46771 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -98,7 +98,11 @@ static int iwch_dealloc_ucontext(struct { struct iwch_dev *rhp = to_iwch_dev(context->device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); + struct iwch_mm_entry *mm, *tmp; + PDBG("%s context %p\n", __FUNCTION__, context); + list_for_each_entry_safe(mm, tmp, &ucontext->mmaps, entry) + kfree(mm); cxio_release_ucontext(&rhp->rdev, &ucontext->uctx); kfree(ucontext); return 0; From tziporet at mellanox.co.il Tue Feb 13 12:11:49 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 13 Feb 2007 22:11:49 +0200 Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package In-Reply-To: <1171387167.3978.90.camel@vladsk-laptop> References: <1171387167.3978.90.camel@vladsk-laptop> Message-ID: <45D21B85.9070007@mellanox.co.il> Vladimir Sokolovsky wrote: > New OFED package was uploaded to the OFA server: > http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.tgz > > > > Known issues: > mvapich2 RPM build fails (will be fixed in alpha1). > sdpnetstat compilation fails in RHEL5 > > > Hi All, This is the pre-alpha package for your testing. Please send us feedback today so we can build the first alpha OFED tomorrow. If any show-stopper issue for the alpha is found please let us know. Note that components compilation is blocked on kernels that they do not support. Thanks, Tziporet From swise at opengridcomputing.com Tue Feb 13 12:12:02 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Feb 2007 14:12:02 -0600 Subject: [openib-general] OFED 1.2 dapl and dat.conf Message-ID: <1171397522.21471.7.camel@stevo-desktop> Currently, the dapl rpms don't install dat.conf. I think they probably should, eh? Maybe in /etc/dat.conf Steve. From krause at cup.hp.com Tue Feb 13 12:46:41 2007 From: krause at cup.hp.com (Michael Krause) Date: Tue, 13 Feb 2007 12:46:41 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45D0FCC8.4090304@ichips.intel.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> <45D0FCC8.4090304@ichips.intel.com> Message-ID: <6.2.0.14.2.20070213124447.08e6e600@esmail.cup.hp.com> At 03:48 PM 2/12/2007, Sean Hefty wrote: >>An endnode look up should be to find the address vector to the >>remote. A look up may return multiple vectors. The SLID would >>correspond to each local subnet router port that acts as a first-hop >>destination to the remote subnet. I don't see why the router protocol >>would not simply enable all paths on the local subnet to a given remote >>subnet be acquired. All of the work is kept local to the SA / SM in the >>source subnet when determining a remote path to take. >>Why is there any need to define more than just this? > >For an RC QP, we need at least two sets of LIDs. In the simplest case, we >need the SLID/router DLID for the local subnet, and the router SLID/DLID >for the remote subnet. The problem is in obtaining the SLID/DLID for the >remote subnet. Not quite. The router protocol should determine the "next hop" LID to be used to either reach the destination endnode if in its local subnet or for the next router on the path to the remote. CM only needs to be concerned with what is in a local subnet for finding the router or the endnode. It does not need to comprehend the remote subnet(s) LID. That is the router protocol to determine. CM also must understand the GIDs involved which the router will process to figure out its LID mapping to the next hop. Mike From krause at cup.hp.com Tue Feb 13 12:52:35 2007 From: krause at cup.hp.com (Michael Krause) Date: Tue, 13 Feb 2007 12:52:35 -0800 Subject: [openib-general] Immediate data question In-Reply-To: <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.co m> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com> Message-ID: <6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com> At 05:37 AM 2/13/2007, Devesh Sharma wrote: >On 2/12/07, Devesh Sharma wrote: >>On 2/10/07, Tang, Changqing wrote: >> > > > >> > > >Not for the receiver, but the sender will be severely slowed down by >> > > >having to wait for the RNR timeouts. >> > > >> > > RNR = Receiver Not Ready so by definition, the data flow >> > > isn't going to >> > > progress until the receiver is ready to receive data. If a >> > > receive QP >> > > enters RNR for a RC, then it is likely not progressing as >> > > desired. RNR >> > > was initially put in place to enable a receiver to create >> > > back pressure to the sender without causing a fatal error >> > > condition. It should rarely be entered and therefore should >> > > have negligible impact on overall performance however when a >> > > RNR occurs, no forward progress will occur so performance is >> > > essentially zero. >> > >> > Mike: >> > I still do not quite understand this issue. I have two >> > situations that have RNR triggered. >> > >> > 1. process A and process B is connected with QP. A first post a send to >> > B, B does not post receive. Then A and B are doing a long time >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE >> > message. Finally B will post a receive. Does the first pending send in A >> > block all the later RDMA_WRITE ? >>According to IBTA spec HCA will process WR entries in strict order in >>which they are posted so the send will block all WR posted after this >>send, Until-unless HCA has multiple processing elements, I think even >>then processing order will be maintained by HCA >> If not, since RNR is triggered >> > periodically till B post receive, does it affect the RDMA_WRITE >> > performance between A and B ? >> > >> > 2. extend above to three processes, A connect to B, B connect to C, so B >> > has two QPs, but one CQ.A posts a send to B, B does not post receive, >post ordering accross QP is not guaranteed hence presence of same CQ >or different CQ will not affect any thing. >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C, >_may_ affect the performance, since load is on same HCA. In case of >Send/Recv again _may_ affect the performance, with the same reason. Seems orthogonal. Any time h/w is shared, multiple flows will have an impact on one another. That is why we have the different arbitration mechanisms to enable one to control that impact. >> > must sends RNR periodically to A, right?. So does the pending message >> > from A affects B's overall performance between B and C ? >But RNR NAK is not for very long time.....possibly this performance >hit you will not be able to observe even. The moment rnr_counter >expires connection will be broken! Keep in mind the timeout can be infinite. RNR NAK are not expected to be frequent so their performance impact was considered reasonable. Mike >> > >> > Thank you. >> > >> > --CQ >> > >> > >> > > >> > > Mike >> > > >> > > >> > > >> > >> > _______________________________________________ >> > openib-general mailing list >> > openib-general at openib.org >> > http://openib.org/mailman/listinfo/openib-general >> > >> > To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > >> > From krause at cup.hp.com Tue Feb 13 12:49:57 2007 From: krause at cup.hp.com (Michael Krause) Date: Tue, 13 Feb 2007 12:49:57 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070213001045.GY11411@obsidianresearch.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> <20070213001045.GY11411@obsidianresearch.com> Message-ID: <6.2.0.14.2.20070213124803.08ee1208@esmail.cup.hp.com> At 04:10 PM 2/12/2007, Jason Gunthorpe wrote: >On Mon, Feb 12, 2007 at 03:31:15PM -0800, Michael Krause wrote: > > > TClass is intended to communicate the end-to-end QoS desired. TClass is > > then mapped to a SL that is local to each subnet. A flow label is > > intended to much the same as in the IP world and is left, in essence, to > > routers to manage. An endnode look up should be to find the address > > vector to the remote. A look up may return multiple vectors. The SLID > > would correspond to each local subnet router port that acts as a first-hop > > destination to the remote subnet. I don't see why the router protocol > > would not simply enable all paths on the local subnet to a given remote > > subnet be acquired. All of the work is kept local to the SA / SM in the > > source subnet when determining a remote path to take. Why is there any > > need to define more than just this? Define a router protocol to > > communicate the each subnet's prefix, TClass, etc. and apply KISS. A > > management entity that wanted to manage out each subnet provides router > > management in terms of route selection, etc. can be constructed by using > > the existing protocols / tools combined with a new router protocol which > > only does DGID to next hop SLID mapping. > >All of this complexity is due to the RC QP requirement that the SLID >of an incoming LRH match the DLID programmed into the QP. > >Translated into a network with routers this means that for a RC flow >to successfully work both the *forward* and *reverse* direction must >traverse the same router *LID* not just *port* on both subnets. That is a given since the LID = path and same path must be used to insure strong ordering is maintained. >Please see the little ascii diagram I drew in a prior email to >understand my concern. > >There is no such restriction in a real IP network. It would be akin to >having a host match the source MAC address in the ethernet frame to >double check that it came from the router port it is sending outgoing >packets to. Which means simple one-sided solutions from IP land don't >work here. > >Things work exactly the way you outline today for UD. They don't work >at all for the general case of RC. Get rid of the QP requirement and >things work the way you outline for RC too. Keep it in and you must >use the FlowLabel to force the flows onto the right router LID. The same path must always be used to maintain strong ordering. This is immutable part of IB technology. >That is why I said previously that the QP matching rules are a >mistake. The best way to solve this is to change C9-54 to only be in >effect if the GRH is not present. I disagree. We were very explicit in how and why we constructed those rules. >CM also introduces the much smaller problem of getting the LIDs to the >passive side - but that cannot be solved without a broad solution to >the RC QP SLID matching problem. Mike From tziporet at mellanox.co.il Tue Feb 13 13:05:22 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 13 Feb 2007 23:05:22 +0200 Subject: [openib-general] OFED 1.2 Feb-12 meeting summary Message-ID: <45D22812.3030904@mellanox.co.il> Hi, This is the OFED 1.2 Feb-12 meeting summary on alpha status: * Abbreviated minutes / summary:* * The alpha release should be done on Wed Feb-14. (A package for testing was already published today) * Not all components must support all OSes for the alpha. * There going to be 3 weeks for testing the alpha release. * Next milestone is the Beta release - on March-7 *Note:* please post all OFED related mails to EWG mailing list too and not just the general list. * Detailed Minutes:* We reviewed all OFED components and this is the status toward the alpha release: * Kernel* ib_verbs (core) - ready - need to add CMA patch for iWRAP to support uDAPL - was done today ib_mthca - ready ib_ipoib - ready ib_ipath - currently works on 2.6.20 only. Backport patches will be available for the beta ib_iser - ready ib_sdp - ready ib_srp - ready ib_ehca - PPC only - ready cxgb3 - ready - backport patch for SLES9 was applied today vnic - ready rds - currently works on kernel 2.6.20 and 2.6.19. RHEL and SLES support will be added for the beta. ib-bonding - - ready (will work only on RHEL4UP3 & SLES10 ) madeye - we forgot to take this module from OFED 1.1. Will be done for the beta. *User libraries* libibverbs - ready; man pages should be check-in by Roland for the beta. libibcm - ready libmthca - ready libipathverbs - missing the new mode of libibverbs. Will be done for the beta libcxgb3 - ready libsdp - ready libehca - ready libibcommon - ready libibmad - ready libibumad - ready libopensm - ready libosmcomp - ready libosmvendor - ready librdmacm - ready dapl - ready *User utilities* performance tests - ready (for the beta need to check all tests pass compilation) mstflint - ready ibutils - ready opensm - ready qlvnictools - ready openib-diags - ready srptools - ready ipoibtools - ready tvflash - ready (Roland should open a branch) sdpnetstat - ready (does not pass compilation on RHEL5) open-iscsi - ready *MPI:* mvapich - ready mvapich2 - Build issue - must be resolved for the alpha openmpi - ready mpitests - ready *OFED specific:* ofed_docs - taken from 1.1 - not yet updated for 1.2 ofed_scripts - ready Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Tue Feb 13 13:07:19 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 13 Feb 2007 23:07:19 +0200 Subject: [openib-general] [PATCH] ofed_1-2 IWCM - Set iniator depth and responder resources to device max values. In-Reply-To: <1171297207.16167.24.camel@stevo-desktop> References: <1171223899.4027.1.camel@linux-q667.site> <1171297207.16167.24.camel@stevo-desktop> Message-ID: <45D22887.7030003@mellanox.co.il> Steve Wise wrote: > BTW: We need this for the alpha1 build or DAPL applications won't work > over iWARP devices. > > Was applied today Tziporet From mshefty at ichips.intel.com Tue Feb 13 13:14:09 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 13 Feb 2007 13:14:09 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <6.2.0.14.2.20070213124447.08e6e600@esmail.cup.hp.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> <45D0FCC8.4090304@ichips.intel.com> <6.2.0.14.2.20070213124447.08e6e600@esmail.cup.hp.com> Message-ID: <45D22A21.9040708@ichips.intel.com> > It does not need to comprehend the remote subnet(s) LID. > That is the router protocol to determine. CM also must understand the > GIDs involved which the router will process to figure out its LID > mapping to the next hop. The CM REQ carries the remote router LID (primary local port lid - 12.7.11) and remote endpoint LID (primary remote port lid - 12.7.21). - Sean From robert.j.woodruff at intel.com Tue Feb 13 13:36:04 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 13 Feb 2007 13:36:04 -0800 Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package In-Reply-To: <45D21B85.9070007@mellanox.co.il> Message-ID: I tried to build this on RedHat EL4-U3 and got the following build error. make: *** [_module_/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bondi ng] Error 2 make: Leaving directory `/usr/src/kernels/2.6.9-34.EL.root-smp-x86_64' + echo ' Building IB bonding driver failed' Building IB bonding driver failed + exit 1 -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Tuesday, February 13, 2007 12:12 PM To: Vladimir Sokolovsky Cc: EWG; OPENIB Subject: Re: [openfabrics-ewg] new OFED 1.2 package Vladimir Sokolovsky wrote: > New OFED package was uploaded to the OFA server: > http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646. tgz > > > > Known issues: > mvapich2 RPM build fails (will be fixed in alpha1). > sdpnetstat compilation fails in RHEL5 > > > Hi All, This is the pre-alpha package for your testing. Please send us feedback today so we can build the first alpha OFED tomorrow. If any show-stopper issue for the alpha is found please let us know. Note that components compilation is blocked on kernels that they do not support. Thanks, Tziporet _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From jeremy.brown at qlogic.com Tue Feb 13 14:47:48 2007 From: jeremy.brown at qlogic.com (Jeremy Brown) Date: Tue, 13 Feb 2007 14:47:48 -0800 Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package In-Reply-To: <45D21B85.9070007@mellanox.co.il> References: <1171387167.3978.90.camel@vladsk-laptop> <45D21B85.9070007@mellanox.co.il> Message-ID: <1171406869.17328.16.camel@citrine.pathscale.com> On Tue, 2007-02-13 at 22:11 +0200, Tziporet Koren wrote: > This is the pre-alpha package for your testing. > Please send us feedback today so we can build the first alpha OFED tomorrow. > If any show-stopper issue for the alpha is found please let us know. > > Note that components compilation is blocked on kernels that they do not > support. While I understand that Fedora is not officially supported in OFED 1.2, I know that many participants are making an effort to make sure Fedora (at least FC6) will work. I did attempt a build on a Fedora Core 4 system, and encountered an issue related to the sysfs* name changes. ERROR: The libsysfs-devel package is required to build libibverbs_devel RPM I know that the package is named "sysfsutils-devel" in Fedora Core 3-5, and "libsysfs-devel" in Fedora Core 6, similar to the RH 4 vs. RH 5 split. Would it be possible to change the definition and use of $DISTRIBUTION in build_env.sh so the we had "fedora" for FC3-5, and "fedora6" for FC6, similar to the "redhat" and "redhat5" split? I'm not married to those names, of course. Naturally, this shouldn't gate the alpha. :) Thanks for getting the build ready! Jeremy Brown From robert.j.woodruff at intel.com Tue Feb 13 14:49:21 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 13 Feb 2007 14:49:21 -0800 Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package In-Reply-To: Message-ID: I am also still seeing the issue with the rdma_cm abi_version on RedHat EL4-U3, bug number, 347. The bug report contains the patch that should fix this. I_MPI: [0] set_up_devices(): I_MPI_DAPL_IP_ADDR = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_PORT = NULL librdmacm: couldn't read ABI version. librdmacm: assuming: 4 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 I_MPI: [0] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so I_MPI: [0] my_dlopen(): trying to dlopen: libdat.so I_MPI: [1] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Woodruff, Robert J Sent: Tuesday, February 13, 2007 1:36 PM To: Tziporet Koren; Vladimir Sokolovsky Cc: EWG; OPENIB Subject: Re: [openib-general] [openfabrics-ewg] new OFED 1.2 package I tried to build this on RedHat EL4-U3 and got the following build error. make: *** [_module_/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bondi ng] Error 2 make: Leaving directory `/usr/src/kernels/2.6.9-34.EL.root-smp-x86_64' + echo ' Building IB bonding driver failed' Building IB bonding driver failed + exit 1 -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Tuesday, February 13, 2007 12:12 PM To: Vladimir Sokolovsky Cc: EWG; OPENIB Subject: Re: [openfabrics-ewg] new OFED 1.2 package Vladimir Sokolovsky wrote: > New OFED package was uploaded to the OFA server: > http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646. tgz > > > > Known issues: > mvapich2 RPM build fails (will be fixed in alpha1). > sdpnetstat compilation fails in RHEL5 > > > Hi All, This is the pre-alpha package for your testing. Please send us feedback today so we can build the first alpha OFED tomorrow. If any show-stopper issue for the alpha is found please let us know. Note that components compilation is blocked on kernels that they do not support. Thanks, Tziporet _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From krause at cup.hp.com Tue Feb 13 14:48:06 2007 From: krause at cup.hp.com (Michael Krause) Date: Tue, 13 Feb 2007 14:48:06 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45D22A21.9040708@ichips.intel.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> <45D0FCC8.4090304@ichips.intel.com> <6.2.0.14.2.20070213124447.08e6e600@esmail.cup.hp.com> <45D22A21.9040708@ichips.intel.com> Message-ID: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com> At 01:14 PM 2/13/2007, Sean Hefty wrote: >>It does not need to comprehend the remote subnet(s) LID. >>That is the router protocol to determine. CM also must understand the >>GIDs involved which the router will process to figure out its LID mapping >>to the next hop. > >The CM REQ carries the remote router LID (primary local port lid - >12.7.11) and remote endpoint LID (primary remote port lid - 12.7.21). Let me clarify what the specification is saying which is what I'm saying. A LID is subnet local on that we can all agree. The CM Req contains either the LID of a local subnet CA or the LID a local router which will move the packet to the next hop to the destination. 12.7.11 is basically saying that the remote LID is the router's LID of the local subnet's router Port. 12.7.21 also refers to the remote LID but in each subnet that is either the router Port's LID or the destination CA. From an operational flow perspective, CM would: Query to see if the destination CA is on the local subnet If yes, then obtain the associated records to find the local LID If no, then obtain the set of records that contain the local addressing to a router Port that will progress connection establishment to the next hop on the way to the destination. While there isn't a router specification any longer, the basic operation is very much like that of an IP subnet. The router protocol establishes a set of routes for given subnet prefix and then communicates that to each SM/SA so that queries will resolve the optimal router Port. Chapter 8 provides clear guidance in this regard. Chapter 12 is basically stating what to plug into various fields with all LIDs being only local to the subnet where they are managed. The primary global knowledge that one must have across subnets are to establish a connection or communication flow. - SGID - DGID - P_Key - Q_Key There really isn't much more than this to comprehend. The TClass and Flow Labels were expected to be provided via the router protocol so the management requirements are really query look up. Mike From krause at cup.hp.com Tue Feb 13 15:10:27 2007 From: krause at cup.hp.com (Michael Krause) Date: Tue, 13 Feb 2007 15:10:27 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <20070213220255.GA10579@obsidianresearch.com> References: <20070210004820.GS11411@obsidianresearch.com> <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com> <20070211230935.GT11411@obsidianresearch.com> <45D0A27A.2010302@ichips.intel.com> <20070212205634.GW11411@obsidianresearch.com> <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com> <20070213001045.GY11411@obsidianresearch.com> <6.2.0.14.2.20070213124803.08ee1208@esmail.cup.hp.com> <20070213220255.GA10579@obsidianresearch.com> Message-ID: <6.2.0.14.2.20070213144835.093938f8@esmail.cup.hp.com> At 02:02 PM 2/13/2007, Jason Gunthorpe wrote: >On Tue, Feb 13, 2007 at 12:49:57PM -0800, Michael Krause wrote: > > > >Translated into a network with routers this means that for a RC flow > > >to successfully work both the *forward* and *reverse* direction must > > >traverse the same router *LID* not just *port* on both subnets. > > > > That is a given since the LID = path and same path must be used to insure > > strong ordering is maintained. > >I think you are missing what I'm saying. IB within a subnet has the >path selected by the DLID only. The actual path selection is a policy decision outside the scope of the specification - it appears this is your main concern in that the specification does not state "take these N parameters and apply the following algorithm to identify a path". The address vector can be comprised of many fields including a LID range. The actual DLID selected is done above as there can be a variety of policies or constraints imposed for a given data flow. I agree that packet switching within is via a DLID. >So the construction process for a QP is to choose two enport LIDs, reverse >them on one side and then query the SA for the forward and reverse SL. >That gives you a pair of workable QPs. SL, LID, etc. are all uploaded into the management database for the SM / SA to access and there can be much more robust information loaded as well that goes well beyond what the IBTA specified in order to provide additional interpretations / information to guide path selection. A query can return multiple records if multi-path has been configured. Policy above is used to construct the CM messages which communicate the preferred path. The CM messages for establishment across subnets should be sufficient in their existing content to work independent of how the actual routing is accomplished. >This same procedure doesn't work for routers. >Consider a case where a router port has LID 1 and an end port has >LIDs 3,4. >The end port establishes two RC QPs: > #1: SLID=3, DLID=1 > #2: SLID=4, DLID=1 >Both have the same DGID - how is the router expected to know that QP >#1 requires one set of LIDs and QP #2 requires a different set? For all intents and purposes, within a local subnet, a router Port is treated the same as CA. If there are multiple paths between a router Port and a given CA Port, i.e. multiple LIDs are configured, then the router is supposed to query the SM / SA database and obtain the appropriate records and make a decision that remains valid for the lifetime of the data flow. The purpose of the TClass is to enable a local mapping to SL which can also be used as input into LID selection. The flow label is left open in its value and was expected to be used much like it is in IP. People considered encoding it or at a minimum, using it as an input parameter to identify the associated LID for the flow but that was not agreed to since the router vendors at the time wanted it left largely opaque. >Section 19.2.4.1 seems to make it explicit to me that this is a valid >situation. Yes, 19.2.4.1 supports multi-path within a given subnet. >To have this work the router must use the flow label to identify the >correct DLID. SA/CM must be enhanced in some way to let the two sides >exchange flow labels. That is a policy decision or something for a TBD router protocol specification. It is not required to use the Flow Label. >This problem is worse if you have multiple independent redundent >routers on your subnet, or LMC != 0. Then you now have the problem of >SLID matching as well as DLID matching. It is no worse due to the existence of multi-path. There are many variables involved in creating a viable router protocol specification which is in part, why the IBTA chose to not complete that work. >Strong ordering is maintained in all cases because the routers always >make consistent choices for the LRH.DLID on a session by session >basis. Agreed, The router is responsible for insuring a consistent path is used for a given flow. That does not preclude multi-path nor does it make multi-path more complicated as a result. > > >That is why I said previously that the QP matching rules are a > > >mistake. The best way to solve this is to change C9-54 to only be in > > >effect if the GRH is not present. > > > > I disagree. We were very explicit in how and why we constructed those > > rules. > >Do you know of a solution then? > >If C9-54 is a very deliberate design then it must be that the CM >specification in Chapter 12 is not designed to handle the >ramifications of C9-54. > >I just can't see how to fit both CM and C9-54 together into a workable >solution. You are arguing about a router protocol problem that does not exist or perhaps I just don't get it. We did progress the router specification or at least the operating models behind it sufficiently to validate that both Chapter 9 and Chapter 12 worked as specified (as well as chapters 8 and 19). Yes, there are implementation issues within a router for it to perform the appropriate queries on the SM / SA to identify a preferred flow's path within a given subnet. This makes this a local subnet policy issue and is orthogonal to the compliance statements in the volume 1 specification. If you believe the specification is faulty, then it would be best to take this to the IBTA and have an official review done by the workgroup teams involved with these chapters. People are free to implement what they choose which for routers is completely open since there isn't a specification but for the compliance statements, assuming interoperability is desirable in this regard, the validation tree in the specification should be used and any software implemented on top of such hardware should take that into account. For the most part, what you've described is largely an argument about the policy to select a path and that is a router domain problem not a packet validation problem. Within the router domain, that is pure policy just like in the IP world. As long as it results a given flow consistently using the same data path, all is good. At the end of the day, the router implementations will decide their policy for determining the optimal path and I doubt there will be a one-size-fits-all agreement on the formula that is used to construct that decision (albeit, if the SM/SA only returns one path for a given flow, then the decision is rather easy). Mike From sean.hefty at intel.com Tue Feb 13 13:17:57 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Feb 2007 13:17:57 -0800 Subject: [openib-general] IB routing discussion summary Message-ID: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> Here's a first take at summarizing the IB routing discussion. The following spec references are noted: 9.6.1.5 C9-54. The SLID shall be validated (for connected QPs). 12.7.11. CM REQ Local Port LID - is LID of remote router. 13.5.4: Defines reversible paths. The main discussion point centered on trying to meet 9.6.1.5 C9-54. This requires that the forward and reverse data flows between two QPs traverse the same router LID on both subnets. The idea was made to try to eliminate this compliance statement for packets carrying a GRH, but this is viewed as going against the spirit of IBA. Ideas were presented around trying to construct an 'inter-subnet path record' that contained the following: - Side A GRH.SGID = active side's Port GID - Side A GRH.DGID = passive side's Port GID - Side A LRH.SLID = any active side's port LID - Side A LRH.DLID = A subnet router - Side A LRH.SL = SL to A subnet router - Side B GRH.SGID = Side A GRH.DGID - Side B GRH.DGID = Side A GRH.SGID - Side B LRH.SLID = any passive side's port LID - Side B LRH.DLID = B subnet router - Side B LRH.SL = SL to B subnet router It is still unclear how such a record can be constructed. But communication with remote SAs might be achieved by using a well-known GID suffix. It's also unclear whether the fields in a path record are relative to the SA's subnet or the SGID. It's anticipated that SAs will need to interact with routers, but in an unspecified manner. From sean.hefty at intel.com Tue Feb 13 15:55:19 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Feb 2007 15:55:19 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com> Message-ID: <000001c74fca$6c765170$8698070a@amr.corp.intel.com> >A LID is subnet local on that we can all agree. The CM Req contains >either the LID of a local subnet CA or the LID a local router which will >move the packet to the next hop to the destination. 12.7.11 is basically >saying that the remote LID is the router's LID of the local subnet's router >Port. 12.7.21 also refers to the remote LID but in each subnet that is >either the router Port's LID or the destination CA. This isn't my interpretation. 12.7.11 Local Port LID: When local and remote ports are on different subnets, this field must be the LID of the router that the *passive* side will target for the return path. The CM REQ carries the LIDs for the remote (passive side) subnet. This is what the passive side needs to configure the QP, not the active side LID information. (See address vector information for 11.2.4.2 - page 574.) So, the CM REQ is _sent_ to either the LID of the local subnet CA or the LID of a local router port, but _contains_ the LIDs from the remote subnet. - Sean From swise at opengridcomputing.com Tue Feb 13 16:01:26 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Feb 2007 18:01:26 -0600 Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package In-Reply-To: <45D21B85.9070007@mellanox.co.il> References: <1171387167.3978.90.camel@vladsk-laptop> <45D21B85.9070007@mellanox.co.il> Message-ID: <1171411286.28495.12.camel@stevo-desktop> I installed this on RHEL5 beta 2 with that distro's kernel and RHEL4U4 with a kernel.org 2.6.20 kernel. I successfully configured cxgb3 and mthca and could icmp-ping over both interfaces. I successfully ran rping over both IB and IW. I successfully ran dapltest/regress.sh over both IB and IW. I could _not_ get ib_rdma_bw to run in either cma mode or non-cma mode. The server side exits immediately without an error. ??? I'm blocked on mvapich2/iwarp testing due to the known issues with that package. I tried rping over iwarp on the RHEL4U4 distro's kernel and had problems. I'm thinking the SLES9SP3 fix that was pulled in might have problems on other distros (it changed the behavior of xxx_ip_dev_find() on all backports). I don't think this is stop-ship for alpha1, however. That's it for today. Steve. On Tue, 2007-02-13 at 22:11 +0200, Tziporet Koren wrote: > Vladimir Sokolovsky wrote: > > New OFED package was uploaded to the OFA server: > > http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.tgz > > > > > > > > Known issues: > > mvapich2 RPM build fails (will be fixed in alpha1). > > sdpnetstat compilation fails in RHEL5 > > > > > > > Hi All, > > This is the pre-alpha package for your testing. > Please send us feedback today so we can build the first alpha OFED tomorrow. > If any show-stopper issue for the alpha is found please let us know. > > Note that components compilation is blocked on kernels that they do not > support. > > Thanks, > Tziporet > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Tue Feb 13 22:04:01 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 14 Feb 2007 08:04:01 +0200 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> Message-ID: <45D2A651.6020604@voltaire.com> Roland Dreier wrote: > > How do I tell? Can I tell from the .so files? > > ldd on the .so and the app would probably give you good info. > > I'm pretty sure that mpicc must be linking against an libibverbs 1.0 > from somewhere. To be really sure which dynamic libraries where loaded, do $ info sharedlibrary within the gdb console Or. From dotanb at dev.mellanox.co.il Wed Feb 14 01:12:00 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 14 Feb 2007 11:12:00 +0200 Subject: [openib-general] [GIT PULL] please pull infiniband.git In-Reply-To: References: <45D17418.3000508@dev.mellanox.co.il> Message-ID: <45D2D260.1000306@dev.mellanox.co.il> Roland Dreier wrote: > > What about the patch that i sent on "Allow the following QP state > > transition : reset --> reset"? > > OK, I'll merge that in the next patch. It's the kind of patch I'm not > happy about merging, since it bloats the code to handle a corner case > no one is likely to hit in practice, but it is technically correct so > I guess we're forced to merge it. > thanks. Dotan From mst at mellanox.co.il Wed Feb 14 02:01:09 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 12:01:09 +0200 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: References: Message-ID: <20070214100109.GA4434@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree > > OK, I already merged this but now I'm thinking it's somewhat buggy: Hopefully not. > > + if (coherent) > > + ret = mthca_alloc_icm_coherent(&dev->pdev->dev, > > + &chunk->mem[chunk->npages], > > + cur_order, gfp_mask); > > + else > > + ret = mthca_alloc_icm_pages(&chunk->mem[chunk->npages], > > + cur_order, gfp_mask); > > > > - if (++chunk->npages == MTHCA_ICM_CHUNK_LEN) { > > + if (!ret) { > > + ++chunk->npages; > > + > > + if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) { > > chunk->nsg = pci_map_sg(dev->pdev, chunk->mem, > > I don't see anything that ever bumps chunk->nsg if we're allocating a > coherent region and we end up needing more than one allocation to do > it. Yes but this is intentional. > Maybe something like this on top of the patch? > > diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c > index 0b9d053..48f7c65 100644 > --- a/drivers/infiniband/hw/mthca/mthca_memfree.c > +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c > @@ -175,7 +175,9 @@ struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages, > if (!ret) { > ++chunk->npages; > > - if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) { > + if (coherent) > + ++chunk->nsg; > + else if (chunk->npages == MTHCA_ICM_CHUNK_LEN) { > chunk->nsg = pci_map_sg(dev->pdev, chunk->mem, > chunk->npages, > PCI_DMA_BIDIRECTIONAL); No, I think the code is fine and this patch will break things: chunk->nsg is needed only for non-coherent memory to call pci_unmap_sg: if (chunk->nsg > 0) pci_unmap_sg(dev->pdev, chunk->mem, chunk->npages, PCI_DMA_BIDIRECTIONAL); and we do *not* want to call pci_unmap_sg on consistent memory. -- MST From mst at mellanox.co.il Wed Feb 14 02:01:51 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 12:01:51 +0200 Subject: [openib-general] [GIT PULL] please pull infiniband.git In-Reply-To: References: <45D17418.3000508@dev.mellanox.co.il> Message-ID: <20070214100151.GB4434@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [GIT PULL] please pull infiniband.git > > > What about the patch that i sent on "Allow the following QP state > > transition : reset --> reset"? > > OK, I'll merge that in the next patch. It's the kind of patch I'm not > happy about merging, since it bloats the code to handle a corner case > no one is likely to hit in practice, but it is technically correct so > I guess we're forced to merge it. I actually think this is a spec bug: IB spec requires the following transitions in modify qp: Any State to Error Any state to Reset If we are in reset state, both of these do not make any sense. To Error from reset is especially hard to implement, and I think this will actually create bugs in application (if it did "to error" I think its expecting completions with error). I'll try speaking with our representatives at the IBTA to remove this requirement. -- MST From mst at mellanox.co.il Wed Feb 14 02:16:43 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 12:16:43 +0200 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: References: <20070210211508.GD14903@mellanox.co.il> Message-ID: <20070214101618.GC4434@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree > > Queued for 2.6.21, although I think a further cleanup would be: > > > mdev->mr_table.mpt_table = mthca_alloc_icm_table(mdev, init_hca->mpt_base, > > dev_lim->mpt_entry_sz, > > mdev->limits.num_mpts, > > - mdev->limits.reserved_mrws, 1); > > + mdev->limits.reserved_mrws, > > + 1, 1); > > instead of having use_lowmem and use_coherent be separate parameters, > we should probably convert it to a type parameter, and have > MTHCA_ICM_TABLE_HIGHMEM, _LOWMEM and _COHERENT. That would make these > calls a lot easier to read and get correct. Good idea. -- MST From mst at mellanox.co.il Wed Feb 14 02:22:52 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 12:22:52 +0200 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: References: <20070210211508.GD14903@mellanox.co.il> Message-ID: <20070214102252.GD4434@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree > > > + sg_set_buf(mem, buf, PAGE_SIZE << order); > > + BUG_ON(mem->offset); > > + sg_dma_len(mem) = PAGE_SIZE << order; > > What am I missing? Any reason to set sg_dma_len() again after sg_set_buf()? How do you mean, again? Does sg_set_buf set dma_length? In 2.6.20, I see this in include/linux/scatterlist.h: static inline void sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { sg->page = virt_to_page(buf); sg->offset = offset_in_page(buf); sg->length = buflen; } -- MST From vlad at lists.openfabrics.org Wed Feb 14 02:24:45 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Wed, 14 Feb 2007 02:24:45 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070214-0200 daily build status Message-ID: <20070214102445.A9067E603C3@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Failed: From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 03:49:22 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 03:49:22 -0800 (PST) Subject: [openib-general] [Bug 347] rdma cm backport to EL4 - U3 broken In-Reply-To: Message-ID: <20070214114922.6E1CCE603C3@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=347 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |mst at mellanox.co.il -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 03:56:48 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 03:56:48 -0800 (PST) Subject: [openib-general] [Bug 322] 2.6.17 backport: reading the rdma-cm abi file causes fault. In-Reply-To: Message-ID: <20070214115648.BFF8AE602FA@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=322 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |sean.hefty at intel.com -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 04:12:17 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 04:12:17 -0800 (PST) Subject: [openib-general] [Bug 318] Registering up to 1.6 GB in one process causes a machine crash In-Reply-To: Message-ID: <20070214121217.54A88E603C3@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=318 dotanb at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Version|1.1 |1.2 -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 04:17:14 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 04:17:14 -0800 (PST) Subject: [openib-general] [Bug 315] enabling the rdma_ucm and restarting the driver several times causes kernel oops In-Reply-To: Message-ID: <20070214121714.85163E60804@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=315 dotanb at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Version|gen2 |1.2 -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 04:18:47 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 04:18:47 -0800 (PST) Subject: [openib-general] [Bug 318] Registering up to 1.6 GB in one process causes a machine crash In-Reply-To: Message-ID: <20070214121847.6701EE60804@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=318 ------- Comment #1 from mst at mellanox.co.il 2007-02-14 04:18 ------- Subject: Re: Registering up to 1.6 GB in one process causes a machine crash > Driver Version : OFED 1.1 Is this still relevant? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 04:19:57 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 04:19:57 -0800 (PST) Subject: [openib-general] [Bug 314] libibverbs doesn't support static linkage In-Reply-To: Message-ID: <20070214121958.04880E603C3@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=314 dotanb at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from dotanb at mellanox.co.il 2007-02-14 04:19 ------- in this mail: http://openib.org/pipermail/openib-general/2007-January/031009.html it is described that the driver that failed the static linkage is old driver without the bug fixes of the static linkage support. I checked this issue with the new driver, and everything is working now. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 04:21:38 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 04:21:38 -0800 (PST) Subject: [openib-general] [Bug 296] The function ib_init_ah_from_path doesn't fill all of the ib_ah_attr attributes In-Reply-To: Message-ID: <20070214122138.446B5E603C3@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=296 dotanb at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Version|1.1 |1.2 -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 04:22:09 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 04:22:09 -0800 (PST) Subject: [openib-general] [Bug 315] enabling the rdma_ucm and restarting the driver several times causes kernel oops In-Reply-To: Message-ID: <20070214122209.DEA92E603C3@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=315 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |sean.hefty at intel.com ------- Comment #2 from mst at mellanox.co.il 2007-02-14 04:22 ------- Sean is the ucma maintainer in both ofed and kernel.org trees, reassigned to him. This could be a duplicate of bug 322. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 04:26:38 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 04:26:38 -0800 (PST) Subject: [openib-general] [Bug 296] The function ib_init_ah_from_path doesn't fill all of the ib_ah_attr attributes In-Reply-To: Message-ID: <20070214122638.3468CE603C3@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=296 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |sean.hefty at intel.com ------- Comment #1 from mst at mellanox.co.il 2007-02-14 04:26 ------- This is Sean's code, reassign to him. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Wed Feb 14 05:29:25 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 15:29:25 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <200702121736.35468.ossrosch@linux.vnet.ibm.com> References: <200702121736.35468.ossrosch@linux.vnet.ibm.com> Message-ID: <20070214132925.GG16867@mellanox.co.il> > Quoting Stefan Roscher : > Subject: 32-bit build for ppc64 is required > > Hi, > > after building the latest ofed build package we recognized that on PPC64 only > 64-bit libaries were build. > Because we have customers using older userpace apllications which are > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds. > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install > directory.I would suggest to install 32-bit binaries into > /usr/local/ofed/bin32 directory. So no changes on current naming conventions > has to be done.The libaries are installed in the /usr/local/ofed/lib directory. The standard practice is to install 64 bit libraries under prefix/lib64 and 32 bit libraries under prefix/lib. Why would PPC64 be any different? I do not think we need 32 bit binaries at all, and there's no other package I'm aware of that uses "bin32". Comments? -- MST From ossrosch at linux.vnet.ibm.com Wed Feb 14 06:18:55 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Wed, 14 Feb 2007 15:18:55 +0100 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070214132925.GG16867@mellanox.co.il> References: <200702121736.35468.ossrosch@linux.vnet.ibm.com> <20070214132925.GG16867@mellanox.co.il> Message-ID: <200702141518.56138.ossrosch@linux.vnet.ibm.com> On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote: > > Quoting Stefan Roscher : > > Subject: 32-bit build for ppc64 is required > > > > Hi, > > > > after building the latest ofed build package we recognized that on PPC64 only > > 64-bit libaries were build. > > Because we have customers using older userpace apllications which are > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds. > > > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install > > directory.I would suggest to install 32-bit binaries into > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory. > > The standard practice is to install 64 bit libraries under prefix/lib64 > and 32 bit libraries under prefix/lib. Why would PPC64 be any different? I think you missunderstand my post. The directory for 32/64bit libaries shouldbe prefix/lib and prefix/lib64 respectively. But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only. > > I do not think we need 32 bit binaries at all, and there's no other package > I'm aware of that uses "bin32". We have customers that still use 32-bit userspace applications. It would be beneficial for them if they can obtain 32bit libs and execs from ofed1.2 in order to run their applications without recompiling them, because for some 32-bit applications recompiling is not an option. regards Stefan From mst at mellanox.co.il Wed Feb 14 06:29:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 16:29:24 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <200702141518.56138.ossrosch@linux.vnet.ibm.com> References: <200702141518.56138.ossrosch@linux.vnet.ibm.com> Message-ID: <20070214142924.GC20977@mellanox.co.il> > Quoting Stefan Roscher : > Subject: Re: 32-bit build for ppc64 is required > > On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote: > > > Quoting Stefan Roscher : > > > Subject: 32-bit build for ppc64 is required > > > > > > Hi, > > > > > > after building the latest ofed build package we recognized that on PPC64 only > > > 64-bit libaries were build. > > > Because we have customers using older userpace apllications which are > > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds. > > > > > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install > > > directory.I would suggest to install 32-bit binaries into > > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions > > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory. > > > > The standard practice is to install 64 bit libraries under prefix/lib64 > > and 32 bit libraries under prefix/lib. Why would PPC64 be any different? > > I think you missunderstand my post. The directory for 32/64bit libaries > shouldbe prefix/lib and prefix/lib64 respectively. > But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only. Well, this is not by design: AFAIK on x86_64 both types of libraries are installed. > > I do not think we need 32 bit binaries at all, and there's no other package > > I'm aware of that uses "bin32". > > We have customers that still use 32-bit userspace applications. > It would be beneficial for them if they can obtain 32bit libs and execs from > ofed1.2 in order to run their applications without recompiling them, because > for some 32-bit applications recompiling is not an option. 32 bit libraries are needed for users to run 32 applications. But I still do not see how installing 32 bit binaries alongside the 64 bit ones is useful, and I do not think other packages provide this option, so maybe we shouldn't, either. -- MST From tziporet at mellanox.co.il Wed Feb 14 06:17:52 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 14 Feb 2007 16:17:52 +0200 Subject: [openib-general] how to handle OFEd 1.2 bugs in bugzilla Message-ID: <45D31A10.8020102@mellanox.co.il> Hi Scott and all, I wish to consult with you in the way we will treat OFED 1.2 bugs in bugzilla. 1. Do we want to have 1.2-alpha 1.2-beta, 1.2-rcX in version, or just 1.2 as we have now 2. What do we wish to do with bugs that were opened for 1.1 and are still open? 3. What to do with old bugs that where open to gen2 in general? 4. What is our methodology for priority and severity setup? (There are too many blocker bugs still open in OFED 1.1 so they are not actually blockers or they were fixed but not updated) Thanks, Tziporet From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 06:34:35 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 06:34:35 -0800 (PST) Subject: [openib-general] [Bug 289] executing ucmatose on local IPoIB address of IB port 2 in kernel 2.6.16.21-0.8-smp fails In-Reply-To: Message-ID: <20070214143436.0917EE603CE@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=289 dotanb at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from dotanb at mellanox.co.il 2007-02-14 06:34 ------- there was a bug in one of the backports. it was fixed. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 06:35:00 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 06:35:00 -0800 (PST) Subject: [openib-general] [Bug 289] executing ucmatose on local IPoIB address of IB port 2 in kernel 2.6.16.21-0.8-smp fails In-Reply-To: Message-ID: <20070214143500.8A2CAE60802@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=289 dotanb at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at mellanox.co.il Wed Feb 14 06:37:29 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 14 Feb 2007 16:37:29 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <200702141518.56138.ossrosch@linux.vnet.ibm.com> References: <200702121736.35468.ossrosch@linux.vnet.ibm.com> <20070214132925.GG16867@mellanox.co.il> <200702141518.56138.ossrosch@linux.vnet.ibm.com> Message-ID: <1171463849.16240.11.camel@vladsk-laptop> On Wed, 2007-02-14 at 15:18 +0100, Stefan Roscher wrote: > On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote: > > > Quoting Stefan Roscher : > > > Subject: 32-bit build for ppc64 is required > > > > > > Hi, > > > > > > after building the latest ofed build package we recognized that on PPC64 only > > > 64-bit libaries were build. > > > Because we have customers using older userpace apllications which are > > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds. > > > > > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install > > > directory.I would suggest to install 32-bit binaries into > > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions > > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory. > > > > The standard practice is to install 64 bit libraries under prefix/lib64 > > and 32 bit libraries under prefix/lib. Why would PPC64 be any different? > > I think you missunderstand my post. The directory for 32/64bit libaries > shouldbe prefix/lib and prefix/lib64 respectively. > But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only. > > prefix/lib (32bit libraries) should be created on ppc64 as well. Check that you have sysfsutils 32bit RPM installed. I don't have ppc64 here to check. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From bugzilla-daemon at lists.openfabrics.org Wed Feb 14 06:49:45 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Feb 2007 06:49:45 -0800 (PST) Subject: [openib-general] [Bug 318] Registering up to 1.6 GB in one process causes a machine crash In-Reply-To: Message-ID: <20070214144945.1A694E603CE@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=318 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #2 from mst at mellanox.co.il 2007-02-14 06:49 ------- *** This bug has been marked as a duplicate of bug 333 *** -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From swise at opengridcomputing.com Wed Feb 14 06:57:29 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 14 Feb 2007 08:57:29 -0600 Subject: [openib-general] [Bug 325] RDMA_CM and address translation broken on sles9sp3 In-Reply-To: <20070214115444.E46EAE603C3@openfabrics.org> References: <20070214115444.E46EAE603C3@openfabrics.org> Message-ID: <1171465049.15208.13.camel@stevo-desktop> Tziporet, I didn't think we were going to apply this patch until Michael tested it with SDP/IPoIB on various distros. Michael, did you get a chance to test it (I'm guessing not since you were out sick)? The reason I'm concerned is that it changes the behavior of xxx_ip_dev_find() and _all_ backports, and we needed to test it out and make sure it doesn't regress anything. If it causes problems on other backports, the plan was to just fix the sles9sp3 backport and leave the others alone. With the test build vlad published yesterday which has this patch, rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might be due to this patch. I'm investigating this now. Steve. On Wed, 2007-02-14 at 03:54 -0800, bugzilla-daemon at lists.openfabrics.org wrote: > https://bugs.openfabrics.org/show_bug.cgi?id=325 > > > > > > ------- Comment #2 from tziporet at mellanox.co.il 2007-02-14 03:54 ------- > Patch from Steve was applied. > Please check again on alpha1 package. > > From tziporet at mellanox.co.il Wed Feb 14 07:05:06 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 14 Feb 2007 17:05:06 +0200 Subject: [openib-general] [Bug 325] RDMA_CM and address translation broken on sles9sp3 In-Reply-To: <1171465049.15208.13.camel@stevo-desktop> References: <20070214115444.E46EAE603C3@openfabrics.org> <1171465049.15208.13.camel@stevo-desktop> Message-ID: <45D32522.5080100@mellanox.co.il> Steve Wise wrote: > Tziporet, > > I didn't think we were going to apply this patch until Michael tested it > with SDP/IPoIB on various distros. > > Michael, did you get a chance to test it (I'm guessing not since you > were out sick)? > > The reason I'm concerned is that it changes the behavior of > xxx_ip_dev_find() and _all_ backports, and we needed to test it out and > make sure it doesn't regress anything. If it causes problems on other > backports, the plan was to just fix the sles9sp3 backport and leave the > others alone. > > With the test build vlad published yesterday which has this patch, > rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might > be due to this patch. I'm investigating this now. > > > We tested this patch with our regression on IB and its worked fine for both SDP and IPoIB. Then we applied it. Please report ASAP if you think there is an issue. Tziporet From rdreier at cisco.com Wed Feb 14 07:32:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Feb 2007 07:32:14 -0800 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: <20070214102252.GD4434@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 14 Feb 2007 12:22:52 +0200") References: <20070210211508.GD14903@mellanox.co.il> <20070214102252.GD4434@mellanox.co.il> Message-ID: > How do you mean, again? Does sg_set_buf set dma_length? No, you're right, sorry. - R. From rdreier at cisco.com Wed Feb 14 07:34:50 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Feb 2007 07:34:50 -0800 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: <20070214100109.GA4434@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 14 Feb 2007 12:01:09 +0200") References: <20070214100109.GA4434@mellanox.co.il> Message-ID: > > I don't see anything that ever bumps chunk->nsg if we're allocating a > > coherent region and we end up needing more than one allocation to do > > it. > > Yes but this is intentional. > No, I think the code is fine and this patch will break things: > chunk->nsg is needed only for non-coherent memory to call pci_unmap_sg: what about this code in mthca_memfree.h? static inline void mthca_icm_next(struct mthca_icm_iter *iter) { if (++iter->page_idx >= iter->chunk->nsg) { the call to pci_unmap_sg you're worried about is in mthca_free_icm_pages(), which can't be called for coherent memory anyway, so I don't see a problem with that. So I think my patch is correct and needed. - R. From mst at mellanox.co.il Wed Feb 14 07:50:08 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 17:50:08 +0200 Subject: [openib-general] [Bug 325] RDMA_CM and address translation broken on sles9sp3 In-Reply-To: <1171465049.15208.13.camel@stevo-desktop> References: <1171465049.15208.13.camel@stevo-desktop> Message-ID: <20070214155008.GJ16867@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [Bug 325] RDMA_CM and address translation broken on sles9sp3 > > > Tziporet, > > I didn't think we were going to apply this patch until Michael tested it > with SDP/IPoIB on various distros. > > Michael, did you get a chance to test it (I'm guessing not since you > were out sick)? Right, I'm not at the lab. I assume Vlad tested this before applying. > The reason I'm concerned is that it changes the behavior of > xxx_ip_dev_find() and _all_ backports, and we needed to test it out and > make sure it doesn't regress anything. If it causes problems on other > backports, the plan was to just fix the sles9sp3 backport and leave the > others alone. > > With the test build vlad published yesterday which has this patch, > rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might > be due to this patch. I'm investigating this now. In actual fact, xxx_ip_dev_find is not even *needed* on anything except 2.6.14, 2.6.15, 2.6.16 and 2.6.17: these are the kernels which do not export ip_dev_find. -- MST From mst at mellanox.co.il Wed Feb 14 08:04:38 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 18:04:38 +0200 Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree In-Reply-To: References: Message-ID: <20070214160438.GK16867@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree > > > > I don't see anything that ever bumps chunk->nsg if we're allocating a > > > coherent region and we end up needing more than one allocation to do > > > it. > > > > Yes but this is intentional. > > > No, I think the code is fine and this patch will break things: > > chunk->nsg is needed only for non-coherent memory to call pci_unmap_sg: > > what about this code in mthca_memfree.h? > > static inline void mthca_icm_next(struct mthca_icm_iter *iter) > { > if (++iter->page_idx >= iter->chunk->nsg) { Correct. Good catch. > the call to pci_unmap_sg you're worried about is in > mthca_free_icm_pages(), which can't be called for coherent memory > anyway, so I don't see a problem with that. > > So I think my patch is correct and needed. Yes, I agree. I'll also put it in OFED. Thanks! -- MST From sashak at voltaire.com Wed Feb 14 08:24:07 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 14 Feb 2007 18:24:07 +0200 Subject: [openib-general] [PATCH] drivers/infiniband: madeye integration Message-ID: <20070214162407.GP22807@sashak.voltaire.com> This integrates madeye debug module into the tree. Signed-off-by: Sasha Khapyorsky --- drivers/infiniband/Kconfig | 2 + drivers/infiniband/Makefile | 1 + drivers/infiniband/util/Kconfig | 6 + drivers/infiniband/util/Makefile | 3 + drivers/infiniband/util/madeye.c | 590 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 602 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/util/Kconfig create mode 100644 drivers/infiniband/util/Makefile create mode 100644 drivers/infiniband/util/madeye.c diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 712e5e2..de8e39f 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -50,4 +50,6 @@ source "drivers/infiniband/ulp/sdp/Kconfig" source "drivers/infiniband/ulp/vnic/Kconfig" +source "drivers/infiniband/util/Kconfig" + endmenu diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index 57f2616..a7d1dc2 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/ obj-$(CONFIG_INFINIBAND_SDP) += ulp/sdp/ obj-$(CONFIG_INFINIBAND_VNIC) += ulp/vnic/ +obj-$(CONFIG_INFINIBAND_MADEYE) += util/ diff --git a/drivers/infiniband/util/Kconfig b/drivers/infiniband/util/Kconfig new file mode 100644 index 0000000..5e98eaa --- /dev/null +++ b/drivers/infiniband/util/Kconfig @@ -0,0 +1,6 @@ +config INFINIBAND_MADEYE + tristate "MAD debug viewer for InfiniBand" + depends on INFINIBAND + ---help--- + Prints sent and received MADs on QP 0/1 for debugging. + diff --git a/drivers/infiniband/util/Makefile b/drivers/infiniband/util/Makefile new file mode 100644 index 0000000..caf9471 --- /dev/null +++ b/drivers/infiniband/util/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_INFINIBAND_MADEYE) += ib_madeye.o + +ib_madeye-y := madeye.o diff --git a/drivers/infiniband/util/madeye.c b/drivers/infiniband/util/madeye.c new file mode 100644 index 0000000..2a76d45 --- /dev/null +++ b/drivers/infiniband/util/madeye.c @@ -0,0 +1,590 @@ +/* + * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005, 2006 Voltaire Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directorY of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ +#include +#include +#include + +#include +#include +#include + +MODULE_AUTHOR("Sean Hefty"); +MODULE_DESCRIPTION("InfiniBand MAD viewer"); +MODULE_LICENSE("Dual BSD/GPL"); + +static void madeye_remove_one(struct ib_device *device); +static void madeye_add_one(struct ib_device *device); + +static struct ib_client madeye_client = { + .name = "madeye", + .add = madeye_add_one, + .remove = madeye_remove_one +}; + +struct madeye_port { + struct ib_mad_agent *smi_agent; + struct ib_mad_agent *gsi_agent; +}; + +static int smp = 1; +static int gmp = 1; +static int mgmt_class = 0; +static int attr_id = 0; +static int data = 0; + +module_param(smp, int, 0444); +module_param(gmp, int, 0444); +module_param(mgmt_class, int, 0444); +module_param(attr_id, int, 0444); +module_param(data, int, 0444); + +MODULE_PARM_DESC(smp, "Display all SMPs (default=1)"); +MODULE_PARM_DESC(gmp, "Display all GMPs (default=1)"); +MODULE_PARM_DESC(mgmt_class, "Display all MADs of specified class (default=0)"); +MODULE_PARM_DESC(attr_id, "Display add MADs of specified attribute ID (default=0)"); +MODULE_PARM_DESC(data, "Display data area of MADs (default=0)"); + +static char * get_class_name(u8 mgmt_class) +{ + switch(mgmt_class) { + case IB_MGMT_CLASS_SUBN_LID_ROUTED: + return "LID routed SMP"; + case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: + return "Directed route SMP"; + case IB_MGMT_CLASS_SUBN_ADM: + return "Subnet admin."; + case IB_MGMT_CLASS_PERF_MGMT: + return "Perf. mgmt."; + case IB_MGMT_CLASS_BM: + return "Baseboard mgmt."; + case IB_MGMT_CLASS_DEVICE_MGMT: + return "Device mgmt."; + case IB_MGMT_CLASS_CM: + return "Comm. mgmt."; + case IB_MGMT_CLASS_SNMP: + return "SNMP"; + default: + return "Unknown vendor/application"; + } +} + +static char * get_method_name(u8 mgmt_class, u8 method) +{ + switch(method) { + case IB_MGMT_METHOD_GET: + return "Get"; + case IB_MGMT_METHOD_SET: + return "Set"; + case IB_MGMT_METHOD_GET_RESP: + return "Get response"; + case IB_MGMT_METHOD_SEND: + return "Send"; + case IB_MGMT_METHOD_SEND | IB_MGMT_METHOD_RESP: + return "Send response"; + case IB_MGMT_METHOD_TRAP: + return "Trap"; + case IB_MGMT_METHOD_REPORT: + return "Report"; + case IB_MGMT_METHOD_REPORT_RESP: + return "Report response"; + case IB_MGMT_METHOD_TRAP_REPRESS: + return "Trap repress"; + default: + break; + } + + switch (mgmt_class) { + case IB_MGMT_CLASS_SUBN_ADM: + switch (method) { + case IB_SA_METHOD_GET_TABLE: + return "Get table"; + case IB_SA_METHOD_GET_TABLE_RESP: + return "Get table response"; + case IB_SA_METHOD_DELETE: + return "Delete"; + case IB_SA_METHOD_DELETE_RESP: + return "Delete response"; + case IB_SA_METHOD_GET_MULTI: + return "Get Multi"; + case IB_SA_METHOD_GET_MULTI_RESP: + return "Get Multi response"; + case IB_SA_METHOD_GET_TRACE_TBL: + return "Get Trace Table response"; + default: + break; + } + default: + break; + } + + return "Unknown"; +} + +static void print_status_details(u16 status) +{ + if (status & 0x0001) + printk(" busy\n"); + if (status & 0x0002) + printk(" redirection required\n"); + switch((status & 0x001C) >> 2) { + case 1: + printk(" bad version\n"); + break; + case 2: + printk(" method not supported\n"); + break; + case 3: + printk(" method/attribute combo not supported\n"); + break; + case 7: + printk(" invalid attribute/modifier value\n"); + break; + } +} + +static char * get_sa_attr(__be16 attr) +{ + switch(attr) { + case IB_SA_ATTR_CLASS_PORTINFO: + return "Class Port Info"; + case IB_SA_ATTR_NOTICE: + return "Notice"; + case IB_SA_ATTR_INFORM_INFO: + return "Inform Info"; + case IB_SA_ATTR_NODE_REC: + return "Node Record"; + case IB_SA_ATTR_PORT_INFO_REC: + return "PortInfo Record"; + case IB_SA_ATTR_SL2VL_REC: + return "SL to VL Record"; + case IB_SA_ATTR_SWITCH_REC: + return "Switch Record"; + case IB_SA_ATTR_LINEAR_FDB_REC: + return "Linear FDB Record"; + case IB_SA_ATTR_RANDOM_FDB_REC: + return "Random FDB Record"; + case IB_SA_ATTR_MCAST_FDB_REC: + return "Multicast FDB Record"; + case IB_SA_ATTR_SM_INFO_REC: + return "SM Info Record"; + case IB_SA_ATTR_LINK_REC: + return "Link Record"; + case IB_SA_ATTR_GUID_INFO_REC: + return "Guid Info Record"; + case IB_SA_ATTR_SERVICE_REC: + return "Service Record"; + case IB_SA_ATTR_PARTITION_REC: + return "Partition Record"; + case IB_SA_ATTR_PATH_REC: + return "Path Record"; + case IB_SA_ATTR_VL_ARB_REC: + return "VL Arb Record"; + case IB_SA_ATTR_MC_MEMBER_REC: + return "MC Member Record"; + case IB_SA_ATTR_TRACE_REC: + return "Trace Record"; + case IB_SA_ATTR_MULTI_PATH_REC: + return "Multi Path Record"; + case IB_SA_ATTR_SERVICE_ASSOC_REC: + return "Service Assoc Record"; + case IB_SA_ATTR_INFORM_INFO_REC: + return "Inform Info Record"; + default: + return ""; + } +} + +static void print_mad_hdr(struct ib_mad_hdr *mad_hdr) +{ + printk("MAD version....0x%01x\n", mad_hdr->base_version); + printk("Class..........0x%01x (%s)\n", mad_hdr->mgmt_class, + get_class_name(mad_hdr->mgmt_class)); + printk("Class version..0x%01x\n", mad_hdr->class_version); + printk("Method.........0x%01x (%s)\n", mad_hdr->method, + get_method_name(mad_hdr->mgmt_class, mad_hdr->method)); + printk("Status.........0x%02x\n", be16_to_cpu(mad_hdr->status)); + if (mad_hdr->status) + print_status_details(be16_to_cpu(mad_hdr->status)); + printk("Class specific.0x%02x\n", be16_to_cpu(mad_hdr->class_specific)); + printk("Trans ID.......0x%llx\n", mad_hdr->tid); + if (mad_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_ADM) + printk("Attr ID........0x%02x (%s)\n", + be16_to_cpu(mad_hdr->attr_id), + get_sa_attr(be16_to_cpu(mad_hdr->attr_id))); + else + printk("Attr ID........0x%02x\n", + be16_to_cpu(mad_hdr->attr_id)); + printk("Attr modifier..0x%04x\n", be32_to_cpu(mad_hdr->attr_mod)); +} + +static char * get_rmpp_type(u8 rmpp_type) +{ + switch (rmpp_type) { + case IB_MGMT_RMPP_TYPE_DATA: + return "Data"; + case IB_MGMT_RMPP_TYPE_ACK: + return "Ack"; + case IB_MGMT_RMPP_TYPE_STOP: + return "Stop"; + case IB_MGMT_RMPP_TYPE_ABORT: + return "Abort"; + default: + return "Unknown"; + } +} + +static char * get_rmpp_flags(u8 rmpp_flags) +{ + if (rmpp_flags & IB_MGMT_RMPP_FLAG_ACTIVE) + if (rmpp_flags & IB_MGMT_RMPP_FLAG_FIRST) + if (rmpp_flags & IB_MGMT_RMPP_FLAG_LAST) + return "Active - First & Last"; + else + return "Active - First"; + else + if (rmpp_flags & IB_MGMT_RMPP_FLAG_LAST) + return "Active - Last"; + else + return "Active"; + else + return "Inactive"; +} + +static void print_rmpp_hdr(struct ib_rmpp_hdr *rmpp_hdr) +{ + printk("RMPP version...0x%01x\n", rmpp_hdr->rmpp_version); + printk("RMPP type......0x%01x (%s)\n", rmpp_hdr->rmpp_type, + get_rmpp_type(rmpp_hdr->rmpp_type)); + printk("RMPP RRespTime.0x%01x\n", ib_get_rmpp_resptime(rmpp_hdr)); + printk("RMPP flags.....0x%01x (%s)\n", ib_get_rmpp_flags(rmpp_hdr), + get_rmpp_flags(ib_get_rmpp_flags(rmpp_hdr))); + printk("RMPP status....0x%01x\n", rmpp_hdr->rmpp_status); + printk("Seg number.....0x%04x\n", be32_to_cpu(rmpp_hdr->seg_num)); + switch (rmpp_hdr->rmpp_type) { + case IB_MGMT_RMPP_TYPE_DATA: + printk("Payload len....0x%04x\n", + be32_to_cpu(rmpp_hdr->paylen_newwin)); + break; + case IB_MGMT_RMPP_TYPE_ACK: + printk("New window.....0x%04x\n", + be32_to_cpu(rmpp_hdr->paylen_newwin)); + break; + default: + printk("Data 2.........0x%04x\n", + be32_to_cpu(rmpp_hdr->paylen_newwin)); + break; + } +} + +static char * get_smp_attr(__be16 attr) +{ + switch (attr) { + case IB_SMP_ATTR_NOTICE: + return "notice"; + case IB_SMP_ATTR_NODE_DESC: + return "node description"; + case IB_SMP_ATTR_NODE_INFO: + return "node info"; + case IB_SMP_ATTR_SWITCH_INFO: + return "switch info"; + case IB_SMP_ATTR_GUID_INFO: + return "GUID info"; + case IB_SMP_ATTR_PORT_INFO: + return "port info"; + case IB_SMP_ATTR_PKEY_TABLE: + return "pkey table"; + case IB_SMP_ATTR_SL_TO_VL_TABLE: + return "SL to VL table"; + case IB_SMP_ATTR_VL_ARB_TABLE: + return "VL arbitration table"; + case IB_SMP_ATTR_LINEAR_FORWARD_TABLE: + return "linear forwarding table"; + case IB_SMP_ATTR_RANDOM_FORWARD_TABLE: + return "random forward table"; + case IB_SMP_ATTR_MCAST_FORWARD_TABLE: + return "multicast forward table"; + case IB_SMP_ATTR_SM_INFO: + return "SM info"; + case IB_SMP_ATTR_VENDOR_DIAG: + return "vendor diags"; + case IB_SMP_ATTR_LED_INFO: + return "LED info"; + default: + return ""; + } +} + +static void print_smp(struct ib_smp *smp) +{ + int i; + + printk("MAD version....0x%01x\n", smp->base_version); + printk("Class..........0x%01x (%s)\n", smp->mgmt_class, + get_class_name(smp->mgmt_class)); + printk("Class version..0x%01x\n", smp->class_version); + printk("Method.........0x%01x (%s)\n", smp->method, + get_method_name(smp->mgmt_class, smp->method)); + printk("Status.........0x%02x\n", be16_to_cpu(smp->status)); + if (smp->status) + print_status_details(be16_to_cpu(smp->status)); + printk("Hop pointer...0x%01x\n", smp->hop_ptr); + printk("Hop counter...0x%01x\n", smp->hop_cnt); + printk("Trans ID.......0x%llx\n", smp->tid); + printk("Attr ID........0x%02x (%s)\n", be16_to_cpu(smp->attr_id), + get_smp_attr(smp->attr_id)); + printk("Attr modifier..0x%04x\n", be32_to_cpu(smp->attr_mod)); + + printk("Mkey...........0x%llx\n", be64_to_cpu(smp->mkey)); + printk("DR SLID........0x%02x\n", be16_to_cpu(smp->dr_slid)); + printk("DR DLID........0x%02x", be16_to_cpu(smp->dr_dlid)); + + if (data) { + for (i = 0; i < IB_SMP_DATA_SIZE; i++) { + if (i % 16 == 0) + printk("\nSMP Data......."); + printk("%01x ", smp->data[i]); + } + for (i = 0; i < IB_SMP_MAX_PATH_HOPS; i++) { + if (i % 16 == 0) + printk("\nInitial path..."); + printk("%01x ", smp->initial_path[i]); + } + for (i = 0; i < IB_SMP_MAX_PATH_HOPS; i++) { + if (i % 16 == 0) + printk("\nReturn path...."); + printk("%01x ", smp->return_path[i]); + } + } + printk("\n"); +} + +static void snoop_smi_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_send_buf *send_buf, + struct ib_mad_send_wc *mad_send_wc) +{ + struct ib_mad_hdr *hdr = send_buf->mad; + + if (!smp && hdr->mgmt_class != mgmt_class) + return; + if (attr_id && hdr->attr_id != attr_id) + return; + + printk("Madeye:sent SMP\n"); + print_smp(send_buf->mad); +} + +static void recv_smi_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_recv_wc *mad_recv_wc) +{ + if (!smp && mad_recv_wc->recv_buf.mad->mad_hdr.mgmt_class != mgmt_class) + return; + if (attr_id && mad_recv_wc->recv_buf.mad->mad_hdr.attr_id != attr_id) + return; + + printk("Madeye:recv SMP\n"); + print_smp((struct ib_smp *)&mad_recv_wc->recv_buf.mad->mad_hdr); +} + +static int is_rmpp_mad(struct ib_mad_hdr *mad_hdr) +{ + if (mad_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { + switch (mad_hdr->method) { + case IB_SA_METHOD_GET_TABLE: + case IB_SA_METHOD_GET_TABLE_RESP: + case IB_SA_METHOD_GET_MULTI_RESP: + return 1; + default: + break; + } + } else if ((mad_hdr->mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && + (mad_hdr->mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) + return 1; + + return 0; +} + +static void snoop_gsi_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_send_buf *send_buf, + struct ib_mad_send_wc *mad_send_wc) +{ + struct ib_mad_hdr *hdr = send_buf->mad; + + if (!gmp && hdr->mgmt_class != mgmt_class) + return; + if (attr_id && hdr->attr_id != attr_id) + return; + + printk("Madeye:sent GMP\n"); + print_mad_hdr(hdr); + + if (is_rmpp_mad(hdr)) + print_rmpp_hdr(&((struct ib_rmpp_mad *) hdr)->rmpp_hdr); +} + +static void recv_gsi_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_recv_wc *mad_recv_wc) +{ + struct ib_mad_hdr *hdr = &mad_recv_wc->recv_buf.mad->mad_hdr; + struct ib_rmpp_mad *mad = NULL; + struct ib_sa_mad *sa_mad; + struct ib_vendor_mad *vendor_mad; + u8 *mad_data; + int i, j; + + if (!gmp && hdr->mgmt_class != mgmt_class) + return; + if (attr_id && mad_recv_wc->recv_buf.mad->mad_hdr.attr_id != attr_id) + return; + + printk("Madeye:recv GMP\n"); + print_mad_hdr(hdr); + + if (is_rmpp_mad(hdr)) { + mad = (struct ib_rmpp_mad *) hdr; + print_rmpp_hdr(&mad->rmpp_hdr); + } + + if (data) { + if (hdr->mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { + j = IB_MGMT_SA_DATA; + /* Display SA header */ + if (is_rmpp_mad(hdr) && + mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) + return; + sa_mad = (struct ib_sa_mad *) + &mad_recv_wc->recv_buf.mad; + mad_data = sa_mad->data; + } else { + if (is_rmpp_mad(hdr)) { + j = IB_MGMT_VENDOR_DATA; + /* Display OUI */ + vendor_mad = (struct ib_vendor_mad *) + &mad_recv_wc->recv_buf.mad; + printk("Vendor OUI......%01x %01x %01x\n", + vendor_mad->oui[0], + vendor_mad->oui[1], + vendor_mad->oui[2]); + mad_data = vendor_mad->data; + } else { + j = IB_MGMT_MAD_DATA; + mad_data = mad_recv_wc->recv_buf.mad->data; + } + } + for (i = 0; i < j; i++) { + if (i % 16 == 0) + printk("\nData..........."); + printk("%01x ", mad_data[i]); + } + printk("\n"); + } +} + +static void madeye_add_one(struct ib_device *device) +{ + struct madeye_port *port; + int reg_flags; + u8 i, s, e; + + if (device->node_type == RDMA_NODE_IB_SWITCH) { + s = 0; + e = 0; + } else { + s = 1; + e = device->phys_port_cnt; + } + + port = kmalloc(sizeof *port * (e - s + 1), GFP_KERNEL); + if (!port) + goto out; + + reg_flags = IB_MAD_SNOOP_SEND_COMPLETIONS | IB_MAD_SNOOP_RECVS; + for (i = s; i <= e; i++) { + port[i].smi_agent = ib_register_mad_snoop(device, i, + IB_QPT_SMI, + reg_flags, + snoop_smi_handler, + recv_smi_handler, + &port[i]); + port[i].gsi_agent = ib_register_mad_snoop(device, i, + IB_QPT_GSI, + reg_flags, + snoop_gsi_handler, + recv_gsi_handler, + &port[i]); + } + +out: + ib_set_client_data(device, &madeye_client, port); +} + +static void madeye_remove_one(struct ib_device *device) +{ + struct madeye_port *port; + int i, s, e; + + port = (struct madeye_port *) + ib_get_client_data(device, &madeye_client); + if (!port) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) { + s = 0; + e = 0; + } else { + s = 1; + e = device->phys_port_cnt; + } + + for (i = s; i <= e; i++) { + if (!IS_ERR(port[i].smi_agent)) + ib_unregister_mad_agent(port[i].smi_agent); + if (!IS_ERR(port[i].gsi_agent)) + ib_unregister_mad_agent(port[i].gsi_agent); + } + kfree(port); +} + +static int __init ib_madeye_init(void) +{ + return ib_register_client(&madeye_client); +} + +static void __exit ib_madeye_cleanup(void) +{ + ib_unregister_client(&madeye_client); +} + +module_init(ib_madeye_init); +module_exit(ib_madeye_cleanup); -- 1.5.0.rc2.g73a2 From Ashish.Batwara at lsi.com Wed Feb 14 08:12:35 2007 From: Ashish.Batwara at lsi.com (Batwara, Ashish) Date: Wed, 14 Feb 2007 09:12:35 -0700 Subject: [openib-general] SM assigned GID addresses Message-ID: <01B9E81EECACE94DBBD0A556E768FB8A013599EC@NAMAIL2.ad.lsil.com> Hi, I am referring to Section 4.1.1 of IB Spec which talks about "GID Usage AND Properties". Does anyone know whether or not SM uses item # 3 below for the address assignment and who are all the vendor supports # 3? Can anybody points me to the appropriate driver documentation in this area? Thanks Ashish GID USAGE AND PROPERTIES 1) Each endport shall be assigned at least one unicast GID. The first unicast GID assigned shall be created using the manufacturer assigned EUI-64 identifier. This GID is referred to as GID index 0 and is formed by techniques 3(a) and 3(b) described below. 2) The default GID prefix shall be (0xFE80::0). A packet using the default GID prefix and either a manufacturer assigned or SM assigned EUI-64 must always be accepted by an endnode. A packet containing a GRH with a destination GID with this prefix must never be forwarded by a router, i.e. it is restricted to the local subnet. 3) A unicast GID shall be created using one or more of the following mechanisms: a) Concatenation of the default GID prefix with the manufacturer assigned EUI-64 identifier associated with an endport. This GID is referred to as the default GID. b) Concatenation of a subnet manager assigned 64-bit GID prefix and the manufacturer assigned EUI-64 identifier associated with an endport. c) Assignment of a GID by the subnet manager. The subnet manager creates a GID by concatenating the GID prefix (default or assigned) with a set of locally assigned EUI-64 values (at GID index 1 or above). Each endport must be assigned at least one unicast GID using (a). Additional GIDs may be assigned using (b) and/or (c). Note: A subnet shall only have one assigned GID prefix (non default) at any given time. From tziporet at mellanox.co.il Wed Feb 14 08:25:06 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 14 Feb 2007 18:25:06 +0200 Subject: [openib-general] OFED 1.2 alpha release Message-ID: <45D337E2.200@mellanox.co.il> Hi, In two weeks delay we publish OFED 1.2-alpha1 on http://www.openfabrics.org/builds/ofed-1.2/ File: OFED-1.2-alpha1.tgz BUILD_ID contains info on all packages sources location. Please report any issues in bugzilla https://bugs.openfabrics.org/ Tziporet & Vlad *_OS support:_* Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up4 - Redhat EL5 beta2 (only partially tested) kernel.org: - 2.6.20 - 2.6.19 Note: Redhat EL4 up3, Fedora C4, Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. _*Systems:*_ * x86_64 * x86 * ia64 * ppc64 (have not tested user space) _*Main changes from OFED-1.1:*_ 1. iWRAP is now supported with Chelsio T3 2. New kernel modules: VNIC, RDS, Bonding, SA cache, 3. New packages: MVAPICH2 4. IPoIB Connected mode 5. Multicast join from user space 6. libibverbs 1.1 7. OpenSM new routing models: FAT tree routing and Taurus routing 8. GUI tool for network diagnostic 9. New MPI releases: MVAPICH: version 0.9.9, Open MPI: version 1.2, MVAPICH2: version 0.9.8 Detailed list of changes can be found in: https://wiki.openfabrics.org/tiki-index.php?page=OFED+1.2+release+plan+and+features _*Limitations and known issues:*_ 1. ipath driver compilation fails on all systems, except for kernel 2.6.20 2. libipathverbs is not working with libibverbs 1.1 3. SDP netstat does not available on RHEL5 (due to compilation errors) 4. Routing table problem in SLES10 when using port #2 5. RDS compiles only on kernel 2.6.18/19/20 6. MVAPICH2 installation fails on SuSE Pro 10. 7. mstflint is not working on ppc64 8. RDS was not tested _*Missing features that should be completed for the Beta:*_ 1. Add madeye utility 2. RDS to support SLES10 and RHEL For details on each module status see: https://wiki.openfabrics.org/tiki-index.php?page=Teleconf+02-12-2007 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hnguyen at linux.vnet.ibm.com Wed Feb 14 08:40:30 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 17:40:30 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 0/5] ehca patch set for 2.6.21-rc1 Message-ID: <200702141740.30637.hnguyen@linux.vnet.ibm.com> Hello Roland! Here is a patch set for ehca with the following changes resp. bug fixes: * Reworked irq handler to avoid/reduce missed irq events * Fix race condition bug in find_next_online_cpu() and other potential locking issue of scaling code * Allow scaling code to be configurable (en-/disable) via module parameter * Replace yield() in ehca_destroy_cq() by wait_for_completion() * ehca_query_port() now returns LINK_UP for phys_state instead UNKNOWN Thanks! Nam From hnguyen at linux.vnet.ibm.com Wed Feb 14 08:40:47 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 17:40:47 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events Message-ID: <200702141740.48286.hnguyen@linux.vnet.ibm.com> Hi, here is a patch for ehca with the reworked irq handler. Thanks Nam Signed-off-by: Hoang-Nam Nguyen --- ehca_classes.h | 18 +++-- ehca_eq.c | 1 ehca_irq.c | 200 ++++++++++++++++++++++++++++++++++++--------------------- ehca_irq.h | 1 ehca_main.c | 24 +++++- ipz_pt_fn.h | 9 ++ 6 files changed, 172 insertions(+), 81 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-11 21:31:06.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 12:53:41.000000000 +0100 @@ -42,8 +42,6 @@ #ifndef __EHCA_CLASSES_H__ #define __EHCA_CLASSES_H__ -#include "ehca_classes.h" -#include "ipz_pt_fn.h" struct ehca_module; struct ehca_qp; @@ -54,14 +52,22 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include +#include + #ifdef CONFIG_PPC64 #include "ehca_classes_pSeries.h" #endif +#include "ipz_pt_fn.h" +#include "ehca_qes.h" +#include "ehca_irq.h" -#include -#include +#define EHCA_EQE_CACHE_SIZE 20 -#include "ehca_irq.h" +struct ehca_eqe_cache_entry { + struct ehca_eqe *eqe; + struct ehca_cq *cq; +}; struct ehca_eq { u32 length; @@ -74,6 +80,8 @@ struct ehca_eq { spinlock_t spinlock; struct tasklet_struct interrupt_task; u32 ist; + spinlock_t irq_spinlock; + struct ehca_eqe_cache_entry eqe_cache[EHCA_EQE_CACHE_SIZE]; }; struct ehca_sport { diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_eq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_eq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_eq.c 2007-02-11 21:31:06.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_eq.c 2007-02-14 12:53:40.000000000 +0100 @@ -61,6 +61,7 @@ int ehca_create_eq(struct ehca_shca *shc struct ib_device *ib_dev = &shca->ib_device; spin_lock_init(&eq->spinlock); + spin_lock_init(&eq->irq_spinlock); eq->is_initialized = 0; if (type != EHCA_EQ && type != EHCA_NEQ) { diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-11 21:36:12.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 13:07:54.000000000 +0100 @@ -401,87 +400,143 @@ irqreturn_t ehca_interrupt_eq(int irq, v return IRQ_HANDLED; } -void ehca_tasklet_eq(unsigned long data) -{ - struct ehca_shca *shca = (struct ehca_shca*)data; - struct ehca_eqe *eqe; - int int_state; - int query_cnt = 0; - do { - eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); +static inline void process_eqe(struct ehca_shca *shca, struct ehca_eqe *eqe) +{ + u64 eqe_value; + u32 token; + unsigned long flags; + struct ehca_cq *cq; + eqe_value = eqe->entry; + ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value); + if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { + ehca_dbg(&shca->ib_device, "... completion event"); + token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq = idr_find(&ehca_cq_idr, token); + if (cq == NULL) { + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + ehca_err(&shca->ib_device, + "Invalid eqe for non-existing cq token=%x", + token); + return; + } + reset_eq_pending(cq); +#ifdef CONFIG_INFINIBAND_EHCA_SCALING + queue_comp_task(cq); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); +#else + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + comp_event_callback(cq); +#endif + } else { + ehca_dbg(&shca->ib_device, + "Got non completion event"); + parse_identifier(shca, eqe_value); + } +} - if ((shca->hw_level >= 2) && eqe) - int_state = 1; - else - int_state = 0; +void ehca_process_eq(struct ehca_shca *shca, int is_irq) +{ + struct ehca_eq *eq = &shca->eq; + struct ehca_eqe_cache_entry *eqe_cache = eq->eqe_cache; + u64 eqe_value; + unsigned long flags; + int eqe_cnt, i; + int eq_empty = 0; - while ((int_state == 1) || eqe) { - while (eqe) { - u64 eqe_value = eqe->entry; - - ehca_dbg(&shca->ib_device, - "eqe_value=%lx", eqe_value); - - /* TODO: better structure */ - if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, - eqe_value)) { - unsigned long flags; - u32 token; - struct ehca_cq *cq; - - ehca_dbg(&shca->ib_device, - "... completion event"); - token = - EHCA_BMASK_GET(EQE_CQ_TOKEN, - eqe_value); - spin_lock_irqsave(&ehca_cq_idr_lock, - flags); - cq = idr_find(&ehca_cq_idr, token); - - if (cq == NULL) { - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); - break; - } + spin_lock_irqsave(&eq->irq_spinlock, flags); + if (is_irq) { + const int max_query_cnt = 100; + int query_cnt = 0; + int int_state = 1; + do { + int_state = hipz_h_query_int_state( + shca->ipz_hca_handle, eq->ist); + query_cnt++; + iosync(); + } while (int_state && query_cnt < max_query_cnt); + if (unlikely((query_cnt == max_query_cnt))) + ehca_dbg(&shca->ib_device, "int_state=%x query_cnt=%x", + int_state, query_cnt); + } - reset_eq_pending(cq); + /* read out all eqes */ + eqe_cnt = 0; + do { + u32 token; + eqe_cache[eqe_cnt].eqe = + (struct ehca_eqe *)ehca_poll_eq(shca, eq); + if (!eqe_cache[eqe_cnt].eqe) + break; + eqe_value = eqe_cache[eqe_cnt].eqe->entry; + if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { + token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); + spin_lock(&ehca_cq_idr_lock); + eqe_cache[eqe_cnt].cq = idr_find(&ehca_cq_idr, token); + if (!eqe_cache[eqe_cnt].cq) { + spin_unlock(&ehca_cq_idr_lock); + ehca_err(&shca->ib_device, + "Invalid eqe for non-existing cq " + "token=%x", token); + continue; + } + spin_unlock(&ehca_cq_idr_lock); + } else + eqe_cache[eqe_cnt].cq = NULL; + eqe_cnt++; + } while (eqe_cnt < EHCA_EQE_CACHE_SIZE); + if (!eqe_cnt) { + if (is_irq) + ehca_dbg(&shca->ib_device, + "No eqe found for irq event"); + goto unlock_irq_spinlock; + } else if (!is_irq) + ehca_dbg(&shca->ib_device, "deadman found %x eqe", eqe_cnt); + if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE)) + ehca_dbg(&shca->ib_device, "too many eqes for one irq event"); + /* enable irq for new packets */ + for (i = 0; i < eqe_cnt; i++) { + if (eq->eqe_cache[i].cq) + reset_eq_pending(eq->eqe_cache[i].cq); + } + /* check eq */ + spin_lock(&eq->spinlock); + eq_empty = (!ipz_eqit_eq_peek_valid(&shca->eq.ipz_queue)); + spin_unlock(&eq->spinlock); + /* call completion handler for cached eqes */ + for (i = 0; i < eqe_cnt; i++) + if (eq->eqe_cache[i].cq) { #ifdef CONFIG_INFINIBAND_EHCA_SCALING - queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); + spin_lock(&ehca_cq_idr_lock); + queue_comp_task(eq->eqe_cache[i].cq); + spin_unlock(&ehca_cq_idr_lock); #else - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); - comp_event_callback(cq); + comp_event_callback(eq->eqe_cache[i].cq); #endif - } else { - ehca_dbg(&shca->ib_device, - "... non completion event"); - parse_identifier(shca, eqe_value); - } - eqe = - (struct ehca_eqe *)ehca_poll_eq(shca, - &shca->eq); - } - - - if (shca->hw_level >= 2) { - int_state = - hipz_h_query_int_state(shca->ipz_hca_handle, - shca->eq.ist); - query_cnt++; - iosync(); - if (query_cnt >= 100) { - query_cnt = 0; - int_state = 0; - } - } - eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); - + } else { + ehca_dbg(&shca->ib_device, "Got non completion event"); + parse_identifier(shca, eq->eqe_cache[i].eqe->entry); } - } while (int_state != 0); + /* poll eq if not empty */ + if (eq_empty) + goto unlock_irq_spinlock; + do { + struct ehca_eqe *eqe; + eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); + if (!eqe) + break; + process_eqe(shca, eqe); + eqe_cnt++; + } while (1); - return; + unlock_irq_spinlock: + spin_unlock_irqrestore(&eq->irq_spinlock, flags); +} + +void ehca_tasklet_eq(unsigned long data) +{ + ehca_process_eq((struct ehca_shca*)data, 1); } #ifdef CONFIG_INFINIBAND_EHCA_SCALING diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.h infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.h 2007-02-11 21:31:06.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.h 2007-02-14 12:53:40.000000000 +0100 @@ -56,6 +56,7 @@ void ehca_tasklet_neq(unsigned long data irqreturn_t ehca_interrupt_eq(int irq, void *dev_id); void ehca_tasklet_eq(unsigned long data); +void ehca_process_eq(struct ehca_shca *shca, int is_irq); struct ehca_cpu_comp_task { wait_queue_head_t wait_queue; diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c 2007-02-11 21:31:06.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c 2007-02-14 12:53:41.000000000 +0100 @@ -52,7 +52,7 @@ MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0020"); +MODULE_VERSION("SVNEHCA_0021"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -778,8 +777,24 @@ void ehca_poll_eqs(unsigned long data) spin_lock(&shca_list_lock); list_for_each_entry(shca, &shca_list, shca_list) { - if (shca->eq.is_initialized) - ehca_tasklet_eq((unsigned long)(void*)shca); + if (shca->eq.is_initialized) { + /* call deadman proc only if eq ptr does not change */ + struct ehca_eq *eq = &shca->eq; + int max = 3; + volatile u64 q_ofs, q_ofs2; + u64 flags; + spin_lock_irqsave(&eq->spinlock, flags); + q_ofs = eq->ipz_queue.current_q_offset; + spin_unlock_irqrestore(&eq->spinlock, flags); + do { + spin_lock_irqsave(&eq->spinlock, flags); + q_ofs2 = eq->ipz_queue.current_q_offset; + spin_unlock_irqrestore(&eq->spinlock, flags); + max--; + } while (q_ofs == q_ofs2 && max > 0); + if (q_ofs == q_ofs2) + ehca_process_eq(shca, 0); + } } mod_timer(&poll_eqs_timer, jiffies + HZ); spin_unlock(&shca_list_lock); @@ -790,7 +805,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0020)\n"); + "(Rel.: SVNEHCA_0021)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h infiniband_work/drivers/infiniband/hw/ehca/ipz_pt_fn.h --- infiniband_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h 2007-02-11 21:31:06.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ipz_pt_fn.h 2007-02-14 12:53:40.000000000 +0100 @@ -247,6 +247,15 @@ static inline void *ipz_eqit_eq_get_inc_ return ret; } +static inline void *ipz_eqit_eq_peek_valid(struct ipz_queue *queue) +{ + void *ret = ipz_qeit_get(queue); + u32 qe = *(u8 *) ret; + if ((qe >> 7) != (queue->toggle_state & 1)) + return NULL; + return ret; +} + /* returns address (GX) of first queue entry */ static inline u64 ipz_qpt_get_firstpage(struct ipz_qpt *qpt) { From hnguyen at linux.vnet.ibm.com Wed Feb 14 08:41:03 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 17:41:03 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 2/5] ehca: fix race condition/locking issues in scaling code Message-ID: <200702141741.03964.hnguyen@linux.vnet.ibm.com> Hi, this patch fixes a race condition in find_next_cpu_online() and some other locking issues in scaling code. Thanks Nam Signed-off-by: Hoang-Nam Nguyen --- ehca_irq.c | 68 +++++++++++++++++++++++++++++-------------------------------- 1 files changed, 33 insertions(+), 35 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 14:16:45.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 14:16:35.000000000 +0100 @@ -544,28 +544,30 @@ void ehca_tasklet_eq(unsigned long data) static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { - unsigned long flags_last_cpu; + int cpu; + unsigned long flags; + WARN_ON_ONCE(!in_interrupt()); if (ehca_debug_level) ehca_dmp(&cpu_online_map, sizeof(cpumask_t), ""); - spin_lock_irqsave(&pool->last_cpu_lock, flags_last_cpu); - pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map); - if (pool->last_cpu == NR_CPUS) - pool->last_cpu = first_cpu(cpu_online_map); - spin_unlock_irqrestore(&pool->last_cpu_lock, flags_last_cpu); + spin_lock_irqsave(&pool->last_cpu_lock, flags); + cpu = next_cpu(pool->last_cpu, cpu_online_map); + if (cpu == NR_CPUS) + cpu = first_cpu(cpu_online_map); + pool->last_cpu = cpu; + spin_unlock_irqrestore(&pool->last_cpu_lock, flags); - return pool->last_cpu; + return cpu; } static void __queue_comp_task(struct ehca_cq *__cq, struct ehca_cpu_comp_task *cct) { - unsigned long flags_cct; - unsigned long flags_cq; + unsigned long flags; - spin_lock_irqsave(&cct->task_lock, flags_cct); - spin_lock_irqsave(&__cq->task_lock, flags_cq); + spin_lock_irqsave(&cct->task_lock, flags); + spin_lock(&__cq->task_lock); if (__cq->nr_callbacks == 0) { __cq->nr_callbacks++; @@ -576,8 +578,8 @@ static void __queue_comp_task(struct ehc else __cq->nr_callbacks++; - spin_unlock_irqrestore(&__cq->task_lock, flags_cq); - spin_unlock_irqrestore(&cct->task_lock, flags_cct); + spin_unlock(&__cq->task_lock); + spin_unlock_irqrestore(&cct->task_lock, flags); } static void queue_comp_task(struct ehca_cq *__cq) @@ -588,69 +590,69 @@ static void queue_comp_task(struct ehca_ cpu = get_cpu(); cpu_id = find_next_online_cpu(pool); - BUG_ON(!cpu_online(cpu_id)); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); + BUG_ON(!cct); if (cct->cq_jobs > 0) { cpu_id = find_next_online_cpu(pool); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); + BUG_ON(!cct); } __queue_comp_task(__cq, cct); - - put_cpu(); - - return; } static void run_comp_task(struct ehca_cpu_comp_task* cct) { struct ehca_cq *cq; - unsigned long flags_cct; - unsigned long flags_cq; + unsigned long flags; - spin_lock_irqsave(&cct->task_lock, flags_cct); + spin_lock_irqsave(&cct->task_lock, flags); while (!list_empty(&cct->cq_list)) { cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); - spin_unlock_irqrestore(&cct->task_lock, flags_cct); + spin_unlock_irqrestore(&cct->task_lock, flags); comp_event_callback(cq); - spin_lock_irqsave(&cct->task_lock, flags_cct); + spin_lock_irqsave(&cct->task_lock, flags); - spin_lock_irqsave(&cq->task_lock, flags_cq); + spin_lock(&cq->task_lock); cq->nr_callbacks--; if (cq->nr_callbacks == 0) { list_del_init(cct->cq_list.next); cct->cq_jobs--; } - spin_unlock_irqrestore(&cq->task_lock, flags_cq); - + spin_unlock(&cq->task_lock); } - spin_unlock_irqrestore(&cct->task_lock, flags_cct); - - return; + spin_unlock_irqrestore(&cct->task_lock, flags); } static int comp_task(void *__cct) { struct ehca_cpu_comp_task* cct = __cct; + int cql_empty; DECLARE_WAITQUEUE(wait, current); set_current_state(TASK_INTERRUPTIBLE); while(!kthread_should_stop()) { add_wait_queue(&cct->wait_queue, &wait); - if (list_empty(&cct->cq_list)) + spin_lock_irq(&cct->task_lock); + cql_empty = list_empty(&cct->cq_list); + spin_unlock_irq(&cct->task_lock); + if (cql_empty) schedule(); else __set_current_state(TASK_RUNNING); remove_wait_queue(&cct->wait_queue, &wait); - if (!list_empty(&cct->cq_list)) + spin_lock_irq(&cct->task_lock); + cql_empty = list_empty(&cct->cq_list); + spin_unlock_irq(&cct->task_lock); + if (!cql_empty) run_comp_task(__cct); set_current_state(TASK_INTERRUPTIBLE); @@ -693,8 +695,6 @@ static void destroy_comp_task(struct ehc if (task) kthread_stop(task); - - return; } static void take_over_work(struct ehca_comp_pool *pool, @@ -815,6 +815,4 @@ void ehca_destroy_comp_pool(void) free_percpu(pool->cpu_comp_tasks); kfree(pool); #endif - - return; } From hnguyen at linux.vnet.ibm.com Wed Feb 14 08:41:21 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 17:41:21 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 3/5] ehca: allow en/disabling scaling code via module parameter Message-ID: <200702141741.21722.hnguyen@linux.vnet.ibm.com> Hi, here is a patch for ehca that allows users to en/disable scaling code when loading ib_ehca module. Thanks Nam Signed-off-by: Hoang-Nam Nguyen --- Kconfig | 8 -------- ehca_classes.h | 1 + ehca_irq.c | 47 +++++++++++++++++++++-------------------------- ehca_main.c | 4 ++++ 4 files changed, 26 insertions(+), 34 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/Kconfig infiniband_work/drivers/infiniband/hw/ehca/Kconfig --- infiniband_orig/drivers/infiniband/hw/ehca/Kconfig 2007-02-14 14:18:16.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/Kconfig 2007-02-14 14:20:52.000000000 +0100 @@ -7,11 +7,3 @@ config INFINIBAND_EHCA To compile the driver as a module, choose M here. The module will be called ib_ehca. -config INFINIBAND_EHCA_SCALING - bool "Scaling support (EXPERIMENTAL)" - depends on IBMEBUS && INFINIBAND_EHCA && HOTPLUG_CPU && EXPERIMENTAL - default y - ---help--- - eHCA scaling support schedules the CQ callbacks to different CPUs. - - To enable this feature choose Y here. diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 14:18:16.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 14:20:17.000000000 +0100 @@ -277,6 +277,7 @@ extern struct idr ehca_cq_idr; extern int ehca_static_rate; extern int ehca_port_act_time; extern int ehca_use_hp_mr; +extern int ehca_scaling_code; struct ipzu_queue_resp { u32 qe_size; /* queue entry size */ diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 14:18:16.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 14:20:17.000000000 +0100 @@ -63,15 +63,11 @@ #define ERROR_DATA_LENGTH EHCA_BMASK_IBM(52,63) #define ERROR_DATA_TYPE EHCA_BMASK_IBM(0,7) -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - static void queue_comp_task(struct ehca_cq *__cq); static struct ehca_comp_pool* pool; static struct notifier_block comp_pool_callback_nb; -#endif - static inline void comp_event_callback(struct ehca_cq *cq) { if (!cq->ib_cq.comp_handler) @@ -423,13 +419,13 @@ static inline void process_eqe(struct eh return; } reset_eq_pending(cq); -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); -#else - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - comp_event_callback(cq); -#endif + if (ehca_scaling_code) { + queue_comp_task(cq); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + } else { + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + comp_event_callback(cq); + } } else { ehca_dbg(&shca->ib_device, "Got non completion event"); @@ -508,13 +504,12 @@ void ehca_process_eq(struct ehca_shca *s /* call completion handler for cached eqes */ for (i = 0; i < eqe_cnt; i++) if (eq->eqe_cache[i].cq) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - spin_lock(&ehca_cq_idr_lock); - queue_comp_task(eq->eqe_cache[i].cq); - spin_unlock(&ehca_cq_idr_lock); -#else - comp_event_callback(eq->eqe_cache[i].cq); -#endif + if (ehca_scaling_code) { + spin_lock(&ehca_cq_idr_lock); + queue_comp_task(eq->eqe_cache[i].cq); + spin_unlock(&ehca_cq_idr_lock); + } else + comp_event_callback(eq->eqe_cache[i].cq); } else { ehca_dbg(&shca->ib_device, "Got non completion event"); parse_identifier(shca, eq->eqe_cache[i].eqe->entry); @@ -540,8 +535,6 @@ void ehca_tasklet_eq(unsigned long data) ehca_process_eq((struct ehca_shca*)data, 1); } -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { int cpu; @@ -764,14 +757,14 @@ static int comp_pool_callback(struct not return NOTIFY_OK; } -#endif - int ehca_create_comp_pool(void) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING int cpu; struct task_struct *task; + if (!ehca_scaling_code) + return 0; + pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL); if (pool == NULL) return -ENOMEM; @@ -796,16 +789,19 @@ int ehca_create_comp_pool(void) comp_pool_callback_nb.notifier_call = comp_pool_callback; comp_pool_callback_nb.priority =0; register_cpu_notifier(&comp_pool_callback_nb); -#endif + + printk(KERN_INFO "eHCA scaling code enabled\n"); return 0; } void ehca_destroy_comp_pool(void) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING int i; + if (!ehca_scaling_code) + return; + unregister_cpu_notifier(&comp_pool_callback_nb); for (i = 0; i < NR_CPUS; i++) { @@ -814,5 +810,4 @@ void ehca_destroy_comp_pool(void) } free_percpu(pool->cpu_comp_tasks); kfree(pool); -#endif } diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c 2007-02-14 14:18:16.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c 2007-02-14 14:20:17.000000000 +0100 @@ -62,6 +62,7 @@ int ehca_use_hp_mr = 0; int ehca_port_act_time = 30; int ehca_poll_all_eqs = 1; int ehca_static_rate = -1; +int ehca_scaling_code = 1; module_param_named(open_aqp1, ehca_open_aqp1, int, 0); module_param_named(debug_level, ehca_debug_level, int, 0); @@ -71,6 +72,7 @@ module_param_named(use_hp_mr, ehca_u module_param_named(port_act_time, ehca_port_act_time, int, 0); module_param_named(poll_all_eqs, ehca_poll_all_eqs, int, 0); module_param_named(static_rate, ehca_static_rate, int, 0); +module_param_named(scaling_code, ehca_scaling_code, int, 0); MODULE_PARM_DESC(open_aqp1, "AQP1 on startup (0: no (default), 1: yes)"); @@ -91,6 +93,8 @@ MODULE_PARM_DESC(poll_all_eqs, " (0: no, 1: yes (default))"); MODULE_PARM_DESC(static_rate, "set permanent static rate (default: disabled)"); +MODULE_PARM_DESC(scaling_code, + "set scaling code (0: disabled, 1: enabled/default)"); spinlock_t ehca_qp_idr_lock; spinlock_t ehca_cq_idr_lock; From hnguyen at linux.vnet.ibm.com Wed Feb 14 08:41:35 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 17:41:35 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion() Message-ID: <200702141741.35444.hnguyen@linux.vnet.ibm.com> Hi, this patch removes yield() and uses wait_for_completion() in order to wait for running completion handlers finished before destroying associated completion queue. Thanks Nam Signed-off-by: Hoang-Nam Nguyen --- ehca_classes.h | 3 +++ ehca_cq.c | 3 ++- ehca_irq.c | 6 +++++- 3 files changed, 10 insertions(+), 2 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 13:52:49.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 13:52:06.000000000 +0100 @@ -52,6 +52,8 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include + #include #include @@ -154,6 +156,7 @@ struct ehca_cq { struct hlist_head qp_hashtab[QP_HASHTAB_LEN]; struct list_head entry; u32 nr_callbacks; + struct completion zero_callbacks; spinlock_t task_lock; u32 ownpid; /* mmap counter for resources mapped into user space */ diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_cq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_cq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_cq.c 2007-02-14 13:52:49.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_cq.c 2007-02-14 13:52:06.000000000 +0100 @@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d spin_lock_init(&my_cq->spinlock); spin_lock_init(&my_cq->cb_lock); spin_lock_init(&my_cq->task_lock); + init_completion(&my_cq->zero_callbacks); my_cq->ownpid = current->tgid; cq = &my_cq->ib_cq; @@ -332,7 +333,7 @@ int ehca_destroy_cq(struct ib_cq *cq) spin_lock_irqsave(&ehca_cq_idr_lock, flags); while (my_cq->nr_callbacks) { spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - yield(); + wait_for_completion(&my_cq->zero_callbacks); spin_lock_irqsave(&ehca_cq_idr_lock, flags); } diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 13:52:49.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 13:52:06.000000000 +0100 @@ -605,6 +605,7 @@ static void run_comp_task(struct ehca_cp spin_lock_irqsave(&cct->task_lock, flags); while (!list_empty(&cct->cq_list)) { + int is_complete = 0; cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); spin_unlock_irqrestore(&cct->task_lock, flags); comp_event_callback(cq); @@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp spin_lock(&cq->task_lock); cq->nr_callbacks--; - if (cq->nr_callbacks == 0) { + is_complete = (cq->nr_callbacks == 0); + if (is_complete) { list_del_init(cct->cq_list.next); cct->cq_jobs--; } spin_unlock(&cq->task_lock); + if (is_complete) /* wake up waiting destroy_cq() */ + complete(&cq->zero_callbacks); } spin_unlock_irqrestore(&cct->task_lock, flags); From halr at voltaire.com Wed Feb 14 08:41:01 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Feb 2007 11:41:01 -0500 Subject: [openib-general] SM assigned GID addresses In-Reply-To: <01B9E81EECACE94DBBD0A556E768FB8A013599EC@NAMAIL2.ad.lsil.com> References: <01B9E81EECACE94DBBD0A556E768FB8A013599EC@NAMAIL2.ad.lsil.com> Message-ID: <1171471259.22446.104290.camel@hal.voltaire.com> Hi, On Wed, 2007-02-14 at 11:12, Batwara, Ashish wrote: > Hi, > I am referring to Section 4.1.1 of IB Spec which talks about "GID Usage > AND Properties". Does anyone know whether or not SM uses item # 3 below > for the address assignment and who are all the vendor supports # 3? > Can anybody points me to the appropriate driver documentation in this area? OpenSM supports setting either the default GID prefix or a configured GID prefix and GIDs are comprised of this prefix and the endport EUI-64 (at index 0 of GUIDInfo). OpenSM does not currently support configuring GUIDInfo indices above 0. I'm also not sure how the stack would deal with this either. Is this a requirement for some reason ? If so, can you elaborate/explain ? -- Hal > Thanks > Ashish > > > GID USAGE AND PROPERTIES > 1) Each endport shall be assigned at least one unicast GID. The first > unicast GID assigned shall be created using the manufacturer assigned > EUI-64 identifier. This GID is referred to as GID index 0 and is > formed by techniques 3(a) and 3(b) described below. > 2) The default GID prefix shall be (0xFE80::0). A packet using the > default > GID prefix and either a manufacturer assigned or SM assigned > EUI-64 must always be accepted by an endnode. A packet containing > a GRH with a destination GID with this prefix must never be > forwarded by a router, i.e. it is restricted to the local subnet. > 3) A unicast GID shall be created using one or more of the following > mechanisms: > a) Concatenation of the default GID prefix with the manufacturer > assigned > EUI-64 identifier associated with an endport. This GID is > referred to as the default GID. > b) Concatenation of a subnet manager assigned 64-bit GID prefix > and the manufacturer assigned EUI-64 identifier associated with > an endport. > c) Assignment of a GID by the subnet manager. The subnet manager > creates a GID by concatenating the GID prefix (default or assigned) > with a set of locally assigned EUI-64 values (at GID index > 1 or above). Each endport must be assigned at least one unicast GID > using (a). Additional GIDs may be assigned using (b) and/or (c). Note: A > subnet > shall only have one assigned GID prefix (non default) at any given > time. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hnguyen at linux.vnet.ibm.com Wed Feb 14 08:41:44 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 17:41:44 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 5/5] ehca: query_port() returns LINK_UP instead UNKNOWN Message-ID: <200702141741.45135.hnguyen@linux.vnet.ibm.com> Hi, this patch sets port phys state as a result of ehca_query_port() to LINK_UP. On pSeries ehca actually represents a logical HCA, whose phys/link state always is LINK_UP. Thanks Nam Signed-off-by: Hoang-Nam Nguyen --- ehca_hca.c | 3 +++ 1 files changed, 3 insertions(+) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_hca.c infiniband_work/drivers/infiniband/hw/ehca/ehca_hca.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_hca.c 2007-02-14 13:11:45.000000000 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_hca.c 2007-02-14 12:53:52.000000000 +0100 @@ -162,6 +162,9 @@ int ehca_query_port(struct ib_device *ib props->active_width = IB_WIDTH_12X; props->active_speed = 0x1; + /* at the moment (logical) link state is always LINK_UP */ + props->phys_state = 0x5; + query_port1: ehca_free_fw_ctrlblock(rblock); From swise at opengridcomputing.com Wed Feb 14 08:47:05 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 14 Feb 2007 10:47:05 -0600 Subject: [openib-general] [Bug 325] RDMA_CM and address translation broken on sles9sp3 In-Reply-To: <45D32522.5080100@mellanox.co.il> References: <20070214115444.E46EAE603C3@openfabrics.org> <1171465049.15208.13.camel@stevo-desktop> <45D32522.5080100@mellanox.co.il> Message-ID: <1171471625.15208.30.camel@stevo-desktop> On Wed, 2007-02-14 at 17:05 +0200, Tziporet Koren wrote: > Steve Wise wrote: > > Tziporet, > > > > I didn't think we were going to apply this patch until Michael tested it > > with SDP/IPoIB on various distros. > > > > Michael, did you get a chance to test it (I'm guessing not since you > > were out sick)? > > > > The reason I'm concerned is that it changes the behavior of > > xxx_ip_dev_find() and _all_ backports, and we needed to test it out and > > make sure it doesn't regress anything. If it causes problems on other > > backports, the plan was to just fix the sles9sp3 backport and leave the > > others alone. > > > > With the test build vlad published yesterday which has this patch, > > rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might > > be due to this patch. I'm investigating this now. > > > > > > > We tested this patch with our regression on IB and its worked fine for > both SDP and IPoIB. > Then we applied it. > Please report ASAP if you think there is an issue. > I undid that change on RHEL4U4 and still see my iwarp rping problem, so its not related... Thanks, Steve. From HNGUYEN at de.ibm.com Wed Feb 14 08:55:27 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 17:55:27 +0100 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070214142924.GC20977@mellanox.co.il> Message-ID: Hi, > Well, this is not by design: AFAIK on x86_64 both types of libraries > are installed. So, it seems to be an issue with the build script. Will talk to Vlad. > But I still do not see how installing 32 bit binaries alongside the 64 > bit ones is useful, and I do not think other packages provide this option, > so maybe we shouldn't, either. Since we've 32bit libs with ofed-1.2, it is a benefit to have also at least ibutils as 32bit so that we can test if the corresponding 32bit libs work properly, especially the context switch path. Thus, please include also 32bit binaries. Thanks Nam From Ashish.Batwara at lsi.com Wed Feb 14 08:56:27 2007 From: Ashish.Batwara at lsi.com (Batwara, Ashish) Date: Wed, 14 Feb 2007 09:56:27 -0700 Subject: [openib-general] SM assigned GID addresses Message-ID: <01B9E81EECACE94DBBD0A556E768FB8A01359A19@NAMAIL2.ad.lsil.com> Thanks for your reply. So do you mean to say that current GUIDCap is always configured by SM to 1 for all the HCAs, and it is safe to assume that one IB port will have only one or multiple IB address but that will have GUID portion common in them (Based upon manufacturer's assigned EUI-64 based). We are trying to define the target functionality for IB for our storage arrays, and are trying to explore howmany port addresses that we can get from an initiator standpoint. How does this IB port GUID maps to SRP initiator ID? Are they same or I/O Controller may have its own GUID and can use SM prefix to derive initiator port ID in SRP login req? Thanks Ashish -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Wednesday, February 14, 2007 10:41 AM To: Batwara, Ashish Cc: openib-general at openib.org Subject: Re: [openib-general] SM assigned GID addresses Hi, On Wed, 2007-02-14 at 11:12, Batwara, Ashish wrote: > Hi, > I am referring to Section 4.1.1 of IB Spec which talks about "GID Usage > AND Properties". Does anyone know whether or not SM uses item # 3 below > for the address assignment and who are all the vendor supports # 3? > Can anybody points me to the appropriate driver documentation in this area? OpenSM supports setting either the default GID prefix or a configured GID prefix and GIDs are comprised of this prefix and the endport EUI-64 (at index 0 of GUIDInfo). OpenSM does not currently support configuring GUIDInfo indices above 0. I'm also not sure how the stack would deal with this either. Is this a requirement for some reason ? If so, can you elaborate/explain ? -- Hal > Thanks > Ashish > > > GID USAGE AND PROPERTIES > 1) Each endport shall be assigned at least one unicast GID. The first > unicast GID assigned shall be created using the manufacturer assigned > EUI-64 identifier. This GID is referred to as GID index 0 and is > formed by techniques 3(a) and 3(b) described below. > 2) The default GID prefix shall be (0xFE80::0). A packet using the > default > GID prefix and either a manufacturer assigned or SM assigned > EUI-64 must always be accepted by an endnode. A packet containing > a GRH with a destination GID with this prefix must never be > forwarded by a router, i.e. it is restricted to the local subnet. > 3) A unicast GID shall be created using one or more of the following > mechanisms: > a) Concatenation of the default GID prefix with the manufacturer > assigned > EUI-64 identifier associated with an endport. This GID is > referred to as the default GID. > b) Concatenation of a subnet manager assigned 64-bit GID prefix > and the manufacturer assigned EUI-64 identifier associated with > an endport. > c) Assignment of a GID by the subnet manager. The subnet manager > creates a GID by concatenating the GID prefix (default or assigned) > with a set of locally assigned EUI-64 values (at GID index > 1 or above). Each endport must be assigned at least one unicast GID > using (a). Additional GIDs may be assigned using (b) and/or (c). Note: A > subnet > shall only have one assigned GID prefix (non default) at any given > time. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Wed Feb 14 09:05:21 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 19:05:21 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: References: Message-ID: <20070214170520.GM16867@mellanox.co.il> > > Well, this is not by design: AFAIK on x86_64 both types of libraries > > are installed. > So, it seems to be an issue with the build script. Will talk to Vlad. > > > But I still do not see how installing 32 bit binaries alongside the 64 > > bit ones is useful, and I do not think other packages provide this option, > > so maybe we shouldn't, either. > > Since we've 32bit libs with ofed-1.2, it is a benefit to have also at least > ibutils as 32bit so that we can test if the corresponding 32bit libs work > properly, especially the context switch path. > Thus, please include also 32bit binaries. Still, using non-standard hacks like bin32 does not sound like a good idea. Maybe an option to *only* make 32 bit userspace might make sense though. Something like --disable-32bit, --disable-64bit. This would solve your problem, would it not? -- MST From HNGUYEN at de.ibm.com Wed Feb 14 09:02:38 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 18:02:38 +0100 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <1171463849.16240.11.camel@vladsk-laptop> Message-ID: Hi Vlad, > prefix/lib (32bit libraries) should be created on ppc64 as well. > Check that you have sysfsutils 32bit RPM installed. > I don't have ppc64 here to check. The current ofed-1.2 package does not, while ofed-1.1.1 has done. It looks like that the one fix we did for ofed-1.1.1 were away. If I remember right, the issue was that 64bit libs were created first, then copied as backup. Next 32bit libs were created and 64bit libs copied back to the same place of 32bit libs, ie. overwrote the 32bit libs. Haven't checked the build script/openib.spec yet... Regards Nam From halr at voltaire.com Wed Feb 14 09:16:05 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Feb 2007 12:16:05 -0500 Subject: [openib-general] SM assigned GID addresses In-Reply-To: <01B9E81EECACE94DBBD0A556E768FB8A01359A19@NAMAIL2.ad.lsil.com> References: <01B9E81EECACE94DBBD0A556E768FB8A01359A19@NAMAIL2.ad.lsil.com> Message-ID: <1171473359.22446.106282.camel@hal.voltaire.com> On Wed, 2007-02-14 at 11:56, Batwara, Ashish wrote: > Thanks for your reply. > So do you mean to say that current GUIDCap is always configured by SM to > 1 for all the HCAs, and it is safe to assume that one IB port will have > only one or multiple IB address but that will have GUID portion common > in them (Based upon manufacturer's assigned EUI-64 based). GUIDCap is a RO component in terms of the SM and guaranteed to be at least 1 for an endport. This comes from the device SMA, not the SM. -- Hal > We are trying to define the target functionality for IB for our storage > arrays, and are trying to explore howmany port addresses that we can get > from an initiator standpoint. > How does this IB port GUID maps to SRP initiator ID? Are they same or > I/O Controller may have its own GUID and can use SM prefix to derive > initiator port ID in SRP login req? > > Thanks > Ashish > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, February 14, 2007 10:41 AM > To: Batwara, Ashish > Cc: openib-general at openib.org > Subject: Re: [openib-general] SM assigned GID addresses > > Hi, > > On Wed, 2007-02-14 at 11:12, Batwara, Ashish wrote: > > Hi, > > I am referring to Section 4.1.1 of IB Spec which talks about "GID > Usage > > AND Properties". Does anyone know whether or not SM uses item # 3 > below > > for the address assignment and who are all the vendor supports # 3? > > > Can anybody points me to the appropriate driver documentation in this > area? > > OpenSM supports setting either the default GID prefix or a configured > GID prefix and GIDs are comprised of this prefix and the endport EUI-64 > (at index 0 of GUIDInfo). > > OpenSM does not currently support configuring GUIDInfo indices above 0. > I'm also not sure how the stack would deal with this either. Is this a > requirement for some reason ? If so, can you elaborate/explain ? > > -- Hal > > > Thanks > > Ashish > > > > > > GID USAGE AND PROPERTIES > > 1) Each endport shall be assigned at least one unicast GID. The first > > unicast GID assigned shall be created using the manufacturer assigned > > EUI-64 identifier. This GID is referred to as GID index 0 and is > > formed by techniques 3(a) and 3(b) described below. > > 2) The default GID prefix shall be (0xFE80::0). A packet using the > > default > > GID prefix and either a manufacturer assigned or SM assigned > > EUI-64 must always be accepted by an endnode. A packet containing > > a GRH with a destination GID with this prefix must never be > > forwarded by a router, i.e. it is restricted to the local subnet. > > 3) A unicast GID shall be created using one or more of the following > > mechanisms: > > a) Concatenation of the default GID prefix with the manufacturer > > assigned > > EUI-64 identifier associated with an endport. This GID is > > referred to as the default GID. > > b) Concatenation of a subnet manager assigned 64-bit GID prefix > > and the manufacturer assigned EUI-64 identifier associated with > > an endport. > > c) Assignment of a GID by the subnet manager. The subnet manager > > creates a GID by concatenating the GID prefix (default or assigned) > > with a set of locally assigned EUI-64 values (at GID index > > 1 or above). Each endport must be assigned at least one unicast GID > > using (a). Additional GIDs may be assigned using (b) and/or (c). Note: > A > > subnet > > shall only have one assigned GID prefix (non default) at any given > > time. > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > From HNGUYEN at de.ibm.com Wed Feb 14 09:25:27 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 14 Feb 2007 18:25:27 +0100 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070214170520.GM16867@mellanox.co.il> Message-ID: Hi, > Still, using non-standard hacks like bin32 does not sound like a good idea. I think the actual issue is there is no common approach for this on various platforms. > Maybe an option to *only* make 32 bit userspace might make sense though. > Something like --disable-32bit, --disable-64bit. > This would solve your problem, would it not? Does that mean if I don't specify one of them, I'll get 32- and 64bit execs? If yes, that's fine. Thanks Nam From mst at mellanox.co.il Wed Feb 14 09:39:27 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Feb 2007 19:39:27 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: References: Message-ID: <20070214173927.GN16867@mellanox.co.il> > > Still, using non-standard hacks like bin32 does not sound like a good idea. > > I think the actual issue is there is no common approach for this on various > platforms. > On platforms I've seen, there are 2 sets of libraries but only 64 bit executables provided. This is what we had for OFED 1.0, OFED 1.1, and I don't see the reason to change that - adding more executables to install on production systems will double QA work. > > > Maybe an option to *only* make 32 bit userspace might make sense though. > > Something like --disable-32bit, --disable-64bit. > > This would solve your problem, would it not? > > Does that mean if I don't specify one of them, I'll get 32- and 64bit > execs? > If yes, that's fine. No, by default we build 2 sets of libraries, and only 64 bit execs. But for your development purposes (I think you mentioned testing user/kernel context switch) I think we could have --disable-32bit flag to configure to get only 32 bit userspace. -- MST From dledford at redhat.com Wed Feb 14 10:29:22 2007 From: dledford at redhat.com (Doug Ledford) Date: Wed, 14 Feb 2007 13:29:22 -0500 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070214142924.GC20977@mellanox.co.il> References: <200702141518.56138.ossrosch@linux.vnet.ibm.com> <20070214142924.GC20977@mellanox.co.il> Message-ID: <1171477762.3161.105.camel@fc6.xsintricity.com> On Wed, 2007-02-14 at 16:29 +0200, Michael S. Tsirkin wrote: > > Quoting Stefan Roscher : > > Subject: Re: 32-bit build for ppc64 is required > > > > On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote: > > > > Quoting Stefan Roscher : > > > > Subject: 32-bit build for ppc64 is required > > > > > > > > Hi, > > > > > > > > after building the latest ofed build package we recognized that on PPC64 only > > > > 64-bit libaries were build. > > > > Because we have customers using older userpace apllications which are > > > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds. > > > > > > > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install > > > > directory.I would suggest to install 32-bit binaries into > > > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions > > > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory. > > > > > > The standard practice is to install 64 bit libraries under prefix/lib64 > > > and 32 bit libraries under prefix/lib. Why would PPC64 be any different? > > > > I think you missunderstand my post. The directory for 32/64bit libaries > > shouldbe prefix/lib and prefix/lib64 respectively. > > But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only. > > Well, this is not by design: AFAIK on x86_64 both types of libraries > are installed. > > > > I do not think we need 32 bit binaries at all, and there's no other package > > > I'm aware of that uses "bin32". > > > > We have customers that still use 32-bit userspace applications. > > It would be beneficial for them if they can obtain 32bit libs and execs from > > ofed1.2 in order to run their applications without recompiling them, because > > for some 32-bit applications recompiling is not an option. > > 32 bit libraries are needed for users to run 32 applications. > > But I still do not see how installing 32 bit binaries alongside the 64 > bit ones is useful, and I do not think other packages provide this option, > so maybe we shouldn't, either. The choice of 32/64 bit default is done on a per arch basis. With x86_64/i386, the increased number of CPU registers in 64bit mode outweighs the increased code bloat that goes along with 64bit mode. On PPC, no such register benefit exists for 64bit mode. As such, 32bit apps on PPC are faster than the equivalent 64bit apps up to the point at which a 4GB address space becomes a problem. Correspondingly, the default binaries on PPC are 32bit, and only those that *need* to be 64bit are. While a customer's application may need >4GB address space, certainly all the ibutils, diags, opensm, etc. do not. As a result, we compile all of those utilities as 32bit by default on PPC. We also ship all the libs as both 32/64bit so users can select the appropriate environment for their particular application (with the exception of dapl, which doesn't support 32bit and for which I filed a bug around the time of OFED 1.1). -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From sashak at voltaire.com Wed Feb 14 10:42:19 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 14 Feb 2007 20:42:19 +0200 Subject: [openib-general] [PATCH RFC] opensm: OpenSM Coding Style doc draft Message-ID: <20070214184219.GB27414@sashak.voltaire.com> Initial writeup about OpenSM Coding Style recommendations. Signed-off-by: Sasha Khapyorsky --- osm/doc/opensm-coding-style.txt | 34 ++++++++++++++++++++++++++++++++++ 1 files changed, 34 insertions(+), 0 deletions(-) create mode 100644 osm/doc/opensm-coding-style.txt diff --git a/osm/doc/opensm-coding-style.txt b/osm/doc/opensm-coding-style.txt new file mode 100644 index 0000000..379042c --- /dev/null +++ b/osm/doc/opensm-coding-style.txt @@ -0,0 +1,34 @@ +This short (hopefully) memo is about to define the coding style +recommended for OpenSM development. + +The goal of this is to make OpenSM code base to be standard in terms of +the rest of OpenIB management software, OpenIB projects and Linux in +general. And in this way to make OpenSM more developer friendly and to +involve more open source programmers to be part of OpenSM development +process. + +The goal of this is not to provide long and boring list of coding style +paradigms, but rather to define general coding style concept and to +suggest a way for such a concept to be implemented in the existing +OpenSM code base. + +The OpenSM project is an OpenIB and Linux centric project, so we think +it is reasonable to use the coding style most popular with OpenIB +projects (linux/Documentation/CodingStyle) as the starting point rather +than reinventing one more coding style rule-set. + +Some things from there in short: tab character for indentation and space +character for alignment, K&R style braces, short local and meanful +global names, please no confused Hungary style, short functions. And of +course to be reasonable about all above. + + +Some ideas about existing OpenSM code improvements in terms of the +Coding style: + +* When writing new code, please try to follow the new Coding style. +* Coding style improvement patches are desired and accepted, but please + try to not mix coding style improvement with functional and other + changes in one patch. +* When you are going to improve coding style for existing code, please + try to do it for entire file(s). -- 1.5.0.rc2.g11a3 From dledford at redhat.com Wed Feb 14 10:36:27 2007 From: dledford at redhat.com (Doug Ledford) Date: Wed, 14 Feb 2007 13:36:27 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com> References: <1170866522.6223.8.camel@vladsk-laptop> <7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com> Message-ID: <1171478187.3161.107.camel@fc6.xsintricity.com> On Fri, 2007-02-09 at 13:38 -0500, Jeff Squyres wrote: > New SRPM on server that munges the %build section into the %install > section. > > Yuck. :-) Worse than yuck, it's wrong. Your SuSE %build section bug is a result of trying to build against something that isn't installed yet but is required for the build. You guys chose to split things up into modules, and that's fine and the way things should be, but that means you need to install required packages along the way if you want to build against them, not try to build against binaries in temporary directories. Apart from that though, I can assure you that on RHEL and FC, the %build section is a requirement if you want valid -debuginfo packages. I've brought it up at the last two conferences I attented, and I usually get a brick wall when I do, but the OFED packaging process is broken by design. As Shaun brought up, one of the benefits of proper RPM packaging is reliable, reproducible builds, not to mention the whole issue of debugging with gdb is nigh impossible without valid debuginfo rpms; all of which are vital to supportability. I'm looking through the alpha1 tarball right now, I'll comment on it later under separate email. But, first glance is that I'll be ripping everything out and making it sane again. Which brings up another point that I've mentioned before but nothing has happened on: as long as you guys keep making your distribution use an installation hierarchy that violates the rules for distributions shipping code, places like Novell or Red Hat have one of two choices: violate the Linux File Hierarchy Standard in our distributions or use a different hierarchy than you do. Obviously, we aren't going to fore go LFHS compliance of our entire product for just this, so we use a different hierarchy than you. In the end, this can end up causing confusion for customers, as well as inconsistency between what Red Hat or Novell or you guys choose to use as the file placement. Something needs to be done to standardize installation directories in an acceptable place IMO (/usr/local is verboten for a distribution to use, and theoretically that should include you guys since you are a distribution source, the only real reason people are compiling your code locally is that you don't provide binary RPMs or because they want a custom compiler instead of gcc, not because they are trying out new software they don't necessarily intend to keep/use or which is new enough that no one has formally packaged it up, which is what /usr/local is for). > > On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote: > > > Hi Jeff, > > Please remove %build macro from the RPM spec file. > > On SuSE distros it removes RPM_BUILD_ROOT. > > > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343 > > + umask 022 > > + cd /var/tmp/OFEDRPM/BUILD > > + /bin/rm -rf /var/tmp/OFED > > ++ dirname /var/tmp/OFED > > + /bin/mkdir -p /var/tmp > > + /bin/mkdir /var/tmp/OFED > > + cd openmpi-1.2b4ofedr13470 > > + fortify_source=1 > > + test '' '!=' '' > > ... > > > > -- > > Vladimir Sokolovsky > > Mellanox Technologies Ltd. > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From sweitzen at cisco.com Wed Feb 14 10:44:14 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 14 Feb 2007 10:44:14 -0800 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: <1171478187.3161.107.camel@fc6.xsintricity.com> References: <1170866522.6223.8.camel@vladsk-laptop> <7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com> <1171478187.3161.107.camel@fc6.xsintricity.com> Message-ID: Tziporet and Doug, we can discuss this at the OFED conf call on Feb 26, I suggest we try to improve this area. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Doug Ledford > Sent: Wednesday, February 14, 2007 10:36 AM > To: Jeff Squyres (jsquyres) > Cc: openfabrics-ewg at openib.org; 'Openib-General at Openib.Org' > Subject: Re: [openib-general] Open MPI rpmbuild fails in OFED-1.2 > > On Fri, 2007-02-09 at 13:38 -0500, Jeff Squyres wrote: > > New SRPM on server that munges the %build section into the > %install > > section. > > > > Yuck. :-) > > Worse than yuck, it's wrong. Your SuSE %build section bug is a result > of trying to build against something that isn't installed yet but is > required for the build. You guys chose to split things up > into modules, > and that's fine and the way things should be, but that means > you need to > install required packages along the way if you want to build against > them, not try to build against binaries in temporary > directories. Apart > from that though, I can assure you that on RHEL and FC, the %build > section is a requirement if you want valid -debuginfo packages. > > I've brought it up at the last two conferences I attented, > and I usually > get a brick wall when I do, but the OFED packaging process is > broken by > design. As Shaun brought up, one of the benefits of proper RPM > packaging is reliable, reproducible builds, not to mention the whole > issue of debugging with gdb is nigh impossible without valid debuginfo > rpms; all of which are vital to supportability. > > I'm looking through the alpha1 tarball right now, I'll comment on it > later under separate email. But, first glance is that I'll be ripping > everything out and making it sane again. > > Which brings up another point that I've mentioned before but > nothing has > happened on: as long as you guys keep making your distribution use an > installation hierarchy that violates the rules for distributions > shipping code, places like Novell or Red Hat have one of two choices: > violate the Linux File Hierarchy Standard in our > distributions or use a > different hierarchy than you do. Obviously, we aren't going > to fore go > LFHS compliance of our entire product for just this, so we use a > different hierarchy than you. In the end, this can end up causing > confusion for customers, as well as inconsistency between what Red Hat > or Novell or you guys choose to use as the file placement. Something > needs to be done to standardize installation directories in an > acceptable place IMO (/usr/local is verboten for a > distribution to use, > and theoretically that should include you guys since you are a > distribution source, the only real reason people are > compiling your code > locally is that you don't provide binary RPMs or because they want a > custom compiler instead of gcc, not because they are trying out new > software they don't necessarily intend to keep/use or which is new > enough that no one has formally packaged it up, which is what > /usr/local > is for). > > > > > On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote: > > > > > Hi Jeff, > > > Please remove %build macro from the RPM spec file. > > > On SuSE distros it removes RPM_BUILD_ROOT. > > > > > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343 > > > + umask 022 > > > + cd /var/tmp/OFEDRPM/BUILD > > > + /bin/rm -rf /var/tmp/OFED > > > ++ dirname /var/tmp/OFED > > > + /bin/mkdir -p /var/tmp > > > + /bin/mkdir /var/tmp/OFED > > > + cd openmpi-1.2b4ofedr13470 > > > + fortify_source=1 > > > + test '' '!=' '' > > > ... > > > > > > -- > > > Vladimir Sokolovsky > > > Mellanox Technologies Ltd. > > > > > -- > Doug Ledford > GPG KeyID: CFBFF194 > http://people.redhat.com/dledford > > Infiniband specific RPMs available at > http://people.redhat.com/dledford/Infiniband > From jsquyres at cisco.com Wed Feb 14 10:51:29 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 14 Feb 2007 13:51:29 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: References: <1170866522.6223.8.camel@vladsk-laptop> <7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com> <1171478187.3161.107.camel@fc6.xsintricity.com> Message-ID: <464B6D9D-FA58-46C1-88AD-5D109E98C16B@cisco.com> On Feb 14, 2007, at 1:44 PM, Scott Weitzenkamp ((sweitzen)) wrote: > Tziporet and Doug, we can discuss this at the OFED conf call on Feb > 26, > I suggest we try to improve this area. I strongly agree with this and all of Doug's points (see my prior e- mails on this subject :-) ). -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From hch at infradead.org Wed Feb 14 10:59:07 2007 From: hch at infradead.org (Christoph Hellwig) Date: Wed, 14 Feb 2007 18:59:07 +0000 Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events In-Reply-To: <200702141740.48286.hnguyen@linux.vnet.ibm.com> References: <200702141740.48286.hnguyen@linux.vnet.ibm.com> Message-ID: <20070214185907.GA15105@infradead.org> On Wed, Feb 14, 2007 at 05:40:47PM +0100, Hoang-Nam Nguyen wrote: > Hi, > here is a patch for ehca with the reworked irq handler. > Thanks > Nam This looks okay to me (and sorry for new replying earlier to you private mail) From jeremy.brown at qlogic.com Wed Feb 14 11:08:38 2007 From: jeremy.brown at qlogic.com (Jeremy Brown) Date: Wed, 14 Feb 2007 11:08:38 -0800 Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package In-Reply-To: <1171406869.17328.16.camel@citrine.pathscale.com> References: <1171387167.3978.90.camel@vladsk-laptop> <45D21B85.9070007@mellanox.co.il> <1171406869.17328.16.camel@citrine.pathscale.com> Message-ID: <1171480118.17328.18.camel@citrine.pathscale.com> On Tue, 2007-02-13 at 14:47 -0800, Jeremy Brown wrote: > I know that the package is named "sysfsutils-devel" in Fedora Core 3-5, > and "libsysfs-devel" in Fedora Core 6, similar to the RH 4 vs. RH 5 > split. Would it be possible to change the definition and use of > $DISTRIBUTION in build_env.sh so the we had "fedora" for FC3-5, and > "fedora6" for FC6, similar to the "redhat" and "redhat5" split? I'm not > married to those names, of course. I apologize for replying to myself, but I wanted to say that this is working great in the alpha. Thanks for making the change! Jeremy Brown From sashak at voltaire.com Wed Feb 14 11:20:08 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 14 Feb 2007 21:20:08 +0200 Subject: [openib-general] ibsim announcement Message-ID: <20070214192008.GC27414@sashak.voltaire.com> Hi All, 'ibsim' is Voltaire Infiniband Fabric Simulator. The tool was originally developed by voltairians and was used with big success for IB management software development, debug and testing. Also we found this perfectly useful for various researches and a routing algorithms development. Based on the successful experience in the using 'ibsim' for development Voltaire decided to make this tool available for everybody and contributes 'ibsim' sources to the OpenIB Community. The ibsim package is available now on the OFA site and can be cloned: git clone git://git.openfabrics.org/~sashak/ibsim There is README file with build instructions. The kernel support or OpenSM and diags tools recompilation are _not_ required. Enjoy! Sasha From sweitzen at cisco.com Wed Feb 14 11:12:52 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 14 Feb 2007 11:12:52 -0800 Subject: [openib-general] how to handle OFEd 1.2 bugs in bugzilla In-Reply-To: <45D31A10.8020102@mellanox.co.il> References: <45D31A10.8020102@mellanox.co.il> Message-ID: Yes, I'd like to add alpha1, etc. version numbers in bugzilla. For existing bugs, the Reporter and Assignee should try to communicate/negotiate Priority/Severity. For bugs in areas that Cisco supports, I review the bugs and try to ask for desired ones to be fixed. I was happy with the responses I got for OFED 1.1 from Mellanox and Open MPI. If you want a bug scrub, I suggest a distributed one, where someone from each company scrubs the bugs in areas they are responsible for. Scott > -----Original Message----- > From: Tziporet Koren [mailto:tziporet at mellanox.co.il] > Sent: Wednesday, February 14, 2007 6:18 AM > To: Scott Weitzenkamp (sweitzen) > Cc: EWG; OPENIB > Subject: how to handle OFEd 1.2 bugs in bugzilla > > Hi Scott and all, > I wish to consult with you in the way we will treat OFED 1.2 bugs in > bugzilla. > > 1. Do we want to have 1.2-alpha 1.2-beta, 1.2-rcX in version, or just > 1.2 as we have now > 2. What do we wish to do with bugs that were opened for 1.1 and are > still open? > 3. What to do with old bugs that where open to gen2 in general? > 4. What is our methodology for priority and severity setup? > (There are > too many blocker bugs still open in OFED 1.1 so they are not > actually > blockers or they were fixed but not updated) > > Thanks, > Tziporet > From dledford at redhat.com Wed Feb 14 11:33:24 2007 From: dledford at redhat.com (Doug Ledford) Date: Wed, 14 Feb 2007 14:33:24 -0500 Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2 In-Reply-To: References: <1170866522.6223.8.camel@vladsk-laptop> <7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com> <1171478187.3161.107.camel@fc6.xsintricity.com> Message-ID: <1171481604.3161.110.camel@fc6.xsintricity.com> On Wed, 2007-02-14 at 10:44 -0800, Scott Weitzenkamp (sweitzen) wrote: > Tziporet and Doug, we can discuss this at the OFED conf call on Feb 26, > I suggest we try to improve this area. OK. I'll make sure to attend the Feb 26 meeting. > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: openib-general-bounces at openib.org > > [mailto:openib-general-bounces at openib.org] On Behalf Of Doug Ledford > > Sent: Wednesday, February 14, 2007 10:36 AM > > To: Jeff Squyres (jsquyres) > > Cc: openfabrics-ewg at openib.org; 'Openib-General at Openib.Org' > > Subject: Re: [openib-general] Open MPI rpmbuild fails in OFED-1.2 > > > > On Fri, 2007-02-09 at 13:38 -0500, Jeff Squyres wrote: > > > New SRPM on server that munges the %build section into the > > %install > > > section. > > > > > > Yuck. :-) > > > > Worse than yuck, it's wrong. Your SuSE %build section bug is a result > > of trying to build against something that isn't installed yet but is > > required for the build. You guys chose to split things up > > into modules, > > and that's fine and the way things should be, but that means > > you need to > > install required packages along the way if you want to build against > > them, not try to build against binaries in temporary > > directories. Apart > > from that though, I can assure you that on RHEL and FC, the %build > > section is a requirement if you want valid -debuginfo packages. > > > > I've brought it up at the last two conferences I attented, > > and I usually > > get a brick wall when I do, but the OFED packaging process is > > broken by > > design. As Shaun brought up, one of the benefits of proper RPM > > packaging is reliable, reproducible builds, not to mention the whole > > issue of debugging with gdb is nigh impossible without valid debuginfo > > rpms; all of which are vital to supportability. > > > > I'm looking through the alpha1 tarball right now, I'll comment on it > > later under separate email. But, first glance is that I'll be ripping > > everything out and making it sane again. > > > > Which brings up another point that I've mentioned before but > > nothing has > > happened on: as long as you guys keep making your distribution use an > > installation hierarchy that violates the rules for distributions > > shipping code, places like Novell or Red Hat have one of two choices: > > violate the Linux File Hierarchy Standard in our > > distributions or use a > > different hierarchy than you do. Obviously, we aren't going > > to fore go > > LFHS compliance of our entire product for just this, so we use a > > different hierarchy than you. In the end, this can end up causing > > confusion for customers, as well as inconsistency between what Red Hat > > or Novell or you guys choose to use as the file placement. Something > > needs to be done to standardize installation directories in an > > acceptable place IMO (/usr/local is verboten for a > > distribution to use, > > and theoretically that should include you guys since you are a > > distribution source, the only real reason people are > > compiling your code > > locally is that you don't provide binary RPMs or because they want a > > custom compiler instead of gcc, not because they are trying out new > > software they don't necessarily intend to keep/use or which is new > > enough that no one has formally packaged it up, which is what > > /usr/local > > is for). > > > > > > > > On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote: > > > > > > > Hi Jeff, > > > > Please remove %build macro from the RPM spec file. > > > > On SuSE distros it removes RPM_BUILD_ROOT. > > > > > > > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343 > > > > + umask 022 > > > > + cd /var/tmp/OFEDRPM/BUILD > > > > + /bin/rm -rf /var/tmp/OFED > > > > ++ dirname /var/tmp/OFED > > > > + /bin/mkdir -p /var/tmp > > > > + /bin/mkdir /var/tmp/OFED > > > > + cd openmpi-1.2b4ofedr13470 > > > > + fortify_source=1 > > > > + test '' '!=' '' > > > > ... > > > > > > > > -- > > > > Vladimir Sokolovsky > > > > Mellanox Technologies Ltd. > > > > > > > > -- > > Doug Ledford > > GPG KeyID: CFBFF194 > > http://people.redhat.com/dledford > > > > Infiniband specific RPMs available at > > http://people.redhat.com/dledford/Infiniband > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From krause at cup.hp.com Wed Feb 14 11:09:25 2007 From: krause at cup.hp.com (Michael Krause) Date: Wed, 14 Feb 2007 11:09:25 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <000001c74fca$6c765170$8698070a@amr.corp.intel.com> References: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com> <000001c74fca$6c765170$8698070a@amr.corp.intel.com> Message-ID: <6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com> At 03:55 PM 2/13/2007, Sean Hefty wrote: > >A LID is subnet local on that we can all agree. The CM Req contains > >either the LID of a local subnet CA or the LID a local router which will > >move the packet to the next hop to the destination. 12.7.11 is basically > >saying that the remote LID is the router's LID of the local subnet's router > >Port. 12.7.21 also refers to the remote LID but in each subnet that is > >either the router Port's LID or the destination CA. > >This isn't my interpretation. > >12.7.11 Local Port LID: When local and remote ports are on different subnets, >this field must be the LID of the router that the *passive* side will >target for >the return path. > >The CM REQ carries the LIDs for the remote (passive side) subnet. This is >what >the passive side needs to configure the QP, not the active side LID >information. >(See address vector information for 11.2.4.2 - page 574.) > >So, the CM REQ is _sent_ to either the LID of the local subnet CA or the >LID of >a local router port, but _contains_ the LIDs from the remote subnet. In volume 1, version 1.2, page 574 it states: Emacs! 12.7.11 Emacs! Both of these statements refer to the local subnet's LID for the router port being used by the local CA to communicate to a remote subnet. The IB architecture is built upon the concept that no subnet local information knowledge is required beyond the subnet itself to establish communication across subnets. Perhaps the various wordings are a bit confusing but the CM protocol should not be concerned with a remote subnet's LID or any validation of such remote subnet information. All it needs to do is communicate what is global so that a remote endnode can respond correctly. It is up to the router and the associated router protocol to perform any global to subnet local mapping which includes the LID and LRH generation. The router must work with each subnet's SM / SA to provide the necessary global to subnet local mappings which are then queried by the CM agent to find the appropriate router Port. There is no requirement in the specification to ever communicate across a subnet anything that is strictly subnet local. LID is a strictly subnet local value and is not shared. Again, the passive here refers to the subnet local router LID and not the remote subnet's LID. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: a3c35b92.jpg Type: image/jpeg Size: 43794 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: a3c35c00.jpg Type: image/jpeg Size: 45123 bytes Desc: not available URL: From sweitzen at cisco.com Wed Feb 14 12:24:39 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 14 Feb 2007 12:24:39 -0800 Subject: [openib-general] OFED 1.2 alpha release In-Reply-To: <45D337E2.200@mellanox.co.il> References: <45D337E2.200@mellanox.co.il> Message-ID: I don't remember discussing dropping RHEL4 U3, and would like to add it back to the official list. IPoIB multicast does not work correctly (bug 266) in RHEL4 U4, thus RHEL4 U3 is the most recent working RHEL release in this area (unless it has been fixed in U4 errata kernels). The new ib-bonding RPM also says it only supports RHEL4 U3 for Red Hat releases. We should probably also plan for SLES10 SP1 support in OFED 1.2. Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Wednesday, February 14, 2007 8:25 AM To: EWG Cc: OPENIB Subject: [openib-general] OFED 1.2 alpha release Hi, In two weeks delay we publish OFED 1.2-alpha1 on http://www.openfabrics.org/builds/ofed-1.2/ File: OFED-1.2-alpha1.tgz BUILD_ID contains info on all packages sources location. Please report any issues in bugzilla https://bugs.openfabrics.org/ Tziporet & Vlad OS support: Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up4 - Redhat EL5 beta2 (only partially tested) kernel.org: - 2.6.20 - 2.6.19 Note: Redhat EL4 up3, Fedora C4, Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 (have not tested user space) Main changes from OFED-1.1: 1. iWRAP is now supported with Chelsio T3 2. New kernel modules: VNIC, RDS, Bonding, SA cache, 3. New packages: MVAPICH2 4. IPoIB Connected mode 5. Multicast join from user space 6. libibverbs 1.1 7. OpenSM new routing models: FAT tree routing and Taurus routing 8. GUI tool for network diagnostic 9. New MPI releases: MVAPICH: version 0.9.9, Open MPI: version 1.2, MVAPICH2: version 0.9.8 Detailed list of changes can be found in: https://wiki.openfabrics.org/tiki-index.php?page=OFED+1.2+release+plan+a nd+features Limitations and known issues: 1. ipath driver compilation fails on all systems, except for kernel 2.6.20 2. libipathverbs is not working with libibverbs 1.1 3. SDP netstat does not available on RHEL5 (due to compilation errors) 4. Routing table problem in SLES10 when using port #2 5. RDS compiles only on kernel 2.6.18/19/20 6. MVAPICH2 installation fails on SuSE Pro 10. 7. mstflint is not working on ppc64 8. RDS was not tested Missing features that should be completed for the Beta: 1. Add madeye utility 2. RDS to support SLES10 and RHEL For details on each module status see: https://wiki.openfabrics.org/tiki-index.php?page=Teleconf+02-12-2007 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ardavis at ichips.intel.com Wed Feb 14 13:26:38 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 14 Feb 2007 13:26:38 -0800 Subject: [openib-general] OFED 1.2 dapl and dat.conf In-Reply-To: <1171397522.21471.7.camel@stevo-desktop> References: <1171397522.21471.7.camel@stevo-desktop> Message-ID: <45D37E8E.5050800@ichips.intel.com> Steve Wise wrote: >Currently, the dapl rpms don't install dat.conf. I think they probably >should, eh? Maybe in /etc/dat.conf > > my specfile is setup to target sysconfdir which is typically set to `$(prefix)/etc' %{_sysconfdir}/dat.conf I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir can help explain? -arlin From mshefty at ichips.intel.com Wed Feb 14 13:36:46 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Feb 2007 13:36:46 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com> References: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com> <000001c74fca$6c765170$8698070a@amr.corp.intel.com> <6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com> Message-ID: <45D380EE.70300@ichips.intel.com> Assume that the active and passive sides of a connection request are on different subnets and: Active side - LID 1 Active side router - LID 2 Passive side - LID 93 Passive side router - LID 94 What values are you suggesting are used for: Active side QP - DLID Passive side QP - DLID CM REQ Primary Local Port LID - Sean From sean.hefty at intel.com Wed Feb 14 13:45:40 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 14 Feb 2007 13:45:40 -0800 Subject: [openib-general] GetTable path record query not returning DGID=SGID paths Message-ID: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com> We're seeing a situation where it appears that the response to a GetTable path record query is not returning paths where the DGID is the same as the SGID. The query is setting the SGID and number of paths. We're still investigating if this is indeed the case, but does anyone know if such a query should return paths where DGID=SGID? - Sean From krause at cup.hp.com Wed Feb 14 13:49:26 2007 From: krause at cup.hp.com (Michael Krause) Date: Wed, 14 Feb 2007 13:49:26 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> Message-ID: <6.2.0.14.2.20070214134413.0944fcf8@esmail.cup.hp.com> I do not see the need for any of this. The router protocol should be designed to work with each subnet's SM / SA to provide information on what GID prefix is on each router Port. This is used to look up the subnet local LRH fields. The only cross-subnet challenges are global based, e.g. what is the P_Key to use and how to manage those across subnets or how should TClass be interpreted to achieve a consistent behavior independent of how the TClass is subnet local mapped to a SL. These were the types of challenges remaining when we stopped development of the router specification. If the IBTA decides to develop a router specification then it might be best to join that effort and work it out in detail before attempting to develop the management infrastructure. May be able to slightly lag in order to validate the technical directions that the spec will take without having to wait until 1.0 to say, yep, this looks good or here is where you need to change the spec. Not clear what can be developed until there is a router specification to execute to in the industry. Mike At 01:17 PM 2/13/2007, Sean Hefty wrote: >Here's a first take at summarizing the IB routing discussion. > >The following spec references are noted: > >9.6.1.5 C9-54. The SLID shall be validated (for connected QPs). >12.7.11. CM REQ Local Port LID - is LID of remote router. >13.5.4: Defines reversible paths. > >The main discussion point centered on trying to meet 9.6.1.5 C9-54. This >requires that the forward and reverse data flows between two QPs traverse the >same router LID on both subnets. The idea was made to try to eliminate this >compliance statement for packets carrying a GRH, but this is viewed as going >against the spirit of IBA. > >Ideas were presented around trying to construct an 'inter-subnet path record' >that contained the following: > > - Side A GRH.SGID = active side's Port GID > - Side A GRH.DGID = passive side's Port GID > - Side A LRH.SLID = any active side's port LID > - Side A LRH.DLID = A subnet router > - Side A LRH.SL = SL to A subnet router > > - Side B GRH.SGID = Side A GRH.DGID > - Side B GRH.DGID = Side A GRH.SGID > - Side B LRH.SLID = any passive side's port LID > - Side B LRH.DLID = B subnet router > - Side B LRH.SL = SL to B subnet router > >It is still unclear how such a record can be constructed. But communication >with remote SAs might be achieved by using a well-known GID suffix. It's also >unclear whether the fields in a path record are relative to the SA's subnet or >the SGID. > >It's anticipated that SAs will need to interact with routers, but in an >unspecified manner. From mshefty at ichips.intel.com Wed Feb 14 14:02:38 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Feb 2007 14:02:38 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <6.2.0.14.2.20070214134413.0944fcf8@esmail.cup.hp.com> References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> <6.2.0.14.2.20070214134413.0944fcf8@esmail.cup.hp.com> Message-ID: <45D386FE.5080202@ichips.intel.com> Mike, are you expecting that routers will modify CM messages as they flow between subnets? - Sean From rdreier at cisco.com Wed Feb 14 14:08:03 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Feb 2007 14:08:03 -0800 Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events In-Reply-To: <200702141740.48286.hnguyen@linux.vnet.ibm.com> (Hoang-Nam Nguyen's message of "Wed, 14 Feb 2007 17:40:47 +0100") References: <200702141740.48286.hnguyen@linux.vnet.ibm.com> Message-ID: Looks fine but this patch at least has serious whitespace damage... please resend a fixed version. - R. From rdreier at cisco.com Wed Feb 14 14:16:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Feb 2007 14:16:28 -0800 Subject: [openib-general] [PATCH] IPoIB: Only allow root to change between datagram and connected mode Message-ID: Change the permissions of the "mode" sysfs attribute to be S_IWUSR instead of S_IWUGO. Signed-off-by: Roland Dreier --- FYI -- I'm planning to merge this for 2.6.21. It doesn't seem appropriate to allow ordinary users to mess with this sort of config. drivers/infiniband/ulp/ipoib/ipoib_cm.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 2d48387..8881a71 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -1138,7 +1138,7 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr, return -EINVAL; } -static DEVICE_ATTR(mode, S_IWUGO | S_IRUGO, show_mode, set_mode); +static DEVICE_ATTR(mode, S_IWUSR | S_IRUGO, show_mode, set_mode); int ipoib_cm_add_mode_attr(struct net_device *dev) { -- 1.4.4.4 From krause at cup.hp.com Wed Feb 14 14:15:50 2007 From: krause at cup.hp.com (Michael Krause) Date: Wed, 14 Feb 2007 14:15:50 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <45D380EE.70300@ichips.intel.com> References: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com> <000001c74fca$6c765170$8698070a@amr.corp.intel.com> <6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com> <45D380EE.70300@ichips.intel.com> Message-ID: <6.2.0.14.2.20070214135826.096b7638@esmail.cup.hp.com> At 01:36 PM 2/14/2007, Sean Hefty wrote: >Assume that the active and passive sides of a connection request are on >different subnets and: > >Active side - LID 1 >Active side router - LID 2 >Passive side - LID 93 >Passive side router - LID 94 > >What values are you suggesting are used for: > >Active side QP - DLID >Passive side QP - DLID >CM REQ Primary Local Port LID Subnet A is: QP Port LID 1 Router A Port LID 2 Subnet B is: QP Port LID 93 Router B Port LID 94 Process steps: - Router A populates SM / SA A with the GID prefix it can route. SM / SA A will have configured the router Port with the appropriate local route information and hence have assigned it LID 2. - CM associated with Port LID 1 queries the SM / SA to identify a path to a GID Prefix. SM / SA returns a path record indicating a global route, i.e. one that requires a GRH, is available and provides the CM with the information targeting router Port LID 2. - CM creates a REQ and populates the global information to identify the remote endnode. The LRH generated targets Port LID 2. The GRH is generated to target the remote subnet so the router will comprehend how to process the packet. - Router A receives the packet and examines the GRH. Via its router protocol, it has previously identified what router Port will lead to the next hop on the path to the destination endnode. - If the endnode is subnet local, say subnet B, then the router generates a LRH with QP LID 93 and emits that on router Port LID 94. - QP in subnet B receives the CM REQ and validates the LRH. Given these messages are via UD service and not RC / UC, the validation rules for the LRH are different. The CM agent processes the request and returns an appropriate response by filling in a GRH that replaces the SGID with the DGID and so forth so the addresses are basically reflected back. The response uses QP port LID 93 and targets router Port 94. - Router B Port 94 receives the response. It parses the GRH and determines the next hop port. In this example, the response goes out router A Port 2 and targets QP Port LID 1. The LRH is generated using these fields. Again, since CM is targeting a UD QP, the LRH validation rules are different. - Once the connection is established, the QP on subnet A will send packets to QP on subnet B using a GRH that is processed by the router with each QP using a LRH that targets the router port locally attached to its subnet. The router is responsible for generating a LRH to forward to the next hop. These packets are now in a RC / UC data flow so the LRH validation is per the sections cited in this e-mail string. In all cases, the router protocol is responsible for generation of a LRH that will work within each subnet. There is no exchange of subnet local information between the subnets. Each subnet's SM/SA only tracks what is local to it as well as what GID prefix can be routed via a given LID. If multiple LID can route to a given GID prefix, multiple path records are returned. Which to choose is not specified by the specifications so it can be any policy one desires. If the router protocol communicates a "cost" to a given path in order to give an indication of appropriateness for a given workload, then this should be communicated to the CM agent. Mike From hch at infradead.org Wed Feb 14 14:28:24 2007 From: hch at infradead.org (Christoph Hellwig) Date: Wed, 14 Feb 2007 22:28:24 +0000 Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion() In-Reply-To: <200702141741.35444.hnguyen@linux.vnet.ibm.com> References: <200702141741.35444.hnguyen@linux.vnet.ibm.com> Message-ID: <20070214222824.GA11579@infradead.org> > @@ -332,7 +333,7 @@ int ehca_destroy_cq(struct ib_cq *cq) > spin_lock_irqsave(&ehca_cq_idr_lock, flags); > while (my_cq->nr_callbacks) { > spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); > - yield(); > + wait_for_completion(&my_cq->zero_callbacks); > spin_lock_irqsave(&ehca_cq_idr_lock, flags); > } A while loop around wait_for_completion doesn't make all that much sense. I suspect a simple if (my_cq->nr_callbacks) wait_for_completion(&my_cq->zero_callbacks); Is what you need. From halr at voltaire.com Wed Feb 14 14:28:09 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Feb 2007 17:28:09 -0500 Subject: [openib-general] GetTable path record query not returning DGID=SGID paths In-Reply-To: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com> References: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com> Message-ID: <1171492082.22446.123948.camel@hal.voltaire.com> On Wed, 2007-02-14 at 16:45, Sean Hefty wrote: > We're seeing a situation where it appears that the response to a GetTable path > record query is not returning paths where the DGID is the same as the SGID. Is this OpenSM or a vendor SM ? > The query is setting the SGID and number of paths. Yes, that's the min required for GetTable request. > We're still investigating if this is indeed the case, but does anyone know if > such a query should return paths where DGID=SGID? I believe it should but I'm not sure there's specific compliance. Such loopback paths are mentioned though. -- Hal > - Sean From krause at cup.hp.com Wed Feb 14 14:39:59 2007 From: krause at cup.hp.com (Michael Krause) Date: Wed, 14 Feb 2007 14:39:59 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <45D386FE.5080202@ichips.intel.com> References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> <6.2.0.14.2.20070214134413.0944fcf8@esmail.cup.hp.com> <45D386FE.5080202@ichips.intel.com> Message-ID: <6.2.0.14.2.20070214143823.09695658@esmail.cup.hp.com> At 02:02 PM 2/14/2007, Sean Hefty wrote: >Mike, are you expecting that routers will modify CM messages as they flow >between subnets? The router parses the GRH, strips the LRH, attaches a new LRH to the next hop with the contents of the LRH filled in per its internal policies. Nothing more for the main packet processing. The router interacts with each subnet's SM/SA to insure the path records can be provided to the CM to fill in the right information. Mike From mshefty at ichips.intel.com Wed Feb 14 15:01:55 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Feb 2007 15:01:55 -0800 Subject: [openib-general] GetTable path record query not returning DGID=SGID paths In-Reply-To: <1171492082.22446.123948.camel@hal.voltaire.com> References: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com> <1171492082.22446.123948.camel@hal.voltaire.com> Message-ID: <45D394E3.5080805@ichips.intel.com> >>We're seeing a situation where it appears that the response to a GetTable path >>record query is not returning paths where the DGID is the same as the SGID. > > Is this OpenSM or a vendor SM ? This is with opensm. When we're running with the local SA cache, we're seeing route resolution (path record lookup) retries, but only for loopback connections. This suggests that that we're not getting path records for DGID=SGID. - Sean From mshefty at ichips.intel.com Wed Feb 14 15:16:41 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Feb 2007 15:16:41 -0800 Subject: [openib-general] Problem is routing CM REQ In-Reply-To: <6.2.0.14.2.20070214135826.096b7638@esmail.cup.hp.com> References: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com> <000001c74fca$6c765170$8698070a@amr.corp.intel.com> <6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com> <45D380EE.70300@ichips.intel.com> <6.2.0.14.2.20070214135826.096b7638@esmail.cup.hp.com> Message-ID: <45D39859.4070700@ichips.intel.com> I agree with what was in your response, however, this is how I interpret your answers: >> Active side QP - DLID 2 >> Passive side QP - DLID 94 >> CM REQ Primary Local Port LID no answer given > - CM creates a REQ and populates the global information to identify the > remote endnode. The LRH generated targets Port LID 2. The GRH is > generated to target the remote subnet so the router will comprehend how > to process the packet. What is carried in the Primary REQ Local Port LID and Primary Remote Port LID fields in the REQ? My claim is that in this example the values are 94 and 93, respectively. The passive side uses these values to configure its QP. This means that the active side requires knowledge of the LIDs that are used on the passive side subnet. If you believe that other values are carried in the REQ, what are they, and how are they used? - Sean From halr at voltaire.com Wed Feb 14 15:20:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Feb 2007 18:20:20 -0500 Subject: [openib-general] GetTable path record query not returning DGID=SGID paths In-Reply-To: <45D394E3.5080805@ichips.intel.com> References: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com> <1171492082.22446.123948.camel@hal.voltaire.com> <45D394E3.5080805@ichips.intel.com> Message-ID: <1171495184.22446.126481.camel@hal.voltaire.com> On Wed, 2007-02-14 at 18:01, Sean Hefty wrote: > >>We're seeing a situation where it appears that the response to a GetTable path > >>record query is not returning paths where the DGID is the same as the SGID. > > > > Is this OpenSM or a vendor SM ? > > This is with opensm. When we're running with the local SA cache, we're seeing > route resolution (path record lookup) retries, but only for loopback > connections. This suggests that that we're not getting path records for DGID=SGID. What is the value of NumbPath and how large a subnet is this ? I'm pretty sure this works; at least it did the last I checked. -- Hal > - Sean From sean.hefty at intel.com Wed Feb 14 15:42:20 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 14 Feb 2007 15:42:20 -0800 Subject: [openib-general] GetTable path record query not returningDGID=SGID paths In-Reply-To: <1171495184.22446.126481.camel@hal.voltaire.com> Message-ID: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com> >What is the value of NumbPath and how large a subnet is this ? I'm >pretty sure this works; at least it did the last I checked. By default, NumbPath should be 127, but I would have expected a path record even with it set to 1. (I don't think we were using different PKeys or anything like that.) We haven't looked into this in more detail yet. This was our observation while testing on a larger (64 node) cluster this morning that we don't have access to at the moment. With the local SA cache running, we were surprised to see any retries, and when we looked into it more, retries were always for loopback connections. Let me look into this more on the host stack side. - Sean From rdreier at cisco.com Wed Feb 14 16:50:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Feb 2007 16:50:06 -0800 Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion() In-Reply-To: <200702141741.35444.hnguyen@linux.vnet.ibm.com> (Hoang-Nam Nguyen's message of "Wed, 14 Feb 2007 17:41:35 +0100") References: <200702141741.35444.hnguyen@linux.vnet.ibm.com> Message-ID: I agree with Christoph -- the use of wait_for_completion() in a loop makes no sense. When you send a new copy of this patch without whitespace damage, please fix that up too... From michael.arndt at informatik.tu-chemnitz.de Wed Feb 14 17:34:32 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Thu, 15 Feb 2007 02:34:32 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> <001001c74c87$8b653470$21606d86@one7> <1171122546.31538.251673.camel@hal.voltaire.com> Message-ID: <000601c750a1$71746a40$21606d86@one7> Hi, I used your changes and it helps in some cases, but there are still situations where the umad_send return with that error. I try to describe this situation: (Node 1) -> (Node 2) -> (Node 3) Node 1: sends 100 SubnGets to Node 3 (Dr [0][1][1]) Node 2: traverse 100 SubGets to Node 3 and also traverse 100 SubnGetResp to Node 1 Node 3: response 100 times That works fine!! Please don't wonder that the Node2 gets the packets, that's because I changed the SMI. But if I start now the sender on Node 1 again, so that it sends another 100 SubnGets the Node 2 produces umad_send errors. The error didn't come every time. The receive are allways ok and also the packets are. Below I attach the main code from the router tool on Node 2. I also tested to allocate a packet for every single receive and send, but that didn't work as well. What is about the size of the packet, could there be any error? Thanks Michael while(run){ bcopy((char*)&fd_ports,(char*)&fd_ports_tmp,sizeof(fd_ports)); activ = select(highest_fd+1, (fd_set*)&fd_ports_tmp, (fd_set*)0, (fd_set*)0,(struct timeval*)0); if (activ < 0 ){ if (run) printf("Error: select : %i\n",activ); run = 0; } else if (activ == 0) printf("Nothing to do\n"); else { // ++ Alloc MAD ++ //printf("... Alloc UMAD ......................."); if (!(umad = umad_alloc(Port_ID_cnt, umad_size() + IB_MAD_SIZE))){ printf("Error: umad_alloc\n"); goto Exit; } //printf("done\n"); // ++ Alloc SMP Pointer ++ //printf("... Alloc SMP ........................"); smp = (struct drsmp**) malloc(Port_ID_cnt * sizeof(struct drsmp*)); for (i = 0; i < Port_ID_cnt; i++) smp[i] = (struct drsmp*) umad_get_mad(umad + (i * (umad_size() + IB_MAD_SIZE))); //printf("done\n"); // ++ Check All Ports where something is to do ++ for (i = 0; i < Port_ID_cnt; i++) { if ( (Port_ID[i] >= 0) && (Agent_ID[i] >= 0) && (FD_ISSET(umad_get_fd(Port_ID[i]),(fd_set*)&fd_ports_tmp))) { smplength = IB_MAD_SIZE; packet_size = umad_size() + IB_MAD_SIZE; printf("... Recv Mad (Port: %i (ID:%i).....",i+1,Port_ID[i]); // ++ Receive ++ if ((ret = umad_recv(Port_ID[i], umad + (i * packet_size), &smplength, timeout_ms_r)) != Agent_ID[i]){ printf("Error: umad_recv: %s ,Nr: %i\n", drmad_status_str(smp[i]),ret); if (optExitRecvFail) run = 0; } else { // ++ Drop Echo ++ if (smp[i]->initial_path[1] != 0) { // ++ Keep TID in Mind with supporting turning algorithm ++ if ( !( (smp[i]->initial_path[smp[i]->hop_ptr] == i+1) && (smp[i]->status & DIRECTION) && (smp[i]->hop_cnt == smp[i]->hop_ptr) && (smp[i]->initial_path[smp[i]->hop_ptr] != smp[i]->initial_path[smp[i]->hop_ptr - 1]) ) && ( (Agent_TIDs[i] == -1) || (Agent_TIDs[i] != (own_ntoh64(smp[i]->tid) >> 32)) ) ) Agent_TIDs[i] = smp[i]->tid; printf("TID: 0x%lx\n",own_ntoh64(Agent_TIDs[i])); // ++ Message Logging ++ if (optMsgLog) { fprintf(MsgLogFile,"...............................................................................................\n"); fprintf(MsgLogFile,"... Recv Mad (Port: %i (ID:%i)...............\n",i+1,Port_ID[i]); fprintf(MsgLogFile,"... Recv TID: 0x%lx \n",own_ntoh64(Agent_TIDs[i])); dump_dr_smp(smp[i], MsgLogFile); } // ++ Looking up the Out-Port ++ Out_Port_index = routing(smp[i],Devices_Info,Devices_cnt); if ((Out_Port_index >= 0) && (Port_ID[Out_Port_index] >=0)){ printf("... Send Mad (Port: %i (ID:%i).....",Out_Port_index+1,Port_ID[Out_Port_index]); // ++ Replace TID if (Agent_TIDs[Out_Port_index] != -1) smp[i]->tid = (uint64_t) Agent_TIDs[Out_Port_index]; // ++ Sending ++ //printf("%i\n",timeout_ms_s); //= (smp[i]->status & DIRECTION)? 0 : 200; if ((ret = umad_send(Port_ID[Out_Port_index], Agent_ID[Out_Port_index], umad + (i * packet_size), smplength, (smp[i]->status & DIRECTION)? 0 : timeout_ms_s, 3)) < 0){ printf("Error: umad_send Nr: %i \n",ret); if (optExitSendFail) run = 0; } else printf("TID: 0x%lx \n",own_ntoh64(Agent_TIDs[Out_Port_index])); if (optMsgLog) { fprintf(MsgLogFile,"... Send TID: 0x%lx \n",own_ntoh64(Agent_TIDs[Out_Port_index])); fprintf(MsgLogFile,"... Send Mad (Port: %i (ID:%i)(%s)(%i)...............\n",Out_Port_index+1,Port_ID[Out_Port_index],(ret >= 0)?"OK":"Fail",(smp[i]->status & DIRECTION)? 0 : timeout_ms_s); fprintf(MsgLogFile,"...............................................................................................\n"); fflush(MsgLogFile); } traversed++; } } else { printf("dropped, probably there is missing a response mad\n"); dropped++; } } } } if (umad) umad_free(umad); } printf("... Traversed Packets (%i)(%i) .............................\n",traversed,dropped); } From michael.arndt at informatik.tu-chemnitz.de Wed Feb 14 18:12:54 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Thu, 15 Feb 2007 03:12:54 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> <001001c74c87$8b653470$21606d86@one7> <1171122546.31538.251673.camel@hal.voltaire.com> Message-ID: <000801c750a6$cd925120$21606d86@one7> Hi, what I forgot was that the write function in umad_send returns with -1 if the error occurs. Maybe that helps. Thanks Michael From nimrodg at mellanox.com Wed Feb 14 20:16:20 2007 From: nimrodg at mellanox.com (Nimrod Gindi) Date: Wed, 14 Feb 2007 20:16:20 -0800 Subject: [openib-general] OFED release testing Task force meeting minutes Message-ID: <1E3DCD1C63492545881FACB6063A57C1C8275A@mtiexch01.mti.com> Meeting took place on Wednesday - Feb. 7th, 2007 8:30AM (PST) Agenda: 1. Review report summary suggestion 1 (sent by Amit K.- Mellanox). 2. Review report summary suggestion 2 (sent by Moni L.- Voltaire). 3. Review testing matrix report (sent by Jeremy B.- Qlogic). Attending companies: Mellanox, IBM, Qlogic, Voltaire, SystemFabricWorks Discussion Items and Action Items: 1) Reviewed the different reports. 2) Minor suggestions for the structure were suggested and adopted (see attached combined suggestion for review). 3) Agreed to review and close on structure the content will be done later. 4) Agreed Action Items: a. AI 1: Amit K (Mellanox) - send update/fixed spread-sheet. b. AI 2: Jeremy B (Qlogic) - send update/fixed spread-sheet. c. AI 3: Nimrod G. (Mellanox) - send combined suggestion for review. We agreed to review the above via e-mails before the next meeting. Follow-up meeting scheduled for 21st February 2007 8:30am PDT=11:30am EDT=6:30pm Israel. Nimrod Gindi Mellanox Technologies Ltd. mail : nimrodg at mellanox.com Cell : +1-408-750-4801 Office: +1-347-342-0011 Fax : +1-212-987-0275 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OFED testing report format rev1.xls Type: application/vnd.ms-excel Size: 45056 bytes Desc: OFED testing report format rev1.xls URL: From rowland at cse.ohio-state.edu Wed Feb 14 20:31:32 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Wed, 14 Feb 2007 23:31:32 -0500 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> Message-ID: <45D3E224.9060306@cse.ohio-state.edu> Roland Dreier wrote: > > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is > > built, at least by looking at the .so file result: > > > > [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a > > libibverbs.so > > libibverbs.so.1 > > libibverbs.so.1.0.0 > > The soname hasn't changed because the library is still compatible. > But (I hope at least) OFED has libibverbs 1.1. The soname is libibverbs.so.1, so I guess the longer name would not matter anyway. Clearly, what I posted shows the IBVERBS 1.1 ABI is there. I think I have figured out why our code has this problem. The problem below is similar to the original one posted about. I did some experimentation with the srq_pingpong libibverbs example code. First I built it directly with: gcc -g -c pingpong.c -I/usr/local/ofed/include gcc -g -c -D_GNU_SOURCE srq_pingpong.c -I/usr/local/ofed/include gcc -g -o srq_pingpong srq_pingpong.o pingpong.o -L/usr/local/ofed/lib64 -libverbs This works. Next I copied srq_pingpong.c to two files: srq_pingpong_rowland.c - just has a main function that calls lib_start(). srq_pingpong_lib_rowland.c - main() changed to lib_start(). This moves all of the SRQ pingpong code into a shared library. If I build this shared library in this way, it works: gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c -I/usr/local/ofed/include gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so srq_pingpong_lib_rowland.o pingpong.o -L/usr/local/ofed/lib64 -libverbs gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD -lsrqtest Above I am linking libibverbs directly into my libsrqtest.so library. This works and the IBVERBS 1.1 ABI is clearly in the libsrqtest.so file: [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head U ibv_ack_cq_events@@IBVERBS_1.1 U ibv_alloc_pd@@IBVERBS_1.1 U ibv_close_device@@IBVERBS_1.1 U ibv_create_comp_channel@@IBVERBS_1.0 U ibv_create_cq@@IBVERBS_1.1 U ibv_create_qp@@IBVERBS_1.1 U ibv_create_srq@@IBVERBS_1.1 U ibv_dealloc_pd@@IBVERBS_1.1 U ibv_dereg_mr@@IBVERBS_1.1 U ibv_destroy_comp_channel@@IBVERBS_1.0 However, if I build in a similar way to MVAPICH2, the resulting program fails: gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c -I/usr/local/ofed/include gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so srq_pingpong_lib_rowland.o pingpong.o gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD -L/usr/local/ofed/lib64 -lsrqtest -libverbs Above I am not linking libibverbs into libsrqtest.so, thus it is required on the last gcc line. This is how MVAPICH2's libmpich.so file works, and from past experience, I've seen this before. Running shows: [rowland at z1 ibverbs-examples]$ gdb ./srq_pingpong_rowland GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) r Starting program: /home/7/rowland/z1-test/ibverbs-examples/srq_pingpong_rowland [Thread debugging using libthread_db enabled] [New Thread 182896403968 (LWP 29858)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 182896403968 (LWP 29858)] post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, bad_wr=0x7fbfff88c8) at src/compat-1_0.c:312 312 src/compat-1_0.c: No such file or directory. in src/compat-1_0.c (gdb) bt #0 post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, bad_wr=0x7fbfff88c8) at src/compat-1_0.c:312 #1 0x0000002a95559e12 in ibv_post_srq_recv (srq=0x5075b0, recv_wr=0x7fbfff88d0, bad_recv_wr=0x7fbfff88c8) at /usr/local/ofed/include/infiniband/verbs.h:915 #2 0x0000002a95559dcf in pp_post_recv (ctx=0x5023d0, n=500) at srq_pingpong_lib_rowland.c:496 #3 0x0000002a9555a614 in lib_start (argc=1, argv=0x7fbffff7f8) at srq_pingpong_lib_rowland.c:696 #4 0x0000000000400608 in main (argc=1, argv=0x7fbffff7f8) at srq_pingpong_rowland.c:36 (gdb) quit It is not clear to me why the difference of either linking libibverbs into libsrqtest.so or not doing so causes the IBVERBS 1.1 ABI to be used or not. I looked at the libibverbs code, and the 1.1 ABI is the default. The libsrqtest.so file in the above case seems to have lost this information: [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head U ibv_ack_cq_events U ibv_alloc_pd U ibv_close_device U ibv_create_comp_channel U ibv_create_cq U ibv_create_qp U ibv_create_srq U ibv_dealloc_pd U ibv_dereg_mr U ibv_destroy_comp_channel I've never had to deal with an ABI issue like this in shared library linking/usage. Does it make sense for this to be the case? I think perhaps it does, but I wanted to ask. I've placed my test code here if it helps: http://www.cse.ohio-state.edu/~rowland/ibverbs-examples.tar.gz I have a fix for our code that I am testing now. It seems to work and solve the observed problems, but more testing will be required to be sure there are no issues. This will require a new SRPM if the fix is required, which it seems at this point. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From halr at voltaire.com Wed Feb 14 20:47:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Feb 2007 23:47:20 -0500 Subject: [openib-general] GetTable path record query not returningDGID=SGID paths In-Reply-To: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com> References: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com> Message-ID: <1171514817.22446.145890.camel@hal.voltaire.com> On Wed, 2007-02-14 at 18:42, Sean Hefty wrote: > >What is the value of NumbPath and how large a subnet is this ? I'm > >pretty sure this works; at least it did the last I checked. > > By default, NumbPath should be 127, but I would have expected a path record even > with it set to 1. Yes, you should be getting a PathRecord or more. Are you getting some error returned instead ? > (I don't think we were using different PKeys or anything like that.) Were partitions (other than full default) being used ? > We haven't looked into this in more detail yet. This was our observation while > testing on a larger (64 node) cluster this morning that we don't have access to > at the moment. With the local SA cache running, we were surprised to see any > retries, and when we looked into it more, retries were always for loopback > connections. > > Let me look into this more on the host stack side. OK; thanks. -- Hal > - Sean From krkumar2 at in.ibm.com Wed Feb 14 20:51:01 2007 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Thu, 15 Feb 2007 10:21:01 +0530 Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()] In-Reply-To: Message-ID: Steve/Tom, I tested with rdma_bw and also introduced some failures "randomly" in handlers, and the tests ran without any problems. Acked-by: Krishna Kumar thanks, - KK > From: Steve Wise > To: Tom Tucker > Cc: Roland Dreier , openib-general at openib.org > Subject: Re: [openib-general] [PATCH] RDMA/iwcm: Bugs in > cm_conn_req_handler() > Date: Sat, 10 Feb 2007 15:26:35 -0600 > > On Sat, 2007-02-10 at 14:36 -0600, Steve Wise wrote: > > ugh. > > > > There is at least one bug in this patch. I cannot call iw_cm_reject() > > inside destroy_cm_id() because both functions grab the iw_cm lock... > > > > > > This patch puts the iw_cm_reject() calls back in > cm_conn_req_handler()... > > > --- > > iw_cm_id destruction race condition fixes. > > From: Steve Wise > > Several changes: > > - iwcm_deref_id() always wakes up if there's another reference. > > - clean up race condition in cm_work_handler(). > > - create static void free_cm_id() which deallocs the work entries and then > kfrees the cm_id memory. This reduces code replication. > > - rem_ref() if this is the last reference -and- the IWCM owns freeing the > cm_id, then free it. > > Signed-off-by: Steve Wise > Signed-off-by: Tom Tucker > --- > > drivers/infiniband/core/iwcm.c | 47 > +++++++++++++++++++++------------------- > 1 files changed, 25 insertions(+), 22 deletions(-) > > diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c > index 1039ad5..891d1fa 100644 > --- a/drivers/infiniband/core/iwcm.c > +++ b/drivers/infiniband/core/iwcm.c > @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c > return 0; > } > > +static void free_cm_id(struct iwcm_id_private *cm_id_priv) > +{ > + dealloc_work_entries(cm_id_priv); > + kfree(cm_id_priv); > +} > + > /* > * Release a reference on cm_id. If the last reference is being > * released, enable the waiting thread (in iw_destroy_cm_id) to > @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c > */ > static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) > { > - int ret = 0; > - > BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > if (atomic_dec_and_test(&cm_id_priv->refcount)) { > BUG_ON(!list_empty(&cm_id_priv->work_list)); > - if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { > - BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); > - BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, > - &cm_id_priv->flags)); > - ret = 1; > - } > complete(&cm_id_priv->destroy_comp); > + return 1; > } > > - return ret; > + return 0; > } > > static void add_ref(struct iw_cm_id *cm_id) > @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_ > { > struct iwcm_id_private *cm_id_priv; > cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > - iwcm_deref_id(cm_id_priv); > + if (iwcm_deref_id(cm_id_priv) && > + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { > + BUG_ON(!list_empty(&cm_id_priv->work_list)); > + free_cm_id(cm_id_priv); > + } > } > > static int cm_event_handler(struct iw_cm_id *cm_id, struct > iw_cm_event *event); > @@ -355,7 +358,9 @@ static void destroy_cm_id(struct iw_cm_i > case IW_CM_STATE_CONN_RECV: > /* > * App called destroy before/without calling accept after > - * receiving connection request event notification. > + * receiving connection request event notification or > + * returned non zero from the event callback function. > + * In either case, must tell the provider to reject. > */ > cm_id_priv->state = IW_CM_STATE_DESTROYING; > break; > @@ -391,9 +396,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c > > wait_for_completion(&cm_id_priv->destroy_comp); > > - dealloc_work_entries(cm_id_priv); > - > - kfree(cm_id_priv); > + free_cm_id(cm_id_priv); > } > EXPORT_SYMBOL(iw_destroy_cm_id); > > @@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i > /* Call the client CM handler */ > ret = cm_id->cm_handler(cm_id, iw_event); > if (ret) { > + iw_cm_reject(cm_id, NULL, 0); > set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); > destroy_cm_id(cm_id); > if (atomic_read(&cm_id_priv->refcount)==0) > - kfree(cm_id); > + free_cm_id(cm_id_priv); > } > > out: > @@ -854,13 +858,12 @@ static void cm_work_handler(struct work_ > destroy_cm_id(&cm_id_priv->id); > } > BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > - if (iwcm_deref_id(cm_id_priv)) > - return; > - > - if (atomic_read(&cm_id_priv->refcount)==0 && > - test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { > - dealloc_work_entries(cm_id_priv); > - kfree(cm_id_priv); > + if (iwcm_deref_id(cm_id_priv)) { > + if (test_bit(IWCM_F_CALLBACK_DESTROY, > + &cm_id_priv->flags)) { > + BUG_ON(!list_empty(&cm_id_priv->work_list)); > + free_cm_id(cm_id_priv); > + } > return; > } > spin_lock_irqsave(&cm_id_priv->lock, flags); > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From devesh28 at gmail.com Wed Feb 14 21:37:04 2007 From: devesh28 at gmail.com (Devesh Sharma) Date: Thu, 15 Feb 2007 11:07:04 +0530 Subject: [openib-general] Immediate data question In-Reply-To: <6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com> <6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com> Message-ID: <309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.com> On 2/14/07, Michael Krause wrote: > At 05:37 AM 2/13/2007, Devesh Sharma wrote: > >On 2/12/07, Devesh Sharma wrote: > >>On 2/10/07, Tang, Changqing wrote: > >> > > > > >> > > >Not for the receiver, but the sender will be severely slowed down by > >> > > >having to wait for the RNR timeouts. > >> > > > >> > > RNR = Receiver Not Ready so by definition, the data flow > >> > > isn't going to > >> > > progress until the receiver is ready to receive data. If a > >> > > receive QP > >> > > enters RNR for a RC, then it is likely not progressing as > >> > > desired. RNR > >> > > was initially put in place to enable a receiver to create > >> > > back pressure to the sender without causing a fatal error > >> > > condition. It should rarely be entered and therefore should > >> > > have negligible impact on overall performance however when a > >> > > RNR occurs, no forward progress will occur so performance is > >> > > essentially zero. > >> > > >> > Mike: > >> > I still do not quite understand this issue. I have two > >> > situations that have RNR triggered. > >> > > >> > 1. process A and process B is connected with QP. A first post a send to > >> > B, B does not post receive. Then A and B are doing a long time > >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE > >> > message. Finally B will post a receive. Does the first pending send in A > >> > block all the later RDMA_WRITE ? > >>According to IBTA spec HCA will process WR entries in strict order in > >>which they are posted so the send will block all WR posted after this > >>send, Until-unless HCA has multiple processing elements, I think even > >>then processing order will be maintained by HCA > >> If not, since RNR is triggered > >> > periodically till B post receive, does it affect the RDMA_WRITE > >> > performance between A and B ? > >> > > >> > 2. extend above to three processes, A connect to B, B connect to C, so B > >> > has two QPs, but one CQ.A posts a send to B, B does not post receive, > >post ordering accross QP is not guaranteed hence presence of same CQ > >or different CQ will not affect any thing. > >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B > >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C, I am sorry I have missed that in both cases same DMA channel is in use. > >_may_ affect the performance, since load is on same HCA. In case of > >Send/Recv again _may_ affect the performance, with the same reason. > > Seems orthogonal. Any time h/w is shared, multiple flows will have an > impact on one another. That is why we have the different arbitration > mechanisms to enable one to control that impact. Please, can you explain it more clearly? > > >> > must sends RNR periodically to A, right?. So does the pending message > >> > from A affects B's overall performance between B and C ? > >But RNR NAK is not for very long time.....possibly this performance > >hit you will not be able to observe even. The moment rnr_counter > >expires connection will be broken! > > Keep in mind the timeout can be infinite. RNR NAK are not expected to be > frequent so their performance impact was considered reasonable. Thanks I missed that. > > Mike > > >> > > >> > Thank you. > >> > > >> > --CQ > >> > > >> > > >> > > > >> > > Mike > >> > > > >> > > > >> > > > >> > > >> > _______________________________________________ > >> > openib-general mailing list > >> > openib-general at openib.org > >> > http://openib.org/mailman/listinfo/openib-general > >> > > >> > To unsubscribe, please visit > >> http://openib.org/mailman/listinfo/openib-general > >> > > >> > > > > From mst at mellanox.co.il Wed Feb 14 21:57:51 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Feb 2007 07:57:51 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <1171477762.3161.105.camel@fc6.xsintricity.com> References: <1171477762.3161.105.camel@fc6.xsintricity.com> Message-ID: <20070215055751.GA11866@mellanox.co.il> > Quoting Doug Ledford : > Subject: Re: [openib-general] 32-bit build for ppc64 is required > > On Wed, 2007-02-14 at 16:29 +0200, Michael S. Tsirkin wrote: > > > Quoting Stefan Roscher : > > > Subject: Re: 32-bit build for ppc64 is required > > > > > > On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote: > > > > > Quoting Stefan Roscher : > > > > > Subject: 32-bit build for ppc64 is required > > > > > > > > > > Hi, > > > > > > > > > > after building the latest ofed build package we recognized that on PPC64 only > > > > > 64-bit libaries were build. > > > > > Because we have customers using older userpace apllications which are > > > > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds. > > > > > > > > > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install > > > > > directory.I would suggest to install 32-bit binaries into > > > > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions > > > > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory. > > > > > > > > The standard practice is to install 64 bit libraries under prefix/lib64 > > > > and 32 bit libraries under prefix/lib. Why would PPC64 be any different? > > > > > > I think you missunderstand my post. The directory for 32/64bit libaries > > > shouldbe prefix/lib and prefix/lib64 respectively. > > > But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only. > > > > Well, this is not by design: AFAIK on x86_64 both types of libraries > > are installed. > > > > > > I do not think we need 32 bit binaries at all, and there's no other package > > > > I'm aware of that uses "bin32". > > > > > > We have customers that still use 32-bit userspace applications. > > > It would be beneficial for them if they can obtain 32bit libs and execs from > > > ofed1.2 in order to run their applications without recompiling them, because > > > for some 32-bit applications recompiling is not an option. > > > > 32 bit libraries are needed for users to run 32 applications. > > > > But I still do not see how installing 32 bit binaries alongside the 64 > > bit ones is useful, and I do not think other packages provide this option, > > so maybe we shouldn't, either. > > The choice of 32/64 bit default is done on a per arch basis. With > x86_64/i386, the increased number of CPU registers in 64bit mode > outweighs the increased code bloat that goes along with 64bit mode. On > PPC, no such register benefit exists for 64bit mode. As such, 32bit > apps on PPC are faster than the equivalent 64bit apps up to the point at > which a 4GB address space becomes a problem. Correspondingly, the > default binaries on PPC are 32bit, and only those that *need* to be > 64bit are. While a customer's application may need >4GB address space, > certainly all the ibutils, diags, opensm, etc. do not. As a result, we > compile all of those utilities as 32bit by default on PPC. We also ship > all the libs as both 32/64bit so users can select the appropriate > environment for their particular application (with the exception of > dapl, which doesn't support 32bit and for which I filed a bug around the > time of OFED 1.1). So, what you suggest is - build 2 types of libraries, but on PPC make binaries 32 bit? That's easy - do others agree to this approach? Another option is to build binaries with whatever type of binary gcc without extra flags generates by default. -- MST From mst at mellanox.co.il Wed Feb 14 21:58:58 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Feb 2007 07:58:58 +0200 Subject: [openib-general] [PATCH] IPoIB: Only allow root to change between datagram and connected mode In-Reply-To: References: Message-ID: <20070215055858.GB11866@mellanox.co.il> > Quoting Roland Dreier : > Subject: [PATCH] IPoIB: Only allow root to change between datagram and connected mode > > Change the permissions of the "mode" sysfs attribute to be S_IWUSR > instead of S_IWUGO. > > Signed-off-by: Roland Dreier > --- > FYI -- I'm planning to merge this for 2.6.21. It doesn't seem > appropriate to allow ordinary users to mess with this sort of config. Acked-by: Michael S. Tsirkin -- MST From erezz at voltaire.com Wed Feb 14 22:33:13 2007 From: erezz at voltaire.com (Erez Zilber) Date: Thu, 15 Feb 2007 08:33:13 +0200 Subject: [openib-general] OFED 1.2 alpha release In-Reply-To: <45D337E2.200@mellanox.co.il> References: <45D337E2.200@mellanox.co.il> Message-ID: <45D3FEA9.9020802@voltaire.com> Tziporet Koren wrote: > Hi, > > In two weeks delay we publish OFED 1.2-alpha1 on > http://www.openfabrics.org/builds/ofed-1.2/ > File: OFED-1.2-alpha1.tgz > BUILD_ID contains info on all packages sources location. > > Please report any issues in bugzilla https://bugs.openfabrics.org/ > > Tziporet & Vlad > > *_OS support:_* > Novell: > - SLES 9.0 SP3 > - SLES10 > Redhat: > - Redhat EL4 up4 > - Redhat EL5 beta2 (only partially tested) > kernel.org: > - 2.6.20 > - 2.6.19 > > Note: Redhat EL4 up3, Fedora C4, Fedora C6 and SuSE Pro 10 are not > part of the official list. > We keep the backport patches for these OSes and make sure OFED compile > and loaded properly but will not do full QA cycle. > > _*Systems:*_ > * x86_64 > * x86 > * ia64 > * ppc64 (have not tested user space) > > _*Main changes from OFED-1.1:*_ > > 1. iWRAP is now supported with Chelsio T3 > 2. New kernel modules: VNIC, RDS, Bonding, SA cache, > 3. New packages: MVAPICH2 > 4. IPoIB Connected mode > 5. Multicast join from user space > 6. libibverbs 1.1 > 7. OpenSM new routing models: FAT tree routing and Taurus routing > 8. GUI tool for network diagnostic > 9. New MPI releases: MVAPICH: version 0.9.9, Open MPI: version 1.2, > MVAPICH2: version 0.9.8 > > Detailed list of changes can be found in: > https://wiki.openfabrics.org/tiki-index.php?page=OFED+1.2+release+plan+and+features > > _*Limitations and known issues:*_ > > 1. ipath driver compilation fails on all systems, except for kernel > 2.6.20 > 2. libipathverbs is not working with libibverbs 1.1 > 3. SDP netstat does not available on RHEL5 (due to compilation errors) > 4. Routing table problem in SLES10 when using port #2 > 5. RDS compiles only on kernel 2.6.18/19/20 > 6. MVAPICH2 installation fails on SuSE Pro 10. > 7. mstflint is not working on ppc64 > 8. RDS was not tested > One more limitation - open-iscsi over iSER is currently supported on SLES 10, RHEL5 beta 2 & 2.6.20 (naturally). Other distros/kernels are not supported. Erez From ogerlitz at voltaire.com Wed Feb 14 23:03:41 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 15 Feb 2007 09:03:41 +0200 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <45CEFCA8.4000008@voltaire.com> References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com> <45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com> <45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com> <45CEFCA8.4000008@voltaire.com> Message-ID: <45D405CD.9020606@voltaire.com> Or Gerlitz wrote: > Roland Dreier wrote: >> I merged the "increment port number" and "remove redundant '_wq'" >> patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland >> >> I plan to review to multicast stuff next week and I hope to merge it >> for 2.6.21. Or, have you or anyone else at Voltaire read over the >> code in addition to using it? Do you see anything that should be >> cleaned up? > > OK, I spent some time today on reviewing and playing with the ib_sa: > track multicast join/leave requests patch - and have no special > comments. I think the two patches are ready for merge, let me know if > you have any specific question. Roland - any progress here? Or. From mst at mellanox.co.il Wed Feb 14 23:15:37 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Feb 2007 09:15:37 +0200 Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.5.0 Message-ID: <20070215071537.GD11866@mellanox.co.il> FYI. I suggest we update git on the openfabrics server to 1.5.0: "Detached HEAD" feature will be useful for nightly build scripts. Sasha? ----- Forwarded message from Junio C Hamano ----- Subject: [ANNOUNCE] GIT 1.5.0 Date: Wed, 14 Feb 2007 05:14:16 +0200 From: Junio C Hamano The latest feature release GIT 1.5.0 is available at the usual places: http://www.kernel.org/pub/software/scm/git/ git-1.5.0.tar.{gz,bz2} (tarball) git-htmldocs-1.5.0.tar.{gz,bz2} (preformatted docs) git-manpages-1.5.0.tar.{gz,bz2} (preformatted docs) RPMS/$arch/git-*-1.5.0-1.$arch.rpm (RPM) ---------------------------------------------------------------- GIT v1.5.0 Release Notes ======================== Old news -------- This section is for people who are upgrading from ancient versions of git. Although all of the changes in this section happened before the current v1.4.4 release, they are summarized here in the v1.5.0 release notes for people who skipped earlier versions. As of git v1.5.0 there are some optional features that changes the repository to allow data to be stored and transferred more efficiently. These features are not enabled by default, as they will make the repository unusable with older versions of git. Specifically, the available options are: - There is a configuration variable core.legacyheaders that changes the format of loose objects so that they are more efficient to pack and to send out of the repository over git native protocol, since v1.4.2. However, loose objects written in the new format cannot be read by git older than that version; people fetching from your repository using older clients over dumb transports (e.g. http) using older versions of git will also be affected. - Since v1.4.3, configuration repack.usedeltabaseoffset allows packfile to be created in more space efficient format, which cannot be read by git older than that version. The above two are not enabled by default and you explicitly have to ask for them, because these two features make repositories unreadable by older versions of git, and in v1.5.0 we still do not enable them by default for the same reason. We will change this default probably 1 year after 1.4.2's release, when it is reasonable to expect everybody to have new enough version of git. - 'git pack-refs' appeared in v1.4.4; this command allows tags to be accessed much more efficiently than the traditional 'one-file-per-tag' format. Older git-native clients can still fetch from a repository that packed and pruned refs (the server side needs to run the up-to-date version of git), but older dumb transports cannot. Packing of refs is done by an explicit user action, either by use of "git pack-refs --prune" command or by use of "git gc" command. - 'git -p' to paginate anything -- many commands do pagination by default on a tty. Introduced between v1.4.1 and v1.4.2; this may surprise old timers. - 'git archive' superseded 'git tar-tree' in v1.4.3; - 'git cvsserver' was new invention in v1.3.0; - 'git repo-config', 'git grep', 'git rebase' and 'gitk' were seriously enhanced during v1.4.0 timeperiod. - 'gitweb' became part of git.git during v1.4.0 timeperiod and seriously modified since then. - reflog is an v1.4.0 invention. This allows you to name a revision that a branch used to be at (e.g. "git diff master@{yesterday} master" allows you to see changes since yesterday's tip of the branch). Updates in v1.5.0 since v1.4.4 series ------------------------------------- * Index manipulation - git-add is to add contents to the index (aka "staging area" for the next commit), whether the file the contents happen to be is an existing one or a newly created one. - git-add without any argument does not add everything anymore. Use 'git-add .' instead. Also you can add otherwise ignored files with an -f option. - git-add tries to be more friendly to users by offering an interactive mode ("git-add -i"). - git-commit used to refuse to commit if was different between HEAD and the index (i.e. update-index was used on it earlier). This check was removed. - git-rm is much saner and safer. It is used to remove paths from both the index file and the working tree, and makes sure you are not losing any local modification before doing so. - git-reset ... can be used to revert index entries for selected paths. - git-update-index is much less visible. Many suggestions to use the command in git output and documentation have now been replaced by simpler commands such as "git add" or "git rm". * Repository layout and objects transfer - The data for origin repository is stored in the configuration file $GIT_DIR/config, not in $GIT_DIR/remotes/, for newly created clones. The latter is still supported and there is no need to convert your existing repository if you are already comfortable with your workflow with the layout. - git-clone always uses what is known as "separate remote" layout for a newly created repository with a working tree. A repository with the separate remote layout starts with only one default branch, 'master', to be used for your own development. Unlike the traditional layout that copied all the upstream branches into your branch namespace (while renaming their 'master' to your 'origin'), the new layout puts upstream branches into local "remote-tracking branches" with their own namespace. These can be referenced with names such as "origin/$upstream_branch_name" and are stored in .git/refs/remotes rather than .git/refs/heads where normal branches are stored. This layout keeps your own branch namespace less cluttered, avoids name collision with your upstream, makes it possible to automatically track new branches created at the remote after you clone from it, and makes it easier to interact with more than one remote repository (you can use "git remote" to add other repositories to track). There might be some surprises: * 'git branch' does not show the remote tracking branches. It only lists your own branches. Use '-r' option to view the tracking branches. * If you are forking off of a branch obtained from the upstream, you would have done something like 'git branch my-next next', because traditional layout dropped the tracking branch 'next' into your own branch namespace. With the separate remote layout, you say 'git branch next origin/next', which allows you to use the matching name 'next' for your own branch. It also allows you to track a remote other than 'origin' (i.e. where you initially cloned from) and fork off of a branch from there the same way (e.g. "git branch mingw j6t/master"). Repositories initialized with the traditional layout continue to work. - New branches that appear on the origin side after a clone is made are also tracked automatically. This is done with an wildcard refspec "refs/heads/*:refs/remotes/origin/*", which older git does not understand, so if you clone with 1.5.0, you would need to downgrade remote.*.fetch in the configuration file to specify each branch you are interested in individually if you plan to fetch into the repository with older versions of git (but why would you?). - Similarly, wildcard refspec "refs/heads/*:refs/remotes/me/*" can be given to "git-push" command to update the tracking branches that is used to track the repository you are pushing from on the remote side. - git-branch and git-show-branch know remote tracking branches (use the command line switch "-r" to list only tracked branches). - git-push can now be used to delete a remote branch or a tag. This requires the updated git on the remote side (use "git push :refs/heads/" to delete "branch"). - git-push more aggressively keeps the transferred objects packed. Earlier we recommended to monitor amount of loose objects and repack regularly, but you should repack when you accumulated too many small packs this way as well. Updated git-count-objects helps you with this. - git-fetch also more aggressively keeps the transferred objects packed. This behavior of git-push and git-fetch can be tweaked with a single configuration transfer.unpacklimit (but usually there should not be any need for a user to tweak it). - A new command, git-remote, can help you manage your remote tracking branch definitions. - You may need to specify explicit paths for upload-pack and/or receive-pack due to your ssh daemon configuration on the other end. This can now be done via remote.*.uploadpack and remote.*.receivepack configuration. * Bare repositories - Certain commands change their behavior in a bare repository (i.e. a repository without associated working tree). We use a fairly conservative heuristic (if $GIT_DIR is ".git", or ends with "/.git", the repository is not bare) to decide if a repository is bare, but "core.bare" configuration variable can be used to override the heuristic when it misidentifies your repository. - git-fetch used to complain updating the current branch but this is now allowed for a bare repository. So is the use of 'git-branch -f' to update the current branch. - Porcelain-ish commands that require a working tree refuses to work in a bare repository. * Reflog - Reflog records the history from the view point of the local repository. In other words, regardless of the real history, the reflog shows the history as seen by one particular repository (this enables you to ask "what was the current revision in _this_ repository, yesterday at 1pm?"). This facility is enabled by default for repositories with working trees, and can be accessed with the "branch@{time}" and "branch@{Nth}" notation. - "git show-branch" learned showing the reflog data with the new -g option. "git log" has -s option to view reflog entries in a more verbose manner. - git-branch knows how to rename branches and moves existing reflog data from the old branch to the new one. - In addition to the reflog support in v1.4.4 series, HEAD reference maintains its own log. "HEAD@{5.minutes.ago}" means the commit you were at 5 minutes ago, which takes branch switching into account. If you want to know where the tip of your current branch was at 5 minutes ago, you need to explicitly say its name (e.g. "master@{5.minutes.ago}") or omit the refname altogether i.e. "@{5.minutes.ago}". - The commits referred to by reflog entries are now protected against pruning. The new command "git reflog expire" can be used to truncate older reflog entries and entries that refer to commits that have been pruned away previously with older versions of git. Existing repositories that have been using reflog may get complaints from fsck-objects and may not be able to run git-repack, if you had run git-prune from older git; please run "git reflog expire --stale-fix --all" first to remove reflog entries that refer to commits that are no longer in the repository when that happens. * Crufts removal - We used to say "old commits are retrievable using reflog and 'master@{yesterday}' syntax as long as you haven't run git-prune". We no longer have to say the latter half of the above sentence, as git-prune does not remove things reachable from reflog entries. - 'git-prune' by default does not remove _everything_ unreachable, as there is a one-day grace period built-in. - There is a toplevel garbage collector script, 'git-gc', that runs periodic cleanup functions, including 'git-repack -a -d', 'git-reflog expire', 'git-pack-refs --prune', and 'git-rerere gc'. - The output from fsck ("fsck-objects" is called just "fsck" now, but the old name continues to work) was needlessly alarming in that it warned missing objects that are reachable only from dangling objects. This has been corrected and the output is much more useful. * Detached HEAD - You can use 'git-checkout' to check out an arbitrary revision or a tag as well, instead of named branches. This will dissociate your HEAD from the branch you are currently on. A typical use of this feature is to "look around". E.g. $ git checkout v2.6.16 ... compile, test, etc. $ git checkout v2.6.17 ... compile, test, etc. - After detaching your HEAD, you can go back to an existing branch with usual "git checkout $branch". Also you can start a new branch using "git checkout -b $newbranch" to start a new branch at that commit. - You can even pull from other repositories, make merges and commits while your HEAD is detached. Also you can use "git reset" to jump to arbitrary commit, while still keeping your HEAD detached. Going back to attached state (i.e. on a particular branch) by "git checkout $branch" can lose the current stat you arrived in these ways, and "git checkout" refuses when the detached HEAD is not pointed by any existing ref (an existing branch, a remote tracking branch or a tag). This safety can be overridden with "git checkout -f $branch". * Packed refs - Repositories with hundreds of tags have been paying large overhead, both in storage and in runtime, due to the traditional one-ref-per-file format. A new command, git-pack-refs, can be used to "pack" them in more efficient representation (you can let git-gc do this for you). - Clones and fetches over dumb transports are now aware of packed refs and can download from repositories that use them. * Configuration - configuration related to color setting are consolidated under color.* namespace (older diff.color.*, status.color.* are still supported). - 'git-repo-config' command is accessible as 'git-config' now. * Updated features - git-describe uses better criteria to pick a base ref. It used to pick the one with the newest timestamp, but now it picks the one that is topologically the closest (that is, among ancestors of commit C, the ref T that has the shortest output from "git-rev-list T..C" is chosen). - git-describe gives the number of commits since the base ref between the refname and the hash suffix. E.g. the commit one before v2.6.20-rc6 in the kernel repository is: v2.6.20-rc5-306-ga21b069 which tells you that its object name begins with a21b069, v2.6.20-rc5 is an ancestor of it (meaning, the commit contains everything -rc5 has), and there are 306 commits since v2.6.20-rc5. - git-describe with --abbrev=0 can be used to show only the name of the base ref. - git-blame learned a new option, --incremental, that tells it to output the blames as they are assigned. A sample script to use it is also included as contrib/blameview. - git-blame starts annotating from the working tree by default. * Less external dependency - We no longer require the "merge" program from the RCS suite. All 3-way file-level merges are now done internally. - The original implementation of git-merge-recursive which was in Python has been removed; we have a C implementation of it now. - git-shortlog is no longer a Perl script. It no longer requires output piped from git-log; it can accept revision parameters directly on the command line. * I18n - We have always encouraged the commit message to be encoded in UTF-8, but the users are allowed to use legacy encoding as appropriate for their projects. This will continue to be the case. However, a non UTF-8 commit encoding _must_ be explicitly set with i18n.commitencoding in the repository where a commit is made; otherwise git-commit-tree will complain if the log message does not look like a valid UTF-8 string. - The value of i18n.commitencoding in the originating repository is recorded in the commit object on the "encoding" header, if it is not UTF-8. git-log and friends notice this, and reencodes the message to the log output encoding when displaying, if they are different. The log output encoding is determined by "git log --encoding=", i18n.logoutputencoding configuration, or i18n.commitencoding configuration, in the decreasing order of preference, and defaults to UTF-8. - Tools for e-mailed patch application now default to -u behavior; i.e. it always re-codes from the e-mailed encoding to the encoding specified with i18n.commitencoding. This unfortunately forces projects that have happily been using a legacy encoding without setting i18n.commitencoding to set the configuration, but taken with other improvement, please excuse us for this very minor one-time inconvenience. * e-mailed patches - See the above I18n section. - git-format-patch now enables --binary without being asked. git-am does _not_ default to it, as sending binary patch via e-mail is unusual and is harder to review than textual patches and it is prudent to require the person who is applying the patch to explicitly ask for it. - The default suffix for git-format-patch output is now ".patch", not ".txt". This can be changed with --suffix=.txt option, or setting the config variable "format.suffix" to ".txt". * Foreign SCM interfaces - git-svn now requires the Perl SVN:: libraries, the command-line backend was too slow and limited. - the 'commit' subcommand of git-svn has been renamed to 'set-tree', and 'dcommit' is the recommended replacement for day-to-day work. - git fast-import backend. * User support - Quite a lot of documentation updates. - Bash completion scripts have been updated heavily. - Better error messages for often used Porcelainish commands. - Git GUI. This is a simple Tk based graphical interface for common Git operations. * Sliding mmap - We used to assume that we can mmap the whole packfile while in use, but with a large project this consumes huge virtual memory space and truly huge ones would not fit in the userland address space on 32-bit platforms. We now mmap huge packfile in pieces to avoid this problem. * Shallow clones - There is a partial support for 'shallow' repositories that keeps only recent history. A 'shallow clone' is created by specifying how deep that truncated history should be (e.g. "git clone --depth=5 git://some.where/repo.git"). Currently a shallow repository has number of limitations: - Cloning and fetching _from_ a shallow clone are not supported (nor tested -- so they might work by accident but they are not expected to). - Pushing from nor into a shallow clone are not expected to work. - Merging inside a shallow repository would work as long as a merge base is found in the recent history, but otherwise it will be like merging unrelated histories and may result in huge conflicts. but this would be more than adequate for people who want to look at near the tip of a big project with a deep history and send patches in e-mail format. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ----- End forwarded message ----- -- MST From HNGUYEN at de.ibm.com Wed Feb 14 23:40:09 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 08:40:09 +0100 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070215055751.GA11866@mellanox.co.il> Message-ID: > So, what you suggest is - build 2 types of libraries, but on PPC make > binaries 32 bit? That's easy - do others agree to this approach? No, for execs please create 32bit and 64bit on PPC. > Another option is to build binaries with whatever type of binary > gcc without extra flags generates by default. On PPC we really need the ability to build both versions. The reason is simply that there're customers who want them. Why don't offer both options, and each component owner can decide her/his default? And the customers can pick the one(s) they like. I see your point regarding QA effort. Is it really twice? My assumption might be wrong: I guess we have to assure/test the 32/64bit compatibility anyway eg. 32bit client talks to 64bit server. If we have 32bit execs only for development resp. testing, why don't we also give them to customers in order to do basic test or diagnosis of their setup? Regards Nam From mst at mellanox.co.il Thu Feb 15 00:15:53 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Feb 2007 10:15:53 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: References: <20070215055751.GA11866@mellanox.co.il> Message-ID: <20070215081529.GG11866@mellanox.co.il> > Quoting Hoang-Nam Nguyen : > Subject: Re: 32-bit build for ppc64 is required > > > So, what you suggest is - build 2 types of libraries, but on PPC make > > binaries 32 bit? That's easy - do others agree to this approach? > No, for execs please create 32bit and 64bit on PPC. > > > Another option is to build binaries with whatever type of binary > > gcc without extra flags generates by default. > On PPC we really need the ability to build both versions. The reason is > simply that there're customers who want them. Why don't offer both > options, and each component owner can decide her/his default? I don't think this can be elegantly dealt with on a per-component basis. But if some component owner has an opinion on this, do speak up - note this affects binaries only, not libraries such as libehca. > And the > customers can pick the one(s) they like. > I see your point regarding QA effort. Is it really twice? Probably more - I'm reasonably sure most scripts written so far assume stuff is installed in prefix/bin, so testing harness etc would need to be changed. And how to make sure the *correct* set of binaries was actually QA'd? > My assumption > might be wrong: I guess we have to assure/test the 32/64bit compatibility > anyway eg. 32bit client talks to 64bit server. > If we have 32bit execs only for development resp. testing, why don't we > also give them to customers in order to do basic test or diagnosis of > their setup? Because of the confusion this would create. For shared libraries, the 32/64 bit issues seem to be automatically figured out by ld.so, but there's no such solution for binaries. -- MST From ogerlitz at voltaire.com Thu Feb 15 01:25:13 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 15 Feb 2007 11:25:13 +0200 Subject: [openib-general] IPv6 multicast address per NIC Message-ID: <45D426F9.6060807@voltaire.com> Hi, I see that when IPv6 is enabled in the kernel, the stack joins for a --dedicated-- multicast group per each interface. Can anyone here supply me with a pointer to where this is defined, doing a quick look on rfc 3307 did not provide an answer. Or. Below is the maddr show on a node with two partitions on ib0, note that the --pkey-- is not presented in the link addresses since IPoIB fill that in its own copy (i don't mind send a patch to fix that if anyone here think it is helpful). $ ip maddr show > 41: ib0 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > inet 224.0.0.1 > inet6 ff02::1:ff98:6d > inet6 ff02::1 > 45: ib0.8001 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > inet 224.0.0.1 > inet6 ff02::1:ff98:6d > inet6 ff02::1 > 46: ib0.8003 > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > inet 224.0.0.1 > inet6 ff02::1:ff98:6d > inet6 ff02::1 > From tziporet at mellanox.co.il Thu Feb 15 01:43:02 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 15 Feb 2007 11:43:02 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.2 alpha release In-Reply-To: References: <45D337E2.200@mellanox.co.il> Message-ID: <45D42B26.10709@mellanox.co.il> Scott Weitzenkamp (sweitzen) wrote: > I don't remember discussing dropping RHEL4 U3, and would like to add > it back to the official list. IPoIB multicast does not work correctly > (bug 266) in RHEL4 U4, thus RHEL4 U3 is the most recent working RHEL > release in this area (unless it has been fixed in U4 errata kernels). > The new ib-bonding RPM also says it only supports RHEL4 U3 for Red Hat > releases. Its OK with me to add RHEL4 U3 for the official list. All other partners - please approve Regarding RHEL4 U4 and IPoIB bug - Or just prepared a patch that should fix it. We will merge it and test for the beta. > > We should probably also plan for SLES10 SP1 support in OFED 1.2. > This is the plan. We still don't have the backport patches for this kernel but they should be added for the beta. Tziporet From vlad at lists.openfabrics.org Thu Feb 15 02:24:24 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 15 Feb 2007 02:24:24 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070215-0200 daily build status Message-ID: <20070215102425.2BAB0E6080F@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.13 Passed on powerpc with linux-2.6.19 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Failed: From HNGUYEN at de.ibm.com Thu Feb 15 03:41:51 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 12:41:51 +0100 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070215081529.GG11866@mellanox.co.il> Message-ID: > > And the > > customers can pick the one(s) they like. > > I see your point regarding QA effort. Is it really twice? > Probably more - I'm reasonably sure most scripts written so far > assume stuff is installed in prefix/bin, so testing harness etc > would need to be changed. > And how to make sure the *correct* set of binaries was actually > QA'd? As far as I understood each component owner is responsible for QA of her/his component and supported platforms. For PPC we cover that. > > If we have 32bit execs only for development resp. testing, why don't we > > also give them to customers in order to do basic test or diagnosis of > > their setup? > Because of the confusion this would create. > For shared libraries, the 32/64 bit issues seem to be automatically > figured out > by ld.so, but there's no such solution for binaries. This is true as we have discussed at ofed-1.1. See also http://openib.org/pipermail/openfabrics-ewg/2006-October/001831.html I agree with you in that if there is a standard for binaries dir struct let's go for it. If there is no such one, let's agree on one approach: either bin32 resp bin or appl resp appl64 or... To me it's worse if customers have to fix or write build scripts by themselves in order to build 32bit binaries. Nam From halr at voltaire.com Thu Feb 15 04:02:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Feb 2007 07:02:20 -0500 Subject: [openib-general] IPv6 multicast address per NIC In-Reply-To: <45D426F9.6060807@voltaire.com> References: <45D426F9.6060807@voltaire.com> Message-ID: <1171540918.22446.171829.camel@hal.voltaire.com> Or, On Thu, 2007-02-15 at 04:25, Or Gerlitz wrote: > Hi, > > I see that when IPv6 is enabled in the kernel, the stack joins for a > --dedicated-- multicast group per each interface. Can anyone here supply > me with a pointer to where this is defined, doing a quick look on rfc > 3307 did not provide an answer. You are referring to the solicited-node multicast address (see RFC 4291). There have been several different threads on issues relating to this on this list over time. -- Hal > Or. > > Below is the maddr show on a node with two partitions on ib0, note that > the --pkey-- is not presented in the link addresses since IPoIB fill > that in its own copy (i don't mind send a patch to fix that if anyone > here think it is helpful). > > $ ip maddr show > > > 41: ib0 > > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > > inet 224.0.0.1 > > inet6 ff02::1:ff98:6d > > inet6 ff02::1 > > 45: ib0.8001 > > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > > inet 224.0.0.1 > > inet6 ff02::1:ff98:6d > > inet6 ff02::1 > > 46: ib0.8003 > > link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 > > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d > > link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 > > inet 224.0.0.1 > > inet6 ff02::1:ff98:6d > > inet6 ff02::1 > > > From ogerlitz at voltaire.com Thu Feb 15 05:08:28 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 15 Feb 2007 15:08:28 +0200 Subject: [openib-general] IPv6 multicast address per NIC In-Reply-To: <1171540918.22446.171829.camel@hal.voltaire.com> References: <45D426F9.6060807@voltaire.com> <1171540918.22446.171829.camel@hal.voltaire.com> Message-ID: <45D45B4C.10702@voltaire.com> Hal Rosenstock wrote: > Or, > > On Thu, 2007-02-15 at 04:25, Or Gerlitz wrote: >> Hi, >> >> I see that when IPv6 is enabled in the kernel, the stack joins for a >> --dedicated-- multicast group per each interface. Can anyone here supply >> me with a pointer to where this is defined, doing a quick look on rfc >> 3307 did not provide an answer. > > You are referring to the solicited-node multicast address (see RFC > 4291). There have been several different threads on issues relating to > this on this list over time. thanks for the pointer, i will look into that. Or. From swise at opengridcomputing.com Thu Feb 15 06:09:36 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 08:09:36 -0600 Subject: [openib-general] [PATCH] 2.6.21 iwcm - iw_cm_id destruction race condition fixes. Message-ID: <1171548576.12187.2.camel@stevo-desktop> From: Steve Wise iwcm iw_cm_id destruction race condition fixes. Several changes: - iwcm_deref_id() always wakes up if there's another reference. - clean up race condition in cm_work_handler(). - create static void free_cm_id() which deallocs the work entries and then kfrees the cm_id memory. This reduces code replication. - rem_ref() if this is the last reference -and- the IWCM owns freeing the cm_id, then free it. Signed-off-by: Steve Wise Signed-off-by: Tom Tucker Acked-by: Krishna Kumar --- drivers/infiniband/core/iwcm.c | 47 +++++++++++++++++++++------------------- 1 files changed, 25 insertions(+), 22 deletions(-) diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c index 1039ad5..891d1fa 100644 --- a/drivers/infiniband/core/iwcm.c +++ b/drivers/infiniband/core/iwcm.c @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c return 0; } +static void free_cm_id(struct iwcm_id_private *cm_id_priv) +{ + dealloc_work_entries(cm_id_priv); + kfree(cm_id_priv); +} + /* * Release a reference on cm_id. If the last reference is being * released, enable the waiting thread (in iw_destroy_cm_id) to @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c */ static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) { - int ret = 0; - BUG_ON(atomic_read(&cm_id_priv->refcount)==0); if (atomic_dec_and_test(&cm_id_priv->refcount)) { BUG_ON(!list_empty(&cm_id_priv->work_list)); - if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { - BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); - BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, - &cm_id_priv->flags)); - ret = 1; - } complete(&cm_id_priv->destroy_comp); + return 1; } - return ret; + return 0; } static void add_ref(struct iw_cm_id *cm_id) @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_ { struct iwcm_id_private *cm_id_priv; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); - iwcm_deref_id(cm_id_priv); + if (iwcm_deref_id(cm_id_priv) && + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { + BUG_ON(!list_empty(&cm_id_priv->work_list)); + free_cm_id(cm_id_priv); + } } static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event); @@ -355,7 +358,9 @@ static void destroy_cm_id(struct iw_cm_i case IW_CM_STATE_CONN_RECV: /* * App called destroy before/without calling accept after - * receiving connection request event notification. + * receiving connection request event notification or + * returned non zero from the event callback function. + * In either case, must tell the provider to reject. */ cm_id_priv->state = IW_CM_STATE_DESTROYING; break; @@ -391,9 +396,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c wait_for_completion(&cm_id_priv->destroy_comp); - dealloc_work_entries(cm_id_priv); - - kfree(cm_id_priv); + free_cm_id(cm_id_priv); } EXPORT_SYMBOL(iw_destroy_cm_id); @@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i /* Call the client CM handler */ ret = cm_id->cm_handler(cm_id, iw_event); if (ret) { + iw_cm_reject(cm_id, NULL, 0); set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); destroy_cm_id(cm_id); if (atomic_read(&cm_id_priv->refcount)==0) - kfree(cm_id); + free_cm_id(cm_id_priv); } out: @@ -854,13 +858,12 @@ static void cm_work_handler(struct work_ destroy_cm_id(&cm_id_priv->id); } BUG_ON(atomic_read(&cm_id_priv->refcount)==0); - if (iwcm_deref_id(cm_id_priv)) - return; - - if (atomic_read(&cm_id_priv->refcount)==0 && - test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { - dealloc_work_entries(cm_id_priv); - kfree(cm_id_priv); + if (iwcm_deref_id(cm_id_priv)) { + if (test_bit(IWCM_F_CALLBACK_DESTROY, + &cm_id_priv->flags)) { + BUG_ON(!list_empty(&cm_id_priv->work_list)); + free_cm_id(cm_id_priv); + } return; } spin_lock_irqsave(&cm_id_priv->lock, flags); From swise at opengridcomputing.com Thu Feb 15 06:49:02 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 08:49:02 -0600 Subject: [openib-general] [PATCH] 2.6.21 iw_cxgb3 Fail posts synchronously when in TERMINATE state. Message-ID: <1171550942.13282.5.camel@stevo-desktop> From: Steve Wise Fail posts synchronously when in TERMINATE state. For T3B devices, mark user qp in error once we transition to TERMINATE. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_qp.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index e066727..da13a38 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -846,6 +846,8 @@ int iwch_modify_qp(struct iwch_dev *rhp, break; case IWCH_QP_STATE_TERMINATE: qhp->attr.state = IWCH_QP_STATE_TERMINATE; + if (t3b_device(qhp->rhp)) + cxio_set_wq_in_error(&qhp->wq); if (!internal) terminate = 1; break; From swise at opengridcomputing.com Thu Feb 15 06:50:38 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 08:50:38 -0600 Subject: [openib-general] [PATCH] ofed_1_2 iw_cxgb3 Fail posts synchronously when in TERMINATE state. Message-ID: <1171551038.13282.6.camel@stevo-desktop> Fail posts synchronously when in TERMINATE state. For T3B devices, mark user qp in error once we transition to TERMINATE. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_qp.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index ad044bd..9cc8b5e 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -846,6 +846,8 @@ int iwch_modify_qp(struct iwch_dev *rhp, break; case IWCH_QP_STATE_TERMINATE: qhp->attr.state = IWCH_QP_STATE_TERMINATE; + if (t3b_device(qhp->rhp)) + cxio_set_wq_in_error(&qhp->wq); if (!internal) terminate = 1; break; From todd.rimmer at qlogic.com Thu Feb 15 07:20:59 2007 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Thu, 15 Feb 2007 09:20:59 -0600 Subject: [openib-general] IPv6 multicast address per NIC In-Reply-To: <45D426F9.6060807@voltaire.com> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE061191A5FAF@EPEXCH2.qlogic.org> > From: Or Gerlitz > Sent: Thursday, February 15, 2007 4:25 AM > To: Roland Dreier; Hal Rosenstock; openib > Subject: [openib-general] IPv6 multicast address per NIC > > Hi, > > I see that when IPv6 is enabled in the kernel, the stack joins for a > --dedicated-- multicast group per each interface. Can anyone here supply > me with a pointer to where this is defined, doing a quick look on rfc > 3307 did not provide an answer. > RFC 2373 defined an IPv6 Solicited Node multicast address which is based on the IPv6 address of the Node. Each node supports a unique multicast (in addition to the assorted multicast addresses for all nodes, all routers, etc). >From RFC 2373: Solicited-Node Address: FF02:0:0:0:0:1:FFXX:XXXX The above multicast address is computed as a function of a node's unicast and anycast addresses. The solicited-node multicast address is formed by taking the low-order 24 bits of the address (unicast or anycast) and appending those bits to the prefix FF02:0:0:0:0:1:FF00::/104 resulting in a multicast address in the range FF02:0:0:0:0:1:FF00:0000 to FF02:0:0:0:0:1:FFFF:FFFF For example, the solicited node multicast address corresponding to the IPv6 address 4037::01:800:200E:8C6C is FF02::1:FF0E:8C6C. IPv6 addresses that differ only in the high-order bits, e.g. due to multiple high-order prefixes associated with different aggregations, will map to the same solicited-node address thereby reducing the number of multicast addresses a node must join. A node is required to compute and join the associated Solicited-Node multicast addresses for every unicast and anycast address it is assigned. Todd Rimmer From hnguyen at linux.vnet.ibm.com Thu Feb 15 07:28:35 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 16:28:35 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events In-Reply-To: References: <200702141740.48286.hnguyen@linux.vnet.ibm.com> Message-ID: <200702151628.35483.hnguyen@linux.vnet.ibm.com> > Looks fine but this patch at least has serious whitespace > damage... please resend a fixed version. Sorry for this. Resending the patches 1-5. Nam From dledford at redhat.com Thu Feb 15 07:12:15 2007 From: dledford at redhat.com (Doug Ledford) Date: Thu, 15 Feb 2007 10:12:15 -0500 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070215055751.GA11866@mellanox.co.il> References: <1171477762.3161.105.camel@fc6.xsintricity.com> <20070215055751.GA11866@mellanox.co.il> Message-ID: <1171552335.3161.128.camel@fc6.xsintricity.com> On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote: > > The choice of 32/64 bit default is done on a per arch basis. With > > x86_64/i386, the increased number of CPU registers in 64bit mode > > outweighs the increased code bloat that goes along with 64bit mode. On > > PPC, no such register benefit exists for 64bit mode. As such, 32bit > > apps on PPC are faster than the equivalent 64bit apps up to the point at > > which a 4GB address space becomes a problem. Correspondingly, the > > default binaries on PPC are 32bit, and only those that *need* to be > > 64bit are. While a customer's application may need >4GB address space, > > certainly all the ibutils, diags, opensm, etc. do not. As a result, we > > compile all of those utilities as 32bit by default on PPC. We also ship > > all the libs as both 32/64bit so users can select the appropriate > > environment for their particular application (with the exception of > > dapl, which doesn't support 32bit and for which I filed a bug around the > > time of OFED 1.1). > > So, what you suggest is - build 2 types of libraries, but on PPC make > binaries 32 bit? That's easy - do others agree to this approach? Yep, that's what we do. > Another option is to build binaries with whatever type of binary > gcc without extra flags generates by default. Usually this should work, but I don't rely on that since we also support s390/s390x (although not with Infiniband, but the OpenMPI alternative that we shipped with RHEL4, lam, gets compiled on s390/s390x) and that pair is a bit of an odd mix and I don't have one setting here at my house where I work, so it's hard for me to confirm that just leaving things to happen by default works as anticipated. If they would ever make an s390 that uses less than a gigawatt of power and heats less than a large sized convention center, that could change... ;-) -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rdreier at cisco.com Thu Feb 15 07:42:21 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 07:42:21 -0800 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <1171552335.3161.128.camel@fc6.xsintricity.com> (Doug Ledford's message of "Thu, 15 Feb 2007 10:12:15 -0500") References: <1171477762.3161.105.camel@fc6.xsintricity.com> <20070215055751.GA11866@mellanox.co.il> <1171552335.3161.128.camel@fc6.xsintricity.com> Message-ID: > Usually this should work, but I don't rely on that since we also support > s390/s390x (although not with Infiniband, but the OpenMPI alternative > that we shipped with RHEL4, lam, gets compiled on s390/s390x) and that > pair is a bit of an odd mix and I don't have one setting here at my > house where I work, so it's hard for me to confirm that just leaving > things to happen by default works as anticipated. If they would ever > make an s390 that uses less than a gigawatt of power and heats less than > a large sized convention center, that could change... ;-) http://www.conmicro.cx/hercules/ From dledford at redhat.com Thu Feb 15 07:35:39 2007 From: dledford at redhat.com (Doug Ledford) Date: Thu, 15 Feb 2007 10:35:39 -0500 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: References: Message-ID: <1171553739.3161.141.camel@fc6.xsintricity.com> On Thu, 2007-02-15 at 08:40 +0100, Hoang-Nam Nguyen wrote: > > So, what you suggest is - build 2 types of libraries, but on PPC make > > binaries 32 bit? That's easy - do others agree to this approach? > No, for execs please create 32bit and 64bit on PPC. > > Another option is to build binaries with whatever type of binary > > gcc without extra flags generates by default. > On PPC we really need the ability to build both versions. The reason is > simply that there're customers who want them. Customers ask for all sorts of silly things here and there. Sometimes you just need to say "No". > Why don't offer both > options, and each component owner can decide her/his default? And the > customers can pick the one(s) they like. Generally speaking, because doing that costs money. So, there needs to be a valid reason for the customer to pick one or the other in order to justify the extra spending. If there isn't, then it's time to educate the customer as to *why* there's no reason to do both sets of binaries. In this case, it's that generally speaking, no fabric is large enough that the provided utilities have any need of a >4GB address space. Additionally, the utilities need not be the same bit size as the customers applications since they are separate processes. A 64bit customer app can happily call a 32bit utility and the return code from that utility will still be valid. Now, last I knew, we don't ship anything that is a general RDMA application for use with custom applications other than opensm, and that follows a standard packet format that prevents 32/64bit issues from arising (modulo bugs). Things like rping aren't intended to be used on one side of a connection while the customer's application sits on the other. > I see your point regarding QA effort. Is it really twice? My assumption > might be wrong: I guess we have to assure/test the 32/64bit compatibility > anyway eg. 32bit client talks to 64bit server. > If we have 32bit execs only for development resp. testing, why don't we > also give them to customers in order to do basic test or diagnosis of > their setup? 32/64 bit mpitests would suffice for testing that I think (and is generally a good test anyway). -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From swise at opengridcomputing.com Thu Feb 15 07:55:19 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 09:55:19 -0600 Subject: [openib-general] remap_page_range() in older kernels Message-ID: <1171554919.13282.17.camel@stevo-desktop> Roland, Do you remember any issues with using remap_page_range() in older kernels for mapping memory allocated in the kernel back to a user process? I'm testing cxgb3 in ofed 1.2 on rhel4u4 with uses a 2.6.9 based kernel. And cxgb3 kernel-bypass isn't working because my WQ and CQ memory isn't getting correctly mapped into the user process. I've confirmed that the mapping is wrong by scribbling in the memory just after its allocated in the kernel (via dma_alloc_coherent()), then reading in the library after mapping it. The process isn't reading the correct scribbles... For the ofed 1.2 backport, we've redefined remap_pfn_range() to: static inline int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t prot) { return remap_page_range(vma, addr, pfn << PAGE_SHIFT, size, prot); } Any of this ring a bell? Any ideas? Thanks, Steve. From yipeeyipeeyipeeyipee at yahoo.com Thu Feb 15 07:53:11 2007 From: yipeeyipeeyipeeyipee at yahoo.com (yipeeyipeeyipeeyipee) Date: Thu, 15 Feb 2007 15:53:11 +0000 (UTC) Subject: [openib-general] bad port physstate Message-ID: Hi, It seems like I've stumbled into some sort of bug in the port info mad query. I have several pc's connected to an IB switch. On one of the machines I have an OpenIB installation, and on one pc I continuously run a management utility that sweeps the fabric (using ibnetdiscover from management/diags/ibnetdiscover/). At one point in time after another slow-booting pc boots, ibnetdiscover fails during its fabric sweep and the IB_ATTR_PORT_INFO query to the sweeping node's ib port fails returning a physstate == 6 (LinkErrorRecovery). When I check the /sys/class/infiniband/mthca0/ports/1/state I get "4: ACTIVE". Is there some known issue with port info mad queries? Could this be somehow related to mixed SDR/DDR switch and hcas? Maybe someone here knows how to workaround this issue? Thanks From hnguyen at linux.vnet.ibm.com Thu Feb 15 08:06:33 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 17:06:33 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events Message-ID: <200702151706.33773.hnguyen@linux.vnet.ibm.com> reworked irq handler to avoid/reduce missed irq events Signed-off-by: Hoang-Nam Nguyen --- ehca_classes.h | 18 +++- ehca_eq.c | 1 ehca_irq.c | 214 +++++++++++++++++++++++++++++++++++---------------------- ehca_irq.h | 1 ehca_main.c | 28 +++++-- ipz_pt_fn.h | 11 ++ 6 files changed, 182 insertions(+), 91 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index cf95ee4..f08ad6f 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -42,8 +42,6 @@ #ifndef __EHCA_CLASSES_H__ #define __EHCA_CLASSES_H__ -#include "ehca_classes.h" -#include "ipz_pt_fn.h" struct ehca_module; struct ehca_qp; @@ -54,14 +52,22 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include +#include + #ifdef CONFIG_PPC64 #include "ehca_classes_pSeries.h" #endif +#include "ipz_pt_fn.h" +#include "ehca_qes.h" +#include "ehca_irq.h" -#include -#include +#define EHCA_EQE_CACHE_SIZE 20 -#include "ehca_irq.h" +struct ehca_eqe_cache_entry { + struct ehca_eqe *eqe; + struct ehca_cq *cq; +}; struct ehca_eq { u32 length; @@ -74,6 +80,8 @@ struct ehca_eq { spinlock_t spinlock; struct tasklet_struct interrupt_task; u32 ist; + spinlock_t irq_spinlock; + struct ehca_eqe_cache_entry eqe_cache[EHCA_EQE_CACHE_SIZE]; }; struct ehca_sport { diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c b/drivers/infiniband/hw/ehca/ehca_eq.c index 5281dec..33c822e 100644 --- a/drivers/infiniband/hw/ehca/ehca_eq.c +++ b/drivers/infiniband/hw/ehca/ehca_eq.c @@ -61,6 +61,7 @@ int ehca_create_eq(struct ehca_shca *shc struct ib_device *ib_dev = &shca->ib_device; spin_lock_init(&eq->spinlock); + spin_lock_init(&eq->irq_spinlock); eq->is_initialized = 0; if (type != EHCA_EQ && type != EHCA_NEQ) { diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 6c4f9f9..b923b5d 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -206,7 +206,7 @@ static void qp_event_callback(struct ehc } static void cq_event_callback(struct ehca_shca *shca, - u64 eqe) + u64 eqe) { struct ehca_cq *cq; unsigned long flags; @@ -318,7 +318,7 @@ static void parse_ec(struct ehca_shca *s "disruptive port %x configuration change", port); ehca_info(&shca->ib_device, - "port %x is inactive.", port); + "port %x is inactive.", port); event.device = &shca->ib_device; event.event = IB_EVENT_PORT_ERR; event.element.port_num = port; @@ -326,7 +326,7 @@ static void parse_ec(struct ehca_shca *s ib_dispatch_event(&event); ehca_info(&shca->ib_device, - "port %x is active.", port); + "port %x is active.", port); event.device = &shca->ib_device; event.event = IB_EVENT_PORT_ACTIVE; event.element.port_num = port; @@ -401,87 +401,143 @@ irqreturn_t ehca_interrupt_eq(int irq, v return IRQ_HANDLED; } -void ehca_tasklet_eq(unsigned long data) -{ - struct ehca_shca *shca = (struct ehca_shca*)data; - struct ehca_eqe *eqe; - int int_state; - int query_cnt = 0; - do { - eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); +static inline void process_eqe(struct ehca_shca *shca, struct ehca_eqe *eqe) +{ + u64 eqe_value; + u32 token; + unsigned long flags; + struct ehca_cq *cq; + eqe_value = eqe->entry; + ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value); + if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { + ehca_dbg(&shca->ib_device, "... completion event"); + token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq = idr_find(&ehca_cq_idr, token); + if (cq == NULL) { + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + ehca_err(&shca->ib_device, + "Invalid eqe for non-existing cq token=%x", + token); + return; + } + reset_eq_pending(cq); +#ifdef CONFIG_INFINIBAND_EHCA_SCALING + queue_comp_task(cq); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); +#else + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + comp_event_callback(cq); +#endif + } else { + ehca_dbg(&shca->ib_device, + "Got non completion event"); + parse_identifier(shca, eqe_value); + } +} - if ((shca->hw_level >= 2) && eqe) - int_state = 1; - else - int_state = 0; +void ehca_process_eq(struct ehca_shca *shca, int is_irq) +{ + struct ehca_eq *eq = &shca->eq; + struct ehca_eqe_cache_entry *eqe_cache = eq->eqe_cache; + u64 eqe_value; + unsigned long flags; + int eqe_cnt, i; + int eq_empty = 0; - while ((int_state == 1) || eqe) { - while (eqe) { - u64 eqe_value = eqe->entry; - - ehca_dbg(&shca->ib_device, - "eqe_value=%lx", eqe_value); - - /* TODO: better structure */ - if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, - eqe_value)) { - unsigned long flags; - u32 token; - struct ehca_cq *cq; - - ehca_dbg(&shca->ib_device, - "... completion event"); - token = - EHCA_BMASK_GET(EQE_CQ_TOKEN, - eqe_value); - spin_lock_irqsave(&ehca_cq_idr_lock, - flags); - cq = idr_find(&ehca_cq_idr, token); - - if (cq == NULL) { - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); - break; - } + spin_lock_irqsave(&eq->irq_spinlock, flags); + if (is_irq) { + const int max_query_cnt = 100; + int query_cnt = 0; + int int_state = 1; + do { + int_state = hipz_h_query_int_state( + shca->ipz_hca_handle, eq->ist); + query_cnt++; + iosync(); + } while (int_state && query_cnt < max_query_cnt); + if (unlikely((query_cnt == max_query_cnt))) + ehca_dbg(&shca->ib_device, "int_state=%x query_cnt=%x", + int_state, query_cnt); + } - reset_eq_pending(cq); + /* read out all eqes */ + eqe_cnt = 0; + do { + u32 token; + eqe_cache[eqe_cnt].eqe = + (struct ehca_eqe *)ehca_poll_eq(shca, eq); + if (!eqe_cache[eqe_cnt].eqe) + break; + eqe_value = eqe_cache[eqe_cnt].eqe->entry; + if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { + token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); + spin_lock(&ehca_cq_idr_lock); + eqe_cache[eqe_cnt].cq = idr_find(&ehca_cq_idr, token); + if (!eqe_cache[eqe_cnt].cq) { + spin_unlock(&ehca_cq_idr_lock); + ehca_err(&shca->ib_device, + "Invalid eqe for non-existing cq " + "token=%x", token); + continue; + } + spin_unlock(&ehca_cq_idr_lock); + } else + eqe_cache[eqe_cnt].cq = NULL; + eqe_cnt++; + } while (eqe_cnt < EHCA_EQE_CACHE_SIZE); + if (!eqe_cnt) { + if (is_irq) + ehca_dbg(&shca->ib_device, + "No eqe found for irq event"); + goto unlock_irq_spinlock; + } else if (!is_irq) + ehca_dbg(&shca->ib_device, "deadman found %x eqe", eqe_cnt); + if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE)) + ehca_dbg(&shca->ib_device, "too many eqes for one irq event"); + /* enable irq for new packets */ + for (i = 0; i < eqe_cnt; i++) { + if (eq->eqe_cache[i].cq) + reset_eq_pending(eq->eqe_cache[i].cq); + } + /* check eq */ + spin_lock(&eq->spinlock); + eq_empty = (!ipz_eqit_eq_peek_valid(&shca->eq.ipz_queue)); + spin_unlock(&eq->spinlock); + /* call completion handler for cached eqes */ + for (i = 0; i < eqe_cnt; i++) + if (eq->eqe_cache[i].cq) { #ifdef CONFIG_INFINIBAND_EHCA_SCALING - queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); + spin_lock(&ehca_cq_idr_lock); + queue_comp_task(eq->eqe_cache[i].cq); + spin_unlock(&ehca_cq_idr_lock); #else - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); - comp_event_callback(cq); + comp_event_callback(eq->eqe_cache[i].cq); #endif - } else { - ehca_dbg(&shca->ib_device, - "... non completion event"); - parse_identifier(shca, eqe_value); - } - eqe = - (struct ehca_eqe *)ehca_poll_eq(shca, - &shca->eq); - } - - if (shca->hw_level >= 2) { - int_state = - hipz_h_query_int_state(shca->ipz_hca_handle, - shca->eq.ist); - query_cnt++; - iosync(); - if (query_cnt >= 100) { - query_cnt = 0; - int_state = 0; - } - } - eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); - + } else { + ehca_dbg(&shca->ib_device, "Got non completion event"); + parse_identifier(shca, eq->eqe_cache[i].eqe->entry); } - } while (int_state != 0); + /* poll eq if not empty */ + if (eq_empty) + goto unlock_irq_spinlock; + do { + struct ehca_eqe *eqe; + eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); + if (!eqe) + break; + process_eqe(shca, eqe); + eqe_cnt++; + } while (1); - return; +unlock_irq_spinlock: + spin_unlock_irqrestore(&eq->irq_spinlock, flags); +} + +void ehca_tasklet_eq(unsigned long data) +{ + ehca_process_eq((struct ehca_shca*)data, 1); } #ifdef CONFIG_INFINIBAND_EHCA_SCALING @@ -654,11 +710,11 @@ static void take_over_work(struct ehca_c list_splice_init(&cct->cq_list, &list); while(!list_empty(&list)) { - cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); + cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); - list_del(&cq->entry); - __queue_comp_task(cq, per_cpu_ptr(pool->cpu_comp_tasks, - smp_processor_id())); + list_del(&cq->entry); + __queue_comp_task(cq, per_cpu_ptr(pool->cpu_comp_tasks, + smp_processor_id())); } spin_unlock_irqrestore(&cct->task_lock, flags_cct); diff --git a/drivers/infiniband/hw/ehca/ehca_irq.h b/drivers/infiniband/hw/ehca/ehca_irq.h index be579cc..6ed06ee 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.h +++ b/drivers/infiniband/hw/ehca/ehca_irq.h @@ -56,6 +56,7 @@ void ehca_tasklet_neq(unsigned long data irqreturn_t ehca_interrupt_eq(int irq, void *dev_id); void ehca_tasklet_eq(unsigned long data); +void ehca_process_eq(struct ehca_shca *shca, int is_irq); struct ehca_cpu_comp_task { wait_queue_head_t wait_queue; diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 1155bcf..5790534 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -52,7 +52,7 @@ MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0020"); +MODULE_VERSION("SVNEHCA_0021"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -432,8 +432,8 @@ static int ehca_destroy_aqp1(struct ehca static ssize_t ehca_show_debug_level(struct device_driver *ddp, char *buf) { - return snprintf(buf, PAGE_SIZE, "%d\n", - ehca_debug_level); + return snprintf(buf, PAGE_SIZE, "%d\n", + ehca_debug_level); } static ssize_t ehca_store_debug_level(struct device_driver *ddp, @@ -778,8 +778,24 @@ void ehca_poll_eqs(unsigned long data) spin_lock(&shca_list_lock); list_for_each_entry(shca, &shca_list, shca_list) { - if (shca->eq.is_initialized) - ehca_tasklet_eq((unsigned long)(void*)shca); + if (shca->eq.is_initialized) { + /* call deadman proc only if eq ptr does not change */ + struct ehca_eq *eq = &shca->eq; + int max = 3; + volatile u64 q_ofs, q_ofs2; + u64 flags; + spin_lock_irqsave(&eq->spinlock, flags); + q_ofs = eq->ipz_queue.current_q_offset; + spin_unlock_irqrestore(&eq->spinlock, flags); + do { + spin_lock_irqsave(&eq->spinlock, flags); + q_ofs2 = eq->ipz_queue.current_q_offset; + spin_unlock_irqrestore(&eq->spinlock, flags); + max--; + } while (q_ofs == q_ofs2 && max > 0); + if (q_ofs == q_ofs2) + ehca_process_eq(shca, 0); + } } mod_timer(&poll_eqs_timer, jiffies + HZ); spin_unlock(&shca_list_lock); @@ -790,7 +806,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0020)\n"); + "(Rel.: SVNEHCA_0021)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h index dc3bda2..8199c45 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h @@ -79,7 +79,7 @@ static inline void *ipz_qeit_calc(struct if (q_offset >= queue->queue_length) return NULL; current_page = (queue->queue_pages)[q_offset >> EHCA_PAGESHIFT]; - return ¤t_page->entries[q_offset & (EHCA_PAGESIZE - 1)]; + return ¤t_page->entries[q_offset & (EHCA_PAGESIZE - 1)]; } /* @@ -247,6 +247,15 @@ static inline void *ipz_eqit_eq_get_inc_ return ret; } +static inline void *ipz_eqit_eq_peek_valid(struct ipz_queue *queue) +{ + void *ret = ipz_qeit_get(queue); + u32 qe = *(u8 *) ret; + if ((qe >> 7) != (queue->toggle_state & 1)) + return NULL; + return ret; +} + /* returns address (GX) of first queue entry */ static inline u64 ipz_qpt_get_firstpage(struct ipz_qpt *qpt) { From hnguyen at linux.vnet.ibm.com Thu Feb 15 08:07:30 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 17:07:30 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 2/5] ehca: fix race condition/locking issues in scaling code Message-ID: <200702151707.31021.hnguyen@linux.vnet.ibm.com> fix a race condition in find_next_cpu_online() and some other locking issues in scaling code Signed-off-by: Hoang-Nam Nguyen --- ehca_irq.c | 68 +++++++++++++++++++++++++++++-------------------------------- 1 files changed, 33 insertions(+), 35 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index b923b5d..9679b07 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -544,28 +544,30 @@ void ehca_tasklet_eq(unsigned long data) static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { - unsigned long flags_last_cpu; + int cpu; + unsigned long flags; + WARN_ON_ONCE(!in_interrupt()); if (ehca_debug_level) ehca_dmp(&cpu_online_map, sizeof(cpumask_t), ""); - spin_lock_irqsave(&pool->last_cpu_lock, flags_last_cpu); - pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map); - if (pool->last_cpu == NR_CPUS) - pool->last_cpu = first_cpu(cpu_online_map); - spin_unlock_irqrestore(&pool->last_cpu_lock, flags_last_cpu); + spin_lock_irqsave(&pool->last_cpu_lock, flags); + cpu = next_cpu(pool->last_cpu, cpu_online_map); + if (cpu == NR_CPUS) + cpu = first_cpu(cpu_online_map); + pool->last_cpu = cpu; + spin_unlock_irqrestore(&pool->last_cpu_lock, flags); - return pool->last_cpu; + return cpu; } static void __queue_comp_task(struct ehca_cq *__cq, struct ehca_cpu_comp_task *cct) { - unsigned long flags_cct; - unsigned long flags_cq; + unsigned long flags; - spin_lock_irqsave(&cct->task_lock, flags_cct); - spin_lock_irqsave(&__cq->task_lock, flags_cq); + spin_lock_irqsave(&cct->task_lock, flags); + spin_lock(&__cq->task_lock); if (__cq->nr_callbacks == 0) { __cq->nr_callbacks++; @@ -576,8 +578,8 @@ static void __queue_comp_task(struct ehc else __cq->nr_callbacks++; - spin_unlock_irqrestore(&__cq->task_lock, flags_cq); - spin_unlock_irqrestore(&cct->task_lock, flags_cct); + spin_unlock(&__cq->task_lock); + spin_unlock_irqrestore(&cct->task_lock, flags); } static void queue_comp_task(struct ehca_cq *__cq) @@ -588,69 +590,69 @@ static void queue_comp_task(struct ehca_ cpu = get_cpu(); cpu_id = find_next_online_cpu(pool); - BUG_ON(!cpu_online(cpu_id)); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); + BUG_ON(!cct); if (cct->cq_jobs > 0) { cpu_id = find_next_online_cpu(pool); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); + BUG_ON(!cct); } __queue_comp_task(__cq, cct); - - put_cpu(); - - return; } static void run_comp_task(struct ehca_cpu_comp_task* cct) { struct ehca_cq *cq; - unsigned long flags_cct; - unsigned long flags_cq; + unsigned long flags; - spin_lock_irqsave(&cct->task_lock, flags_cct); + spin_lock_irqsave(&cct->task_lock, flags); while (!list_empty(&cct->cq_list)) { cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); - spin_unlock_irqrestore(&cct->task_lock, flags_cct); + spin_unlock_irqrestore(&cct->task_lock, flags); comp_event_callback(cq); - spin_lock_irqsave(&cct->task_lock, flags_cct); + spin_lock_irqsave(&cct->task_lock, flags); - spin_lock_irqsave(&cq->task_lock, flags_cq); + spin_lock(&cq->task_lock); cq->nr_callbacks--; if (cq->nr_callbacks == 0) { list_del_init(cct->cq_list.next); cct->cq_jobs--; } - spin_unlock_irqrestore(&cq->task_lock, flags_cq); - + spin_unlock(&cq->task_lock); } - spin_unlock_irqrestore(&cct->task_lock, flags_cct); - - return; + spin_unlock_irqrestore(&cct->task_lock, flags); } static int comp_task(void *__cct) { struct ehca_cpu_comp_task* cct = __cct; + int cql_empty; DECLARE_WAITQUEUE(wait, current); set_current_state(TASK_INTERRUPTIBLE); while(!kthread_should_stop()) { add_wait_queue(&cct->wait_queue, &wait); - if (list_empty(&cct->cq_list)) + spin_lock_irq(&cct->task_lock); + cql_empty = list_empty(&cct->cq_list); + spin_unlock_irq(&cct->task_lock); + if (cql_empty) schedule(); else __set_current_state(TASK_RUNNING); remove_wait_queue(&cct->wait_queue, &wait); - if (!list_empty(&cct->cq_list)) + spin_lock_irq(&cct->task_lock); + cql_empty = list_empty(&cct->cq_list); + spin_unlock_irq(&cct->task_lock); + if (!cql_empty) run_comp_task(__cct); set_current_state(TASK_INTERRUPTIBLE); @@ -693,8 +695,6 @@ static void destroy_comp_task(struct ehc if (task) kthread_stop(task); - - return; } static void take_over_work(struct ehca_comp_pool *pool, @@ -815,6 +815,4 @@ void ehca_destroy_comp_pool(void) free_percpu(pool->cpu_comp_tasks); kfree(pool); #endif - - return; } From hnguyen at linux.vnet.ibm.com Thu Feb 15 08:08:33 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 17:08:33 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 3/5] ehca: allow en/disabling scaling code via module parameter Message-ID: <200702151708.33781.hnguyen@linux.vnet.ibm.com> allow users to en/disable scaling code when loading ib_ehca module Signed-off-by: Hoang-Nam Nguyen --- Kconfig | 8 -------- ehca_classes.h | 1 + ehca_irq.c | 47 +++++++++++++++++++++-------------------------- ehca_main.c | 4 ++++ 4 files changed, 26 insertions(+), 34 deletions(-) diff --git a/drivers/infiniband/hw/ehca/Kconfig b/drivers/infiniband/hw/ehca/Kconfig index 727b10d..1a85459 100644 --- a/drivers/infiniband/hw/ehca/Kconfig +++ b/drivers/infiniband/hw/ehca/Kconfig @@ -7,11 +7,3 @@ config INFINIBAND_EHCA To compile the driver as a module, choose M here. The module will be called ib_ehca. -config INFINIBAND_EHCA_SCALING - bool "Scaling support (EXPERIMENTAL)" - depends on IBMEBUS && INFINIBAND_EHCA && HOTPLUG_CPU && EXPERIMENTAL - default y - ---help--- - eHCA scaling support schedules the CQ callbacks to different CPUs. - - To enable this feature choose Y here. diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index f08ad6f..40404c9 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -277,6 +277,7 @@ extern struct idr ehca_cq_idr; extern int ehca_static_rate; extern int ehca_port_act_time; extern int ehca_use_hp_mr; +extern int ehca_scaling_code; struct ipzu_queue_resp { u32 qe_size; /* queue entry size */ diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 9679b07..3ec53c6 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -63,15 +63,11 @@ #define ERROR_DATA_LENGTH EHCA_BMASK_IBM(52,63) #define ERROR_DATA_TYPE EHCA_BMASK_IBM(0,7) -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - static void queue_comp_task(struct ehca_cq *__cq); static struct ehca_comp_pool* pool; static struct notifier_block comp_pool_callback_nb; -#endif - static inline void comp_event_callback(struct ehca_cq *cq) { if (!cq->ib_cq.comp_handler) @@ -423,13 +419,13 @@ static inline void process_eqe(struct eh return; } reset_eq_pending(cq); -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); -#else - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - comp_event_callback(cq); -#endif + if (ehca_scaling_code) { + queue_comp_task(cq); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + } else { + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + comp_event_callback(cq); + } } else { ehca_dbg(&shca->ib_device, "Got non completion event"); @@ -508,13 +504,12 @@ void ehca_process_eq(struct ehca_shca *s /* call completion handler for cached eqes */ for (i = 0; i < eqe_cnt; i++) if (eq->eqe_cache[i].cq) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - spin_lock(&ehca_cq_idr_lock); - queue_comp_task(eq->eqe_cache[i].cq); - spin_unlock(&ehca_cq_idr_lock); -#else - comp_event_callback(eq->eqe_cache[i].cq); -#endif + if (ehca_scaling_code) { + spin_lock(&ehca_cq_idr_lock); + queue_comp_task(eq->eqe_cache[i].cq); + spin_unlock(&ehca_cq_idr_lock); + } else + comp_event_callback(eq->eqe_cache[i].cq); } else { ehca_dbg(&shca->ib_device, "Got non completion event"); parse_identifier(shca, eq->eqe_cache[i].eqe->entry); @@ -540,8 +535,6 @@ void ehca_tasklet_eq(unsigned long data) ehca_process_eq((struct ehca_shca*)data, 1); } -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { int cpu; @@ -764,14 +757,14 @@ static int comp_pool_callback(struct not return NOTIFY_OK; } -#endif - int ehca_create_comp_pool(void) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING int cpu; struct task_struct *task; + if (!ehca_scaling_code) + return 0; + pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL); if (pool == NULL) return -ENOMEM; @@ -796,16 +789,19 @@ int ehca_create_comp_pool(void) comp_pool_callback_nb.notifier_call = comp_pool_callback; comp_pool_callback_nb.priority =0; register_cpu_notifier(&comp_pool_callback_nb); -#endif + + printk(KERN_INFO "eHCA scaling code enabled\n"); return 0; } void ehca_destroy_comp_pool(void) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING int i; + if (!ehca_scaling_code) + return; + unregister_cpu_notifier(&comp_pool_callback_nb); for (i = 0; i < NR_CPUS; i++) { @@ -814,5 +810,4 @@ void ehca_destroy_comp_pool(void) } free_percpu(pool->cpu_comp_tasks); kfree(pool); -#endif } diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 5790534..c183512 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -62,6 +62,7 @@ int ehca_use_hp_mr = 0; int ehca_port_act_time = 30; int ehca_poll_all_eqs = 1; int ehca_static_rate = -1; +int ehca_scaling_code = 1; module_param_named(open_aqp1, ehca_open_aqp1, int, 0); module_param_named(debug_level, ehca_debug_level, int, 0); @@ -71,6 +72,7 @@ module_param_named(use_hp_mr, ehca_u module_param_named(port_act_time, ehca_port_act_time, int, 0); module_param_named(poll_all_eqs, ehca_poll_all_eqs, int, 0); module_param_named(static_rate, ehca_static_rate, int, 0); +module_param_named(scaling_code, ehca_scaling_code, int, 0); MODULE_PARM_DESC(open_aqp1, "AQP1 on startup (0: no (default), 1: yes)"); @@ -91,6 +93,8 @@ MODULE_PARM_DESC(poll_all_eqs, " (0: no, 1: yes (default))"); MODULE_PARM_DESC(static_rate, "set permanent static rate (default: disabled)"); +MODULE_PARM_DESC(scaling_code, + "set scaling code (0: disabled, 1: enabled/default)"); spinlock_t ehca_qp_idr_lock; spinlock_t ehca_cq_idr_lock; From hnguyen at linux.vnet.ibm.com Thu Feb 15 08:09:44 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 17:09:44 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion() Message-ID: <200702151709.45323.hnguyen@linux.vnet.ibm.com> remove yield() and use wait_for_completion() in order to wait for running completion handlers finished before destroying associated completion queue Signed-off-by: Hoang-Nam Nguyen --- ehca_classes.h | 3 +++ ehca_cq.c | 5 +++-- ehca_irq.c | 6 +++++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 40404c9..d8ce0c8 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -52,6 +52,8 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include + #include #include @@ -154,6 +156,7 @@ struct ehca_cq { struct hlist_head qp_hashtab[QP_HASHTAB_LEN]; struct list_head entry; u32 nr_callbacks; + struct completion zero_callbacks; spinlock_t task_lock; u32 ownpid; /* mmap counter for resources mapped into user space */ diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index 9291a86..906bd5b 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d spin_lock_init(&my_cq->spinlock); spin_lock_init(&my_cq->cb_lock); spin_lock_init(&my_cq->task_lock); + init_completion(&my_cq->zero_callbacks); my_cq->ownpid = current->tgid; cq = &my_cq->ib_cq; @@ -330,9 +331,9 @@ int ehca_destroy_cq(struct ib_cq *cq) } spin_lock_irqsave(&ehca_cq_idr_lock, flags); - while (my_cq->nr_callbacks) { + if (my_cq->nr_callbacks) { spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - yield(); + wait_for_completion(&my_cq->zero_callbacks); spin_lock_irqsave(&ehca_cq_idr_lock, flags); } diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 3ec53c6..7db39b7 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -605,6 +605,7 @@ static void run_comp_task(struct ehca_cp spin_lock_irqsave(&cct->task_lock, flags); while (!list_empty(&cct->cq_list)) { + int is_complete = 0; cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); spin_unlock_irqrestore(&cct->task_lock, flags); comp_event_callback(cq); @@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp spin_lock(&cq->task_lock); cq->nr_callbacks--; - if (cq->nr_callbacks == 0) { + is_complete = (cq->nr_callbacks == 0); + if (is_complete) { list_del_init(cct->cq_list.next); cct->cq_jobs--; } spin_unlock(&cq->task_lock); + if (is_complete) /* wake up waiting destroy_cq() */ + complete(&cq->zero_callbacks); } spin_unlock_irqrestore(&cct->task_lock, flags); From hnguyen at linux.vnet.ibm.com Thu Feb 15 08:10:06 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 17:10:06 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 5/5] ehca: query_port() returns LINK_UP instead UNKNOWN Message-ID: <200702151710.06432.hnguyen@linux.vnet.ibm.com> set port phys state as a result of ehca_query_port() to LINK_UP. On pSeries ehca actually represents a logical HCA, whose phys/link state always is LINK_UP. Signed-off-by: Hoang-Nam Nguyen --- ehca_hca.c | 3 +++ 1 files changed, 3 insertions(+) diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c index b7be950..30eb45d 100644 --- a/drivers/infiniband/hw/ehca/ehca_hca.c +++ b/drivers/infiniband/hw/ehca/ehca_hca.c @@ -162,6 +162,9 @@ int ehca_query_port(struct ib_device *ib props->active_width = IB_WIDTH_12X; props->active_speed = 0x1; + /* at the moment (logical) link state is always LINK_UP */ + props->phys_state = 0x5; + query_port1: ehca_free_fw_ctrlblock(rblock); From changquing.tang at hp.com Thu Feb 15 08:13:21 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Thu, 15 Feb 2007 16:13:21 -0000 Subject: [openib-general] How heavy to resize a CQ ? In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84037050CE@G3W0634.americas.hpqcorp.net> Roland or other driver developers: In dynamic process application, we don't know how many connections a process will make when we create the CQ, so we don't know the CQ size, what we do is to increase the CQ size when a new connection is made, and decrease the CQ size when a connection is destroyed. My question is, is ibv_resize_cq() a lightweight function call ? Do we have to drain the CQ before we resize the CQ ? Thanks --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Wednesday, February 07, 2007 5:42 PM > To: Tang, Changqing > Cc: Michael S. Tsirkin; openib-general at openib.org > Subject: Re: Immediate data question > > Changqing> What I mean is that, is there any performance penalty > Changqing> for receiver's overall performance if RNR happens > Changqing> continuously on one of the QP ? > > Not for the receiver, but the sender will be severely slowed > down by having to wait for the RNR timeouts. > From bclements at SBSPlanet.com Thu Feb 15 08:16:55 2007 From: bclements at SBSPlanet.com (Clements, Brent) Date: Thu, 15 Feb 2007 11:16:55 -0500 Subject: [openib-general] What is the expected performance of IPoIB using DDR equipment? Message-ID: I've searched the web but I cannot find the answer to the following question: What is the expected (not theoretical) IPoIB throughput performance when using DDR switches and DDR HCA's? Thanks! The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. To reply to our email administrator directly, please send an email to postmaster at sbsplanet.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Feb 15 08:14:34 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Feb 2007 11:14:34 -0500 Subject: [openib-general] bad port physstate In-Reply-To: References: Message-ID: <1171556073.22446.185292.camel@hal.voltaire.com> On Thu, 2007-02-15 at 10:53, yipeeyipeeyipeeyipee wrote: > Hi, > > It seems like I've stumbled into some sort of bug in the port info mad query. > I have several pc's connected to an IB switch. > On one of the machines I have an OpenIB installation, and on one pc I > continuously run a management utility that sweeps the fabric (using > ibnetdiscover from management/diags/ibnetdiscover/). At one point in time after > another slow-booting pc boots, ibnetdiscover fails during its fabric sweep and > the IB_ATTR_PORT_INFO query to the sweeping node's ib port fails returning a > physstate == 6 (LinkErrorRecovery). > When I check the /sys/class/infiniband/mthca0/ports/1/state I get "4: ACTIVE". That's because the initial smpquery (by ibnetdiscover) sees the LinkErrorRecovery PortPhysicalState, the port then comes up at the physical level, and then the SM moves it through the port states to active and when you look again locally (via sys/class/infiniband/mthca0/ports/1/state), it has been made active and I would expect an smpquery of portinfo of this or ibnetdiscover would now show this. > Is there some known issue with port info mad queries? Could this be somehow > related to mixed SDR/DDR switch and hcas? Maybe someone here knows how to > workaround this issue? Sounds like the way it is suppposed to work to me. -- Hal > Thanks > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Thu Feb 15 08:29:42 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Feb 2007 18:29:42 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <1171552335.3161.128.camel@fc6.xsintricity.com> References: <1171477762.3161.105.camel@fc6.xsintricity.com> <20070215055751.GA11866@mellanox.co.il> <1171552335.3161.128.camel@fc6.xsintricity.com> Message-ID: <20070215162942.GB15185@mellanox.co.il> > Quoting Doug Ledford : > Subject: Re: 32-bit build for ppc64 is required > > On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote: > > > > The choice of 32/64 bit default is done on a per arch basis. With > > > x86_64/i386, the increased number of CPU registers in 64bit mode > > > outweighs the increased code bloat that goes along with 64bit mode. On > > > PPC, no such register benefit exists for 64bit mode. As such, 32bit > > > apps on PPC are faster than the equivalent 64bit apps up to the point at > > > which a 4GB address space becomes a problem. Correspondingly, the > > > default binaries on PPC are 32bit, and only those that *need* to be > > > 64bit are. While a customer's application may need >4GB address space, > > > certainly all the ibutils, diags, opensm, etc. do not. As a result, we > > > compile all of those utilities as 32bit by default on PPC. We also ship > > > all the libs as both 32/64bit so users can select the appropriate > > > environment for their particular application (with the exception of > > > dapl, which doesn't support 32bit and for which I filed a bug around the > > > time of OFED 1.1). > > > > So, what you suggest is - build 2 types of libraries, but on PPC make > > binaries 32 bit? That's easy - do others agree to this approach? > > Yep, that's what we do. Care to post a patch to Vlad's scripts? -- MST From vlad at mellanox.co.il Thu Feb 15 08:29:30 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 15 Feb 2007 18:29:30 +0200 Subject: [openib-general] [PATCH] ofed_1_2 iw_cxgb3 Fail posts synchronously when in TERMINATE state. In-Reply-To: <1171551038.13282.6.camel@stevo-desktop> References: <1171551038.13282.6.camel@stevo-desktop> Message-ID: <1171556970.16477.0.camel@vladsk-laptop> On Thu, 2007-02-15 at 08:50 -0600, Steve Wise wrote: > Fail posts synchronously when in TERMINATE state. > > For T3B devices, mark user qp in error once we transition > to TERMINATE. > > Signed-off-by: Steve Wise > --- Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From swise at opengridcomputing.com Thu Feb 15 08:59:45 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 10:59:45 -0600 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <45D3E224.9060306@cse.ohio-state.edu> References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> <45D3E224.9060306@cse.ohio-state.edu> Message-ID: <1171558785.13282.29.camel@stevo-desktop> Shaun, Lemme know if you have an mvapich2 kit that I can test with iwarp... Thanks, Steve. On Wed, 2007-02-14 at 23:31 -0500, Shaun Rowland wrote: > Roland Dreier wrote: > > > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is > > > built, at least by looking at the .so file result: > > > > > > [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a > > > libibverbs.so > > > libibverbs.so.1 > > > libibverbs.so.1.0.0 > > > > The soname hasn't changed because the library is still compatible. > > But (I hope at least) OFED has libibverbs 1.1. > > The soname is libibverbs.so.1, so I guess the longer name would not > matter anyway. Clearly, what I posted shows the IBVERBS 1.1 ABI is > there. I think I have figured out why our code has this problem. The > problem below is similar to the original one posted about. > > I did some experimentation with the srq_pingpong libibverbs example > code. First I built it directly with: > > > gcc -g -c pingpong.c -I/usr/local/ofed/include > > gcc -g -c -D_GNU_SOURCE srq_pingpong.c -I/usr/local/ofed/include > > gcc -g -o srq_pingpong srq_pingpong.o pingpong.o -L/usr/local/ofed/lib64 > -libverbs > > > This works. Next I copied srq_pingpong.c to two files: > > srq_pingpong_rowland.c > - just has a main function that calls lib_start(). > > srq_pingpong_lib_rowland.c > - main() changed to lib_start(). > > This moves all of the SRQ pingpong code into a shared library. If I > build this shared library in this way, it works: > > > gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include > > gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c > -I/usr/local/ofed/include > > gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so > srq_pingpong_lib_rowland.o pingpong.o -L/usr/local/ofed/lib64 -libverbs > > gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD -lsrqtest > > > Above I am linking libibverbs directly into my libsrqtest.so > library. This works and the IBVERBS 1.1 ABI is clearly in the > libsrqtest.so file: > > [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head > U ibv_ack_cq_events@@IBVERBS_1.1 > U ibv_alloc_pd@@IBVERBS_1.1 > U ibv_close_device@@IBVERBS_1.1 > U ibv_create_comp_channel@@IBVERBS_1.0 > U ibv_create_cq@@IBVERBS_1.1 > U ibv_create_qp@@IBVERBS_1.1 > U ibv_create_srq@@IBVERBS_1.1 > U ibv_dealloc_pd@@IBVERBS_1.1 > U ibv_dereg_mr@@IBVERBS_1.1 > U ibv_destroy_comp_channel@@IBVERBS_1.0 > > However, if I build in a similar way to MVAPICH2, the resulting program > fails: > > > gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include > > gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c > -I/usr/local/ofed/include > > gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so > srq_pingpong_lib_rowland.o pingpong.o > > gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD > -L/usr/local/ofed/lib64 -lsrqtest -libverbs > > > Above I am not linking libibverbs into libsrqtest.so, thus it is > required on the last gcc line. This is how MVAPICH2's libmpich.so file > works, and from past experience, I've seen this before. Running shows: > > [rowland at z1 ibverbs-examples]$ gdb ./srq_pingpong_rowland > GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu"...Using host > libthread_db library "/lib64/tls/libthread_db.so.1". > > (gdb) r > Starting program: > /home/7/rowland/z1-test/ibverbs-examples/srq_pingpong_rowland > [Thread debugging using libthread_db enabled] > [New Thread 182896403968 (LWP 29858)] > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 182896403968 (LWP 29858)] > post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, > bad_wr=0x7fbfff88c8) > at src/compat-1_0.c:312 > 312 src/compat-1_0.c: No such file or directory. > in src/compat-1_0.c > (gdb) bt > #0 post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, > bad_wr=0x7fbfff88c8) at src/compat-1_0.c:312 > #1 0x0000002a95559e12 in ibv_post_srq_recv (srq=0x5075b0, > recv_wr=0x7fbfff88d0, bad_recv_wr=0x7fbfff88c8) > at /usr/local/ofed/include/infiniband/verbs.h:915 > #2 0x0000002a95559dcf in pp_post_recv (ctx=0x5023d0, n=500) > at srq_pingpong_lib_rowland.c:496 > #3 0x0000002a9555a614 in lib_start (argc=1, argv=0x7fbffff7f8) > at srq_pingpong_lib_rowland.c:696 > #4 0x0000000000400608 in main (argc=1, argv=0x7fbffff7f8) > at srq_pingpong_rowland.c:36 > (gdb) quit > > It is not clear to me why the difference of either linking libibverbs > into libsrqtest.so or not doing so causes the IBVERBS 1.1 ABI to be used > or not. I looked at the libibverbs code, and the 1.1 ABI is the default. > The libsrqtest.so file in the above case seems to have lost this > information: > > [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head > U ibv_ack_cq_events > U ibv_alloc_pd > U ibv_close_device > U ibv_create_comp_channel > U ibv_create_cq > U ibv_create_qp > U ibv_create_srq > U ibv_dealloc_pd > U ibv_dereg_mr > U ibv_destroy_comp_channel > > I've never had to deal with an ABI issue like this in shared library > linking/usage. Does it make sense for this to be the case? I think > perhaps it does, but I wanted to ask. > > I've placed my test code here if it helps: > > http://www.cse.ohio-state.edu/~rowland/ibverbs-examples.tar.gz > > I have a fix for our code that I am testing now. It seems to work and > solve the observed problems, but more testing will be required to be > sure there are no issues. This will require a new SRPM if the fix is > required, which it seems at this point. From swise at opengridcomputing.com Thu Feb 15 09:12:06 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 11:12:06 -0600 Subject: [openib-general] bug 355 - problems building modules that depend on the ofed 1.2 modules Message-ID: <1171559526.13282.41.camel@stevo-desktop> All, I've run into the following problem. Bug 335 opened to track this... I install the alpha1 ofed 1.2 rpms on a RHEL5b2 system with its 2.6.18-1.2747.el5 kernel. Then I build a module outside of the kernel that uses the IB verbs and RDMA CM kernel interface. (krping). This module builds and loads ok on a stock 2.6.20 system with ofed1.2 installed, but it fails to load on the rhel5b2 system with a version symbol problem. Here is a snipit of the errors: rdma_krping: disagrees about version of symbol ib_create_cq rdma_krping: Unknown symbol ib_create_cq rdma_krping: disagrees about version of symbol rdma_resolve_addr rdma_krping: Unknown symbol rdma_resolve_addr rdma_krping: disagrees about version of symbol ib_dereg_mr rdma_krping: Unknown symbol ib_dereg_mr I'm wondering if maybe the ofed modules are _not_ being build with src versioning even if the kernel has it turned on? We see similar problems with NFS-RDMA trying to use OFED 1.2 modules. And the NFS-RDMA works with OFED 1.1 modules, so I _think_ something is whacked with the OFED 1.2 build process. Here is the Makefile I'm using to build rdma_krping (borrowed from Intel's e1000 kit): [root at vic12 krping]# cat Makefile KSRC=/lib/modules/`uname -r`/source KOBJ=/lib/modules/`uname -r`/build CFLAGS += -DLINUX -D__KERNEL__ -DMODULE -O2 -pipe -Wall CFLAGS += -I/usr/local/ofed/src/ofa_kernel-1.2/include -I$(KSRC)/include -I. CFLAGS += $(shell [ -f $(KSRC)/include/linux/modversions.h ] && \ echo "-DMODVERSIONS -DEXPORT_SYMTAB \ -include $(KSRC)/include/linux/modversions.h") CFLAGS += $(CFLAGS_EXTRA) obj-m += rdma_krping.o rdma_krping-y := getopt.o krping.o default: make -C $(KSRC) O=$(KOBJ) SUBDIRS=$(shell pwd) modules clean: rm -f *.o rm -f *.ko rm -f rdma_krping.mod.c rm -f Module.symvers [root at vic12 krping]# From dledford at redhat.com Thu Feb 15 09:11:17 2007 From: dledford at redhat.com (Doug Ledford) Date: Thu, 15 Feb 2007 12:11:17 -0500 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070215162942.GB15185@mellanox.co.il> References: <1171477762.3161.105.camel@fc6.xsintricity.com> <20070215055751.GA11866@mellanox.co.il> <1171552335.3161.128.camel@fc6.xsintricity.com> <20070215162942.GB15185@mellanox.co.il> Message-ID: <1171559478.3161.155.camel@fc6.xsintricity.com> On Thu, 2007-02-15 at 18:29 +0200, Michael S. Tsirkin wrote: > > Quoting Doug Ledford : > > Subject: Re: 32-bit build for ppc64 is required > > > > On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote: > > > > > > The choice of 32/64 bit default is done on a per arch basis. With > > > > x86_64/i386, the increased number of CPU registers in 64bit mode > > > > outweighs the increased code bloat that goes along with 64bit mode. On > > > > PPC, no such register benefit exists for 64bit mode. As such, 32bit > > > > apps on PPC are faster than the equivalent 64bit apps up to the point at > > > > which a 4GB address space becomes a problem. Correspondingly, the > > > > default binaries on PPC are 32bit, and only those that *need* to be > > > > 64bit are. While a customer's application may need >4GB address space, > > > > certainly all the ibutils, diags, opensm, etc. do not. As a result, we > > > > compile all of those utilities as 32bit by default on PPC. We also ship > > > > all the libs as both 32/64bit so users can select the appropriate > > > > environment for their particular application (with the exception of > > > > dapl, which doesn't support 32bit and for which I filed a bug around the > > > > time of OFED 1.1). > > > > > > So, what you suggest is - build 2 types of libraries, but on PPC make > > > binaries 32 bit? That's easy - do others agree to this approach? > > > > Yep, that's what we do. > > Care to post a patch to Vlad's scripts? Yuk. I suppose I could write one, but I don't (and can't) use any of the OFED supplied build scripts in our build system, so it's hard for me to test since our build system is the only way I have to access ppc/ppc64 hardware. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at mellanox.co.il Thu Feb 15 09:43:09 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Feb 2007 19:43:09 +0200 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <1171559478.3161.155.camel@fc6.xsintricity.com> References: <1171559478.3161.155.camel@fc6.xsintricity.com> Message-ID: <20070215174309.GD15185@mellanox.co.il> > Quoting Doug Ledford : > Subject: Re: 32-bit build for ppc64 is required > > On Thu, 2007-02-15 at 18:29 +0200, Michael S. Tsirkin wrote: > > > Quoting Doug Ledford : > > > Subject: Re: 32-bit build for ppc64 is required > > > > > > On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote: > > > > > > > > The choice of 32/64 bit default is done on a per arch basis. With > > > > > x86_64/i386, the increased number of CPU registers in 64bit mode > > > > > outweighs the increased code bloat that goes along with 64bit mode. On > > > > > PPC, no such register benefit exists for 64bit mode. As such, 32bit > > > > > apps on PPC are faster than the equivalent 64bit apps up to the point at > > > > > which a 4GB address space becomes a problem. Correspondingly, the > > > > > default binaries on PPC are 32bit, and only those that *need* to be > > > > > 64bit are. While a customer's application may need >4GB address space, > > > > > certainly all the ibutils, diags, opensm, etc. do not. As a result, we > > > > > compile all of those utilities as 32bit by default on PPC. We also ship > > > > > all the libs as both 32/64bit so users can select the appropriate > > > > > environment for their particular application (with the exception of > > > > > dapl, which doesn't support 32bit and for which I filed a bug around the > > > > > time of OFED 1.1). > > > > > > > > So, what you suggest is - build 2 types of libraries, but on PPC make > > > > binaries 32 bit? That's easy - do others agree to this approach? > > > > > > Yep, that's what we do. > > > > Care to post a patch to Vlad's scripts? > > Yuk. I suppose I could write one, but I don't (and can't) use any of > the OFED supplied build scripts in our build system, so it's hard for me > to test since our build system is the only way I have to access > ppc/ppc64 hardware. Oh, well. Other takers? -- MST From krause at cup.hp.com Thu Feb 15 09:42:37 2007 From: krause at cup.hp.com (Michael Krause) Date: Thu, 15 Feb 2007 09:42:37 -0800 Subject: [openib-general] Immediate data question In-Reply-To: <309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.co m> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com> <6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com> <309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.com> Message-ID: <6.2.0.14.2.20070215071309.0979bed8@esmail.cup.hp.com> At 09:37 PM 2/14/2007, Devesh Sharma wrote: >On 2/14/07, Michael Krause wrote: >>At 05:37 AM 2/13/2007, Devesh Sharma wrote: >> >On 2/12/07, Devesh Sharma wrote: >> >>On 2/10/07, Tang, Changqing wrote: >> >> > > > >> >> > > >Not for the receiver, but the sender will be severely slowed down by >> >> > > >having to wait for the RNR timeouts. >> >> > > >> >> > > RNR = Receiver Not Ready so by definition, the data flow >> >> > > isn't going to >> >> > > progress until the receiver is ready to receive data. If a >> >> > > receive QP >> >> > > enters RNR for a RC, then it is likely not progressing as >> >> > > desired. RNR >> >> > > was initially put in place to enable a receiver to create >> >> > > back pressure to the sender without causing a fatal error >> >> > > condition. It should rarely be entered and therefore should >> >> > > have negligible impact on overall performance however when a >> >> > > RNR occurs, no forward progress will occur so performance is >> >> > > essentially zero. >> >> > >> >> > Mike: >> >> > I still do not quite understand this issue. I have two >> >> > situations that have RNR triggered. >> >> > >> >> > 1. process A and process B is connected with QP. A first post a send to >> >> > B, B does not post receive. Then A and B are doing a long time >> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE >> >> > message. Finally B will post a receive. Does the first pending send >> in A >> >> > block all the later RDMA_WRITE ? >> >>According to IBTA spec HCA will process WR entries in strict order in >> >>which they are posted so the send will block all WR posted after this >> >>send, Until-unless HCA has multiple processing elements, I think even >> >>then processing order will be maintained by HCA >> >> If not, since RNR is triggered >> >> > periodically till B post receive, does it affect the RDMA_WRITE >> >> > performance between A and B ? >> >> > >> >> > 2. extend above to three processes, A connect to B, B connect to C, >> so B >> >> > has two QPs, but one CQ.A posts a send to B, B does not post receive, >> >post ordering accross QP is not guaranteed hence presence of same CQ >> >or different CQ will not affect any thing. >> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B >> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C, >I am sorry I have missed that in both cases same DMA channel is in use. >> >_may_ affect the performance, since load is on same HCA. In case of >> >Send/Recv again _may_ affect the performance, with the same reason. >> >>Seems orthogonal. Any time h/w is shared, multiple flows will have an >>impact on one another. That is why we have the different arbitration >>mechanisms to enable one to control that impact. >Please, can you explain it more clearly? Most I/O devices are shared by multiple applications / kernel subsystems. Hence, the device acts as a serialization point for what goes on the wire / link. Sharing = resource contention and in order to add any structure to that contention, a number of technologies provide arbitration options. In the case of IB, the arbitration is confined to VL arbitration where a given data flow is assigned to a VL and that VL is services at some particular rate. A number of years ago I wrote up how one might also provide QP arbitration (not part of the IBTA specifications) and I understand some implementations have incorporated that or a variation of the mechanisms into their products. In addition to IB link contention, there is also PCI link / bus contention. For PCIe, given most designs did not want to waste resources on multiple VC, there really isn't any standard arbitration mechanism. However, many devices, especially a device like a HCA or a RNIC, already have the concept of separate resource domains, e.g. QP, and they provide a mechanism to associate how the QP's DMA requests or interrupts requests are scheduled to the PCIe link. >> >> > must sends RNR periodically to A, right?. So does the pending message >> >> > from A affects B's overall performance between B and C ? >> >But RNR NAK is not for very long time.....possibly this performance >> >hit you will not be able to observe even. The moment rnr_counter >> >expires connection will be broken! >> >>Keep in mind the timeout can be infinite. RNR NAK are not expected to be >>frequent so their performance impact was considered reasonable. >Thanks I missed that. It is a subtlety within the specification that is easy to miss. Mike From rdreier at cisco.com Thu Feb 15 09:48:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 09:48:14 -0800 Subject: [openib-general] [PATCH 2.6.21-rc1 5/5] ehca: query_port() returns LINK_UP instead UNKNOWN In-Reply-To: <200702151710.06432.hnguyen@linux.vnet.ibm.com> (Hoang-Nam Nguyen's message of "Thu, 15 Feb 2007 17:10:06 +0100") References: <200702151710.06432.hnguyen@linux.vnet.ibm.com> Message-ID: Thanks, queued 1, 2, 3 and 5 for 2.6.21. From dledford at redhat.com Thu Feb 15 09:49:43 2007 From: dledford at redhat.com (Doug Ledford) Date: Thu, 15 Feb 2007 12:49:43 -0500 Subject: [openib-general] OFED 1.2 dapl and dat.conf In-Reply-To: <45D37E8E.5050800@ichips.intel.com> References: <1171397522.21471.7.camel@stevo-desktop> <45D37E8E.5050800@ichips.intel.com> Message-ID: <1171561783.3161.165.camel@fc6.xsintricity.com> On Wed, 2007-02-14 at 13:26 -0800, Arlin Davis wrote: > Steve Wise wrote: > > >Currently, the dapl rpms don't install dat.conf. I think they probably > >should, eh? Maybe in /etc/dat.conf > > > > > my specfile is setup to target sysconfdir which is typically set to > `$(prefix)/etc' > > %{_sysconfdir}/dat.conf > > I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir > can help explain? Note that this setup is problematic on multilib arches. Since the dat.conf file hard codes a library path that's different for 32bit/64bit arches, installing both a 32bit and 64bit dapl library is impossible without munging things. For RHEL4U5/RHEL5 I changed the dat library to read dat.conf and have two separate conf files. A probably better approach would be to change the library to use a relative library name that it looks for starting from the libraries own directory. Hence if the dapl library is in /usr/lib, it looks in /usr/lib. Doing that would allow the 32bit/64bit libraries to share the same config file. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rdreier at cisco.com Thu Feb 15 09:57:48 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 09:57:48 -0800 Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion() In-Reply-To: <200702151709.45323.hnguyen@linux.vnet.ibm.com> (Hoang-Nam Nguyen's message of "Thu, 15 Feb 2007 17:09:44 +0100") References: <200702151709.45323.hnguyen@linux.vnet.ibm.com> Message-ID: Looking at this one more time, I think it actually may be buggy: > @@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d > spin_lock_init(&my_cq->spinlock); > spin_lock_init(&my_cq->cb_lock); > spin_lock_init(&my_cq->task_lock); > + init_completion(&my_cq->zero_callbacks); So you initialize the zero_callbacks completion once, at ehca_create_cq(). But then > @@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp > > spin_lock(&cq->task_lock); > cq->nr_callbacks--; > - if (cq->nr_callbacks == 0) { > + is_complete = (cq->nr_callbacks == 0); > + if (is_complete) { > list_del_init(cct->cq_list.next); > cct->cq_jobs--; > } > spin_unlock(&cq->task_lock); > + if (is_complete) /* wake up waiting destroy_cq() */ > + complete(&cq->zero_callbacks); > } every time nr_callbacks drops to 0, you complete the zero_callbacks completion. So the first time a callback runs, you will complete zero_callbacks, which will let wait_for_completion() finish even if you later increment nr_callbacks again. Also this > - while (my_cq->nr_callbacks) { > + if (my_cq->nr_callbacks) { > spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); > - yield(); > + wait_for_completion(&my_cq->zero_callbacks); > spin_lock_irqsave(&ehca_cq_idr_lock, flags); > } looks rather unsafe -- I don't see any common locking protecting both this test of nr_callbacks and the setting of nr_callbacks in the ehca irq handling... so I don't see anything protecting you from seeing nr_callbacks==0 and not going into the if() (or while() -- the old code has the same problem I think) but then doing ++nr_callbacks somewhere else. In fact since you do the idr_remove() and hipz_h_destroy_cq() *after* you make sure no callbacks are running, this seems like it could happen easily. So I'm holding off on applying this for now. Please think it over and either tell me the current patch is OK, or fix it up. There's not really too much urgency because a change like this is something I would be comfortable merging between 2.6.21-rc1 and -rc2. - R. From rdreier at cisco.com Thu Feb 15 09:59:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 09:59:29 -0800 Subject: [openib-general] How heavy to resize a CQ ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84037050CE@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Thu, 15 Feb 2007 16:13:21 -0000") References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84037050CE@G3W0634.americas.hpqcorp.net> Message-ID: > In dynamic process application, we don't know how many > connections a process will make when we create the CQ, so we don't know > the CQ size, what we do is to increase the CQ size when a new connection > is made, and decrease the CQ size when a connection is destroyed. My > question is, is ibv_resize_cq() a lightweight function call ? Do we > have to drain the CQ before we resize the CQ ? I would say that resizing a CQ is not lightweight -- I've never benchmarked it but it's probably comparable to creating a CQ or something like that. There is no requirement to drain the CQ or anything like that before resizing it -- you can resize it any time, even if it is currently getting completions or being polled. - R. From dledford at redhat.com Thu Feb 15 09:56:34 2007 From: dledford at redhat.com (Doug Ledford) Date: Thu, 15 Feb 2007 12:56:34 -0500 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <45D3E224.9060306@cse.ohio-state.edu> References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> <45D3E224.9060306@cse.ohio-state.edu> Message-ID: <1171562194.3161.169.camel@fc6.xsintricity.com> On Wed, 2007-02-14 at 23:31 -0500, Shaun Rowland wrote: > It is not clear to me why the difference of either linking libibverbs > into libsrqtest.so or not doing so causes the IBVERBS 1.1 ABI to be used > or not. I looked at the libibverbs code, and the 1.1 ABI is the default. > The libsrqtest.so file in the above case seems to have lost this > information: > > [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head > U ibv_ack_cq_events > U ibv_alloc_pd > U ibv_close_device > U ibv_create_comp_channel > U ibv_create_cq > U ibv_create_qp > U ibv_create_srq > U ibv_dealloc_pd > U ibv_dereg_mr > U ibv_destroy_comp_channel It didn't loose the information, it never had it. When you link both libs against the application binary, the linker is resolving linkups and writing that into the resulting application binary output, but unless it's allowed to write into the libsrqtest.so binary and modify *it's* link table, that particular versioning information can't be written. Obviously, if every gcc compile that touched a shared library as a source object file also attempted to write back to that source object file, people would be very surprised when their attempt to link an application failed due to permission problems on the shared library. > I've never had to deal with an ABI issue like this in shared library > linking/usage. Does it make sense for this to be the case? I think > perhaps it does, but I wanted to ask. Yes. If you want symbol information in a shared lib that uses other shared libs, then they have to be linked at .so creation time, not at application creation time. > I've placed my test code here if it helps: > > http://www.cse.ohio-state.edu/~rowland/ibverbs-examples.tar.gz > > I have a fix for our code that I am testing now. It seems to work and > solve the observed problems, but more testing will be required to be > sure there are no issues. This will require a new SRPM if the fix is > required, which it seems at this point. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rowland at cse.ohio-state.edu Thu Feb 15 10:09:26 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Thu, 15 Feb 2007 13:09:26 -0500 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <1171562194.3161.169.camel@fc6.xsintricity.com> References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> <45D3E224.9060306@cse.ohio-state.edu> <1171562194.3161.169.camel@fc6.xsintricity.com> Message-ID: <45D4A1D6.2030409@cse.ohio-state.edu> Doug Ledford wrote: > It didn't loose the information, it never had it. When you link both > libs against the application binary, the linker is resolving linkups and > writing that into the resulting application binary output, but unless > it's allowed to write into the libsrqtest.so binary and modify *it's* > link table, that particular versioning information can't be written. I thought that this might be the case, but I had never run into this before. Thanks for clearing that up. > Obviously, if every gcc compile that touched a shared library as a > source object file also attempted to write back to that source object > file, people would be very surprised when their attempt to link an > application failed due to permission problems on the shared library. Yes. I thought perhaps it would use the default ABI when the symbols were resolved when making the binary, but as I said, I've never seen this issue before. Clearly, it is working as you've described, and that one thought I had seems not to make sense. Even when I tried making my shared library the way I thought it should be done the first time, I linked libibverbs into it at shared library creation time. Only when I saw the difference did I try waiting until building the application. > Yes. If you want symbol information in a shared lib that uses other > shared libs, then they have to be linked at .so creation time, not at > application creation time. I can make this happen. I am testing it now. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From Kapil.Dukle at med.ge.com Thu Feb 15 10:22:56 2007 From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare)) Date: Thu, 15 Feb 2007 13:22:56 -0500 Subject: [openib-general] IB diagnostic tool : ibping Message-ID: Hi all, I came across a list of tools for displaying information IB nodes and testing connectivity/performance between nodes. (ex. ibping, ibstat..etc). The list can be found here: https://wiki.openfabrics.org/tiki-index.php?page=Diagnostics Is there any link online to the manual pages for these commands? The link on the page points to a server that is no longer maintained. I'm trying to ping self using ibping and it fails without showing the reason. What could be the problem? [xxx at xxx ~]$ ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027e4 System image GUID: 0x0003ba00010027e7 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027e5 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027e6 [xxx at xxx ~]$ su Password: [root at xxx]# ibping -v -G 0x0003ba00010027e5 ibwarn: [6207] ibping: Ping.. ibwarn: [6207] main: ibping to Lid 0x2 failed ibwarn: [6207] ibping: Ping.. ibwarn: [6207] main: ibping to Lid 0x2 failed ibwarn: [6207] ibping: Ping.. ibwarn: [6207] report: out due signal 2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From changquing.tang at hp.com Thu Feb 15 10:29:01 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Thu, 15 Feb 2007 18:29:01 -0000 Subject: [openib-general] How heavy to resize a CQ ? In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA84037050CE@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403705365@G3W0634.americas.hpqcorp.net> Thanks for your good point. --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Thursday, February 15, 2007 11:59 AM > To: Tang, Changqing > Cc: Michael S. Tsirkin; openib-general at openib.org > Subject: Re: How heavy to resize a CQ ? > > > In dynamic process application, we don't know how many > > connections a process will make when we create the CQ, so > we don't know > the CQ size, what we do is to increase the > CQ size when a new connection > is made, and decrease the CQ > size when a connection is destroyed. My > question is, is > ibv_resize_cq() a lightweight function call ? Do we > have > to drain the CQ before we resize the CQ ? > > I would say that resizing a CQ is not lightweight -- I've > never benchmarked it but it's probably comparable to creating > a CQ or something like that. There is no requirement to > drain the CQ or anything like that before resizing it -- you > can resize it any time, even if it is currently getting > completions or being polled. > > - R. > From boris at mellanox.com Thu Feb 15 10:35:38 2007 From: boris at mellanox.com (Boris Shpolyansky) Date: Thu, 15 Feb 2007 10:35:38 -0800 Subject: [openib-general] IB diagnostic tool : ibping Message-ID: <1E3DCD1C63492545881FACB6063A57C16E449E@mtiexch01.mti.com> Try 'man ibping' on the machine where you have OFED installed. Also 'ibping -h' will list all available flags (without explanation). Particularly for ibping command you need to start a Server first: ibping -S and then to run the client side. Hope this helps. Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Dukle, Kapil (GE Healthcare) Sent: Thursday, February 15, 2007 10:23 AM To: openib-general at openib.org Subject: [openib-general] IB diagnostic tool : ibping Hi all, I came across a list of tools for displaying information IB nodes and testing connectivity/performance between nodes. (ex. ibping, ibstat..etc). The list can be found here: https://wiki.openfabrics.org/tiki-index.php?page=Diagnostics Is there any link online to the manual pages for these commands? The link on the page points to a server that is no longer maintained. I'm trying to ping self using ibping and it fails without showing the reason. What could be the problem? [xxx at xxx ~]$ ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027e4 System image GUID: 0x0003ba00010027e7 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027e5 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027e6 [xxx at xxx ~]$ su Password: [root at xxx]# ibping -v -G 0x0003ba00010027e5 ibwarn: [6207] ibping: Ping.. ibwarn: [6207] main: ibping to Lid 0x2 failed ibwarn: [6207] ibping: Ping.. ibwarn: [6207] main: ibping to Lid 0x2 failed ibwarn: [6207] ibping: Ping.. ibwarn: [6207] report: out due signal 2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kapil.Dukle at med.ge.com Thu Feb 15 11:14:36 2007 From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare)) Date: Thu, 15 Feb 2007 14:14:36 -0500 Subject: [openib-general] IB diagnostic tool : ibping In-Reply-To: <1E3DCD1C63492545881FACB6063A57C16E449E@mtiexch01.mti.com> Message-ID: Hi, There is no manual page for ibping on the system. [root at xxx]# man ibping No manual entry for ibping [root at xxx]# ibping -h Usage: ibping [-d(ebug) -e(rr_show) -v(erbose) -G(uid) -s smlid -V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms -c ping_count -f(lood) -o oui -S(erver)] [root at xxx] I will try using the "-S" option to start the server as Boris suggested. Thanks, Kapil ________________________________ From: Boris Shpolyansky [mailto:boris at mellanox.com] Sent: Thursday, February 15, 2007 12:36 PM To: Dukle, Kapil (GE Healthcare); openib-general at openib.org Subject: RE: [openib-general] IB diagnostic tool : ibping Try 'man ibping' on the machine where you have OFED installed. Also 'ibping -h' will list all available flags (without explanation). Particularly for ibping command you need to start a Server first: ibping -S and then to run the client side. Hope this helps. Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Dukle, Kapil (GE Healthcare) Sent: Thursday, February 15, 2007 10:23 AM To: openib-general at openib.org Subject: [openib-general] IB diagnostic tool : ibping Hi all, I came across a list of tools for displaying information IB nodes and testing connectivity/performance between nodes. (ex. ibping, ibstat..etc). The list can be found here: https://wiki.openfabrics.org/tiki-index.php?page=Diagnostics Is there any link online to the manual pages for these commands? The link on the page points to a server that is no longer maintained. I'm trying to ping self using ibping and it fails without showing the reason. What could be the problem? [xxx at xxx ~]$ ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027e4 System image GUID: 0x0003ba00010027e7 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027e5 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027e6 [xxx at xxx ~]$ su Password: [root at xxx]# ibping -v -G 0x0003ba00010027e5 ibwarn: [6207] ibping: Ping.. ibwarn: [6207] main: ibping to Lid 0x2 failed ibwarn: [6207] ibping: Ping.. ibwarn: [6207] main: ibping to Lid 0x2 failed ibwarn: [6207] ibping: Ping.. ibwarn: [6207] report: out due signal 2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From HNGUYEN at de.ibm.com Thu Feb 15 11:30:13 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 20:30:13 +0100 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: <20070215174309.GD15185@mellanox.co.il> Message-ID: > > Yuk. I suppose I could write one, but I don't (and can't) use any of > > the OFED supplied build scripts in our build system, so it's hard for me > > to test since our build system is the only way I have to access > > ppc/ppc64 hardware. > Oh, well. > Other takers? OK, I've no choice to say no. Haven't look at the scripts yet. But will do in next couple of days! Nam From mshefty at ichips.intel.com Thu Feb 15 11:39:49 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 15 Feb 2007 11:39:49 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> Message-ID: <45D4B705.5020805@ichips.intel.com> > Ideas were presented around trying to construct an 'inter-subnet path record' > that contained the following: > > - Side A GRH.SGID = active side's Port GID > - Side A GRH.DGID = passive side's Port GID > - Side A LRH.SLID = any active side's port LID > - Side A LRH.DLID = A subnet router > - Side A LRH.SL = SL to A subnet router > > - Side B GRH.SGID = Side A GRH.DGID > - Side B GRH.DGID = Side A GRH.SGID > - Side B LRH.SLID = any passive side's port LID > - Side B LRH.DLID = B subnet router > - Side B LRH.SL = SL to B subnet router Until I can become convinced that the above isn't needed, I've been trying to brainstorm of ways to obtain this information. 0. Have the SA return pairs of PathRecords for inter-subnet queries. But, since this simply punts the problem to the SA, my other thought is to define the following: 1. Inter-subnet PathRecord/MultiPathRecord Get/GetTable requests require both an SGID and DGID, one of which must be subnet local to the processing SA. 2. PathRecord/MultiPathRecord Get/GetTable request fields are relative to the subnet specified by the SGID. 3. PathRecord GetResp/GetTableResp response fields are relative to the subnet local to the processing SA. 4. SAs are addressable by a well-known GID suffix. I think this may allow establishing inter-subnet connections. As an example of its usage: a. Active side issues a PathRecord query to the local SA with SGID=local, DGID=remote. b. SA responds with PathRecord(s). c. Active side selects local PathRecord P1. d. Active side issues a PathRecord query to the remote SA using PathRecord P1 to format the request: SGID, DGID, SLID, DLID, TC, FL, SL, etc. e. The remote SA responds with PathRecord(s). The SA must ensure that packets injected into the internetwork using P1 will route to the returned records. f. Active side selects remote PathRecord P2. g. Active side validates that remote packets injected using P2 route to P1. At this point, the active side should have path information that can be used to configure the QPs for a connection. Assuming that this will work, what I don't like about it is the validation at step g. This adds a third query that I don't see a way to eliminate. If the check fails, the client restarts at step c. - Sean From halr at voltaire.com Thu Feb 15 11:46:39 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Feb 2007 14:46:39 -0500 Subject: [openib-general] IB diagnostic tool : ibping In-Reply-To: References: Message-ID: <1171568778.22446.195441.camel@hal.voltaire.com> On Thu, 2007-02-15 at 14:14, Dukle, Kapil (GE Healthcare) wrote: > Hi, > > There is no manual page for ibping on the system. > > [root at xxx]# man ibping > No manual entry for ibping > [root at xxx]# ibping -h > Usage: ibping [-d(ebug) -e(rr_show) -v(erbose) -G(uid) -s smlid > -V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms -c ping_count > -f(lood) -o oui -S(erver)] > [root at xxx] > > I will try using the "-S" option to start the server as Boris > suggested. What version of OFED are you running ? 1.0 ? 1.1 had them as does 1.2. Attached is the latest man page but nothing has changed. -- Hal > Thanks, > Kapil > > > ______________________________________________________________________ > From: Boris Shpolyansky [mailto:boris at mellanox.com] > Sent: Thursday, February 15, 2007 12:36 PM > To: Dukle, Kapil (GE Healthcare); openib-general at openib.org > Subject: RE: [openib-general] IB diagnostic tool : ibping > > > Try 'man ibping' on the machine where you have OFED installed. > Also 'ibping -h' will list all available flags (without explanation). > > Particularly for ibping command you need to start a Server first: > ibping -S > and then to run the client side. > > Hope this helps. > > Boris Shpolyansky > Sr. Member of Technical Staff > Applications > Mellanox Technologies Inc. > 2900 Stender Way > Santa Clara, CA 95054 > Tel.: (408) 916 0014 > Fax: (408) 970 3403 > Cell: (408) 834 9365 > www.mellanox.com > > > ______________________________________________________________________ > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Dukle, Kapil > (GE Healthcare) > Sent: Thursday, February 15, 2007 10:23 AM > To: openib-general at openib.org > Subject: [openib-general] IB diagnostic tool : ibping > > > Hi all, > > I came across a list of tools for displaying information IB nodes and > testing connectivity/performance between nodes. (ex. ibping, > ibstat..etc). > The list can be found here: > https://wiki.openfabrics.org/tiki-index.php?page=Diagnostics > > Is there any link online to the manual pages for these commands? The > link on the page points to a server that is no longer maintained. > > I'm trying to ping self using ibping and it fails without showing the > reason. What could be the problem? > > [xxx at xxx ~]$ ibstat > CA 'mthca0' > CA type: MT25208 (MT23108 compat mode) > Number of ports: 2 > Firmware version: 4.7.400 > Hardware version: a0 > Node GUID: 0x0003ba00010027e4 > System image GUID: 0x0003ba00010027e7 > Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 2 > LMC: 0 > SM lid: 1 > Capability mask: 0x02510a68 > Port GUID: 0x0003ba00010027e5 > Port 2: > State: Down > Physical state: Polling > Rate: 10 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x0003ba00010027e6 > [xxx at xxx ~]$ su > Password: > [root at xxx]# ibping -v -G 0x0003ba00010027e5 > ibwarn: [6207] ibping: Ping.. > ibwarn: [6207] main: ibping to Lid 0x2 failed > ibwarn: [6207] ibping: Ping.. > ibwarn: [6207] main: ibping to Lid 0x2 failed > ibwarn: [6207] ibping: Ping.. > ibwarn: [6207] report: out due signal 2 > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- .TH IBPING 8 "August 11, 2006" "OpenIB" "OpenIB Diagnostics" .SH NAME ibping \- ping an InfiniBand address .SH SYNOPSIS .B ibping [\-d(ebug)] [\-e(rr_show)] [\-v(erbose)] [\-G(uid)] [\-C ca_name] [\-P ca_port] [\-s smlid] [\-t(imeout) timeout_ms] [\-V(ersion)] [\-c ping_count] [\-f(lood)] [\-o oui] [\-S(erver)] [\-h(elp)] .SH DESCRIPTION .PP ibping uses vendor mads to validate connectivity between IB nodes. On exit, (IP) ping like output is show. ibping is run as client/server. Default is to run as client. Note also that a default ping server is implemented within the kernel. .SH OPTIONS .PP .TP \fB\-c\fR stop after count packets .TP \fB\-f\fR, \fB\-\-flood\fR flood destination: send packets back to back without delay .TP \fB\-o\fR, \fB\-\-oui\fR use specified OUI number to multiplex vendor mads .TP \fB\-S\fR, \fB\-\-Server\fR start in server mode (do not return) .SH COMMON OPTIONS Most OpenIB diagnostics take the following common flags. The exact list of supported flags per utility can be found in the usage message and can be shown using the util_name -h syntax. # Debugging flags .PP \-d raise the IB debugging level. May be used several times (-ddd or -d -d -d). .PP \-e show send and receive errors (timeouts and others) .PP \-h show the usage message .PP \-v increase the application verbosity level. May be used several times (-vv or -v -v -v) .PP \-V show the version info. # Addressing flags .PP \-G use GUID address argument. In most cases, it is the Port GUID. Example: "0x08f1040023" .PP \-s use 'smlid' as the target lid for SM/SA queries. # Other common flags: .PP \-C use the specified ca_name. .PP \-P use the specified ca_port. .PP \-t override the default timeout for the solicited mads. Multiple CA/Multiple Port Support When no IB device or port is specified, the port to use is selected by the following criteria: .PP 1. the first port that is ACTIVE. .PP 2. if not found, the first port that is UP (physical link up). If a port and/or CA name is specified, the user request is attempted to be fulfilled, and will fail if it is not possible. .SH AUTHOR .TP Hal Rosenstock .RI < halr at voltaire.com > From swise at opengridcomputing.com Thu Feb 15 11:54:22 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 13:54:22 -0600 Subject: [openib-general] [PATCH] iw_cxgb3 Fix copyrights in the iw_cxgb3 driver. Message-ID: <1171569262.13282.59.camel@stevo-desktop> Fix copyrights in the iw_cxgb3 driver. Remove the Open Grid Computing copyright. It shouldn't be there. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/cxio_dbg.c | 1 - drivers/infiniband/hw/cxgb3/cxio_hal.c | 1 - drivers/infiniband/hw/cxgb3/cxio_hal.h | 1 - drivers/infiniband/hw/cxgb3/cxio_resource.c | 1 - drivers/infiniband/hw/cxgb3/cxio_resource.h | 1 - drivers/infiniband/hw/cxgb3/cxio_wr.h | 1 - drivers/infiniband/hw/cxgb3/iwch.c | 1 - drivers/infiniband/hw/cxgb3/iwch.h | 1 - drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 - drivers/infiniband/hw/cxgb3/iwch_cm.h | 1 - drivers/infiniband/hw/cxgb3/iwch_cq.c | 1 - drivers/infiniband/hw/cxgb3/iwch_ev.c | 1 - drivers/infiniband/hw/cxgb3/iwch_mem.c | 1 - drivers/infiniband/hw/cxgb3/iwch_provider.c | 1 - drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 - drivers/infiniband/hw/cxgb3/iwch_qp.c | 1 - drivers/infiniband/hw/cxgb3/iwch_user.h | 1 - 17 files changed, 0 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/cxio_dbg.c index 5a7306f..75f7b16 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_dbg.c +++ b/drivers/infiniband/hw/cxgb3/cxio_dbg.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 82fa720..114ac3b 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h index 1b97e80..8ab04a7 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.h +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.c b/drivers/infiniband/hw/cxgb3/cxio_resource.c index 997aa32..65bf577 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_resource.c +++ b/drivers/infiniband/hw/cxgb3/cxio_resource.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.h b/drivers/infiniband/hw/cxgb3/cxio_resource.h index a6bbe83..a2703a3 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_resource.h +++ b/drivers/infiniband/hw/cxgb3/cxio_resource.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h index 103fc42..90d7b89 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_wr.h +++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c index 4611afa..0315c9d 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.c +++ b/drivers/infiniband/hw/cxgb3/iwch.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h index 6517ef8..caf4e60 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.h +++ b/drivers/infiniband/hw/cxgb3/iwch.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index a522b1b..e5442e3 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h index 7c810d9..0c6f281 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.h +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c index 98b3bdb..d7624c1 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cq.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c index a6efa8f..54362af 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_ev.c +++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c index 2b6cd53..a6c2c4b 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_mem.c +++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 6861087..2aef122 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 61e3278..2af3e93 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index da13a38..4dda2f6 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h index c4e7fbe..cb7086f 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_user.h +++ b/drivers/infiniband/hw/cxgb3/iwch_user.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU From HNGUYEN at de.ibm.com Thu Feb 15 11:54:43 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 15 Feb 2007 20:54:43 +0100 Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion() In-Reply-To: Message-ID: Hi, > So I'm holding off on applying this for now. Please think it over and > either tell me the current patch is OK, or fix it up. There's not > really too much urgency because a change like this is something I > would be comfortable merging between 2.6.21-rc1 and -rc2. You're absolutely right. Let's target for rc2. Thanks for this good catch! Nam From swise at opengridcomputing.com Thu Feb 15 11:59:55 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 13:59:55 -0600 Subject: [openib-general] [PATCH 1/2] ofed_1_2 Fix copyrights in the cxgb3 driver. Message-ID: <1171569595.13282.60.camel@stevo-desktop> Fix copyrights in the cxgb3 driver. Remove the Open Grid Computing copyright. It shouldn't be there. Signed-off-by: Steve Wise --- drivers/net/cxgb3/cxgb3_defs.h | 1 - drivers/net/cxgb3/cxgb3_offload.c | 1 - drivers/net/cxgb3/cxgb3_offload.h | 1 - drivers/net/cxgb3/l2t.c | 1 - drivers/net/cxgb3/l2t.h | 1 - drivers/net/cxgb3/t3cdev.h | 1 - 6 files changed, 0 insertions(+), 6 deletions(-) diff --git a/drivers/net/cxgb3/cxgb3_defs.h b/drivers/net/cxgb3/cxgb3_defs.h old mode 100755 new mode 100644 index 16e0049..e14862b --- a/drivers/net/cxgb3/cxgb3_defs.h +++ b/drivers/net/cxgb3/cxgb3_defs.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/net/cxgb3/cxgb3_offload.c b/drivers/net/cxgb3/cxgb3_offload.c old mode 100755 new mode 100644 index c3a02d6..46e9068 --- a/drivers/net/cxgb3/cxgb3_offload.c +++ b/drivers/net/cxgb3/cxgb3_offload.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/net/cxgb3/cxgb3_offload.h b/drivers/net/cxgb3/cxgb3_offload.h old mode 100755 new mode 100644 index 0e6beb6..f15446a --- a/drivers/net/cxgb3/cxgb3_offload.h +++ b/drivers/net/cxgb3/cxgb3_offload.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/net/cxgb3/l2t.c b/drivers/net/cxgb3/l2t.c old mode 100755 new mode 100644 index 3c0cb85..d660af7 --- a/drivers/net/cxgb3/l2t.c +++ b/drivers/net/cxgb3/l2t.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2003-2007 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/net/cxgb3/l2t.h b/drivers/net/cxgb3/l2t.h old mode 100755 new mode 100644 index ba5d2cb..d790013 --- a/drivers/net/cxgb3/l2t.h +++ b/drivers/net/cxgb3/l2t.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2003-2007 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/net/cxgb3/t3cdev.h b/drivers/net/cxgb3/t3cdev.h old mode 100755 new mode 100644 index 9af3bcd..fa4099b --- a/drivers/net/cxgb3/t3cdev.h +++ b/drivers/net/cxgb3/t3cdev.h @@ -1,6 +1,5 @@ /* * Copyright (C) 2006-2007 Chelsio Communications. All rights reserved. - * Copyright (C) 2006-2007 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU From swise at opengridcomputing.com Thu Feb 15 12:00:21 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 14:00:21 -0600 Subject: [openib-general] [PATCH 2/2] ofed_1_2 Fix copyrights in the iw_cxgb3 driver. Message-ID: <1171569621.13282.62.camel@stevo-desktop> Fix copyrights in the iw_cxgb3 driver. Remove the Open Grid Computing copyright. It shouldn't be there. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/core/cxio_dbg.c | 1 - drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 1 - drivers/infiniband/hw/cxgb3/core/cxio_hal.h | 1 - drivers/infiniband/hw/cxgb3/core/cxio_resource.c | 1 - drivers/infiniband/hw/cxgb3/core/cxio_resource.h | 1 - drivers/infiniband/hw/cxgb3/core/cxio_wr.h | 1 - drivers/infiniband/hw/cxgb3/iwch.c | 1 - drivers/infiniband/hw/cxgb3/iwch.h | 1 - drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 - drivers/infiniband/hw/cxgb3/iwch_cm.h | 1 - drivers/infiniband/hw/cxgb3/iwch_cq.c | 1 - drivers/infiniband/hw/cxgb3/iwch_ev.c | 1 - drivers/infiniband/hw/cxgb3/iwch_mem.c | 1 - drivers/infiniband/hw/cxgb3/iwch_provider.c | 1 - drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 - drivers/infiniband/hw/cxgb3/iwch_qp.c | 1 - drivers/infiniband/hw/cxgb3/iwch_user.h | 1 - 17 files changed, 0 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c index dfaa704..d6b6c97 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c +++ b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c index 5e31816..229edd5 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h index e5e702d..1553bda 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c index d1d8722..cf78050 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c +++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h index a6bbe83..a2703a3 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h +++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h index 234a084..6c7ac55 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h +++ b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c index 0c95f2c..de44c57 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.c +++ b/drivers/infiniband/hw/cxgb3/iwch.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h index 8b11198..8d9390b 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.h +++ b/drivers/infiniband/hw/cxgb3/iwch.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 3237fc8..21fadbe 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h index 893f9d0..855f1ef 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.h +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c index ff09509..225fcfa 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cq.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c index 646f612..f4cd5ec 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_ev.c +++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c index 5909ec5..335e9a4 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_mem.c +++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 4a46771..3f64dbf 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index d9d94e3..7322773 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 9cc8b5e..e1e35d9 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h index e8ff061..bf0a2f6 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_user.h +++ b/drivers/infiniband/hw/cxgb3/iwch_user.h @@ -1,6 +1,5 @@ /* * Copyright (c) 2006 Chelsio, Inc. All rights reserved. - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU From rdreier at cisco.com Thu Feb 15 12:08:32 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 12:08:32 -0800 Subject: [openib-general] remap_page_range() in older kernels In-Reply-To: <1171554919.13282.17.camel@stevo-desktop> (Steve Wise's message of "Thu, 15 Feb 2007 09:55:19 -0600") References: <1171554919.13282.17.camel@stevo-desktop> Message-ID: > Do you remember any issues with using remap_page_range() in older > kernels for mapping memory allocated in the kernel back to a user > process? No, I would have thought it should work just like remap_pfn_range() in later kernels. - R. From swise at opengridcomputing.com Thu Feb 15 12:19:48 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 15 Feb 2007 14:19:48 -0600 Subject: [openib-general] remap_page_range() in older kernels In-Reply-To: References: <1171554919.13282.17.camel@stevo-desktop> Message-ID: <1171570788.13282.69.camel@stevo-desktop> On Thu, 2007-02-15 at 12:08 -0800, Roland Dreier wrote: > > Do you remember any issues with using remap_page_range() in older > > kernels for mapping memory allocated in the kernel back to a user > > process? > > No, I would have thought it should work just like remap_pfn_range() in > later kernels. > > - R. Me too. But it definitely isn't working for cxgb3. Sigh... From krause at cup.hp.com Thu Feb 15 12:44:02 2007 From: krause at cup.hp.com (Michael Krause) Date: Thu, 15 Feb 2007 12:44:02 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <45D4B705.5020805@ichips.intel.com> References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> <45D4B705.5020805@ichips.intel.com> Message-ID: <6.2.0.14.2.20070215123631.09692088@esmail.cup.hp.com> At 11:39 AM 2/15/2007, Sean Hefty wrote: >>Ideas were presented around trying to construct an 'inter-subnet path record' >>that contained the following: >> - Side A GRH.SGID = active side's Port GID >> - Side A GRH.DGID = passive side's Port GID >> - Side A LRH.SLID = any active side's port LID >> - Side A LRH.DLID = A subnet router >> - Side A LRH.SL = SL to A subnet router >> - Side B GRH.SGID = Side A GRH.DGID >> - Side B GRH.DGID = Side A GRH.SGID >> - Side B LRH.SLID = any passive side's port LID >> - Side B LRH.DLID = B subnet router >> - Side B LRH.SL = SL to B subnet router > >Until I can become convinced that the above isn't needed, I've been trying >to brainstorm of ways to obtain this information. Is this first an IBTA problem to solve if you believe there is a problem? I believe the track you are on is incorrect and any attempt to surface subnet local information across subnets will create unnecessary complexity and therefore makes such solutions less practical to execute within the industry. I've tried to illustrate the role of the router, how the flows work, etc. I believe these to be correct and are reflected not only in the existing specifications but also the prior router specification work and thinking. They also parallel the IP world quite nicely which should also lend credence that subnet-local information does not need to be exchanged between subnets. I contend CM does not require anything that is subnet local other than to target a given router port which should be derived from local SM/SA only information. I will further state that SA-SA communication sans perhaps a P_Key / Q_Key service lookup should be avoided wherever possible. I strongly urge you to take this problem to the IBTA where any issues regarding specification interpretation can be sorted out and an official position taken. This will yield a faster and more successful investigation into whether there is a problem and if so, how best to solve it. Mike >0. Have the SA return pairs of PathRecords for inter-subnet queries. > >But, since this simply punts the problem to the SA, my other thought is to >define the following: > >1. Inter-subnet PathRecord/MultiPathRecord Get/GetTable requests require >both an SGID and DGID, one of which must be subnet local to the processing SA. >2. PathRecord/MultiPathRecord Get/GetTable request fields are relative to >the subnet specified by the SGID. >3. PathRecord GetResp/GetTableResp response fields are relative to the >subnet local to the processing SA. >4. SAs are addressable by a well-known GID suffix. > >I think this may allow establishing inter-subnet connections. As an >example of >its usage: > >a. Active side issues a PathRecord query to the local SA with SGID=local, >DGID=remote. >b. SA responds with PathRecord(s). >c. Active side selects local PathRecord P1. >d. Active side issues a PathRecord query to the remote SA using PathRecord >P1 to >format the request: SGID, DGID, SLID, DLID, TC, FL, SL, etc. >e. The remote SA responds with PathRecord(s). The SA must ensure that >packets injected into the internetwork using P1 will route to the returned >records. >f. Active side selects remote PathRecord P2. >g. Active side validates that remote packets injected using P2 route to P1. > >At this point, the active side should have path information that can be >used to >configure the QPs for a connection. > >Assuming that this will work, what I don't like about it is the validation >at step g. This adds a third query that I don't see a way to >eliminate. If the check fails, the client restarts at step c. > >- Sean From rdreier at cisco.com Thu Feb 15 13:10:39 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 13:10:39 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> (Sean Hefty's message of "Tue, 6 Feb 2007 12:00:22 -0800") References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> Message-ID: So I'm reading this over, and the following code looks kind of odd to me: > +int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num, > + union ib_gid *mgid, struct ib_sa_mcmember_rec *rec) > > ... > > + } else { > + memset(rec, 0, sizeof *rec); > + ib_get_cached_gid(device, port_num, 0, &rec->port_gid); > + rec->pkey = 0xFFFF; > + get_random_bytes(&rec->qkey, sizeof rec->qkey); > + rec->join_state = 1; > + } Where is this particular hard-coded P_Key value coming from? And how about the Q_Key -- why is a random one being chosen? Does it matter that this is setting the privileged bit of the Q_Key at random? The only place this code seems to be used is in cma_join_ib_multicast(), which overwrites all the values that get set here anyway. (Except it leaves the Q_Key if the portspace is not UDP??) Would it be more sensible to leave the P_Key and Q_Key initialized to 0 here, and let the caller handle it? I don't see how the multicast tracking module can pick a sensible default here. Also, should we check the return value of ib_get_cached_gid()? - R. From rdreier at cisco.com Thu Feb 15 13:47:12 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 13:47:12 -0800 Subject: [openib-general] [PATCH] 2.6.21 iwcm - iw_cm_id destruction race condition fixes. In-Reply-To: <1171548576.12187.2.camel@stevo-desktop> (Steve Wise's message of "Thu, 15 Feb 2007 08:09:36 -0600") References: <1171548576.12187.2.camel@stevo-desktop> Message-ID: thanks, applied From rdreier at cisco.com Thu Feb 15 13:48:57 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 13:48:57 -0800 Subject: [openib-general] [PATCH] 2.6.21 iw_cxgb3 Fail posts synchronously when in TERMINATE state. In-Reply-To: <1171550942.13282.5.camel@stevo-desktop> (Steve Wise's message of "Thu, 15 Feb 2007 08:49:02 -0600") References: <1171550942.13282.5.camel@stevo-desktop> Message-ID: thanks, applied. From rdreier at cisco.com Thu Feb 15 13:50:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 15 Feb 2007 13:50:30 -0800 Subject: [openib-general] [PATCH] iw_cxgb3 Fix copyrights in the iw_cxgb3 driver. In-Reply-To: <1171569262.13282.59.camel@stevo-desktop> (Steve Wise's message of "Thu, 15 Feb 2007 13:54:22 -0600") References: <1171569262.13282.59.camel@stevo-desktop> Message-ID: thanks, applied From mshefty at ichips.intel.com Thu Feb 15 14:05:24 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 15 Feb 2007 14:05:24 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <6.2.0.14.2.20070215123631.09692088@esmail.cup.hp.com> References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> <45D4B705.5020805@ichips.intel.com> <6.2.0.14.2.20070215123631.09692088@esmail.cup.hp.com> Message-ID: <45D4D924.8070507@ichips.intel.com> > Is this first an IBTA problem to solve if you believe there is a problem? Based on my interpretation, I do not believe that there's an error in the architecture. It seems consistent. Additional clarification of what PathRecord fields mean when the GIDs are on different subnets may be needed, and a change to the architecture may make things easier to implement, but that's a separate matter. > I contend CM does not require anything that is subnet local other than to > target a given router port which should be derived from local SM/SA only Then please state how the passive side obtains the information (e.g. SLID/DLID) it needs in order to configure its QP. I claim that information is carried in the CM REQ. The alternatives that I see are: 1. The passive side extracts the data from the LRH that carries the CM REQ. 2. The passive side issues its own local path record query. Will you please clarify where this information comes from? > I will further state that SA-SA communication sans perhaps a > P_Key / Q_Key service lookup should be avoided wherever possible. I agree - which is why my proposal avoided SA-SA communication. I see nothing in the architecture that prohibits a node from querying an SA that is not on its local subnet. - Sean From purdy at sgi.com Thu Feb 15 14:08:24 2007 From: purdy at sgi.com (Dale Purdy) Date: Thu, 15 Feb 2007 16:08:24 -0600 Subject: [openib-general] sl2vl tables Message-ID: We are experimenting with OFED 1.2 (alpha1) and have dumped the SL2VL tables for both a switch port and an HCA port using the smpqueury command: switch: # SL2VL table: Lid 103 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| ports: in 1, out 1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| ... HCA: # SL2VL table: Lid 37 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| I would have expected the behavior that the switch describes - a one to one mapping mod the supported number of VLs. But I can't explain why the HCA VLs are in reverse order to the SL. If this were a host endian issue I would have expected both to behave the same. Can someone explain what is going on? Dale From mst at mellanox.co.il Thu Feb 15 14:16:13 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 16 Feb 2007 00:16:13 +0200 Subject: [openib-general] [PATCH for-2.6.21] IB/ipoib: error handling thinko fix Message-ID: <20070215221613.GB26227@mellanox.co.il> ipoib_cm_alloc_rx_skb might be called from IRQ context, so it must use dev_kfree_skb_any, not kfree_skb Signed-off-by: Michael S. Tsirkin --- This one's obvious, isn't it? diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 8ee6f06..e388a41 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -148,7 +148,7 @@ partial_error: for (; i >= 0; --i) ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); - kfree_skb(skb); + dev_kfree_skb_any(skb); return -ENOMEM; } -- MST From mshefty at ichips.intel.com Thu Feb 15 14:39:47 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 15 Feb 2007 14:39:47 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> Message-ID: <45D4E133.3000302@ichips.intel.com> > > + memset(rec, 0, sizeof *rec); > > + ib_get_cached_gid(device, port_num, 0, &rec->port_gid); > > + rec->pkey = 0xFFFF; > > + get_random_bytes(&rec->qkey, sizeof rec->qkey); > > + rec->join_state = 1; > > + } > > Where is this particular hard-coded P_Key value coming from? And how > about the Q_Key -- why is a random one being chosen? Does it matter > that this is setting the privileged bit of the Q_Key at random? The idea behind this part of the call was to return the user an MCMemberRecord that they can use to create a new multicast group. Maybe it would be better to just drop this functionality and fail any lookups for mgid 0, but to answer your questions: The pkey is the default partition, full membership pkey. I believe all nodes will have either 0xffff or 0x7fff as their pkey. We could probably call ib_get_cached_pkey() instead and just use the first entry in the table. We don't want to to set the privileged bit of the q_key, so that's wrong. Good catch. > The only place this code seems to be used is in > cma_join_ib_multicast(), which overwrites all the values that get set > here anyway. (Except it leaves the Q_Key if the portspace is not UDP??) > Would it be more sensible to leave the P_Key and Q_Key initialized to > 0 here, and let the caller handle it? I don't see how the multicast > tracking module can pick a sensible default here. The user can overwrite any of the values that they don't like as defaults before sending the actual join. > Also, should we check the return value of ib_get_cached_gid()? That is probably best. There's shouldn't be much harm if the call fails; the MCMemberRecord will be invalid, and the future join request will fail. - Sean From halr at voltaire.com Thu Feb 15 14:39:36 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Feb 2007 17:39:36 -0500 Subject: [openib-general] sl2vl tables In-Reply-To: References: Message-ID: <1171579140.22446.204899.camel@hal.voltaire.com> On Thu, 2007-02-15 at 17:08, Dale Purdy wrote: > We are experimenting with OFED 1.2 (alpha1) and have dumped the SL2VL > tables for both a switch port and an HCA port using the smpqueury > command: > > switch: > # SL2VL table: Lid 103 > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > ports: in 0, out 1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| > ports: in 1, out 1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| > ... > > HCA: > # SL2VL table: Lid 37 > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > ports: in 0, out 0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| > > I would have expected the behavior that the switch describes - a one > to one mapping mod the supported number of VLs. But I can't explain > why the HCA VLs are in reverse order to the SL. If this were a host > endian issue I would have expected both to behave the same. Can > someone explain what is going on? Is this on powerup of HCA node or after some SM potentially programs the HCA (for this) ? I typically see (for HCAs): # SL2VL table: Lid 10 # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| ports: in 0, out 0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| -- Hal > Dale > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From purdy at sgi.com Thu Feb 15 14:48:31 2007 From: purdy at sgi.com (Dale Purdy) Date: Thu, 15 Feb 2007 16:48:31 -0600 Subject: [openib-general] sl2vl tables In-Reply-To: <1171579140.22446.204899.camel@hal.voltaire.com> References: <1171579140.22446.204899.camel@hal.voltaire.com> Message-ID: We are experimenting with LASH. It appears that the SL2VL tables don't get initialized unless QoS is enabled on the opensm command line (-Q). Enabling this seems to rectify the problem. So it would appear that LASH needs to enable this also. Dale On Thu, 15 Feb 2007, Hal Rosenstock wrote: > On Thu, 2007-02-15 at 17:08, Dale Purdy wrote: > > We are experimenting with OFED 1.2 (alpha1) and have dumped the SL2VL > > tables for both a switch port and an HCA port using the smpqueury > > command: > > > > switch: > > # SL2VL table: Lid 103 > > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > > ports: in 0, out 1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| > > ports: in 1, out 1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| > > ... > > > > HCA: > > # SL2VL table: Lid 37 > > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > > ports: in 0, out 0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| > > > > I would have expected the behavior that the switch describes - a one > > to one mapping mod the supported number of VLs. But I can't explain > > why the HCA VLs are in reverse order to the SL. If this were a host > > endian issue I would have expected both to behave the same. Can > > someone explain what is going on? > > Is this on powerup of HCA node or after some SM potentially programs the > HCA (for this) ? > > I typically see (for HCAs): > # SL2VL table: Lid 10 > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > ports: in 0, out 0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > > -- Hal > > > Dale > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > From halr at voltaire.com Thu Feb 15 15:17:56 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Feb 2007 18:17:56 -0500 Subject: [openib-general] sl2vl tables In-Reply-To: References: <1171579140.22446.204899.camel@hal.voltaire.com> Message-ID: <1171581435.22446.206644.camel@hal.voltaire.com> On Thu, 2007-02-15 at 17:48, Dale Purdy wrote: > We are experimenting with LASH. It appears that the SL2VL tables > don't get initialized unless QoS is enabled on the opensm command line > (-Q). Enabling this seems to rectify the problem. So it would appear > that LASH needs to enable this also. You can run with -Q or set no_qos to FALSE in the opensm.opts file. I wonder whether we should tie LASH to this so this isn't needed. -- Hal > Dale > > On Thu, 15 Feb 2007, Hal Rosenstock wrote: > > > On Thu, 2007-02-15 at 17:08, Dale Purdy wrote: > > > We are experimenting with OFED 1.2 (alpha1) and have dumped the SL2VL > > > tables for both a switch port and an HCA port using the smpqueury > > > command: > > > > > > switch: > > > # SL2VL table: Lid 103 > > > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > > > ports: in 0, out 1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| > > > ports: in 1, out 1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| > > > ... > > > > > > HCA: > > > # SL2VL table: Lid 37 > > > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > > > ports: in 0, out 0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| > > > > > > I would have expected the behavior that the switch describes - a one > > > to one mapping mod the supported number of VLs. But I can't explain > > > why the HCA VLs are in reverse order to the SL. If this were a host > > > endian issue I would have expected both to behave the same. Can > > > someone explain what is going on? > > > > Is this on powerup of HCA node or after some SM potentially programs the > > HCA (for this) ? > > > > I typically see (for HCAs): > > # SL2VL table: Lid 10 > > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15| > > ports: in 0, out 0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7| > > > > -- Hal > > > > > Dale > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > > From purdy at sgi.com Thu Feb 15 15:24:01 2007 From: purdy at sgi.com (Dale Purdy) Date: Thu, 15 Feb 2007 17:24:01 -0600 Subject: [openib-general] sl2vl tables In-Reply-To: <1171581435.22446.206644.camel@hal.voltaire.com> References: <1171579140.22446.204899.camel@hal.voltaire.com> <1171581435.22446.206644.camel@hal.voltaire.com> Message-ID: On Thu, 15 Feb 2007, Hal Rosenstock wrote: > On Thu, 2007-02-15 at 17:48, Dale Purdy wrote: > > We are experimenting with LASH. It appears that the SL2VL tables > > don't get initialized unless QoS is enabled on the opensm command line > > (-Q). Enabling this seems to rectify the problem. So it would appear > > that LASH needs to enable this also. > > You can run with -Q or set no_qos to FALSE in the opensm.opts file. > > I wonder whether we should tie LASH to this so this isn't needed. > I think that would be a good idea. Dale From lareliquia.angulo at gmail.com Thu Feb 15 18:37:35 2007 From: lareliquia.angulo at gmail.com (lareliquia.angulo at gmail.com) Date: Fri, 16 Feb 2007 03:37:35 +0100 Subject: [openib-general] Propuesta Message-ID: <41149-22007251623734640@nanuk-1806bbde9> Primeramente pedirle perd�n por las molestias. Quiero darle las gracias por dedicarme parte de su tiempo. Les pido su colaboraci�n, si pueden poner alg�n articulo, alg�n tipo de referencia o simplemente comentarlo entre sus conocidos, les estar�a muy agradecido. Francisco Angulo (Madrid, 1976) Ha estudiado inform�tica, es inventor y un gran entusiasta de los avances tecnol�gicos, aunque con algunas reservas bien meditadas. Declara tener una fuerte conciencia por la conservaci�n del medio ambiente lo que le impele a buscar f�rmulas que ayuden de un modo pr�ctico a contribuir a la sostenibilidad en el desarrollo de nuestras sociedades. Actualmente ha ideado y patentado diferentes motores ecol�gicos que espera puedan ser de utilidad para lograr este fin. En su aspecto art�stico, Francisco Angulo es escritor, hace dise�o digital y su obra comienza con fuerza mostrando aspectos singulares llenos de una sugestiva vitalidad, humor, intriga e imaginaci�n. http://www.sirlebert.com/xalfdm/pol40.mp3 les paso el comienzo de mi novela LA RELIQUIA "Ojos casta�os� pasaba largas horas observ�ndome; no s� lo que vio en m�, pero le encantaba sentarse en la hierba en frente y mirarme detenidamente; lo cierto es que me encantaba contemplarla. Era de altura peque�a, no llegaba al metro y medio, f�sicamente delgada, ten�a una piel morena que sol�a llevar cubierta con pieles de animales para protegerse del fr�o; tambi�n portaba diferentes adornos en el pelo dependiendo de la �poca del a�o: en primavera acostumbraba trenzarse algunas flores y en invierno algunas cintas tintadas de colores; adem�s habituaba ponerse alg�n adorno colgando del cuello a modo de collar, normalmente alguna tira fina de cuero, y, como joya, alguna concha o figurilla de barro que ella misma modelaba con sus manos. Pertenec�a a una tribu que se hab�a establecido cerca de mi posici�n, en unas cuevas poco profundas, que utilizaban como hogar. �Ojos casta�os� ten�a una mirada intensa y observaba todo con curiosidad, intentando comprender el mundo que la rodeaba, como si todo formase parte de un mundo m�gico; percib�a el movimiento en las copas de los �rboles provocado por el viento, sosten�a sobre su mano insectos con cuidado de no da�arlos, y despu�s de contemplarlos intentando comprender qu� eran, los devolv�a de nuevo a la tierra. Tambi�n le encantaba observar los p�jaros e imitarlos; acostumbraba divertirse corriendo en c�rculos a m� alrededor, estirando los brazos y movi�ndolos arriba y abajo como si fuese un ave. En primavera crec�a una hierba alta en la peque�a pradera que se encontraba a la izquierda, una pradera de hierba verde y alta, plagada de dientes de le�n. A �Ojos casta�os� le encantaba saltar sobre el verde y con sus saltos se llenaba todo de la simiente de los dientes de le�n, que eran arrastradas por la suave brisa de primavera. Aquella bella criatura era incansable y pod�a tirarse horas saltando y jugando a atrapar las semillas que revoloteaban en el viento, cuando ascend�an, �Ojos casta�os� dejaba de saltar y se quedaba quieta, de pie, con la cara hacia arriba, los ojos cerrados y esperando en silencio. Entonces, algunas empezaban a descender suavemente y ca�an sobre su cara acarici�ndola. Me hubiese gustado poder notar aquella sensaci�n, sentir c�mo las suaves semillas ca�an sobre m� como plumas; en algunas ocasiones alguna le entraba en la nariz y la hac�an estornudar; eso me parec�a muy gracioso, porque �Ojos casta�os� se quedaba muy sorprendida, con gesto de preguntarse qu� era lo que hab�a ocurrido. Menos los d�as de lluvia, ven�a a verme siempre; era algo que me hac�a ilusi�n y, cuando el d�a despertaba soleado, la esperaba hasta que la ve�a aparecer subiendo la pendiente que llegaba hasta mi posici�n; por lo general, sub�a tarareando alguna melod�a y saltando al caminar. f�rmame el libro d visitas xfa! www.lareliquia.es El articulo: Este articulo lo escrib� hace ya varios a�os, pero es ahora cuando empezamos a ver sus repercusiones. La tortilla de ma�z �alimento para los veh�culos norteamericanos� La sustituci�n del combustible diesel por el de Biodiesel es totalmente inviable ya que para el consumo actual ser� necesario un cultivo de �girasol o ma�z� mayor de � del territorio nacional y mucho m�s imposible en pa�ses como Jap�n. Las ventajas de la obtenci�n del combustible de los residuos org�nicos mediante bacterias, nos permite un cultivo cada 3 d�as mientras que uno de �girasol o ma�z�, es de un a�o. Que las bacterias se alimentan principalmente de agua y no necesitan ning�n cuidado, no ocurriendo lo mismo con los cultivos de �girasol o ma�z�. Por otra parte el cultivo de �girasol o ma�z� utiliza muchos elementos perjudiciales para el medio ambiente, desde nitratos y fertilizantes, hasta toda clase de productos qu�micos para fumigar lo que termina perjudicando seriamente nuestro medio ambiente. En cambio la utilizaci�n de residuos org�nicos nos da una ventaja m�s que es eliminar las basuras de una forma totalmente ecol�gica, pues en todo el proceso de obtenci�n del nuevo Biodiesel no utilizamos qu�mica alguna. www.lareliquia.es Un abrazo: Francisco Angulo From rowland at cse.ohio-state.edu Thu Feb 15 19:50:46 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Thu, 15 Feb 2007 22:50:46 -0500 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <1171558785.13282.29.camel@stevo-desktop> References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> <45D3E224.9060306@cse.ohio-state.edu> <1171558785.13282.29.camel@stevo-desktop> Message-ID: <45D52A16.1080803@cse.ohio-state.edu> Steve Wise wrote: > Shaun, > > Lemme know if you have an mvapich2 kit that I can test with iwarp... Hi Steve. I've updated our SRPM: https://www.openfabrics.org/~rowland/ofed_1_2/ The latest is mvapich2-0.9.8-4.src.rpm. This version should solve the shared library linking issues. This can be built outside of the OFED 1.2 alpha1 release with the information in the README file or can replace the previous SRPM in the OFED-1.2-alpha1/SRPMS/ directory. To use iWARP, use the OFA build of the SRPM and set MV2_ENABLE_IWARP_MODE=1 in your environment. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From xma at us.ibm.com Thu Feb 15 20:41:11 2007 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 15 Feb 2007 20:41:11 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: Message-ID: Hello, Roland, We have a customer issue regarding IPv6oIB. In the subnet, there are limited number of MCGs supported. So when there are multiple IPv6 addresses are assigned to one interface, each IPv6 address will have one unique solicited-node address (depends on their groupID). Then in a large subnet, we will have tons of MCGs. If IPv6 solicited node addresses exceed the number of MDGs in this subnet, then IPv6 neighbour discovery will be broken, this won't happen in Ethernet since sendonly doesn't require sender to be joined any MCG. According to IPoIB RFCs, it only covers MCG beyond this subnet, For MCG not in this subnet, direct the packet to all routes or broadcast. ( for IPv6 should be all hosts address here), but not cover MCG overflow in this subnet. (Currently it's not implemented in openFabric.) I have done an initial patch to addresss MCG overflow problem and redirect the solicited-node address to all hosts node address, thus IPv6 neighbour discovery will work no matter how many IPv6 addresses in this subnet. This patch is only triggered with IPv6 enabled and MGC overflows, so there is almost no performance penalty. The patch seems working, although it is still under validation and test. I would like to hear your opinion regarding how to address this problem, whether I am in the right direction ... Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Thu Feb 15 21:42:27 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Thu, 15 Feb 2007 22:42:27 -0700 Subject: [openib-general] IB routing discussion summary In-Reply-To: <45D4B705.5020805@ichips.intel.com> References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> <45D4B705.5020805@ichips.intel.com> Message-ID: <20070216054227.GB10942@obsidianresearch.com> On Thu, Feb 15, 2007 at 11:39:49AM -0800, Sean Hefty wrote: > I think this may allow establishing inter-subnet connections. As an > example of its usage: I think you are right, this does contain enough information. > a. Active side issues a PathRecord query to the local SA with SGID=local, > DGID=remote. > b. SA responds with PathRecord(s). > c. Active side selects local PathRecord P1. > d. Active side issues a PathRecord query to the remote SA using PathRecord > P1 to > format the request: SGID, DGID, SLID, DLID, TC, FL, SL, etc. > e. The remote SA responds with PathRecord(s). The SA must ensure that > packets injected into the internetwork using P1 will route to the returned > records. > f. Active side selects remote PathRecord P2. > g. Active side validates that remote packets injected using P2 route to P1. Let me add something: - Side A GRH.SGID = P2.DGID - Side A GRH.DGID = P1.DGID - Side A GRH.TC/FL = P2.TC/FL - Side A LRH.SLID = P1.SLID - Side A LRH.DLID = P1.DLID - Side A LRH.SL = P1.SL - Side B GRH.SGID = P1.DGID - Side B GRH.DGID = P2.DGID - Side B GRH.TC/FL = P1.TC/FL - Side B LRH.SLID = P2.SLID - Side B LRH.DLID = P2.DLID - Side B LRH.SL = P2.SL [Side A information programs the active sides QP. Side B information goes into the REQ and is sent to passive side that then uses it directly to program the QP. The passive side never does a PR.] Inverting the TC/FL source can avoid requirement g. When P1 is produced it also generates a TC/FL that ensures the LIDs match for the reverse direction. When P2 is created it does the same. Here is the interesting bit: If the SAs support 'multiple router paths' then they must have a small bit of global information which is that packets entering router LID x,y,z on subnet Y appear on local subnet router port A. Enough information is in the P2 request to lookup in this table to learn the input port. In forming P2 the SA will use that bit of global informtion to restrict the router LIDs to one that ends up on port A. Once it selects a router LID and CA LID it produces a TC/FL that ensures those LIDs are selected by the local router port A. Choosing the DGIDs as I did can allow for some multipathing via GID if an implementation goes that way. [Though IMHO, doing this creates new ugly problems, and doesn't solve the SLID selection issue.] This changes the definition of a PR, the returned GRH fields are the fields that the *remote* must send to produce the LIDs in the PR. If you define TC/FL like this then I don't know what happens when UD makes a PR query and uses the returned TC/FL in the local QP configuration... It may actually be better to have a new query type and have SA take care of it. In this solution new semantics for PR are defined, it requires the SA magic GID, and it co-opts the flowlabel/tc to solve the QP LID matching issue. Michael is probably right and we should find a way to ask IBTA for implementation guidance. Hopefully that can be done swiftly.. Jason From krkumar2 at in.ibm.com Thu Feb 15 22:29:02 2007 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 16 Feb 2007 11:59:02 +0530 Subject: [openib-general] [RFC] [PATCH] iWARP Connection parameters negotiation Message-ID: <20070216062902.10631.98327.sendpatchset@K50wks273871wss.in.ibm.com> Hi all, In the Nov 16-17th 2006 OpenFabrics Developer Summit, the following presentation by Tom talked of negotiating parameters at the time of establishing a connection : http://www.openfabrics.org/conference/nov2006sc/OFA-Newstuff-SC06.ppt See "IRD/ORD Negotiation, Option #1" This is an RFC patch that implements the same (after interracting with Tom, Steve and Arkady). The working of this functionality is described by Tom below (in one of his mails to me) : "Yes, basically what has to happen is that the IRD/ORD information needs to be exchanged in the private data as part of connect/accept. The parameters should be pre-pended followed by the user specified private data. The IWCM strips the ORD/IRD header off before delivering the private data to the consumer. BTW, if you look at the IB code you can see how they do this already for port number, etc... "The options specified are really whether to be permissive when the remote peer won't honor your request. My opinion is that the policy should be permissive, which is to say that if you ask for 8 outgoing, but the peer responds with 4 incoming, you set up the connection and the local node adjusts his ORD down to 4 to match the peers IRD response. Does this make sense?" Please provide your comments/suggestions/(flames). Patch applies to 2.6.20. Thanks, - KK diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2007-02-16 11:01:08.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2007-02-16 11:01:12.000000000 +0530 @@ -2059,19 +2059,81 @@ out: return ret; } +/* + * set_connp_fields() : Set various fields of iw_param based on connnect or + * accept parameters (connp). Also, allocates a private header if + * config_privhdr is true. + * + * Returns a private header on success and -errno converted-to-pointer on + * failure. A null private header indicates that config_privhdr was false. + */ +static struct iwcm_priv_hdr *set_connp_fields(struct iw_cm_conn_param *iw_param, + struct rdma_conn_param *connp, + int config_privhdr) +{ + struct iwcm_priv_hdr *priv_hdr; + + iw_param->ord = connp->initiator_depth; + iw_param->ird = connp->responder_resources; + + if (!config_privhdr) { + /* + * We are not configured to send private header (for active), + * or the peer didn't send a private header (for passive); in + * both cases, do not use private header. + */ + priv_hdr = NULL; /* Success */ + iw_param->private_data = connp->private_data; + iw_param->private_data_len = connp->private_data_len; + goto out; + } + + if ((typeof (connp->private_data_len)) + (connp->private_data_len + sizeof *priv_hdr) < + connp->private_data_len) { + /* Overflow - private_data_len + priv_hdr is too large */ + /* xxx.KK - Help - there is a better way to check overflow + * than this - some macro that is already inbuilt. + */ + priv_hdr = ERR_PTR(-EOVERFLOW); + goto out; + } + + /* Allocate memory for both private data and iwcm_priv_hdr */ + iw_param->private_data_len = connp->private_data_len + sizeof *priv_hdr; + iw_param->private_data = kmalloc(iw_param->private_data_len, + GFP_KERNEL); + if (!iw_param->private_data) { + priv_hdr = ERR_PTR(-ENOMEM); + goto out; + } + + /* Prepend iwcm_priv_hdr header before actual private data */ + priv_hdr = (struct iwcm_priv_hdr *) iw_param->private_data; + priv_hdr->ord = cpu_to_be32(iw_param->ord); + priv_hdr->ird = cpu_to_be32(iw_param->ird); + if (connp->private_data_len) + memcpy(priv_hdr->private_data, connp->private_data, + connp->private_data_len); + +out: + return priv_hdr; +} + +extern int iw_send_private_header; + static int cma_connect_iw(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct iw_cm_id *cm_id; struct sockaddr_in* sin; - int ret; struct iw_cm_conn_param iw_param; + struct iwcm_priv_hdr *priv_hdr = NULL; + int ret; cm_id = iw_create_cm_id(id_priv->id.device, cma_iw_handler, id_priv); - if (IS_ERR(cm_id)) { - ret = PTR_ERR(cm_id); - goto out; - } + if (IS_ERR(cm_id)) + return PTR_ERR(cm_id); id_priv->cm_id.iw = cm_id; @@ -2085,17 +2147,28 @@ static int cma_connect_iw(struct rdma_id if (ret) goto out; - iw_param.ord = conn_param->initiator_depth; - iw_param.ird = conn_param->responder_resources; - iw_param.private_data = conn_param->private_data; - iw_param.private_data_len = conn_param->private_data_len; + /* Initialize iw_param fields */ + priv_hdr = set_connp_fields(&iw_param, conn_param, + iw_send_private_header); + if (IS_ERR(priv_hdr)) { + ret = PTR_ERR(priv_hdr); + goto out; + } + + /* + * Save iwcm_priv_hdr till we get a connect response to negotiate. + * priv_hdr can be NULL, indicating that negotiation is disabled. + */ + cm_id->priv_hdr = priv_hdr; + if (id_priv->id.qp) iw_param.qpn = id_priv->qp_num; else iw_param.qpn = conn_param->qp_num; ret = iw_cm_connect(cm_id, &iw_param); out: - if (ret && !IS_ERR(cm_id)) { + if (ret) { + /* cm_id->priv_hdr is freed up in iwcm_deref_id() */ iw_destroy_cm_id(cm_id); id_priv->cm_id.iw = NULL; } @@ -2185,23 +2258,51 @@ out: static int cma_accept_iw(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { + struct iw_cm_id *cm_id; + struct iwcm_priv_hdr *priv_hdr; struct iw_cm_conn_param iw_param; int ret; + cm_id = id_priv->cm_id.iw; + + /* Initialize iw_param fields */ + priv_hdr = set_connp_fields(&iw_param, conn_param, + (cm_id->priv_hdr ? 1 : 0) & + iw_send_private_header); + if (IS_ERR(priv_hdr)) { + ret = PTR_ERR(priv_hdr); + goto out; + } + ret = cma_modify_qp_rtr(&id_priv->id); if (ret) - return ret; + goto out; - iw_param.ord = conn_param->initiator_depth; - iw_param.ird = conn_param->responder_resources; - iw_param.private_data = conn_param->private_data; - iw_param.private_data_len = conn_param->private_data_len; - if (id_priv->id.qp) { + if (id_priv->id.qp) iw_param.qpn = id_priv->qp_num; - } else + else iw_param.qpn = conn_param->qp_num; - return iw_cm_accept(id_priv->cm_id.iw, &iw_param); + ret = iw_cm_accept(cm_id, &iw_param); + +out: + /* + * Free the just allocated priv_hdr+private data. priv_hdr could + * be NULL if negotiation is not configured, or if the active side + * didn't use private header. + */ + if (!IS_ERR(priv_hdr)) + kfree(priv_hdr); + + /* + * Free priv_hdr that was allocated in parse_connection_params(). + */ + if (cm_id->priv_hdr) { + kfree(cm_id->priv_hdr); + cm_id->priv_hdr = NULL; + } + + return ret; } static int cma_send_sidr_rep(struct rdma_id_private *id_priv, diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c --- org/drivers/infiniband/core/iwcm.c 2007-02-16 11:01:08.000000000 +0530 +++ new/drivers/infiniband/core/iwcm.c 2007-02-16 11:01:12.000000000 +0530 @@ -54,6 +54,12 @@ MODULE_AUTHOR("Tom Tucker"); MODULE_DESCRIPTION("iWARP CM"); MODULE_LICENSE("Dual BSD/GPL"); +int iw_send_private_header __read_mostly = 0; +module_param_named(iw_send_private_header, iw_send_private_header, int, 0644); +MODULE_PARM_DESC(iw_send_private_header, + "Enable private iwcm connection negotiation header"); +EXPORT_SYMBOL(iw_send_private_header); + static struct workqueue_struct *iwcm_wq; struct iwcm_work { struct work_struct work; @@ -158,6 +164,7 @@ static int iwcm_deref_id(struct iwcm_id_ BUG_ON(atomic_read(&cm_id_priv->refcount)==0); if (atomic_dec_and_test(&cm_id_priv->refcount)) { BUG_ON(!list_empty(&cm_id_priv->work_list)); + kfree(cm_id_priv->id.priv_hdr); if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, @@ -470,6 +477,72 @@ int iw_cm_reject(struct iw_cm_id *cm_id, } EXPORT_SYMBOL(iw_cm_reject); +/* + * iw_negotiate_qp_conn_params() : gets incoming and outgoing ird/ord + * parameters and modifies incoming ird/ord if required as per Sequences + * #7 and #8 noted in Section 6.6.1.1; and Sequences #9 and 10 noted in + * 6.6.1.2 of the hilland draft. + * + * This routine also modifies the caller's ird/ord so that accept is + * provided with this smaller ird/ord. + * + * This routine is called by : + * - iw_cm_accept() for accepting an incoming connection, and + * - parse_connection_params() IW_CM_EVENT_CONNECT_REPLY case where + * the client gets response to the connect request; + * and since both these callers have process context, it is OK to call + * ib_modify_qp (which can sleep). + * + * Returns 0 if negotiation was done successfully; and < 0 if no negotiation + * was required or if modify_qp() failed (which is not a catastrophic error). + */ +static int iw_negotiate_qp_conn_params(struct ib_qp *qp, int *l_ird, int *l_ord, + int r_ird, int r_ord) +{ + int qp_mask = 0; /* mask of attributes to be modified */ + int ret = -1; /* no negotiation required */ + + if (*l_ord > r_ird) { + /* + * Local Outgoing is bigger than Peer Incoming, reduce + * my Outgoing. + */ + pr_debug("%s: Reducing outgoing from %d to %d\n", __FUNCTION__, + *l_ord, r_ird); + *l_ord = r_ird; + qp_mask = IB_QP_MAX_QP_RD_ATOMIC; + } + + if (*l_ird > r_ord) { + /* + * Local Incoming is greater than Peer Outgoing, reduce + * my Incoming. + */ + pr_debug("%s: Reducing incoming from %d to %d\n", __FUNCTION__, + *l_ird, r_ord); + *l_ird = r_ord; + qp_mask |= IB_QP_MAX_DEST_RD_ATOMIC; + } + + if (qp_mask) { + struct ib_qp_attr qp_attr; + + qp_attr.max_rd_atomic = *l_ird; + qp_attr.max_dest_rd_atomic = *l_ord; + ret = ib_modify_qp(qp, &qp_attr, qp_mask); + pr_debug("%s: modify qp with qp_mask:%x returns %d\n", + __FUNCTION__, qp_mask, ret); + /* xxx.KK : This does NOTHING in amso driver, as + * c2_qp_set_read_limits() is used to set rdma limits, this + * seems to be a limitation of driver as modify_qp is + * supposed to do the same. See mthca_modify_qp which + * modifies the limits. Maybe chelsio driver supports this. + */ + } + + return ret; +} + /* * CM_ID <-- ESTABLISHED * @@ -483,6 +556,7 @@ int iw_cm_accept(struct iw_cm_id *cm_id, struct iwcm_id_private *cm_id_priv; struct ib_qp *qp; unsigned long flags; + int r_ird, r_ord; int ret; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); @@ -505,6 +579,44 @@ int iw_cm_accept(struct iw_cm_id *cm_id, cm_id_priv->qp = qp; spin_unlock_irqrestore(&cm_id_priv->lock, flags); + if (!iw_send_private_header || !cm_id->priv_hdr) { + /* + * Not configured to send extra header, or peer didn't use + * private header. + */ + goto accept; + } + + /* + * Retrieve client's connection parameters that were earlier saved + * from an incoming client connect request, and negotiate with the + * accept parameters. iw_param contains local connection values + * while cm_id->priv_hdr contains peer connection values. + */ + + r_ord = be32_to_cpu(cm_id->priv_hdr->ord); + r_ird = be32_to_cpu(cm_id->priv_hdr->ird); + + kfree(cm_id->priv_hdr); + cm_id->priv_hdr = NULL; + + ret = iw_negotiate_qp_conn_params(qp, &iw_param->ird, + &iw_param->ord, r_ird, r_ord); + if (!ret) { + struct iwcm_priv_hdr *priv_hdr; + + /* + * iw_param's ird/ord now contains new values, update the + * prepended header's ird/ord fields to reflect this change + * so as to let the other side know of these updated values. + */ + priv_hdr = (struct iwcm_priv_hdr *) iw_param->private_data; + + priv_hdr->ord = cpu_to_be32(iw_param->ord); + priv_hdr->ird = cpu_to_be32(iw_param->ird); + } + +accept: ret = cm_id->device->iwcm->accept(cm_id, iw_param); if (ret) { /* An error on accept precludes provider events */ @@ -584,6 +696,126 @@ int iw_cm_connect(struct iw_cm_id *cm_id EXPORT_SYMBOL(iw_cm_connect); /* + * parse_connection_params() : parse connection parameters for both + * active side requests and passive side responses, and negotiate + * suitable values for ird/ord. + * + * Returns 0 on success and -errno on failure (via modify_qp). Also + * returns the actual iw_event on success by stripping the extra + * header that was added by the remote peer. + */ +static int parse_connection_params(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event, struct iw_cm_event *new_iw_event) +{ + int ret = 0; + int l_ird, l_ord, r_ird, r_ord; + struct iw_cm_id *cm_id; + struct iwcm_priv_hdr *local_priv_hdr, *remote_priv_hdr; + + cm_id = &cm_id_priv->id; + local_priv_hdr = cm_id->priv_hdr; + *new_iw_event = *iw_event; + + if (!iw_send_private_header) { + /* Not configured to send extra header */ + BUG_ON(local_priv_hdr); + goto out; + } + + remote_priv_hdr = iw_event->private_data; + if (!iw_has_private_header(remote_priv_hdr, iw_event)) { + /* + * Remote side has not sent a iw private header, we use the + * old protocol. + */ + if (local_priv_hdr) { + /* + * This is the active side code path. We had already + * sent a private header when doing the connect, but + * the server does not implement private header, so + * we should fail the connect response otherwise we + * end up having passed wrong private data to the + * application on the server. + */ + kfree(local_priv_hdr); + cm_id->priv_hdr = NULL; + ret = -EAGAIN; + } + goto out; + } + + switch (iw_event->event) { + case IW_CM_EVENT_CONNECT_REQUEST: + /* + * This is Server code - a connect request was received. + * Allocate a iwcm_priv_hdr that can be used later for + * negotiation when accept() is performed. + */ + BUG_ON(local_priv_hdr); + + /* + * Save only the header (and not the private data passed + * in the connect()). By the time an accept is done, we + * will have the local params which we can then compare + * with the remote params that we are just saving (the + * peer's real private data is passed to the app later in + * the flow when we call id->cm_handler(new_iw_event)). + */ + cm_id->priv_hdr = kmalloc(sizeof *cm_id->priv_hdr, GFP_KERNEL); + if (!cm_id->priv_hdr) { + ret = -ENOMEM; + goto out; + } + + /* + * Save all the remote connection parameters. These will be + * later used for negotiation when an accept is performed. + * Save in Big-Endian format. + */ + cm_id->priv_hdr->ord = remote_priv_hdr->ord; + cm_id->priv_hdr->ird = remote_priv_hdr->ird; + break; + + case IW_CM_EVENT_CONNECT_REPLY: + /* + * This is Client code - a connect response was received, + * use local connection parameters saved earlier and the + * remote connection parameters to negotiate. + */ + + BUG_ON(!local_priv_hdr); + + + l_ord = be32_to_cpu(local_priv_hdr->ord); + l_ird = be32_to_cpu(local_priv_hdr->ird); + r_ord = be32_to_cpu(remote_priv_hdr->ord); + r_ird = be32_to_cpu(remote_priv_hdr->ird); + + kfree(local_priv_hdr); + cm_id->priv_hdr = NULL; + + BUG_ON(!cm_id_priv->qp); + (void) iw_negotiate_qp_conn_params(cm_id_priv->qp, &l_ird, + &l_ord, r_ird, r_ord); + break; + + default: + /* Should never get here */ + BUG(); + break; + } + + /* + * Reset the new event values to point to the real private_data/len. + */ + new_iw_event->private_data_len -= sizeof(struct iwcm_priv_hdr); + new_iw_event->private_data = remote_priv_hdr->private_data; + +out: + return ret; +} + +/* * Passive Side: new CM_ID <-- CONN_RECV * * Handles an inbound connect request. The function creates a new @@ -604,6 +836,7 @@ static void cm_conn_req_handler(struct i unsigned long flags; struct iw_cm_id *cm_id; struct iwcm_id_private *cm_id_priv; + struct iw_cm_event new_iw_event; int ret; /* @@ -644,8 +877,13 @@ static void cm_conn_req_handler(struct i goto out; } - /* Call the client CM handler */ - ret = cm_id->cm_handler(cm_id, iw_event); + /* Save the incoming connection parameters in cm_id->priv_hdr */ + ret = parse_connection_params(cm_id_priv, iw_event, &new_iw_event); + if (!ret) { + /* Call the client CM handler */ + ret = cm_id->cm_handler(cm_id, &new_iw_event); + } + if (ret) { set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); destroy_cm_id(cm_id); @@ -704,6 +942,7 @@ static int cm_conn_rep_handler(struct iw struct iw_cm_event *iw_event) { unsigned long flags; + struct iw_cm_event new_iw_event; int ret; spin_lock_irqsave(&cm_id_priv->lock, flags); @@ -724,7 +963,15 @@ static int cm_conn_rep_handler(struct iw cm_id_priv->state = IW_CM_STATE_IDLE; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); - ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); + + /* + * iw_event contains peer connection parameters, while + * cm_id->priv_hdr has local connection parameters. Use these + * to negotiate mutually agreeable connection parameters. + */ + ret = parse_connection_params(cm_id_priv, iw_event, &new_iw_event); + if (!ret) + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, &new_iw_event); if (iw_event->private_data_len) kfree(iw_event->private_data); diff -ruNp org/include/rdma/iw_cm.h new/include/rdma/iw_cm.h --- org/include/rdma/iw_cm.h 2007-02-16 11:01:08.000000000 +0530 +++ new/include/rdma/iw_cm.h 2007-02-16 11:01:12.000000000 +0530 @@ -86,6 +86,32 @@ typedef int (*iw_cm_handler)(struct iw_c typedef int (*iw_event_handler)(struct iw_cm_id *cm_id, struct iw_cm_event *event); +/* + * Header prepended before the actual private data passed during + * connection establishment in connect/accept calls. + */ +struct iwcm_priv_hdr { + u32 ord; /* Outbound RDMA Read Queue Depth */ + u32 ird; /* Inbound RDMA Read Queue Depth */ + /* Other negotiation params come here */ + char private_data[0]; /* copy the real private data here */ +} __attribute__ ((packed)); + +/* + * Returns true if priv_hdr seemingly points to a real private header + * structure. Conditions to determine that a private header is present + * are : + * - non NULL pointer. + * - event's private data is atleast equal to private header size. + * Note : This can return false positives. + */ +static inline int iw_has_private_header(struct iwcm_priv_hdr *priv_hdr, + struct iw_cm_event *event) +{ + return (priv_hdr && + event->private_data_len >= sizeof (struct iwcm_priv_hdr)); +} + struct iw_cm_id { iw_cm_handler cm_handler; /* client callback function */ void *context; /* client cb context */ @@ -93,6 +119,7 @@ struct iw_cm_id { struct sockaddr_in local_addr; struct sockaddr_in remote_addr; void *provider_data; /* provider private data */ + struct iwcm_priv_hdr *priv_hdr; /* Extra header added */ iw_event_handler event_handler; /* cb for provider events */ /* Used by provider to add and remove refs on IW cm_id */ From vlad at lists.openfabrics.org Fri Feb 16 02:23:37 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Fri, 16 Feb 2007 02:23:37 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070216-0200 daily build status Message-ID: <20070216102337.F18D4E603C0@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Failed: From dy.manju at gmail.com Fri Feb 16 05:02:04 2007 From: dy.manju at gmail.com (manju y) Date: Fri, 16 Feb 2007 18:32:04 +0530 Subject: [openib-general] SRP-FMR Message-ID: Hi All I am a newbie to SRP Can any one please explain or provide some links to understand the same. Thanks Manju -------------- next part -------------- An HTML attachment was scrubbed... URL: From dy.manju at gmail.com Fri Feb 16 05:29:00 2007 From: dy.manju at gmail.com (manju y) Date: Fri, 16 Feb 2007 18:59:00 +0530 Subject: [openib-general] fast memory registration Message-ID: Hi All I am a newbie to SRP Can any one please explain or provide some links to understand the the FMR(fast memory registration) concept used in ib_srp.c Thanks Manju From swise at opengridcomputing.com Fri Feb 16 07:06:19 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 16 Feb 2007 09:06:19 -0600 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <45D52A16.1080803@cse.ohio-state.edu> References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> <45D3E224.9060306@cse.ohio-state.edu> <1171558785.13282.29.camel@stevo-desktop> <45D52A16.1080803@cse.ohio-state.edu> Message-ID: <1171638379.1066.6.camel@stevo-desktop> Ok I'll try it out today! Thanks, Steve. On Thu, 2007-02-15 at 22:50 -0500, Shaun Rowland wrote: > Steve Wise wrote: > > Shaun, > > > > Lemme know if you have an mvapich2 kit that I can test with iwarp... > > Hi Steve. I've updated our SRPM: > > https://www.openfabrics.org/~rowland/ofed_1_2/ > > The latest is mvapich2-0.9.8-4.src.rpm. This version should solve the > shared library linking issues. This can be built outside of the OFED 1.2 > alpha1 release with the information in the README file or can replace > the previous SRPM in the OFED-1.2-alpha1/SRPMS/ directory. To use iWARP, > use the OFA build of the SRPM and set MV2_ENABLE_IWARP_MODE=1 in your > environment. From brettmcmillian at swbell.net Fri Feb 16 08:48:34 2007 From: brettmcmillian at swbell.net (Brett McMillian) Date: Fri, 16 Feb 2007 08:48:34 -0800 (PST) Subject: [openib-general] krping.c changes Message-ID: <568620.47287.qm@web81513.mail.mud.yahoo.com> I wasn't sure who I should email about this, but I recently got krping to work between an Opteron and a PPC G5. However, in order for krping to work I had to make the following changes to krping.c to ensure the address, key, and length were being sent across the network as big endian, otherwise they were in machine dependent byte order. static void krping_format_send(struct krping_cb *cb, u64 buf, struct ib_mr *mr) { struct krping_rdma_info *info = &cb->send_buf; - info->buf = buf; - info->rkey = mr->rkey; - info->size = cb->size; + info->buf = cpu_to_be64(buf); + info->rkey = cpu_to_be32(mr->rkey); + info->size = cpu_to_be32(cb->size); DEBUG_LOG("RDMA addr %llx rkey %x len %d\n", info->buf, info->rkey, info->size); } static int server_recv(struct krping_cb *cb, struct ib_wc *wc) { if (wc->byte_len != sizeof(cb->recv_buf)) { printk(KERN_ERR PFX "Received bogus data, size %d\n", wc->byte_len); return -1; } - cb->remote_rkey = cb->recv_buf.rkey; - cb->remote_addr = cb->recv_buf.buf; - cb->remote_len = cb->recv_buf.size; + cb->remote_rkey = be32_to_cpu(cb->recv_buf.rkey); + cb->remote_addr = be64_to_cpu(cb->recv_buf.buf); + cb->remote_len = be32_to_cpu(cb->recv_buf.size); DEBUG_LOG("Received rkey %x addr %llx len %d from peer\n", cb->remote_rkey, cb->remote_addr, cb->remote_len); if (cb->state <= CONNECTED || cb->state == RDMA_WRITE_COMPLETE) cb->state = RDMA_READ_ADV; else cb->state = RDMA_WRITE_ADV; return 0; } Brett McMillian -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Feb 16 09:00:48 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 09:00:48 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: (Shirley Ma's message of "Thu, 15 Feb 2007 20:41:11 -0800") References: Message-ID: > We have a customer issue regarding IPv6oIB. In the subnet, there are > limited number of MCGs supported. So when there are multiple IPv6 addresses > are assigned to one interface, each IPv6 address will have one unique > solicited-node address (depends on their groupID). Then in a large subnet, > we will have tons of MCGs. If IPv6 solicited node addresses exceed the > number of MDGs in this subnet, then IPv6 neighbour discovery will be > broken, this won't happen in Ethernet since sendonly doesn't require sender > to be joined any MCG. > I have done an initial patch to addresss MCG overflow problem and redirect > the solicited-node address to all hosts node address, thus IPv6 neighbour > discovery will work no matter how many IPv6 addresses in this subnet. This > patch is only triggered with IPv6 enabled and MGC overflows, so there is > almost no performance penalty. I really don't like this approach, since it can break things in very subtle ways (eg suppose one node fails to join its solicited node group, but then a later node wants to talk to it and succeeds in joining the solicited node group as a send-only member -- since the first node is not a member then it will never see the ND messages). I much prefer to fix the SM not to impose too-low limits on the number of MCGs. Supporting O(# nodes) MCGs is really not a very onerous requirement on the SM. - R. From rdreier at cisco.com Fri Feb 16 09:06:33 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 09:06:33 -0800 Subject: [openib-general] [PATCH for-2.6.21] IB/ipoib: error handling thinko fix In-Reply-To: <20070215221613.GB26227@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 16 Feb 2007 00:16:13 +0200") References: <20070215221613.GB26227@mellanox.co.il> Message-ID: Thanks, queued for 2.6.21. From swise at opengridcomputing.com Fri Feb 16 09:15:46 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 16 Feb 2007 11:15:46 -0600 Subject: [openib-general] mvapich2 ofed 1.2 problem In-Reply-To: <45D52A16.1080803@cse.ohio-state.edu> References: <1171380610.15471.25.camel@stevo-desktop> <1171386686.15471.36.camel@stevo-desktop> <45D1FD0B.2080606@cse.ohio-state.edu> <45D3E224.9060306@cse.ohio-state.edu> <1171558785.13282.29.camel@stevo-desktop> <45D52A16.1080803@cse.ohio-state.edu> Message-ID: <1171646146.10345.3.camel@stevo-desktop> Good news! This SRPM works with alpha1. I'm able to run the IMB benchmarks on a 4 node iwarp cluster! Thanks, Steve. On Thu, 2007-02-15 at 22:50 -0500, Shaun Rowland wrote: > Steve Wise wrote: > > Shaun, > > > > Lemme know if you have an mvapich2 kit that I can test with iwarp... > > Hi Steve. I've updated our SRPM: > > https://www.openfabrics.org/~rowland/ofed_1_2/ > > The latest is mvapich2-0.9.8-4.src.rpm. This version should solve the > shared library linking issues. This can be built outside of the OFED 1.2 > alpha1 release with the information in the README file or can replace > the previous SRPM in the OFED-1.2-alpha1/SRPMS/ directory. To use iWARP, > use the OFA build of the SRPM and set MV2_ENABLE_IWARP_MODE=1 in your > environment. From halr at voltaire.com Fri Feb 16 09:19:43 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Feb 2007 12:19:43 -0500 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: References: Message-ID: <1171646341.22446.266964.camel@hal.voltaire.com> On Fri, 2007-02-16 at 12:00, Roland Dreier wrote: > > We have a customer issue regarding IPv6oIB. In the subnet, there are > > limited number of MCGs supported. So when there are multiple IPv6 addresses > > are assigned to one interface, each IPv6 address will have one unique > > solicited-node address (depends on their groupID). Then in a large subnet, > > we will have tons of MCGs. If IPv6 solicited node addresses exceed the > > number of MDGs in this subnet, then IPv6 neighbour discovery will be > > broken, this won't happen in Ethernet since sendonly doesn't require sender > > to be joined any MCG. > > > I have done an initial patch to addresss MCG overflow problem and redirect > > the solicited-node address to all hosts node address, thus IPv6 neighbour > > discovery will work no matter how many IPv6 addresses in this subnet. This > > patch is only triggered with IPv6 enabled and MGC overflows, so there is > > almost no performance penalty. > > I really don't like this approach, since it can break things in very > subtle ways (eg suppose one node fails to join its solicited node > group, but then a later node wants to talk to it and succeeds in > joining the solicited node group as a send-only member -- since the > first node is not a member then it will never see the ND messages). > > I much prefer to fix the SM not to impose too-low limits on the number > of MCGs. Supporting O(# nodes) MCGs is really not a very onerous > requirement on the SM. Is this a MFT size issue or SM issue or both ? -- Hal > - R. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Fri Feb 16 09:22:15 2007 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 16 Feb 2007 09:22:15 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: Message-ID: Roland, Thanks for your quick response. Even SM supports 1000 MCGs, it's still not sufficitent for 250 nodes cluster, each node have 4 links for IPv6 without any scope/global IPv6 address configured.(250*4+ a few default MCGs) There will be a MCG overflow problem anyway in IPv6oIB. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Roland Dreier To Shirley Ma/Beaverton/IBM at IBMUS 02/16/2007 09:00 cc AM "Michael S. Tsirkin" , openib-general at openib.org Subject Re: IPv6oIB neighbour discover broken when MCGs overflow > We have a customer issue regarding IPv6oIB. In the subnet, there are > limited number of MCGs supported. So when there are multiple IPv6 addresses > are assigned to one interface, each IPv6 address will have one unique > solicited-node address (depends on their groupID). Then in a large subnet, > we will have tons of MCGs. If IPv6 solicited node addresses exceed the > number of MDGs in this subnet, then IPv6 neighbour discovery will be > broken, this won't happen in Ethernet since sendonly doesn't require sender > to be joined any MCG. > I have done an initial patch to addresss MCG overflow problem and redirect > the solicited-node address to all hosts node address, thus IPv6 neighbour > discovery will work no matter how many IPv6 addresses in this subnet. This > patch is only triggered with IPv6 enabled and MGC overflows, so there is > almost no performance penalty. I really don't like this approach, since it can break things in very subtle ways (eg suppose one node fails to join its solicited node group, but then a later node wants to talk to it and succeeds in joining the solicited node group as a send-only member -- since the first node is not a member then it will never see the ND messages). I much prefer to fix the SM not to impose too-low limits on the number of MCGs. Supporting O(# nodes) MCGs is really not a very onerous requirement on the SM. - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic12802.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From rdreier at cisco.com Fri Feb 16 09:25:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 09:25:30 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: (Shirley Ma's message of "Fri, 16 Feb 2007 09:22:15 -0800") References: Message-ID: > Even SM supports 1000 MCGs, it's still not sufficitent for 250 nodes > cluster, each node have 4 links for IPv6 without any scope/global IPv6 > address configured.(250*4+ a few default MCGs) There will be a MCG overflow > problem anyway in IPv6oIB. But what's the problem with supporting 1000 or even 10000 MCGs? - R. From rdreier at cisco.com Fri Feb 16 09:27:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 09:27:14 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: <1171646341.22446.266964.camel@hal.voltaire.com> (Hal Rosenstock's message of "16 Feb 2007 12:19:43 -0500") References: <1171646341.22446.266964.camel@hal.voltaire.com> Message-ID: > > I much prefer to fix the SM not to impose too-low limits on the number > > of MCGs. Supporting O(# nodes) MCGs is really not a very onerous > > requirement on the SM. > > Is this a MFT size issue or SM issue or both ? Well as we discussed before, the size of the MFT is really independent of the # of MCGs supported. It's up to the SM how to allocate MLIDs, and as long as all the switches in the fabric support at least one MLID, then any number of MCGs can be managed by the SM. So I would say this is entirely an SM issue. - R. From halr at voltaire.com Fri Feb 16 09:32:37 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Feb 2007 12:32:37 -0500 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: References: <1171646341.22446.266964.camel@hal.voltaire.com> Message-ID: <1171647108.22446.267617.camel@hal.voltaire.com> On Fri, 2007-02-16 at 12:27, Roland Dreier wrote: > > > I much prefer to fix the SM not to impose too-low limits on the number > > > of MCGs. Supporting O(# nodes) MCGs is really not a very onerous > > > requirement on the SM. > > > > Is this a MFT size issue or SM issue or both ? > > Well as we discussed before, the size of the MFT is really independent > of the # of MCGs supported. It's up to the SM how to allocate MLIDs, > and as long as all the switches in the fabric support at least one > MLID, then any number of MCGs can be managed by the SM. Almost but not quite. > So I would say this is entirely an SM issue. I thought that mapping multiple MCGs to the same MLID requires that a set of the (group) parameters are the same. Is that the case for these IPv6 groups ? Is the only variable in those parameters the PKey ? I certainly agree that the SM can do a better job than simple 1:1 mapping. -- Hal > - R. From xma at us.ibm.com Fri Feb 16 09:38:14 2007 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 16 Feb 2007 09:38:14 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: Message-ID: Roland, >I really don't like this approach, since it can break things in very >subtle ways (eg suppose one node fails to join its solicited node >group, but then a later node wants to talk to it and succeeds in >joining the solicited node group as a send-only member -- since the >first node is not a member then it will never see the ND messages). For the successful join, ND sends to the node directly, for the failure join, ND sends to all hosts addr. So ND will work no matter whether the join OK or not, that's the patch does. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Feb 16 09:43:17 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 09:43:17 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: <45D4E133.3000302@ichips.intel.com> (Sean Hefty's message of "Thu, 15 Feb 2007 14:39:47 -0800") References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> <45D4E133.3000302@ichips.intel.com> Message-ID: > The pkey is the default partition, full membership pkey. I believe > all nodes will have either 0xffff or 0x7fff as their pkey. We could > probably call ib_get_cached_pkey() instead and just use the first > entry in the table. Well the consumer has to know what P_Key to use since it must match the QP that will be used to send/receive. So I would suggest not trying to guess in the low-level multicast.c code, and rely on the consumer to set it properly. > We don't want to to set the privileged bit of the q_key, so that's > wrong. Good catch. OK, I'll replace the code with something like random32() & 0x7fffffff One other question about the PS_IPOIB stuff: > +static int cma_set_qkey(struct ib_device *device, u8 port_num, > + enum rdma_port_space ps, > + struct rdma_dev_addr *dev_addr, u32 *qkey) > +{ > + struct ib_sa_mcmember_rec rec; > + int ret = 0; > + > + switch (ps) { > + case RDMA_PS_UDP: > + *qkey = RDMA_UDP_QKEY; > + break; > + case RDMA_PS_IPOIB: > + ib_addr_get_mgid(dev_addr, &rec.mgid); > + ret = ib_sa_get_mcmember_rec(device, port_num, &rec.mgid, &rec); > + *qkey = be32_to_cpu(rec.qkey); > + break; Does this work if userspace tries to join a new IPoIB MCG that the kernel driver hasn't joined yet? From reading the code it seems that ib_sa_get_mcmember_rec() would fail with -EADDRNOTAVAIL and so the whole join request would fail. Am I reading this correctly? Is it supposed to work? I would think that it would be nice to be able to receive on IPoIB MCGs not also being received by the kernel. - R. From mshefty at ichips.intel.com Fri Feb 16 09:44:09 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 16 Feb 2007 09:44:09 -0800 Subject: [openib-general] krping.c changes In-Reply-To: <568620.47287.qm@web81513.mail.mud.yahoo.com> References: <568620.47287.qm@web81513.mail.mud.yahoo.com> Message-ID: <45D5ED69.6090909@ichips.intel.com> Brett McMillian wrote: > I wasn't sure who I should email about this, but I recently got krping > to work between an Opteron and a PPC G5. However, in order for krping > to work I had to make the following changes to krping.c to ensure the > address, key, and length were being sent across the network as big > endian, otherwise they were in machine dependent byte order. I maintain a copy of this in the test-apps branch of my rdma-dev.git tree. I'll update my tree with this change. Can you add a signed-off-by line to the patch? - Sean From rdreier at cisco.com Fri Feb 16 09:47:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 09:47:51 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: <1171647108.22446.267617.camel@hal.voltaire.com> (Hal Rosenstock's message of "16 Feb 2007 12:32:37 -0500") References: <1171646341.22446.266964.camel@hal.voltaire.com> <1171647108.22446.267617.camel@hal.voltaire.com> Message-ID: > I thought that mapping multiple MCGs to the same MLID requires that a > set of the (group) parameters are the same. Is that the case for these > IPv6 groups ? Is the only variable in those parameters the PKey ? I don't see why any group parameters need to be the same -- I'm probably missing something, but which parameters in particular did you have in mind? - R. From rdreier at cisco.com Fri Feb 16 09:49:24 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 09:49:24 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: (Shirley Ma's message of "Fri, 16 Feb 2007 09:38:14 -0800") References: Message-ID: > For the successful join, ND sends to the node directly, for the failure > join, ND sends to all hosts addr. So ND will work no matter whether the > join OK or not, that's the patch does. But what if the full-member join fails on node A for node A's solicited node group, but then node B succeeds in joining that group as a send-only member (perhaps because some other nodes have dropped off the fabric in the meantime). Then node B will send the ND message on a MCG that A is not a member of. - R. From rdreier at cisco.com Fri Feb 16 09:53:49 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 09:53:49 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: (Roland Dreier's message of "Fri, 16 Feb 2007 09:43:17 -0800") References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> <45D4E133.3000302@ichips.intel.com> Message-ID: OK, another question about the multicast.c code: > +static struct mcast_group *mcast_find(struct mcast_port *port, > + union ib_gid *mgid) > +{ > + struct rb_node *node = port->table.rb_node; > + struct mcast_group *group; > + int ret; > + > + while (node) { > + group = rb_entry(node, struct mcast_group, node); > + ret = memcmp(mgid->raw, group->rec.mgid.raw, sizeof *mgid); > + if (!ret) > + return group; > + > + if (ret < 0) > + node = node->rb_left; > + else > + node = node->rb_right; > + } > + return NULL; > +} > + > +static struct mcast_group *mcast_insert(struct mcast_port *port, > + struct mcast_group *group, > + int allow_duplicates) > +{ > + struct rb_node **link = &port->table.rb_node; > + struct rb_node *parent = NULL; > + struct mcast_group *cur_group; > + int ret; > + > + while (*link) { > + parent = *link; > + cur_group = rb_entry(parent, struct mcast_group, node); > + > + ret = memcmp(group->rec.mgid.raw, cur_group->rec.mgid.raw, > + sizeof group->rec.mgid); > + if (ret < 0) > + link = &(*link)->rb_left; > + else if (ret > 0) > + link = &(*link)->rb_right; > + else if (allow_duplicates) > + link = &(*link)->rb_left; > + else > + return cur_group; > + } > + rb_link_node(&group->node, parent, link); > + rb_insert_color(&group->node, &port->table); > + return NULL; > +} How does it work to put duplicates into the RB tree? It seems especially strange that the lookup code does: > + if (ret < 0) > + node = node->rb_left; > + else > + node = node->rb_right; so if ret == 0 (ie the two GIDs being tested are the same) then it continues to traverse to the right, while the insert code does: > + else if (allow_duplicates) > + link = &(*link)->rb_left; which seems to put duplicates to the left always. Also I'd be really worried that the rebalancing code freaks out when duplicate keys are inserted in the tree. - R. From halr at voltaire.com Fri Feb 16 09:54:18 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Feb 2007 12:54:18 -0500 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: References: <1171646341.22446.266964.camel@hal.voltaire.com> <1171647108.22446.267617.camel@hal.voltaire.com> Message-ID: <1171648438.22446.268637.camel@hal.voltaire.com> On Fri, 2007-02-16 at 12:47, Roland Dreier wrote: > > I thought that mapping multiple MCGs to the same MLID requires that a > > set of the (group) parameters are the same. Is that the case for these > > IPv6 groups ? Is the only variable in those parameters the PKey ? > > I don't see why any group parameters need to be the same -- I'm > probably missing something, but which parameters in particular did you > have in mind? For starters, I think that rate, MTU, and SL (and maybe PKey too) need to be the same. There may be others too if I stare at the spec for a while... -- Hal > - R. From sean.hefty at intel.com Fri Feb 16 10:02:10 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 16 Feb 2007 10:02:10 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: Message-ID: <000001c751f4$943af3b0$e598070a@amr.corp.intel.com> >Well the consumer has to know what P_Key to use since it must match >the QP that will be used to send/receive. So I would suggest not >trying to guess in the low-level multicast.c code, and rely on the >consumer to set it properly. I'm fine leaving it at 0. For now, I think the safest thing to do is just remove the entire 'else' portion from the function and return an error if the MGID is 0. Neither of the places that call into ib_sa_get_mcmember_rec() should pass in an MGID of 0. (I'm testing this now to verify.) See below for its use: > > + case RDMA_PS_IPOIB: > > + ib_addr_get_mgid(dev_addr, &rec.mgid); > > + ret = ib_sa_get_mcmember_rec(device, port_num, &rec.mgid, &rec); > > + *qkey = be32_to_cpu(rec.qkey); > > + break; > >Does this work if userspace tries to join a new IPoIB MCG that the >kernel driver hasn't joined yet? From reading the code it seems that >ib_sa_get_mcmember_rec() would fail with -EADDRNOTAVAIL and so the >whole join request would fail. In short, yes. ib_addr_get_mgid() is returning the MGID for the ipoib broadcast group, so ipoib must have joined that group. The code then looks up the MCMemberRecord for the broadcast group, and extracts the qkey for that group to use when joining the new group. - Sean From xma at us.ibm.com Fri Feb 16 10:05:32 2007 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 16 Feb 2007 10:05:32 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: Message-ID: Roland Dreier wrote on 02/16/2007 09:49:24 AM: > > For the successful join, ND sends to the node directly, for the failure > > join, ND sends to all hosts addr. So ND will work no matter whether the > > join OK or not, that's the patch does. > > But what if the full-member join fails on node A for node A's > solicited node group, but then node B succeeds in joining that group > as a send-only member (perhaps because some other nodes have dropped > off the fabric in the meantime). Then node B will send the ND message > on a MCG that A is not a member of. > > - R. Yes. B can send ND to A, and A responds without being a member so IPv6 ND works. Is there any security or other problems here? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Feb 16 10:07:43 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 10:07:43 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: <1171648438.22446.268637.camel@hal.voltaire.com> (Hal Rosenstock's message of "16 Feb 2007 12:54:18 -0500") References: <1171646341.22446.266964.camel@hal.voltaire.com> <1171647108.22446.267617.camel@hal.voltaire.com> <1171648438.22446.268637.camel@hal.voltaire.com> Message-ID: > For starters, I think that rate, MTU, and SL (and maybe PKey too) need > to be the same. There may be others too if I stare at the spec for a > while... Can you expand on why? For example I definitely can send to the same MLID with different SLs. Of course MTU and rate need to match up but I don't see that as a real restriction -- the SM needs to allows for least-common-denominator values anyway, since the least-capable node on the fabric might join an existing group. I don't see why one MCG with an MTU of 2048 and one MCG with an MTU of 1024 can't share the same MLID, as long as the underlying fabric is capable of supporting an MTU of 2048. Actually, I wonder what the spec says about what switches should do if they're asked to forward packets with too-big MTUs? Maybe it all works out anyway. - R. From rdreier at cisco.com Fri Feb 16 10:10:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 10:10:55 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: (Shirley Ma's message of "Fri, 16 Feb 2007 10:05:32 -0800") References: Message-ID: > > But what if the full-member join fails on node A for node A's > > solicited node group, but then node B succeeds in joining that group > > as a send-only member (perhaps because some other nodes have dropped > > off the fabric in the meantime). Then node B will send the ND message > > on a MCG that A is not a member of. > Yes. B can send ND to A, and A responds without being a member so IPv6 ND > works. Is there any security or other problems here? Node A is not a member of the group B is sending on, so SM does not have to set up any routes for the messages to even reach node A. So it doesn't see the messages and doesn't respond to ND. - R. From xma at us.ibm.com Fri Feb 16 10:23:39 2007 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 16 Feb 2007 10:23:39 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: Message-ID: Roland Dreier wrote on 02/16/2007 10:10:55 AM: > > > But what if the full-member join fails on node A for node A's > > > solicited node group, but then node B succeeds in joining that group > > > as a send-only member (perhaps because some other nodes have dropped > > > off the fabric in the meantime). Then node B will send the ND message > > > on a MCG that A is not a member of. > > > Yes. B can send ND to A, and A responds without being a member so IPv6 ND > > works. Is there any security or other problems here? > > Node A is not a member of the group B is sending on, so SM does not > have to set up any routes for the messages to even reach node A. So > it doesn't see the messages and doesn't respond to ND. > > - R. Two MCGs groups must be establised before IPoIB link up, one is broadcast for IPv4, one is all hosts multicast for IPv6. So Node A is a member of all hosts address, the patch directs ND sends to all hosts, so node A responses it. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Fri Feb 16 10:31:54 2007 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 16 Feb 2007 10:31:54 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: Message-ID: Roland Dreier wrote on 02/16/2007 09:25:30 AM: > > Even SM supports 1000 MCGs, it's still not sufficitent for 250 nodes > > cluster, each node have 4 links for IPv6 without any scope/global IPv6 > > address configured.(250*4+ a few default MCGs) There will be a MCG overflow > > problem anyway in IPv6oIB. > > But what's the problem with supporting 1000 or even 10000 MCGs? > > - R. I am not sure whether I understand your question. I am trying to answer it, please let me know whether I am wrong. Each IPv6 Link local address will create a unique solicited-node multicast address, which will create unique full member of IB MCG, each other IPv6 address will create a solicited-node multicast address, whether it's unique or not based on the groupID. So when IPv6 module being loaded in the kernel, (or might be a part of kernel in the future) in SM, we will see more than 1000 MCGs when IPoIB link up. Some of them can't join any MCGs. Then IPv6 ND is broken with some of the nodes join failure. Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Fri Feb 16 11:12:07 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 16 Feb 2007 11:12:07 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> <45D4E133.3000302@ichips.intel.com> Message-ID: <45D60207.6080406@ichips.intel.com> Roland Dreier wrote: > OK, another question about the multicast.c code: > > > +static struct mcast_group *mcast_find(struct mcast_port *port, > > + union ib_gid *mgid) > > +{ > > + struct rb_node *node = port->table.rb_node; > > + struct mcast_group *group; > > + int ret; > > + > > + while (node) { > > + group = rb_entry(node, struct mcast_group, node); > > + ret = memcmp(mgid->raw, group->rec.mgid.raw, sizeof *mgid); > > + if (!ret) > > + return group; > > + > > + if (ret < 0) > > + node = node->rb_left; > > + else > > + node = node->rb_right; > > + } > > + return NULL; > > +} > > + > > +static struct mcast_group *mcast_insert(struct mcast_port *port, > > + struct mcast_group *group, > > + int allow_duplicates) > > +{ > > + struct rb_node **link = &port->table.rb_node; > > + struct rb_node *parent = NULL; > > + struct mcast_group *cur_group; > > + int ret; > > + > > + while (*link) { > > + parent = *link; > > + cur_group = rb_entry(parent, struct mcast_group, node); > > + > > + ret = memcmp(group->rec.mgid.raw, cur_group->rec.mgid.raw, > > + sizeof group->rec.mgid); > > + if (ret < 0) > > + link = &(*link)->rb_left; > > + else if (ret > 0) > > + link = &(*link)->rb_right; > > + else if (allow_duplicates) > > + link = &(*link)->rb_left; > > + else > > + return cur_group; > > + } > > + rb_link_node(&group->node, parent, link); > > + rb_insert_color(&group->node, &port->table); > > + return NULL; > > +} > > How does it work to put duplicates into the RB tree? It seems > especially strange that the lookup code does: The only duplicates that should appear in the tree are for MGID 0. After a join for MGID 0 completes, the group is removed from the tree and re-inserted based on the MGID that was assigned by the SA. All multicast groups need to be tracked, which is why even groups with MGID 0 are inserted into the tree. > > + if (ret < 0) > > + node = node->rb_left; > > + else > > + node = node->rb_right; > > so if ret == 0 (ie the two GIDs being tested are the same) then it > continues to traverse to the right, while the insert code does: Immediately above this code, the group is returned if ret == 0. Calling mcast_find() for MGID 0 isn't useful, so the code avoids doing this, but I think that it would work. The caller would just get an arbitrary group. > Also I'd be really worried that the rebalancing code freaks out when > duplicate keys are inserted in the tree. I would guess that the rebalancing code is based on left/right branching, and isn't aware of the actual key values. Having duplicate keys would work fine then, with the restriction that code searching for a duplicated key would get an unpredictable match that is based on the current tree layout. - Sean From halr at voltaire.com Fri Feb 16 10:47:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Feb 2007 13:47:58 -0500 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: References: <1171646341.22446.266964.camel@hal.voltaire.com> <1171647108.22446.267617.camel@hal.voltaire.com> <1171648438.22446.268637.camel@hal.voltaire.com> Message-ID: <1171651661.22446.271399.camel@hal.voltaire.com> On Fri, 2007-02-16 at 13:07, Roland Dreier wrote: > > For starters, I think that rate, MTU, and SL (and maybe PKey too) need > > to be the same. There may be others too if I stare at the spec for a > > while... > > Can you expand on why? For example I definitely can send to the same > MLID with different SLs. Sure but I think this complicates the SL2VL tables in the subnet to accomodate this. I think a similar thing is true for PKeys. So to me this is an SM complexity issue when mapping multiple MGRPs to same MLID. > Of course MTU and rate need to match up but > I don't see that as a real restriction -- the SM needs to allows for > least-common-denominator values anyway, since the least-capable node > on the fabric might join an existing group. In theory, the least capable node could join any group but is this reality in operation ? Different groups could have different LCDs so this would make things less granular (one rather than multiple LCDs). This seems less constraining to me. > I don't see why one MCG with an MTU of 2048 and one MCG with an MTU of > 1024 can't share the same MLID, as long as the underlying fabric is > capable of supporting an MTU of 2048. >From a pure MTU standpoint, the (only) downside of this is that the group with MTU 1024 could send larger packets. > Actually, I wonder what the > spec says about what switches should do if they're asked to forward > packets with too-big MTUs? Maybe it all works out anyway. They get dropped on the output port as packet length > NeighborMTU. That's part of what PortXmitDiscards counts. Bottom line: I'm not sure anything precludes what you are saying (I do need to look at the spec more in terms of this), but I do think there are different levels of complexity in SM implementation depending on how much flexibility in mapping multiple MGRPs to the same MLID is "desired". -- Hal > - R. From sean.hefty at intel.com Fri Feb 16 11:48:49 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 16 Feb 2007 11:48:49 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: <000001c751f4$943af3b0$e598070a@amr.corp.intel.com> Message-ID: <000101c75203$7ace2be0$e598070a@amr.corp.intel.com> >For now, I think the safest thing to do is just remove the entire 'else' >portion >from the function and return an error if the MGID is 0. Neither of the places >that call into ib_sa_get_mcmember_rec() should pass in an MGID of 0. (I'm >testing this now to verify.) I'm not sure if you'll need this, but I've updated the two multicast patches in my for-roland branch based on a couple of comments. All changes are minor. * Converted a list_del/list_add combo to list_move. * Changed a couple of kzalloc calls to kmalloc. * Modified ib_sa_get_mcmember_rec to no longer return default MCMemberRecord settings. The for-roland branch is based on the tip of Linus' tree from two days ago, but I re-tested the changes against 2.6.20. If there's an easier way for you to handle these types of updates, just let me know. - Sean From rdreier at cisco.com Fri Feb 16 13:31:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 13:31:29 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: <1171651661.22446.271399.camel@hal.voltaire.com> (Hal Rosenstock's message of "16 Feb 2007 13:47:58 -0500") References: <1171646341.22446.266964.camel@hal.voltaire.com> <1171647108.22446.267617.camel@hal.voltaire.com> <1171648438.22446.268637.camel@hal.voltaire.com> <1171651661.22446.271399.camel@hal.voltaire.com> Message-ID: > Sure but I think this complicates the SL2VL tables in the subnet to > accomodate this. I think a similar thing is true for PKeys. So to me > this is an SM complexity issue when mapping multiple MGRPs to same MLID. I'm still confused. Aren't SL2VL and P_Key tables completely orthogonal from forwarding tables? Obviously there's no problem using multiple different SLs or P_Keys to reach the same endport using the same LID, so I don't understand why MLIDs would be different. - R. From rdreier at cisco.com Fri Feb 16 13:47:23 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 13:47:23 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: <45D60207.6080406@ichips.intel.com> (Sean Hefty's message of "Fri, 16 Feb 2007 11:12:07 -0800") References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> <45D4E133.3000302@ichips.intel.com> <45D60207.6080406@ichips.intel.com> Message-ID: > All multicast groups need to be tracked, which is why even groups with > MGID 0 are inserted into the tree. OK... > Immediately above this code, the group is returned if ret == 0. Right, I missed that. But... > Calling mcast_find() for MGID 0 isn't useful, so the code avoids doing > this, but I think that it would work. The caller would just get an > arbitrary group. Now this is confusing -- you say the code avoids looking up MGID 0 in the rbtree. So why do you have to insert those groups in the tree and have the allow_duplicates() flag etc? If you're never going to look up the group, I assume you have some other way of finding it and so you don't actually have to insert MGID 0 groups after all... right? Or is it that you want to be able to iterate through the whole rbtree and get the MGID 0 groups too? - R. From rdreier at cisco.com Fri Feb 16 13:55:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 13:55:06 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: (Shirley Ma's message of "Fri, 16 Feb 2007 10:23:39 -0800") References: Message-ID: > Two MCGs groups must be establised before IPoIB link up, one is broadcast > for IPv4, one is all hosts multicast for IPv6. So Node A is a member of all > hosts address, the patch directs ND sends to all hosts, so node A responses > it. I'm still confused. How do you interoperate with other RFC-compliant nodes (they might not have your patch or might not even be running Linux) that send ND messages to the solicited node group? If node A has your patch and doesn't try to join its own solicited node group, then another node that doesn't know to send ND messages to the all nodes group will not be able to find it. - R. From mshefty at ichips.intel.com Fri Feb 16 13:56:38 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 16 Feb 2007 13:56:38 -0800 Subject: [openib-general] SA multicast patches In-Reply-To: References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> <45D4E133.3000302@ichips.intel.com> <45D60207.6080406@ichips.intel.com> Message-ID: <45D62896.9060008@ichips.intel.com> > Or is it that you want to be able to iterate through the whole rbtree > and get the MGID 0 groups too? This is it - see mcast_groups_lost(). That call transitions all multicast groups into an error state, and reports to the user that the group information may have been lost by the SA. (We can't trust that a successful join response is still valid, even if it is reported after we receive a fatal event.) - Sean From xma at us.ibm.com Fri Feb 16 14:25:28 2007 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 16 Feb 2007 14:25:28 -0800 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: Message-ID: Roland Dreier wrote on 02/16/2007 01:55:06 PM: > > Two MCGs groups must be establised before IPoIB link up, one is broadcast > > for IPv4, one is all hosts multicast for IPv6. So Node A is a member of all > > hosts address, the patch directs ND sends to all hosts, so node Aresponses > > it. > > I'm still confused. How do you interoperate with other RFC-compliant > nodes (they might not have your patch or might not even be running > Linux) that send ND messages to the solicited node group? If node A > has your patch and doesn't try to join its own solicited node group, > then another node that doesn't know to send ND messages to the all > nodes group will not be able to find it. > > - R. All nodes in the subnet join all hosts multicast group by default. What the patch does differently than before, is when join failure, sends to all hosts multicast group instead of sending to a particular solicited-node multicast address, the node with the destination solicited-node multicast address will respond to it, so the network will not lose the connectivity when MCGs overflow. There is no interoperability issue here between patched and unpatched node or Linux and none-Linux node. I don't think IPoIB RFC covers this corner case. So there is no RFC-compliant problem here. I will discuss this with the author. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Feb 16 14:36:50 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 14:36:50 -0800 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> (Sean Hefty's message of "Tue, 6 Feb 2007 12:00:22 -0800") References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> Message-ID: OK, I pulled this in to my for-2.6.21 branch and I will ask Linus to pull later today. Thanks. - R. From ardavis at ichips.intel.com Fri Feb 16 14:38:51 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Fri, 16 Feb 2007 14:38:51 -0800 Subject: [openib-general] OFED 1.2 dapl and dat.conf In-Reply-To: <1171561783.3161.165.camel@fc6.xsintricity.com> References: <1171397522.21471.7.camel@stevo-desktop> <45D37E8E.5050800@ichips.intel.com> <1171561783.3161.165.camel@fc6.xsintricity.com> Message-ID: <45D6327B.4060606@ichips.intel.com> Doug Ledford wrote: >On Wed, 2007-02-14 at 13:26 -0800, Arlin Davis wrote: > > >>Steve Wise wrote: >> >> >> >>>Currently, the dapl rpms don't install dat.conf. I think they probably >>>should, eh? Maybe in /etc/dat.conf >>> >>> >>> >>> >>my specfile is setup to target sysconfdir which is typically set to >>`$(prefix)/etc' >> >>%{_sysconfdir}/dat.conf >> >>I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir >>can help explain? >> >> > >Note that this setup is problematic on multilib arches. Since the >dat.conf file hard codes a library path that's different for 32bit/64bit >arches, installing both a 32bit and 64bit dapl library is impossible >without munging things. > >For RHEL4U5/RHEL5 I changed the dat library to read dat.conf and >have two separate conf files. A probably better approach would be to >change the library to use a relative library name that it looks for >starting from the libraries own directory. Hence if the dapl library is >in /usr/lib, it looks in /usr/lib. Doing that would allow the >32bit/64bit libraries to share the same config file. > > > This is a good idea. I will take a look at dladdr options to set appropriate starting path for dapl libraries when absolute paths are not specified. James, do you see any issues with this approach? Vladimir, can you tell me how the OFED 1.2 install scripts are handling the dat.conf? -arlin From mshefty at ichips.intel.com Fri Feb 16 14:41:39 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 16 Feb 2007 14:41:39 -0800 Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast support In-Reply-To: References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> Message-ID: <45D63323.1000404@ichips.intel.com> Roland Dreier wrote: > OK, I pulled this in to my for-2.6.21 branch and I will ask Linus to > pull later today. Thanks for the review. From halr at voltaire.com Fri Feb 16 15:09:26 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Feb 2007 18:09:26 -0500 Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow In-Reply-To: References: <1171646341.22446.266964.camel@hal.voltaire.com> <1171647108.22446.267617.camel@hal.voltaire.com> <1171648438.22446.268637.camel@hal.voltaire.com> <1171651661.22446.271399.camel@hal.voltaire.com> Message-ID: <1171667328.22446.285104.camel@hal.voltaire.com> On Fri, 2007-02-16 at 16:31, Roland Dreier wrote: > > Sure but I think this complicates the SL2VL tables in the subnet to > > accomodate this. I think a similar thing is true for PKeys. So to me > > this is an SM complexity issue when mapping multiple MGRPs to same MLID. > > I'm still confused. Aren't SL2VL and P_Key tables completely > orthogonal from forwarding tables? Sure they are separate mechanisms but by overloading MLIDs I think that this ends up making them interdependent. I think it complicates the use/configuration of those mechanisms depending on how flexible this is. Either of those mechanisms could filter out the packet long before it ever gets to some destination. > Obviously there's no problem using > multiple different SLs or P_Keys to reach the same endport using the > same LID, so I don't understand why MLIDs would be different. In terms of PKeys and overloaded MLIDs, p. 149 line 11 states: "When a multicast LID is overloaded, the multicast groups sharing the same MLID must have the same P_Key. This simplification is required to allow switches and routers that implement optional P_Key enforcement for multicast operations." -- Hal > - R. From rdreier at cisco.com Fri Feb 16 15:32:46 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 15:32:46 -0800 Subject: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path() Message-ID: Guys, any reason not to merge this? It's step one of the cleanups from Jason's patch to make IPoIB work with global routes... The static rate from the path record should be put into the address vector -- a long time ago the rate in the address attributes needed to be a relative rate, which required more munging, but now that the conversion from absolute to relative is done in the low-level driver, it's easy for ib_init_ah_from_path() to put the absolute rate in. Cc: Jason Gunthorpe Cc: Sean Hefty Signed-off-by: Roland Dreier --- drivers/infiniband/core/sa_query.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index d7d4a53..68db633 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -471,6 +471,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr->sl = rec->sl; ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f; ah_attr->port_num = port_num; + ah_attr->static_rate = rec->rate; if (rec->hop_limit > 1) { ah_attr->ah_flags = IB_AH_GRH; -- 1.4.4.4 From sean.hefty at intel.com Fri Feb 16 15:35:38 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 16 Feb 2007 15:35:38 -0800 Subject: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path() In-Reply-To: Message-ID: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com> >Guys, any reason not to merge this? It's step one of the cleanups >from Jason's patch to make IPoIB work with global routes... I would like to see this merged in. - Sean From rdreier at cisco.com Fri Feb 16 15:48:19 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 16 Feb 2007 15:48:19 -0800 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This adds IB multicast tracking, to allow userspace to use multicast groups in a sane way, an ehca interrupt handling fixup, and a few other minor things. I don't think there is anything major left, so we should be good for 2.6.21-rc1 after this pull. Dotan Barak (1): IB/mthca: Allow the QP state transition RESET->RESET Hoang-Nam Nguyen (4): IB/ehca: Rework irq handler IB/ehca: Fix race condition/locking issues in scaling code IB/ehca: Allow en/disabling scaling code via module parameter IB/ehca: Change query_port() to return LINK_UP instead UNKNOWN Michael S. Tsirkin (1): IPoIB: CM error handling thinko fix Roland Dreier (5): IB/mthca: Fix allocation of ICM chunks in coherent memory IPoIB: Only allow root to change between datagram and connected mode IB/core: Fix sparse warnings about shadowed declarations IB/ipath: Make ipath_map_sg() static IB/core: Set static rate in ib_init_ah_from_path() Sean Hefty (2): IB/sa: Track multicast join/leave requests RDMA/cma: Add multicast communication support Steve Wise (3): RDMA/iwcm: iw_cm_id destruction race fixes RDMA/cxgb3: Fail posts synchronously when in TERMINATE state RDMA/cxgb3: Remove Open Grid Computing copyrights in iw_cxgb3 driver drivers/infiniband/core/Makefile | 2 +- drivers/infiniband/core/cma.c | 359 +++++++++-- drivers/infiniband/core/fmr_pool.c | 4 +- drivers/infiniband/core/iwcm.c | 47 +- drivers/infiniband/core/multicast.c | 837 ++++++++++++++++++++++++ drivers/infiniband/core/sa.h | 66 ++ drivers/infiniband/core/sa_query.c | 30 +- drivers/infiniband/core/sysfs.c | 2 - drivers/infiniband/core/ucma.c | 204 ++++++- drivers/infiniband/hw/cxgb3/cxio_dbg.c | 1 - drivers/infiniband/hw/cxgb3/cxio_hal.c | 1 - drivers/infiniband/hw/cxgb3/cxio_hal.h | 1 - drivers/infiniband/hw/cxgb3/cxio_resource.c | 1 - drivers/infiniband/hw/cxgb3/cxio_resource.h | 1 - drivers/infiniband/hw/cxgb3/cxio_wr.h | 1 - drivers/infiniband/hw/cxgb3/iwch.c | 1 - drivers/infiniband/hw/cxgb3/iwch.h | 1 - drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 - drivers/infiniband/hw/cxgb3/iwch_cm.h | 1 - drivers/infiniband/hw/cxgb3/iwch_cq.c | 1 - drivers/infiniband/hw/cxgb3/iwch_ev.c | 1 - drivers/infiniband/hw/cxgb3/iwch_mem.c | 1 - drivers/infiniband/hw/cxgb3/iwch_provider.c | 1 - drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 - drivers/infiniband/hw/cxgb3/iwch_qp.c | 3 +- drivers/infiniband/hw/cxgb3/iwch_user.h | 1 - drivers/infiniband/hw/ehca/Kconfig | 8 - drivers/infiniband/hw/ehca/ehca_classes.h | 19 +- drivers/infiniband/hw/ehca/ehca_eq.c | 1 + drivers/infiniband/hw/ehca/ehca_hca.c | 3 + drivers/infiniband/hw/ehca/ehca_irq.c | 307 +++++---- drivers/infiniband/hw/ehca/ehca_irq.h | 1 + drivers/infiniband/hw/ehca/ehca_main.c | 32 +- drivers/infiniband/hw/ehca/ipz_pt_fn.h | 11 +- drivers/infiniband/hw/ipath/ipath_dma.c | 4 +- drivers/infiniband/hw/mthca/mthca_memfree.c | 4 +- drivers/infiniband/hw/mthca/mthca_qp.c | 5 + drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 195 ++---- include/rdma/ib_addr.h | 6 + include/rdma/ib_sa.h | 159 ++--- include/rdma/rdma_cm.h | 21 +- include/rdma/rdma_cm_ib.h | 4 +- include/rdma/rdma_user_cm.h | 13 +- 44 files changed, 1889 insertions(+), 478 deletions(-) create mode 100644 drivers/infiniband/core/multicast.c create mode 100644 drivers/infiniband/core/sa.h From vlad at lists.openfabrics.org Sat Feb 17 02:24:35 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sat, 17 Feb 2007 02:24:35 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070217-0200 daily build status Message-ID: <20070217102436.CC885E6080C@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Failed: From halr at voltaire.com Sat Feb 17 06:00:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Feb 2007 09:00:21 -0500 Subject: [openib-general] Unknown SMP Recv In-Reply-To: <000801c750a6$cd925120$21606d86@one7> References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> <001001c74c87$8b653470$21606d86@one7> <1171122546.31538.251673.camel@hal.voltaire.com> <000801c750a6$cd925120$21606d86@one7> Message-ID: <1171720820.4380.40177.camel@hal.voltaire.com> On Wed, 2007-02-14 at 21:12, Michael Arndt wrote: > Hi, > > what I forgot was that the write function in umad_send returns with -1 if > the error occurs. That's looks like EPERM. Not sure why write would return this. The only thing I see that might return this is handle_outgoing_dr_smp on some errors but I didn't chase this all the way through. -- Hal > Maybe that helps. > > Thanks Michael > From michael.arndt at informatik.tu-chemnitz.de Sat Feb 17 06:43:21 2007 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Sat, 17 Feb 2007 15:43:21 +0100 Subject: [openib-general] Unknown SMP Recv References: <000901c74938$e10b2a30$21606d86@one7> <1170689654.4525.201415.camel@hal.voltaire.com> <001401c74946$a664a2e0$21606d86@one7> <1170695591.4525.207604.camel@hal.voltaire.com> <002001c74a33$c2ec1db0$21606d86@one7> <1170807564.4525.324195.camel@hal.voltaire.com> <001e01c74be2$b4889310$21606d86@one7> <1170994529.31538.124584.camel@hal.voltaire.com> <000401c74c6d$ce4875f0$21606d86@one7> <1171044773.31538.175280.camel@hal.voltaire.com> <000401c74c79$74439b50$21606d86@one7> <1171051141.2767.7.camel@localhost> <001001c74c87$8b653470$21606d86@one7> <1171122546.31538.251673.camel@hal.voltaire.com> <000801c750a6$cd925120$21606d86@one7> <1171720820.4380.40177.camel@hal.voltaire.com> Message-ID: <000401c752a1$f8d8f990$21606d86@one7> Hi, I have solved the problem by my own. There was an error in the TID changing process in my own code. That caused a sending with duplicated TIDs which brought up the error. Now it works fine. Thank you for your efforts Michael From dotanb at dev.mellanox.co.il Sat Feb 17 09:16:09 2007 From: dotanb at dev.mellanox.co.il (dotanb at dev.mellanox.co.il) Date: Sat, 17 Feb 2007 19:16:09 +0200 (IST) Subject: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path() In-Reply-To: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com> References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com> Message-ID: <1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il> Hi guys. >>Guys, any reason not to merge this? It's step one of the cleanups >>from Jason's patch to make IPoIB work with global routes... > > I would like to see this merged in. > > - Sean In issue number 296 that i opened several months ago in the Bugzilla, i reported about two missing attributes: the first one is the static_rate, and the second one is the src_path_bits which is not being filled right. Can someone look at this issue? thanks Dotan From yipeeyipeeyipeeyipee at yahoo.com Sat Feb 17 23:33:53 2007 From: yipeeyipeeyipeeyipee at yahoo.com (yipeeyipeeyipeeyipee) Date: Sun, 18 Feb 2007 07:33:53 +0000 (UTC) Subject: [openib-general] bad port physstate References: <1171556073.22446.185292.camel@hal.voltaire.com> Message-ID: Hal Rosenstock voltaire.com> writes: [snip] > I would expect an smpquery of portinfo of this or ibnetdiscover would > now show this. nope. After I start getting the link state error it doesn't recover and I keep getting the same error. From sweitzen at cisco.com Sun Feb 18 00:42:18 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 18 Feb 2007 00:42:18 -0800 Subject: [openib-general] MVAPICH2 working with OFED 1.2 alpha1 and IB? Message-ID: I get this on both RHEL4 and SLES10 trying to run any programs over IB with MVAPICH2: $ /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/bin/mpiexec -n 2 `pwd`/osu_latency.x rank 0 in job 6 svbu-qa1850-1_35332 caused collective abort of all ranks exit status of rank 0: killed by signal 9 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Sun Feb 18 02:24:19 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sun, 18 Feb 2007 02:24:19 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070218-0200 daily build status Message-ID: <20070218102419.5140AE6080C@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Failed: From halr at voltaire.com Sun Feb 18 03:41:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Feb 2007 06:41:21 -0500 Subject: [openib-general] bad port physstate In-Reply-To: References: <1171556073.22446.185292.camel@hal.voltaire.com> Message-ID: <1171798880.4380.118535.camel@hal.voltaire.com> On Sun, 2007-02-18 at 02:33, yipeeyipeeyipeeyipee wrote: > Hal Rosenstock voltaire.com> writes: > [snip] > > > I would expect an smpquery of portinfo of this or ibnetdiscover would > > now show this. > > nope. After I start getting the link state error it doesn't recover and I keep > getting the same error. Try swapping cables on that port to a known good one and see if this helps. -- Hal > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From panda at cse.ohio-state.edu Sun Feb 18 06:32:32 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun, 18 Feb 2007 09:32:32 -0500 (EST) Subject: [openib-general] MVAPICH2 working with OFED 1.2 alpha1 and IB? In-Reply-To: from "Scott Weitzenkamp (sweitzen)" at Feb 18, 2007 12:42:18 AM Message-ID: <200702181432.l1IEWWqn024394@xi.cse.ohio-state.edu> > I get this on both RHEL4 and SLES10 trying to run any programs over IB > with MVAPICH2: > > $ /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/bin/mpiexec -n 2 > `pwd`/osu_latency.x > > rank 0 in job 6 svbu-qa1850-1_35332 caused collective abort of all > ranks > exit status of rank 0: killed by signal 9 It looks like you are using an older version of the SRPM: mvapich2-0.9.8-3. This version had some shared library issues with the ofed 1.2 build. The latest MVAPICH2 SRPM version is mvapich2-0.9.8-4. Shaun posted the following e-mail on Feb 15th. Please use this latest version and let us know whether the problem still persists. Thanks, DK ============================================= Steve Wise wrote: > Shaun, > > Lemme know if you have an mvapich2 kit that I can test with iwarp... Hi Steve. I've updated our SRPM: https://www.openfabrics.org/~rowland/ofed_1_2/ The latest is mvapich2-0.9.8-4.src.rpm. This version should solve the shared library linking issues. This can be built outside of the OFED 1.2 alpha1 release with the information in the README file or can replace the previous SRPM in the OFED-1.2-alpha1/SRPMS/ directory. To use iWARP, use the OFA build of the SRPM and set MV2_ENABLE_IWARP_MODE=1 in your environment. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From sashak at voltaire.com Sun Feb 18 07:50:06 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 18 Feb 2007 17:50:06 +0200 Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.5.0 In-Reply-To: <20070215071537.GD11866@mellanox.co.il> References: <20070215071537.GD11866@mellanox.co.il> Message-ID: <20070218155006.GS27414@sashak.voltaire.com> On 09:15 Thu 15 Feb , Michael S. Tsirkin wrote: > FYI. > I suggest we update git on the openfabrics server to 1.5.0: > "Detached HEAD" feature will be useful for nightly build scripts. > Sasha? git-1.5.0 feature list looks fine for me. But let's wait with upgrade a couple of days for 1.5.0.1. Sasha From tziporet at mellanox.co.il Sun Feb 18 08:29:12 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 18 Feb 2007 18:29:12 +0200 Subject: [openib-general] how to handle OFEd 1.2 bugs in bugzilla Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0DE2A@mtlexch01.mtl.com> All, Please clean bugs that you opened for OFED 1.1/1.0 so we can work with bugzilla in efficient manner with OFED 1.2. For bugs that were found in previous OFED releases and are still relevant for OFED 1.2 please change product version so I will see them when I look at OFED 1.2 bugs. Thanks, Tziporet -----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Wednesday, February 14, 2007 9:13 PM To: Tziporet Koren; Scott Weitzenkamp (sweitzen) Cc: EWG; OPENIB Subject: RE: how to handle OFEd 1.2 bugs in bugzilla Yes, I'd like to add alpha1, etc. version numbers in bugzilla. For existing bugs, the Reporter and Assignee should try to communicate/negotiate Priority/Severity. For bugs in areas that Cisco supports, I review the bugs and try to ask for desired ones to be fixed. I was happy with the responses I got for OFED 1.1 from Mellanox and Open MPI. If you want a bug scrub, I suggest a distributed one, where someone from each company scrubs the bugs in areas they are responsible for. Scott > -----Original Message----- > From: Tziporet Koren [mailto:tziporet at mellanox.co.il] > Sent: Wednesday, February 14, 2007 6:18 AM > To: Scott Weitzenkamp (sweitzen) > Cc: EWG; OPENIB > Subject: how to handle OFEd 1.2 bugs in bugzilla > > Hi Scott and all, > I wish to consult with you in the way we will treat OFED 1.2 bugs in > bugzilla. > > 1. Do we want to have 1.2-alpha 1.2-beta, 1.2-rcX in version, or just > 1.2 as we have now > 2. What do we wish to do with bugs that were opened for 1.1 and are > still open? > 3. What to do with old bugs that where open to gen2 in general? > 4. What is our methodology for priority and severity setup? > (There are > too many blocker bugs still open in OFED 1.1 so they are not > actually > blockers or they were fixed but not updated) > > Thanks, > Tziporet > From rdreier at cisco.com Sun Feb 18 09:37:13 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 18 Feb 2007 09:37:13 -0800 Subject: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path() References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com> <1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il> Message-ID: > In issue number 296 that i opened several months ago in the Bugzilla, i > reported about two missing attributes: the first one is the static_rate, > and the second one is the src_path_bits which is not being filled right. The patch I posted fixes the static rate, right? You'll need to explain what you mean about src_path_bits, because at first glance the code looks OK to me. - R. From sweitzen at cisco.com Sun Feb 18 11:39:34 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 18 Feb 2007 11:39:34 -0800 Subject: [openib-general] MVAPICH2 working with OFED 1.2 alpha1 and IB? In-Reply-To: <200702181432.l1IEWWqn024394@xi.cse.ohio-state.edu> References: from "Scott Weitzenkamp (sweitzen)" at Feb 18, 2007 12:42:18 AM <200702181432.l1IEWWqn024394@xi.cse.ohio-state.edu> Message-ID: > It looks like you are using an older version of the SRPM: > mvapich2-0.9.8-3. This version had some shared library issues with the > ofed 1.2 build. The latest MVAPICH2 SRPM version is > mvapich2-0.9.8-4. Shaun posted the following e-mail on Feb 15th. > > Please use this latest version and let us know whether the problem > still persists. This fixed it, thanks. Scott From kaiser at lfbs.RWTH-Aachen.DE Sun Feb 18 14:41:29 2007 From: kaiser at lfbs.RWTH-Aachen.DE (Christian Kaiser) Date: Sun, 18 Feb 2007 23:41:29 +0100 Subject: [openib-general] uDAPL: RDMA Write example Message-ID: <45D8D619.9020904@lfbs.rwth-aachen.de> Hello, I'm trying to find a small sample program, that uses RDMA Write instead of Send/Recv. In the sources there is no single uDAPL example program and on the net neither. Could someone please help me to find something useful? Thanks! Christian From ogerlitz at voltaire.com Sun Feb 18 21:12:47 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 19 Feb 2007 07:12:47 +0200 Subject: [openib-general] uDAPL: RDMA Write example In-Reply-To: <45D8D619.9020904@lfbs.rwth-aachen.de> References: <45D8D619.9020904@lfbs.rwth-aachen.de> Message-ID: <45D931CF.4060601@voltaire.com> Christian Kaiser wrote: > I'm trying to find a small sample program, that uses RDMA Write instead > of Send/Recv. In the sources there is no single uDAPL example program > and on the net neither. > Could someone please help me to find something useful? see http://dapl.svn.sourceforge.net/viewvc/dapl/trunk/test/dapltest Anyway, can you comment what using udapl buys you which you don't get from coding to the verbs (libibverbs) and rdmacm (librdmacm) ??? Or. From ogerlitz at voltaire.com Sun Feb 18 22:40:19 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 19 Feb 2007 08:40:19 +0200 (IST) Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey Message-ID: Hi Sean, this fixes a bug which did not allow to run librdmacm apps over a node which is partial member of a partition. The patch takes the approach of the kernel ib_find_cached_pkey implementation. If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix. Or. ---------------------------------------------------------------------- The pkey extracted by the RDMA CM from the IPoIB device hardware address always has the full membership bit set. However, when looking in the pkey table the search must mask out the full membership bit. Signed-off-by: Or Gerlitz Signed-off-by: Olga Shern diff --git a/src/cma.c b/src/cma.c index c5f8cd9..9c24c6a 100644 --- a/src/cma.c +++ b/src/cma.c @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev for (i = 0, ret = 0; !ret; i++) { ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); - if (!ret && pkey == chk_pkey) { + if ((!ret && pkey == chk_pkey) || (!ret && htons(ntohs(pkey) & 0x7fff) == chk_pkey)) { *pkey_index = (uint16_t) i; return 0; } From ogerlitz at voltaire.com Mon Feb 19 01:29:36 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 19 Feb 2007 11:29:36 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.2 alpha release In-Reply-To: <45D42B26.10709@mellanox.co.il> References: <45D337E2.200@mellanox.co.il> <45D42B26.10709@mellanox.co.il> Message-ID: <45D96E00.4080108@voltaire.com> Tziporet Koren wrote: > Regarding RHEL4 U4 and IPoIB bug - Or just prepared a patch that should > fix it. We will merge it and test for the beta. The patch will only fix the bug for RDMA CM multicast consumers, since unlike IPoIB who gets the (wrong in the RH4 U4 case) L2 multicast address from the stack, the rdma cm has the multicast IP address and is able to compute the correct L2 address. This is confusing, i know... From bugzilla-daemon at lists.openfabrics.org Mon Feb 19 01:50:30 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 19 Feb 2007 01:50:30 -0800 (PST) Subject: [openib-general] [Bug 371] New: IPoIB HA not working properly with OFED1.2-alpha Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=371 Summary: IPoIB HA not working properly with OFED1.2-alpha Product: OpenFabrics Linux Version: 1.2alpha1 Platform: X86-64 OS/Version: RHEL 4 Status: NEW Severity: major Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: karun.sharma at qlogic.com I configured IPoIB HA with OFED1.2-alpha release and it is not working for me. I have configured IPoIB HA on a RHEL4up4 machine with both ports up. Before configuring IPoIB HA, both IB interfaces are able to ping the other machine. Then I executed ipoib_ha.pl script and configured ib0 as primary and ib1 as secondary interface. The ip address of ib1 interface has gone and till this point the things seems to be working fine. The problem starts when I pulled the IB cable connecting port1. I can see ib0 interface going down and ib1 interface taking IP address of ib0 interface but ping doesn't work after that. Even if I reinsert the cable in port1, ping is not working. I have attached some logs below. ################################################################ [root at ss27 ~]# ibv_devinfo hca_id: mthca0 fw_ver: 5.1.400 node_guid: 0006:6a00:9800:6b90 sys_image_guid: 0006:6a00:9800:6b90 vendor_id: 0x066a vendor_part_id: 25218 hw_ver: 0xA0 board_id: SS_0000000002 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 6 port_lid: 2 port_lmc: 0x00 port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 6 port_lid: 3 port_lmc: 0x00 [root at ss27 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:A0:D1:E4:53:DA inet addr:172.20.50.227 Bcast:172.20.50.255 Mask:255.255.255.0 inet6 addr: fe80::2a0:d1ff:fee4:53da/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:125 errors:0 dropped:0 overruns:0 frame:0 TX packets:115 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:17236 (16.8 KiB) TX bytes:15347 (14.9 KiB) Interrupt:201 ib0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:172.20.51.227 Bcast:172.20.51.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) ib1 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:172.20.52.227 Bcast:172.20.52.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1543 errors:0 dropped:0 overruns:0 frame:0 TX packets:1543 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1648528 (1.5 MiB) TX bytes:1648528 (1.5 MiB) [root at ss27 ~]# ping 172.20.51.226 -c 1 PING 172.20.51.226 (172.20.51.226) 56(84) bytes of data. 64 bytes from 172.20.51.226: icmp_seq=0 ttl=64 time=1.44 ms --- 172.20.51.226 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.442/1.442/1.442/0.000 ms, pipe 2 [root at ss27 ~]# ping 172.20.52.226 -c 1 PING 172.20.52.226 (172.20.52.226) 56(84) bytes of data. 64 bytes from 172.20.52.226: icmp_seq=0 ttl=64 time=1.67 ms --- 172.20.52.226 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.671/1.671/1.671/0.000 ms, pipe 2 [root at ss27 ~]# [root at ss27 ~]# ipoib_ha.pl -p ib0 -s ib1 --with-arping -vv get_cfg: Got /etc/sysconfig/network-scripts/ifcfg-ib0 Date:Mon Feb 19 02:32:22 2007 ib0: ====================================== BOOTPROTO = static status = HA = 0 DEVICE = ib0 NETMASK = 255.255.255.0 BROADCAST = 172.20.51.255 IPADDR = 172.20.51.227 NETWORK = 172.20.51.0 ONBOOT = yes pkey = ffff Date:Mon Feb 19 02:32:22 2007 Bond: ====================================== BOOTPROTO = static status = HA = 0 DEVICE = ib0 NETMASK = 255.255.255.0 BROADCAST = 172.20.51.255 IPADDR = 172.20.51.227 NETWORK = 172.20.51.0 ONBOOT = yes pkey = ffff Date:Mon Feb 19 02:32:23 2007 Got NO-CARRIER event on ib0. Interface ib0 is down. Currently Active : ib0 Other device: ib1 is UP migrate_conf: Migrating from ib0 to ib1 Date:Mon Feb 19 02:33:37 2007 Date:Mon Feb 19 02:33:37 2007 set_up_bond: Going to set up ib1 with 172.20.51.227 set_up_bond: Arping ib1 172.20.51.227. Got CARRIER-ON event on ib1. Got CARRIER-ON event on ib1. Got CARRIER-ON event on ib1. Got NO-CARRIER event on ib0. Interface ib0 is down. Currently Active : ib1 Got CARRIER-ON event on ib1. Got CARRIER-ON event on ib0. Got CARRIER-ON event on ib0. Got NO-CARRIER event on ib1. Interface ib1 is down. Currently Active : ib1 Other device: ib0 is UP migrate_conf: Migrating from ib1 to ib0 Date:Mon Feb 19 02:35:48 2007 Date:Mon Feb 19 02:35:48 2007 set_up_bond: Going to set up ib0 with 172.20.51.227 set_up_bond: Arping ib0 172.20.51.227. Got CARRIER-ON event on ib0. Got CARRIER-ON event on ib0. Got CARRIER-ON event on ib1. [root at ss27 ~]# ####################################################### -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at lists.openfabrics.org Mon Feb 19 02:24:13 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Mon, 19 Feb 2007 02:24:13 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070219-0200 daily build status Message-ID: <20070219102414.4122EE6080C@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Failed: From grossmann at hlrs.de Mon Feb 19 03:37:41 2007 From: grossmann at hlrs.de (Thomas =?iso-8859-1?q?Gro=DFmann?=) Date: Mon, 19 Feb 2007 12:37:41 +0100 Subject: [openib-general] Problem with SRP with 512 byte sector size with > 2 TB LUNs In-Reply-To: References: <200702071203.45309.grossmann@hlrs.de> Message-ID: <200702191237.42016.grossmann@hlrs.de> Hello, I also contacted DDN about that problem and am still waiting for a response. I cannot test this DDN target over fibre channel, because you can only connect over IB to it. I have the same impression, that the DDN target somehow does not handle READ CAPACITY(16) properly. Best regards, Thomas Großmann On Wednesday 07 February 2007 18:58, you wrote: > > Is it possible to add LUNs with > 2 TB and 512 byte sectors ? > > Why does the READ CAPACITY(16) comand fail ? > > It seems that the DDN target is not reporting good information -- I > don't see anything obviously wrong in what the kernel is doing (now > that SRP sends a READ CAPACITY command). Do you know if the same type > of config works over fibre channel? > > - R. -- Thomas Großmann                  High Performance Computing Center Stuttgart (HLRS)                                         Allmandring 30                                                  70550 Stuttgart, Germany    E-Mail: grossmann at hlrs.de                                                                Phone: ++49-711-685-65529  Fax  : ++49-711-685-65832 From bugzilla-daemon at lists.openfabrics.org Mon Feb 19 03:56:10 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 19 Feb 2007 03:56:10 -0800 (PST) Subject: [openib-general] [Bug 371] IPoIB HA not working properly with OFED1.2-alpha In-Reply-To: Message-ID: <20070219115610.33273E60810@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=371 ------- Comment #1 from karun.sharma at qlogic.com 2007-02-19 03:56 ------- It is working fine on SLES10 systems. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From monil at voltaire.com Mon Feb 19 04:00:27 2007 From: monil at voltaire.com (Moni Levy) Date: Mon, 19 Feb 2007 14:00:27 +0200 Subject: [openib-general] [PATCH] IB/ipoib: Fix ipoib handling for pkey reordering Message-ID: <45D9915B.6070202@voltaire.com> This issue was found during partitioning & SM fail over testing. The fix was tested for 24 hours with pkey reshuffling every few seconds. The patch applies to Roland's master branch. SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy --- ipoib.h | 2 ++ ipoib_ib.c | 22 ++++++++++++++++++++-- ipoib_main.c | 1 + ipoib_verbs.c | 4 +++- 4 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 07deee8..ed854e8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -139,6 +139,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct flush_restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -261,6 +262,7 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 59d9594..5e2ada9 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -611,7 +611,7 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct work_struct *work, int restart_qp) { struct ipoib_dev_priv *cpriv, *priv = container_of(work, struct ipoib_dev_priv, flush_task); @@ -630,6 +630,12 @@ void ipoib_ib_dev_flush(struct work_stru ipoib_dbg(priv, "flushing\n"); ipoib_ib_dev_down(dev, 0); + + if (restart_qp) { + ipoib_dbg(priv, "restarting the device QP\n"); + ipoib_ib_dev_stop(dev); + ipoib_ib_dev_open(dev); + } /* * The device could have been brought down between the start and when @@ -644,11 +650,23 @@ void ipoib_ib_dev_flush(struct work_stru /* Flush any child interfaces too */ list_for_each_entry(cpriv, &priv->child_intfs, list) - ipoib_ib_dev_flush(&cpriv->flush_task); + __ipoib_ib_dev_flush(&cpriv->flush_task, restart_qp); mutex_unlock(&priv->vlan_mutex); } +void ipoib_ib_dev_flush(struct work_struct *work) +{ + /* We only restart the QP in case of PKEY change event */ + __ipoib_ib_dev_flush(work, 0); +} + +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work) +{ + /* We only restart the QP in case of PKEY change event */ + __ipoib_ib_dev_flush(work, 1); +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 705eb1d..da46b79 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -942,6 +942,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); + INIT_WORK(&priv->flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 7b717c6..c249915 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -252,12 +252,14 @@ void ipoib_event(struct ib_event_handler container_of(handler, struct ipoib_dev_priv, event_handler); if (record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || record->event == IB_EVENT_PORT_ACTIVE || record->event == IB_EVENT_LID_CHANGE || record->event == IB_EVENT_SM_CHANGE || record->event == IB_EVENT_CLIENT_REREGISTER) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); + } else if (record->event == IB_EVENT_PKEY_CHANGE) { + ipoib_dbg(priv, "PKEY change event\n"); + queue_work(ipoib_workqueue, &priv->flush_restart_qp_task); } } From monil at voltaire.com Mon Feb 19 04:15:39 2007 From: monil at voltaire.com (Moni Levy) Date: Mon, 19 Feb 2007 14:15:39 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: References: Message-ID: <6a122cc00702190415p7de43bam97348447d807ac1f@mail.gmail.com> Or, On 2/19/07, Or Gerlitz wrote: > Hi Sean, > > this fixes a bug which did not allow to run librdmacm apps over a node > which is partial member of a partition. The patch takes the approach of the > kernel ib_find_cached_pkey implementation. > > If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix. > > Or. > > ---------------------------------------------------------------------- > The pkey extracted by the RDMA CM from the IPoIB device hardware address always > has the full membership bit set. However, when looking in the pkey table the > search must mask out the full membership bit. > > Signed-off-by: Or Gerlitz > Signed-off-by: Olga Shern > > diff --git a/src/cma.c b/src/cma.c > index c5f8cd9..9c24c6a 100644 > --- a/src/cma.c > +++ b/src/cma.c > @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev > > for (i = 0, ret = 0; !ret; i++) { > ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); > - if (!ret && pkey == chk_pkey) { > + if ((!ret && pkey == chk_pkey) || (!ret && htons(ntohs(pkey) & 0x7fff) == chk_pkey)) { What about just using: if (!ret && pkey | 0x8000 == chk_pkey | 0x8000) { even if not there is no need to check the ret twice in case of limited membership -- Moni > *pkey_index = (uint16_t) i; > return 0; > } > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From ossrosch at linux.vnet.ibm.com Mon Feb 19 04:38:33 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Mon, 19 Feb 2007 13:38:33 +0100 Subject: [openib-general] 32-bit build for ppc64 is required In-Reply-To: References: Message-ID: <200702191338.34623.ossrosch@linux.vnet.ibm.com> On Thursday 15 February 2007 20:30, Hoang-Nam Nguyen wrote: > > > Yuk. I suppose I could write one, but I don't (and can't) use any of > > > the OFED supplied build scripts in our build system, so it's hard for > me > > > to test since our build system is the only way I have to access > > > ppc/ppc64 hardware. > > Oh, well. > > Other takers? > OK, I've no choice to say no. Haven't look at the scripts yet. But will do > in next couple of days! > Nam > > Hi, Did I interpret the conclusion of this thread correctly? Nam will create a patch against the OFED1.2 build scripts, which provides 32 and 64 bit binaries for ppc. Do you agree? regards Stefan From mplee at sandia.gov Mon Feb 19 07:43:23 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Mon, 19 Feb 2007 08:43:23 -0700 Subject: [openib-general] Address List Change for Friday, 2/23/2007 Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov> We're in the process of migrating the maillists from the old openib.org server to the new lists.openfabrics.org machine. The list openib-promoters will be moved this Friday, February 23, 2007. The new address for the maillist will be general at lists.openfabrics.org. What this means is that messages will come from general at lists.openfabrics.org. Conversely, replies should be made to this address as well. Messages will also have a new subject line prefix of [OFA General]. If you have configured your e-mail client to filter based on maillist address or subject headers, you may need to make some adjustments for filtering. However, for the sake of transition, messages sent to the previous maillist address on the old server will forward to the new server. This forward will remain in place until the old server is taken offline and final DNS changes are made. We expect the old server to go offline sometime in early March. The web archives will also be migrated to the new web address shortly, http://lists.openfabrics.org. If you have any questions, please don't hesitate to contact me at mplee at sandia.gov. Regards, Michael Lee -------------- next part -------------- An HTML attachment was scrubbed... URL: From mplee at sandia.gov Mon Feb 19 08:01:21 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Mon, 19 Feb 2007 09:01:21 -0700 Subject: [openib-general] Minor correction regarding list migration Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F03669480@ES22SNLNT.srn.sandia.gov> Sorry for the follow-up, but I made a minor error in the previous e-mail. The reference to "openib-promoters" should have been "openib-general." So just to reiterate: openib-general will become general at lists.openfabrics.org this Friday, 2/23/2007 Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitlinb at broadcom.com Mon Feb 19 08:51:38 2007 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Mon, 19 Feb 2007 08:51:38 -0800 Subject: [openib-general] uDAPL: RDMA Write example In-Reply-To: <45D8D619.9020904@lfbs.rwth-aachen.de> Message-ID: <54AD0F12E08D1541B826BE97C98F99F101091E5E@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > Hello, > > I'm trying to find a small sample program, that uses RDMA > Write instead of Send/Recv. In the sources there is no single > uDAPL example program and on the net neither. > Could someone please help me to find something useful? > > Thanks! > Christian > With uDAPL, you don't use RDMA Write "instead of" Send/Recv, you use it in addition to Send/Recv. The Send/Recv is still required for synchronization. From afriedle at open-mpi.org Mon Feb 19 08:58:26 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Mon, 19 Feb 2007 11:58:26 -0500 Subject: [openib-general] OFA 1.2 tarball creation Message-ID: <45D9D732.8070100@open-mpi.org> How exactly is various developers' source code pulled together to create the nightly OFA tarballs at www.openfabrics.org/builds (could this be put on the wiki somewhere?)? I went looking to see if some of Sean's work on RDMA CM had made it into these tarballs, and am not seeing code with the patches I'm looking for. The exact patch I'm after was 'rdma_cm: allow joins to return a unique address'. I remember seeing this patch on the ofed_1_2 branch in Sean's rdma-dev git repository about two weeks ago, though I don't see the ofed_1_2 branch anymore (the patch does exist on the multicast branch). Sean, was this patch supposed to make it to the nightly 1.2 tarballs? I'm trying to avoid having to figure out how to build source from git into suitable RPM's for RHEL4; documentation on that would be great too :) Andrew From arlin.r.davis at intel.com Mon Feb 19 10:32:27 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 19 Feb 2007 10:32:27 -0800 Subject: [openib-general] Fork issues with simple MPI program Message-ID: <000001c75454$523660f0$eed4180a@amr.corp.intel.com> We are seeing some fork issues with a simple MPI program (attached) running on a 2.6.16+ kernels and OFED 1.1. We have tried both Intel MPI and mvapich2 with the same results: t_fork> mpiexec -n 2 t_system_fork parent process [0] started child process with pid=31552 send desc error parent process [0] Abort: [] Got completion with error 1, vendor code=69, dest rank=1 at line 540 in file ibv_channel_manager.c [1] I am child process with pid=25437 [1] started child process with pid=25437 [0] I am child process with pid=31552 child process [1] finished pid=25437 child process [0] finished pid=31552 rank 0 in job 2 svlmpicl400_32925 caused collective abort of all ranks exit status of rank 0: return code 252 If you run mvapich2 for uDAPL, it hangs before second MPI_Barrier() just like Intel MPI. If you use the I_MPI_RDMA_USE_EVD_FALLBACK=1 option with Intel MPI you get the following error similar to mvapich2: parent process parent process [0] I am child process with pid=9596 [0] started child process with pid=9596 [1] I am child process with pid=11477 [1] started child process with pid=11477 [0][rdma_iba.c:1007] Intel MPI fatal error: DTO operation completed with error. status=0x2. cookie=0x1 [1][rdma_iba.c:1007] Intel MPI fatal error: DTO operation completed with error. status=0x2. cookie=0x1 child process [1] finished pid=11477 child process [0] finished pid=9596 rank 0 in job 8 cst-19_54707 caused collective abort of all ranks exit status of rank 0: return code 255 Any insight would be greatly appreciated. It was our assumption that the parent process can continue to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? Thanks, -arlin -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: t_system_fork.c URL: From kaiser at lfbs.RWTH-Aachen.DE Mon Feb 19 10:50:33 2007 From: kaiser at lfbs.RWTH-Aachen.DE (Christian Kaiser) Date: Mon, 19 Feb 2007 19:50:33 +0100 Subject: [openib-general] uDAPL: RDMA Write example In-Reply-To: <45D931CF.4060601@voltaire.com> References: <45D8D619.9020904@lfbs.rwth-aachen.de> <45D931CF.4060601@voltaire.com> Message-ID: <45D9F179.7010103@lfbs.rwth-aachen.de> We are working on a new provider support for uDAPL. So we don't use verbs and rdmacm. Your example is ok but not what I was looking for. I am searching for a really small programm, that does RDMA with uDAPL in a few lines (I know a few lines is impossible but a few hundred lines). The dapltest suite is not really small. Christian Or Gerlitz schrieb: > Christian Kaiser wrote: > >> I'm trying to find a small sample program, that uses RDMA Write instead >> of Send/Recv. In the sources there is no single uDAPL example program >> and on the net neither. >> Could someone please help me to find something useful? >> > > see http://dapl.svn.sourceforge.net/viewvc/dapl/trunk/test/dapltest > > Anyway, can you comment what using udapl buys you which you don't get > from coding to the verbs (libibverbs) and rdmacm (librdmacm) ??? > > Or. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From kaiser at lfbs.RWTH-Aachen.DE Mon Feb 19 10:58:09 2007 From: kaiser at lfbs.RWTH-Aachen.DE (Christian Kaiser) Date: Mon, 19 Feb 2007 19:58:09 +0100 Subject: [openib-general] uDAPL: RDMA Write example In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F101091E5E@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F101091E5E@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <45D9F341.7010001@lfbs.rwth-aachen.de> Caitlin Bestler schrieb: > openib-general-bounces at openib.org wrote: > >> Hello, >> >> I'm trying to find a small sample program, that uses RDMA >> Write instead of Send/Recv. In the sources there is no single >> uDAPL example program and on the net neither. >> Could someone please help me to find something useful? >> >> Thanks! >> Christian >> >> > With uDAPL, you don't use RDMA Write "instead of" Send/Recv, you use > it in addition to Send/Recv. The Send/Recv is still required for > synchronization. > > So you put a Send/Recv before and after the dat_ep_post_rdma_write()? I tried it once with a zero byte Send/Recv but I had the impression that it doesn't work so that I have to do a one byte Send/Recv? From greg.lindahl at qlogic.com Mon Feb 19 11:28:08 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 19 Feb 2007 11:28:08 -0800 Subject: [openib-general] Address List Change for Friday, 2/23/2007 In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov> References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov> Message-ID: <20070219192808.GA6801@localhost.localdomain> I see that the EWG list is now calling itself the Engineering Working Group, has it been renamed from the Enterprise Working Group? If so, did the nature of the list change? Or was it a typo? -- greg From jsquyres at cisco.com Mon Feb 19 12:06:15 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 19 Feb 2007 15:06:15 -0500 Subject: [openib-general] [ewg] Re: Address List Change for Friday, 2/23/2007 In-Reply-To: <20070219192808.GA6801@localhost.localdomain> References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov> <20070219192808.GA6801@localhost.localdomain> Message-ID: Heh. Probably a typo in the transition to the new server. Michael -- can you fix? On Feb 19, 2007, at 2:28 PM, Greg Lindahl wrote: > I see that the EWG list is now calling itself the Engineering Working > Group, has it been renamed from the Enterprise Working Group? If so, > did the nature of the list change? Or was it a typo? > > -- greg > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mplee at sandia.gov Mon Feb 19 12:08:09 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Mon, 19 Feb 2007 13:08:09 -0700 Subject: [openib-general] [ewg] Re: Address List Change for Friday, 2/23/2007 References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov> <20070219192808.GA6801@localhost.localdomain> Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F03669483@ES22SNLNT.srn.sandia.gov> Greg, Yes, it was a typo. It's been taken care of now. Michael -----Original Message----- From: ewg-bounces at lists.openfabrics.org on behalf of Greg Lindahl Sent: Mon 2/19/2007 11:28 AM To: openib-general at openib.org; ewg at lists.openfabrics.org Subject: [ewg] Re: [openib-general] Address List Change for Friday, 2/23/2007 I see that the EWG list is now calling itself the Engineering Working Group, has it been renamed from the Enterprise Working Group? If so, did the nature of the list change? Or was it a typo? -- greg _______________________________________________ ewg mailing list ewg at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -------------- next part -------------- An HTML attachment was scrubbed... URL: From scarter at ornl.gov Mon Feb 19 12:53:36 2007 From: scarter at ornl.gov (Steven Carter) Date: Mon, 19 Feb 2007 15:53:36 -0500 Subject: [openib-general] Port error rate detection Message-ID: <45DA0E50.7010002@ornl.gov> I have a Nagios module that alerts on connectivity, port errors, speed/width problems. I would like to give it the ability to change the severity of the alert depending on whether errors are just present or if they are increasing faster than a specified rate. The intent is to equip the module to keep the state of the last query and possibly history, but I wanted to make sure that I was not re-inventing the wheel first. Is there an attribute or utility that I am overlooking that will help me do this? Thanks, Steven. From sashak at voltaire.com Mon Feb 19 13:46:30 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 19 Feb 2007 23:46:30 +0200 Subject: [openib-general] [PATCH] osm_vendor_ibumad: termination crash fix Message-ID: <20070219214630.GW27414@sashak.voltaire.com> When OpenSM is terminated umad_receiver thread still running even after the structures are destroyed and freed, this causes to random (but easily reproducible) crashes. The reason is that osm_vendor_delete() does not care about thread termination. This patch adds the receiver thread cancellation (by using pthread_cancel() and pthread_join()) and cares to keep have all mutexes unlocked upon termination. There is also minor termination code consolidation - osm_vendor_port_close() function. Signed-off-by: Sasha Khapyorsky --- osm/include/vendor/osm_vendor_ibumad.h | 6 +- osm/libvendor/osm_vendor_ibumad.c | 157 +++++++++++++++----------------- osm/libvendor/osm_vendor_ibumad_sa.c | 3 +- 3 files changed, 77 insertions(+), 89 deletions(-) diff --git a/osm/include/vendor/osm_vendor_ibumad.h b/osm/include/vendor/osm_vendor_ibumad.h index 4cbd59f..f6e3d69 100644 --- a/osm/include/vendor/osm_vendor_ibumad.h +++ b/osm/include/vendor/osm_vendor_ibumad.h @@ -39,7 +39,6 @@ #include #include -#include #include #include @@ -87,7 +86,6 @@ typedef struct _osm_ca_info ib_net64_t guid; size_t attr_size; ib_ca_attr_t *p_attr; - } osm_ca_info_t; /* * FIELDS @@ -170,8 +168,8 @@ typedef struct _osm_vendor vendor_match_tbl_t mtbl; umad_ca_t umad_ca; umad_port_t umad_port; - cl_spinlock_t cb_lock; - cl_spinlock_t match_tbl_lock; + pthread_mutex_t cb_mutex; + pthread_mutex_t match_tbl_mutex; int umad_port_id; void *receiver; int issmfd; diff --git a/osm/libvendor/osm_vendor_ibumad.c b/osm/libvendor/osm_vendor_ibumad.c index 35f127a..7320738 100644 --- a/osm/libvendor/osm_vendor_ibumad.c +++ b/osm/libvendor/osm_vendor_ibumad.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -58,13 +58,11 @@ #include #include -#include #include #include #include #include -#include #include #include #include @@ -97,12 +95,13 @@ typedef struct _osm_umad_bind_info typedef struct _umad_receiver { - cl_event_t signal; - cl_thread_t receiver; - osm_vendor_t *p_vend; - osm_log_t *p_log; + pthread_t tid; + osm_vendor_t *p_vend; + osm_log_t *p_log; } umad_receiver_t; +static void osm_vendor_close_port(osm_vendor_t* const p_vend); + static void clear_madw(osm_vendor_t *p_vend) { @@ -110,7 +109,7 @@ clear_madw(osm_vendor_t *p_vend) ib_net64_t old_tid; OSM_LOG_ENTER( p_vend->p_log, clear_madw ); - cl_spinlock_acquire( &p_vend->match_tbl_lock ); + pthread_mutex_lock(&p_vend->match_tbl_mutex); for (m = p_vend->mtbl.tbl, e = m + p_vend->mtbl.max; m < e; m++) { if (m->tid) { old_m = m; @@ -119,7 +118,7 @@ clear_madw(osm_vendor_t *p_vend) osm_mad_pool_put( ((osm_umad_bind_info_t *)((osm_madw_t *)m->v)->h_bind)->p_mad_pool, m->v); - cl_spinlock_release( &p_vend->match_tbl_lock ); + pthread_mutex_unlock(&p_vend->match_tbl_mutex); osm_log(p_vend->p_log, OSM_LOG_ERROR, "clear_madw: ERR 5401: " "evicting entry %p (tid was 0x%"PRIx64")\n", @@ -127,7 +126,7 @@ clear_madw(osm_vendor_t *p_vend) goto Exit; } } - cl_spinlock_release( &p_vend->match_tbl_lock ); + pthread_mutex_unlock(&p_vend->match_tbl_mutex); Exit: OSM_LOG_EXIT( p_vend->p_log ); @@ -147,18 +146,18 @@ get_madw(osm_vendor_t *p_vend, ib_net64_t *tid) if (mtid == 0) return 0; - cl_spinlock_acquire( &p_vend->match_tbl_lock ); + pthread_mutex_lock(&p_vend->match_tbl_mutex); for (m = p_vend->mtbl.tbl, e = m + p_vend->mtbl.max; m < e; m++) { if (m->tid == mtid) { m->tid = 0; *tid = mtid; res = m->v; - cl_spinlock_release( &p_vend->match_tbl_lock ); + pthread_mutex_unlock(&p_vend->match_tbl_mutex); return res; } } - cl_spinlock_release( &p_vend->match_tbl_lock ); + pthread_mutex_unlock(&p_vend->match_tbl_mutex); return 0; } @@ -171,13 +170,13 @@ put_madw(osm_vendor_t *p_vend, osm_madw_t *p_madw, ib_net64_t *tid) ib_net64_t old_tid; uint32_t oldest = ~0; - cl_spinlock_acquire( &p_vend->match_tbl_lock ); + pthread_mutex_lock(&p_vend->match_tbl_mutex); for (m = p_vend->mtbl.tbl, e = m + p_vend->mtbl.max; m < e; m++) { if (m->tid == 0) { m->tid = *tid; m->v = p_madw; m->version = cl_atomic_inc((atomic32_t *)&p_vend->mtbl.last_version); - cl_spinlock_release( &p_vend->match_tbl_lock ); + pthread_mutex_unlock(&p_vend->match_tbl_mutex); return; } if (oldest > m->version) { @@ -191,13 +190,13 @@ put_madw(osm_vendor_t *p_vend, osm_madw_t *p_madw, ib_net64_t *tid) p_req_madw = old_lru->v; p_bind = p_req_madw->h_bind; p_req_madw->status = IB_CANCELED; - cl_spinlock_acquire( &p_vend->cb_lock ); + pthread_mutex_lock(&p_vend->cb_mutex); (*p_bind->send_err_callback)(p_bind->client_context, old_lru->v); - cl_spinlock_release( &p_vend->cb_lock ); + pthread_mutex_unlock(&p_vend->cb_mutex); lru->tid = *tid; lru->v = p_madw; lru->version = cl_atomic_inc((atomic32_t *)&p_vend->mtbl.last_version); - cl_spinlock_release( &p_vend->match_tbl_lock ); + pthread_mutex_unlock(&p_vend->match_tbl_mutex); osm_log(p_vend->p_log, OSM_LOG_ERROR, "put_madw: ERR 5402: " "evicting entry %p (tid was 0x%"PRIx64")\n", old_lru, old_tid); @@ -237,7 +236,12 @@ swap_mad_bufs(osm_madw_t *p_madw, void *umad) return old; } -void +static void unlock_mutex(void *arg) +{ + pthread_mutex_unlock(arg); +} + +void * umad_receiver(void *p_ptr) { umad_receiver_t* const p_ur = (umad_receiver_t *)p_ptr; @@ -356,9 +360,10 @@ umad_receiver(void *p_ptr) } else { p_req_madw->status = IB_TIMEOUT; /* cb frees req_madw */ - cl_spinlock_acquire( &p_vend->cb_lock ); + pthread_mutex_lock(&p_vend->cb_mutex); + pthread_cleanup_push(unlock_mutex, &p_vend->cb_mutex); (*p_bind->send_err_callback)(p_bind->client_context, p_req_madw); - cl_spinlock_release( &p_vend->cb_lock ); + pthread_cleanup_pop(1); } osm_mad_pool_put(p_bind->p_mad_pool, p_madw); @@ -398,47 +403,37 @@ umad_receiver(void *p_ptr) #endif /* call the CB */ - cl_spinlock_acquire( &p_vend->cb_lock ); + pthread_mutex_lock(&p_vend->cb_mutex); + pthread_cleanup_push(unlock_mutex, &p_vend->cb_mutex); (*p_bind->mad_recv_callback)(p_madw, p_bind->client_context, p_req_madw); - cl_spinlock_release( &p_vend->cb_lock ); + pthread_cleanup_pop(1); } OSM_LOG_EXIT( p_vend->p_log ); - return; + return NULL; } -static int -umad_receiver_init(osm_vendor_t *p_vend) +static int umad_receiver_start(osm_vendor_t *p_vend) { umad_receiver_t *p_ur = p_vend->receiver; - int r = -1; - - OSM_LOG_ENTER( p_vend->p_log, umad_receiver_init ); p_ur->p_vend = p_vend; p_ur->p_log = p_vend->p_log; - cl_event_construct(&p_ur->signal); - cl_thread_construct(&p_ur->receiver); - - if (cl_event_init(&p_ur->signal, FALSE)) - goto Exit; - - /* - * Initialize the thread after all other dependent objects - * have been initialized. - */ - if (cl_thread_init( &p_ur->receiver, umad_receiver, p_ur, - "umad receiver" )) - goto Exit; - - r = 0; /* success */ + if (pthread_create(&p_ur->tid, NULL, umad_receiver, p_ur) < 0) + return -1; -Exit: - OSM_LOG_EXIT( p_vend->p_log ); - return r; + return 0; } +static void umad_receiver_stop(umad_receiver_t *p_ur) +{ + pthread_cancel(p_ur->tid); + pthread_join(p_ur->tid, NULL); + p_ur->tid = 0; + p_ur->p_vend = NULL; + p_ur->p_log = NULL; +} /********************************************************************** **********************************************************************/ ib_api_status_t @@ -454,23 +449,11 @@ osm_vendor_init( p_vend->p_log = p_log; p_vend->timeout = timeout; p_vend->max_retries = OSM_DEFAULT_RETRY_COUNT; - cl_spinlock_construct( &p_vend->cb_lock ); - cl_spinlock_construct( &p_vend->match_tbl_lock ); + pthread_mutex_init(&p_vend->cb_mutex, NULL); + pthread_mutex_init(&p_vend->match_tbl_mutex, NULL); p_vend->umad_port_id = -1; p_vend->issmfd = -1; - if ((r = cl_spinlock_init( &p_vend->cb_lock ))) { - osm_log(p_vend->p_log, OSM_LOG_ERROR, - "osm_vendor_init: ERR 5435: Error initializing cb spinlock\n"); - goto Exit; - } - - if ((r = cl_spinlock_init( &p_vend->match_tbl_lock ))) { - osm_log(p_vend->p_log, OSM_LOG_ERROR, - "osm_vendor_init: ERR 5434: Error initializing match tbl spinlock\n"); - goto Exit; - } - /* * Open our instance of UMAD. */ @@ -541,29 +524,14 @@ void osm_vendor_delete( IN osm_vendor_t** const pp_vend ) { - umad_receiver_t *p_ur; - int agent_id; - - if ((*pp_vend)->umad_port_id >= 0) { - /* unregister UMAD agents */ - for (agent_id = 0; agent_id < UMAD_CA_MAX_AGENTS; agent_id++) - if ( (*pp_vend)->agents[agent_id] ) - umad_unregister((*pp_vend)->umad_port_id, - agent_id ); - umad_close_port((*pp_vend)->umad_port_id); - (*pp_vend)->umad_port_id = -1; - } + osm_vendor_close_port(*pp_vend); clear_madw( *pp_vend ); /* make sure all ports are closed */ umad_done(); - /* umad receiver thread ? */ - p_ur = (*pp_vend)->receiver; - if (p_ur) - cl_event_destroy( &p_ur->signal ); - cl_spinlock_destroy( &(*pp_vend)->cb_lock ); - cl_spinlock_destroy( &(*pp_vend)->match_tbl_lock ); + pthread_mutex_destroy(&(*pp_vend)->cb_mutex); + pthread_mutex_destroy(&(*pp_vend)->match_tbl_mutex); free( *pp_vend ); *pp_vend = NULL; } @@ -780,7 +748,7 @@ osm_vendor_open_port( p_vend->umad_port_id = umad_port_id = -1; goto Exit; } - if (umad_receiver_init(p_vend) != 0) { + if (umad_receiver_start(p_vend) != 0) { osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_open_port: ERR 5420: " "umad_receiver_init failed\n" ); @@ -793,6 +761,27 @@ Exit: return umad_port_id; } +static void osm_vendor_close_port(osm_vendor_t* const p_vend) +{ + umad_receiver_t *p_ur; + int i; + + p_ur = p_vend->receiver; + p_vend->receiver = NULL; + if (p_ur) { + umad_receiver_stop(p_ur); + free(p_ur); + } + + if (p_vend->umad_port_id >= 0) { + for (i = 0; i < UMAD_CA_MAX_AGENTS; i++) + if (p_vend->agents[i]) + umad_unregister(p_vend->umad_port_id, i); + umad_close_port(p_vend->umad_port_id); + p_vend->umad_port_id = -1; + } +} + static int set_bit(int nr, void *method_mask) { int mask, retval; @@ -985,10 +974,10 @@ osm_vendor_unbind( OSM_LOG_ENTER( p_vend->p_log, osm_vendor_unbind ); - cl_spinlock_acquire( &p_vend->cb_lock ); + pthread_mutex_lock(&p_vend->cb_mutex); p_bind->mad_recv_callback = __osm_vendor_recv_dummy_cb; p_bind->send_err_callback = __osm_vendor_send_err_dummy_cb; - cl_spinlock_release( &p_vend->cb_lock ); + pthread_mutex_unlock(&p_vend->cb_mutex); OSM_LOG_EXIT( p_vend->p_log); } @@ -1154,9 +1143,9 @@ Resp: "Send p_madw = %p of size %d failed %d (%m)\n", p_madw, sent_mad_size, ret); p_madw->status = IB_ERROR; - cl_spinlock_acquire( &p_vend->cb_lock ); + pthread_mutex_lock(&p_vend->cb_mutex); (*p_bind->send_err_callback)(p_bind->client_context, p_madw); /* cb frees madw */ - cl_spinlock_release( &p_vend->cb_lock ); + pthread_mutex_unlock(&p_vend->cb_mutex); goto Exit; } diff --git a/osm/libvendor/osm_vendor_ibumad_sa.c b/osm/libvendor/osm_vendor_ibumad_sa.c index e3978ef..a110e81 100644 --- a/osm/libvendor/osm_vendor_ibumad_sa.c +++ b/osm/libvendor/osm_vendor_ibumad_sa.c @@ -39,9 +39,10 @@ #include #include +#include #include #include -#include +#include #define MAX_PORTS 64 -- 1.5.0.1.40.gb40d From sashak at voltaire.com Mon Feb 19 13:55:39 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 19 Feb 2007 23:55:39 +0200 Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.5.0 In-Reply-To: <20070218155006.GS27414@sashak.voltaire.com> References: <20070215071537.GD11866@mellanox.co.il> <20070218155006.GS27414@sashak.voltaire.com> Message-ID: <20070219215539.GX27414@sashak.voltaire.com> On 17:50 Sun 18 Feb , Sasha Khapyorsky wrote: > On 09:15 Thu 15 Feb , Michael S. Tsirkin wrote: > > FYI. > > I suggest we update git on the openfabrics server to 1.5.0: > > "Detached HEAD" feature will be useful for nightly build scripts. > > Sasha? > > git-1.5.0 feature list looks fine for me. But let's wait with upgrade a > couple of days for 1.5.0.1. Upgraded to git-1.5.0.1. Sasha From sean.hefty at intel.com Mon Feb 19 14:38:55 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 19 Feb 2007 14:38:55 -0800 Subject: [openib-general] OFA 1.2 tarball creation In-Reply-To: <45D9D732.8070100@open-mpi.org> Message-ID: <000001c75476$bcf2eea0$6dcd180a@amr.corp.intel.com> >How exactly is various developers' source code pulled together to create >the nightly OFA tarballs at www.openfabrics.org/builds (could this be >put on the wiki somewhere?)? I went looking to see if some of Sean's >work on RDMA CM had made it into these tarballs, and am not seeing code >with the patches I'm looking for. I do not know how OFED creates their tarballs or manages their source. >The exact patch I'm after was 'rdma_cm: allow joins to return a unique >address'. I remember seeing this patch on the ofed_1_2 branch in Sean's >rdma-dev git repository about two weeks ago, though I don't see the >ofed_1_2 branch anymore (the patch does exist on the multicast branch). > Sean, was this patch supposed to make it to the nightly 1.2 tarballs? Assuming that ~vlad/ofed_1_2.git is the OFED kernel tree, then this patch does not appear to be included. I was asked by OFED to publish an ofed_1_2 branch, which I did, but I do not know if it was used in constructing the OFED tree. The patch was intended to go into OFED. - Sean From sashak at voltaire.com Mon Feb 19 15:01:39 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 20 Feb 2007 01:01:39 +0200 Subject: [openib-general] [PATCH] complib: thread_pool rework Message-ID: <20070219230139.GZ27414@sashak.voltaire.com> This reworks complib's thread_pool implementation (used by opensm dispatcher). Prevents events signaling merges, termination races, eliminates using of broken cl_atomic stuff, reduces memory allocations and code complexity. Signed-off-by: Sasha Khapyorsky --- osm/complib/cl_async_proc.c | 1 - osm/complib/cl_dispatcher.c | 2 +- osm/complib/cl_thread.c | 13 -- osm/complib/cl_threadpool.c | 208 +++++++++++----------------------- osm/complib/libosmcomp.map | 1 - osm/include/complib/cl_thread.h | 16 --- osm/include/complib/cl_threadpool.h | 84 ++++---------- osm/osmtest/osmt_multicast.c | 1 + 8 files changed, 92 insertions(+), 234 deletions(-) diff --git a/osm/complib/cl_async_proc.c b/osm/complib/cl_async_proc.c index 51561af..7ac96bb 100644 --- a/osm/complib/cl_async_proc.c +++ b/osm/complib/cl_async_proc.c @@ -55,7 +55,6 @@ cl_async_proc_construct( cl_qlist_init( &p_async_proc->item_queue ); cl_spinlock_construct( &p_async_proc->lock ); - cl_thread_pool_construct( &p_async_proc->thread_pool ); } cl_status_t diff --git a/osm/complib/cl_dispatcher.c b/osm/complib/cl_dispatcher.c index a7c0ac7..4a1960c 100644 --- a/osm/complib/cl_dispatcher.c +++ b/osm/complib/cl_dispatcher.c @@ -49,6 +49,7 @@ #include #include +#include #include /* give some guidance when we build our cl_pool of messages */ @@ -132,7 +133,6 @@ cl_disp_construct( cl_qlist_init( &p_disp->reg_list ); cl_ptr_vector_construct( &p_disp->reg_vec ); - cl_thread_pool_construct( &p_disp->worker_threads ); cl_qlist_init( &p_disp->msg_fifo ); cl_spinlock_construct( &p_disp->lock ); cl_qpool_construct( &p_disp->msg_pool ); diff --git a/osm/complib/cl_thread.c b/osm/complib/cl_thread.c index f131480..eecc7d6 100644 --- a/osm/complib/cl_thread.c +++ b/osm/complib/cl_thread.c @@ -39,7 +39,6 @@ #include #include -#include #include /* @@ -129,18 +128,6 @@ cl_thread_stall( usleep( pause_us ); } -uint32_t -cl_proc_count( void ) -{ - uint32_t ret; - - ret = get_nprocs(); - if( !ret) - return 1;/* Workaround for PPC where get_nprocs() returns 0 */ - - return ret; -} - boolean_t cl_is_current_thread( IN const cl_thread_t* const p_thread ) diff --git a/osm/complib/cl_threadpool.c b/osm/complib/cl_threadpool.c index ff8bf90..ca4e261 100644 --- a/osm/complib/cl_threadpool.c +++ b/osm/complib/cl_threadpool.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -49,134 +49,85 @@ #include #include +#include +#include #include -#include -void -__cl_thread_pool_routine( - IN void* const context ) +static int proc_count( void ) { - cl_status_t status = CL_SUCCESS; - cl_thread_pool_t *p_thread_pool = (cl_thread_pool_t*)context; - - /* Continue looping until signalled to end. */ - while( !p_thread_pool->exit ) - { - /* Wait for the specified event to occur. */ - status = cl_event_wait_on( &p_thread_pool->wakeup_event, - EVENT_NO_TIMEOUT, TRUE ); - - /* See if we've been signalled to end execution. */ - if( (p_thread_pool->exit) || (status == CL_NOT_DONE) ) - break; - - /* The event has been signalled. Invoke the callback. */ - (*p_thread_pool->pfn_callback)( (void*)p_thread_pool->context ); - } + int ret = get_nprocs(); + if (!ret) + return 1;/* Workaround for PPC where get_nprocs() returns 0 */ + return ret; +} - /* - * Decrement the running count to notify the destroying thread - * that the event was received and processed. - */ - cl_atomic_dec( &p_thread_pool->running_count ); - cl_event_signal( &p_thread_pool->destroy_event ); +static void cleanup_mutex(void *arg) +{ + pthread_mutex_unlock(&((cl_thread_pool_t *)arg)->mutex); } -void -cl_thread_pool_construct( - IN cl_thread_pool_t* const p_thread_pool ) +static void *thread_pool_routine(void* context) { - CL_ASSERT( p_thread_pool); + cl_thread_pool_t *p_thread_pool = (cl_thread_pool_t*)context; + + do { + pthread_mutex_lock(&p_thread_pool->mutex); + pthread_cleanup_push(cleanup_mutex, p_thread_pool); + while(!p_thread_pool->events) + pthread_cond_wait(&p_thread_pool->cond, + &p_thread_pool->mutex); + p_thread_pool->events--; + pthread_cleanup_pop(1); + /* The event has been signalled. Invoke the callback. */ + (*p_thread_pool->pfn_callback)(p_thread_pool->context); + } while (1); - memset( p_thread_pool, 0, sizeof(cl_thread_pool_t) ); - cl_event_construct( &p_thread_pool->wakeup_event ); - cl_event_construct( &p_thread_pool->destroy_event ); - cl_list_construct( &p_thread_pool->thread_list ); - p_thread_pool->state = CL_UNINITIALIZED; + return NULL; } cl_status_t cl_thread_pool_init( - IN cl_thread_pool_t* const p_thread_pool, - IN uint32_t count, - IN cl_pfn_thread_callback_t pfn_callback, - IN const void* const context, - IN const char* const name ) + IN cl_thread_pool_t* const p_thread_pool, + IN unsigned count, + IN void (*pfn_callback)(void*), + IN void *context, + IN const char* const name ) { - cl_status_t status; - cl_thread_t *p_thread; - uint32_t i; + int i; CL_ASSERT( p_thread_pool ); CL_ASSERT( pfn_callback ); - cl_thread_pool_construct( p_thread_pool ); + memset(p_thread_pool, 0, sizeof(*p_thread_pool)); - if( !count ) - count = cl_proc_count(); + if(!count) + count = proc_count(); - status = cl_list_init( &p_thread_pool->thread_list, count ); - if( status != CL_SUCCESS ) - { - cl_thread_pool_destroy( p_thread_pool ); - return( status ); - } + pthread_mutex_init(&p_thread_pool->mutex, NULL); + pthread_cond_init(&p_thread_pool->cond, NULL); - /* Initialize the event that the threads wait on. */ - status = cl_event_init( &p_thread_pool->wakeup_event, FALSE ); - if( status != CL_SUCCESS ) - { - cl_thread_pool_destroy( p_thread_pool ); - return( status ); - } + p_thread_pool->events = 0; - /* Initialize the event used to destroy the threadpool. */ - status = cl_event_init( &p_thread_pool->destroy_event, FALSE ); - if( status != CL_SUCCESS ) - { + p_thread_pool->pfn_callback = pfn_callback; + p_thread_pool->context = context; + + p_thread_pool->tid = calloc(count, sizeof(*p_thread_pool->tid)); + if (!p_thread_pool->tid) { cl_thread_pool_destroy( p_thread_pool ); - return( status ); + return CL_INSUFFICIENT_MEMORY; } - p_thread_pool->pfn_callback = pfn_callback; - p_thread_pool->context = context; + p_thread_pool->running_count = count; for( i = 0; i < count; i++ ) { - /* Create a new thread. */ - p_thread = (cl_thread_t*)malloc( sizeof(cl_thread_t) ); - if( !p_thread ) - { + if (pthread_create(&p_thread_pool->tid[i], NULL, + thread_pool_routine, p_thread_pool) < 0) { cl_thread_pool_destroy( p_thread_pool ); - return( CL_INSUFFICIENT_MEMORY ); + return CL_INSUFFICIENT_RESOURCES; } - - cl_thread_construct( p_thread ); - - /* - * Add it to the list. This is guaranteed to work since we - * initialized the list to hold at least the number of threads we want - * to store there. - */ - status = cl_list_insert_head( &p_thread_pool->thread_list, p_thread ); - CL_ASSERT( status == CL_SUCCESS ); - - /* Start the thread. */ - status = cl_thread_init( p_thread, __cl_thread_pool_routine, - p_thread_pool, name ); - if( status != CL_SUCCESS ) - { - cl_thread_pool_destroy( p_thread_pool ); - return( status ); - } - - /* - * Increment the running count to insure that a destroying thread - * will signal all the threads. - */ - cl_atomic_inc( &p_thread_pool->running_count ); } - p_thread_pool->state = CL_INITIALIZED; + return( CL_SUCCESS ); } @@ -184,59 +135,34 @@ void cl_thread_pool_destroy( IN cl_thread_pool_t* const p_thread_pool ) { - cl_thread_t *p_thread; + int i; CL_ASSERT( p_thread_pool ); - CL_ASSERT( cl_is_state_valid( p_thread_pool->state ) ); - /* Indicate to all threads that they need to exit. */ - p_thread_pool->exit = TRUE; + for (i = 0 ; i < p_thread_pool->running_count; i++) + if (p_thread_pool->tid[i]) + pthread_cancel(p_thread_pool->tid[i]); - /* - * Signal the threads until they have all exited. Signalling - * once for each thread is not guaranteed to work since two events - * could release only a single thread, depending on the rate at which - * the events are set and how the thread scheduler processes notifications. - */ + for (i = 0 ; i < p_thread_pool->running_count; i++) + if (p_thread_pool->tid[i]) + pthread_join(p_thread_pool->tid[i], NULL); - while( p_thread_pool->running_count ) - { - cl_event_signal( &p_thread_pool->wakeup_event ); - /* - * Wait for the destroy event to occur, indicating that the thread - * has exited. - */ - cl_event_wait_on( &p_thread_pool->destroy_event, - EVENT_NO_TIMEOUT, TRUE ); - } - - /* - * Stop each thread one at a time. Note that this cannot be done in the - * above for loop because signal will wake up an unknown thread. - */ - if( cl_is_list_inited( &p_thread_pool->thread_list ) ) - { - while( !cl_is_list_empty( &p_thread_pool->thread_list ) ) - { - p_thread = - (cl_thread_t*)cl_list_remove_head( &p_thread_pool->thread_list ); - cl_thread_destroy( p_thread ); - free( p_thread ); - } - } + p_thread_pool->running_count = 0; + pthread_cond_destroy(&p_thread_pool->cond); + pthread_mutex_destroy(&p_thread_pool->mutex); - cl_event_destroy( &p_thread_pool->destroy_event ); - cl_event_destroy( &p_thread_pool->wakeup_event ); - cl_list_destroy( &p_thread_pool->thread_list ); - p_thread_pool->state = CL_UNINITIALIZED; + p_thread_pool->events = 0; } cl_status_t cl_thread_pool_signal( IN cl_thread_pool_t* const p_thread_pool ) { + int ret; CL_ASSERT( p_thread_pool ); - CL_ASSERT( p_thread_pool->state == CL_INITIALIZED ); - - return( cl_event_signal( &p_thread_pool->wakeup_event ) ); + pthread_mutex_lock(&p_thread_pool->mutex); + p_thread_pool->events++; + ret = pthread_cond_signal(&p_thread_pool->cond); + pthread_mutex_unlock(&p_thread_pool->mutex); + return ret; } diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map index e2e58b1..3b8c040 100644 --- a/osm/complib/libosmcomp.map +++ b/osm/complib/libosmcomp.map @@ -138,7 +138,6 @@ OSMCOMP_1.1 { cl_thread_destroy; cl_thread_suspend; cl_thread_stall; - cl_proc_count; cl_is_current_thread; __cl_thread_pool_routine; cl_thread_pool_construct; diff --git a/osm/include/complib/cl_thread.h b/osm/include/complib/cl_thread.h index 4752278..9635e22 100644 --- a/osm/include/complib/cl_thread.h +++ b/osm/include/complib/cl_thread.h @@ -312,22 +312,6 @@ cl_thread_stall( * Thread, cl_thread_suspend *********/ -/****f* Component Library: Thread/cl_proc_count -* NAME -* cl_proc_count -* -* DESCRIPTION -* The cl_proc_count function returns the number of processors in the system. -* -* SYNOPSIS -*/ -uint32_t -cl_proc_count( void ); -/* -* RETURN VALUE -* Returns the number of processors in the system. -*********/ - /****i* Component Library: Thread/cl_is_current_thread * NAME * cl_is_current_thread diff --git a/osm/include/complib/cl_threadpool.h b/osm/include/complib/cl_threadpool.h index aa1e066..30b5f86 100644 --- a/osm/include/complib/cl_threadpool.h +++ b/osm/include/complib/cl_threadpool.h @@ -46,9 +46,8 @@ #ifndef _CL_THREAD_POOL_H_ #define _CL_THREAD_POOL_H_ -#include -#include -#include +#include +#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { @@ -100,15 +99,13 @@ BEGIN_C_DECLS */ typedef struct _cl_thread_pool { - cl_pfn_thread_callback_t pfn_callback; - const void *context; - cl_list_t thread_list; - cl_event_t wakeup_event; - cl_event_t destroy_event; - boolean_t exit; - cl_state_t state; - atomic32_t running_count; - + void (*pfn_callback)(void*); + void *context; + unsigned running_count; + unsigned events; + pthread_cond_t cond; + pthread_mutex_t mutex; + pthread_t *tid; } cl_thread_pool_t; /* * FIELDS @@ -118,58 +115,23 @@ typedef struct _cl_thread_pool * context * Context to pass to the thread callback function. * -* thread_list -* List of threads managed by the thread pool. -* -* event -* Event used to signal threads to wake up and do work. -* -* destroy_event -* Event used to signal threads to exit. -* -* exit -* Flag used to indicates threads to exit. -* -* state -* State of the thread pool. -* * running_count * Number of threads running. * -* SEE ALSO -* Thread Pool -*********/ - -/****f* Component Library: Thread Pool/cl_thread_pool_construct -* NAME -* cl_thread_pool_construct +* events +* events counter * -* DESCRIPTION -* The cl_thread_pool_construct function initializes the state of a -* thread pool. +* mutex +* mutex for cond variable protection * -* SYNOPSIS -*/ -void -cl_thread_pool_construct( - IN cl_thread_pool_t* const p_thread_pool ); -/* -* PARAMETERS -* p_thread_pool -* [in] Pointer to a thread pool structure. +* cond +* conditional variable to signal an event to thread * -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Allows calling cl_thread_pool_destroy without first calling -* cl_thread_pool_init. -* -* Calling cl_thread_pool_construct is a prerequisite to calling any other -* thread pool function except cl_thread_pool_init. +* tid +* array of allocated thread ids. * * SEE ALSO -* Thread Pool, cl_thread_pool_init, cl_thread_pool_destroy +* Thread Pool *********/ /****f* Component Library: Thread Pool/cl_thread_pool_init @@ -184,11 +146,11 @@ cl_thread_pool_construct( */ cl_status_t cl_thread_pool_init( - IN cl_thread_pool_t* const p_thread_pool, - IN uint32_t thread_count, - IN cl_pfn_thread_callback_t pfn_callback, - IN const void* const context, - IN const char* const name ); + IN cl_thread_pool_t* const p_thread_pool, + IN unsigned count, + IN void (*pfn_callback)(void*), + IN void *context, + IN const char* const name ); /* * PARAMETERS * p_thread_pool diff --git a/osm/osmtest/osmt_multicast.c b/osm/osmtest/osmt_multicast.c index d5519eb..724a0bb 100644 --- a/osm/osmtest/osmt_multicast.c +++ b/osm/osmtest/osmt_multicast.c @@ -51,6 +51,7 @@ #include #include #include +#include #include "osmtest.h" /********************************************************************** -- 1.5.0.1.40.gb40d From sashak at voltaire.com Mon Feb 19 15:04:41 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 20 Feb 2007 01:04:41 +0200 Subject: [openib-general] [PATCH] osm/libvendor: compilation fixes In-Reply-To: <20070219214630.GW27414@sashak.voltaire.com> References: <20070219214630.GW27414@sashak.voltaire.com> Message-ID: <20070219230441.GA27414@sashak.voltaire.com> This adds needed header files inclusion to prevent compilation failures. Signed-off-by: Sasha Khapyorsky --- Those compilation failures was detected during ibutils/ibmgtsim build. osm/libvendor/osm_vendor_mlx_sa.c | 1 + osm/libvendor/osm_vendor_mlx_sim.c | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/osm/libvendor/osm_vendor_mlx_sa.c b/osm/libvendor/osm_vendor_mlx_sa.c index ab37adb..37fa618 100644 --- a/osm/libvendor/osm_vendor_mlx_sa.c +++ b/osm/libvendor/osm_vendor_mlx_sa.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include diff --git a/osm/libvendor/osm_vendor_mlx_sim.c b/osm/libvendor/osm_vendor_mlx_sim.c index d3e6eeb..bcd2bdc 100644 --- a/osm/libvendor/osm_vendor_mlx_sim.c +++ b/osm/libvendor/osm_vendor_mlx_sim.c @@ -57,6 +57,7 @@ #include #include #include +#include /* the simulator messages definition */ #include -- 1.5.0.1.40.gb40d From sashak at voltaire.com Mon Feb 19 15:06:22 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 20 Feb 2007 01:06:22 +0200 Subject: [openib-general] [PATCH] ibutils/ibis: compilation fixes In-Reply-To: <20070219214630.GW27414@sashak.voltaire.com> References: <20070219214630.GW27414@sashak.voltaire.com> Message-ID: <20070219230622.GB27414@sashak.voltaire.com> This adds needed header file inclusions. Signed-off-by: Sasha Khapyorsky --- Those compilation failures was detected during ibutils/ibmgtsim build. ibis/src/ibbbm.h | 1 + ibis/src/ibis_gsi_mad_ctrl.c | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ibis/src/ibbbm.h b/ibis/src/ibbbm.h index d998179..026a49c 100644 --- a/ibis/src/ibbbm.h +++ b/ibis/src/ibbbm.h @@ -50,6 +50,7 @@ #include #include #include +#include #include #include #include diff --git a/ibis/src/ibis_gsi_mad_ctrl.c b/ibis/src/ibis_gsi_mad_ctrl.c index a147642..3c7ea86 100644 --- a/ibis/src/ibis_gsi_mad_ctrl.c +++ b/ibis/src/ibis_gsi_mad_ctrl.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include "ibis_gsi_mad_ctrl.h" #include "ibis.h" -- 1.5.0.1.40.gb40d From sashak at voltaire.com Mon Feb 19 15:07:57 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 20 Feb 2007 01:07:57 +0200 Subject: [openib-general] [PATCH] complib: remove unused stuff In-Reply-To: <20070219230139.GZ27414@sashak.voltaire.com> References: <20070219230139.GZ27414@sashak.voltaire.com> Message-ID: <20070219230757.GC27414@sashak.voltaire.com> This removes some unused complib stuff - cl_memory, cl_async_proc, cl_perf. Signed-off-by: Sasha Khapyorsky --- osm/complib/Makefile.am | 11 +- osm/complib/cl_async_proc.c | 147 -------- osm/complib/cl_memory.c | 515 ------------------------- osm/complib/cl_memory_osd.c | 93 ----- osm/complib/cl_perf.c | 273 -------------- osm/complib/libosmcomp.map | 33 -- osm/include/Makefile.am | 4 - osm/include/complib/cl_async_proc.h | 334 ----------------- osm/include/complib/cl_memory.h | 663 -------------------------------- osm/include/complib/cl_memtrack.h | 96 ----- osm/include/complib/cl_perf.h | 708 ----------------------------------- 11 files changed, 4 insertions(+), 2873 deletions(-) delete mode 100644 osm/complib/cl_async_proc.c delete mode 100644 osm/complib/cl_memory.c delete mode 100644 osm/complib/cl_memory_osd.c delete mode 100644 osm/complib/cl_perf.c delete mode 100644 osm/include/complib/cl_async_proc.h delete mode 100644 osm/include/complib/cl_memory.h delete mode 100644 osm/include/complib/cl_memtrack.h delete mode 100644 osm/include/complib/cl_perf.h diff --git a/osm/complib/Makefile.am b/osm/complib/Makefile.am index 7bdf34b..be26bb7 100644 --- a/osm/complib/Makefile.am +++ b/osm/complib/Makefile.am @@ -17,10 +17,10 @@ else libosmcomp_version_script = endif -libosmcomp_la_SOURCES = cl_async_proc.c cl_complib.c \ +libosmcomp_la_SOURCES = cl_complib.c \ cl_dispatcher.c cl_event.c cl_event_wheel.c \ - cl_list.c cl_log.c cl_map.c cl_memory.c \ - cl_memory_osd.c cl_perf.c cl_pool.c \ + cl_list.c cl_log.c cl_map.c \ + cl_pool.c \ cl_ptr_vector.c \ cl_spinlock.c cl_statustext.c \ cl_thread.c cl_threadpool.c \ @@ -32,7 +32,7 @@ libosmcomp_la_DEPENDENCIES = $(srcdir)/libosmcomp.map libosmcompincludedir = $(includedir)/infiniband/complib -libosmcompinclude_HEADERS = $(srcdir)/../include/complib/cl_async_proc.h \ +libosmcompinclude_HEADERS = \ $(srcdir)/../include/complib/cl_atomic.h \ $(srcdir)/../include/complib/cl_atomic_osd.h \ $(srcdir)/../include/complib/cl_byteswap.h \ @@ -49,12 +49,9 @@ libosmcompinclude_HEADERS = $(srcdir)/../include/complib/cl_async_proc.h \ $(srcdir)/../include/complib/cl_log.h \ $(srcdir)/../include/complib/cl_map.h \ $(srcdir)/../include/complib/cl_math.h \ - $(srcdir)/../include/complib/cl_memory.h \ - $(srcdir)/../include/complib/cl_memtrack.h \ $(srcdir)/../include/complib/cl_packoff.h \ $(srcdir)/../include/complib/cl_packon.h \ $(srcdir)/../include/complib/cl_passivelock.h \ - $(srcdir)/../include/complib/cl_perf.h \ $(srcdir)/../include/complib/cl_pool.h \ $(srcdir)/../include/complib/cl_ptr_vector.h \ $(srcdir)/../include/complib/cl_qcomppool.h \ diff --git a/osm/complib/cl_async_proc.c b/osm/complib/cl_async_proc.c deleted file mode 100644 index 7ac96bb..0000000 --- a/osm/complib/cl_async_proc.c +++ /dev/null @@ -1,147 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include - -#define CL_ASYNC_PROC_MIN 16 -#define CL_ASYNC_PROC_GROWSIZE 16 - -/* Worker function declaration. */ -static void -__cl_async_proc_worker( - IN void* const context ); - -void -cl_async_proc_construct( - IN cl_async_proc_t* const p_async_proc ) -{ - CL_ASSERT( p_async_proc ); - - cl_qlist_init( &p_async_proc->item_queue ); - cl_spinlock_construct( &p_async_proc->lock ); -} - -cl_status_t -cl_async_proc_init( - IN cl_async_proc_t* const p_async_proc, - IN const uint32_t thread_count, - IN const char* const name ) -{ - cl_status_t status; - - CL_ASSERT( p_async_proc ); - - cl_async_proc_construct( p_async_proc ); - - status = cl_spinlock_init( &p_async_proc->lock ); - if( status != CL_SUCCESS ) - { - cl_async_proc_destroy( p_async_proc ); - return( status ); - } - - status = cl_thread_pool_init( &p_async_proc->thread_pool, thread_count, - __cl_async_proc_worker, p_async_proc, name ); - if( status != CL_SUCCESS ) - cl_async_proc_destroy( p_async_proc ); - - return( status ); -} - -void -cl_async_proc_destroy( - IN cl_async_proc_t* const p_async_proc ) -{ - /* Destroy the thread pool first so that the threads stop. */ - cl_thread_pool_destroy( &p_async_proc->thread_pool ); - - /* Flush all queued callbacks. */ - __cl_async_proc_worker( p_async_proc ); - - /* Destroy the spinlock. */ - cl_spinlock_destroy( &p_async_proc->lock ); -} - -void -cl_async_proc_queue( - IN cl_async_proc_t* const p_async_proc, - IN cl_async_proc_item_t* const p_item ) -{ - CL_ASSERT( p_async_proc ); - CL_ASSERT( p_item->pfn_callback ); - - /* Enqueue this item for processing. */ - cl_spinlock_acquire( &p_async_proc->lock ); - cl_qlist_insert_tail( &p_async_proc->item_queue, - &p_item->pool_item.list_item ); - cl_spinlock_release( &p_async_proc->lock ); - - /* Signal the thread pool to wake up. */ - cl_thread_pool_signal( &p_async_proc->thread_pool ); -} - -static void -__cl_async_proc_worker( - IN void* const context) -{ - cl_async_proc_t *p_async_proc = (cl_async_proc_t*)context; - cl_list_item_t *p_list_item; - cl_async_proc_item_t *p_item; - - /* Process items from the head of the queue until it is empty. */ - cl_spinlock_acquire( &p_async_proc->lock ); - p_list_item = cl_qlist_remove_head( &p_async_proc->item_queue ); - while( p_list_item != cl_qlist_end( &p_async_proc->item_queue ) ) - { - /* Release the lock during the user's callback. */ - cl_spinlock_release( &p_async_proc->lock ); - - /* Invoke the user callback. */ - p_item = (cl_async_proc_item_t*)p_list_item; - p_item->pfn_callback( p_item ); - - /* Acquire the lock again to continue processing. */ - cl_spinlock_acquire( &p_async_proc->lock ); - /* Get the next item in the queue. */ - p_list_item = cl_qlist_remove_head( &p_async_proc->item_queue ); - } - - /* The queue is empty. Release the lock and return. */ - cl_spinlock_release( &p_async_proc->lock ); -} diff --git a/osm/complib/cl_memory.c b/osm/complib/cl_memory.c deleted file mode 100644 index daf7fe1..0000000 --- a/osm/complib/cl_memory.c +++ /dev/null @@ -1,515 +0,0 @@ -/* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Implementation of memory allocation tracking functions. - * - * Environment: - * All - * - * $Revision: 1.4 $ - */ - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include -#define _MEM_DEBUG_MODE_ 0 -#ifdef _MEM_DEBUG_MODE_ -/* - In the mem debug mode we will be wrapping up the allocated buffer - with magic constants and the required size and then check during free. - - The memory layout will be: - ||||| - -*/ - -#define _MEM_DEBUG_MAGIC_SIZE_ 4 -#define _MEM_DEBUG_EXTRA_SIZE_ sizeof(size) + 8 -static uint8_t _MEM_DEBUG_MAGIC_START_[4] = {0x12, 0x34, 0x56, 0x78, }; -static uint8_t _MEM_DEBUG_MAGIC_END_[4] = {0x87, 0x65, 0x43, 0x21, }; -#endif - -cl_mem_tracker_t *gp_mem_tracker = NULL; - -/* - * Allocates memory. - */ -void* -__cl_malloc_priv( - IN const size_t size ); - -/* - * Deallocates memory. - */ -void -__cl_free_priv( - IN void* const p_memory ); - -/* - * Allocate and initialize the memory tracker object. - */ -static inline void -__cl_mem_track_start( void ) -{ - cl_status_t status; - - if( gp_mem_tracker ) - return; - - /* Allocate the memory tracker object. */ - gp_mem_tracker = (cl_mem_tracker_t*) - __cl_malloc_priv( sizeof(cl_mem_tracker_t) ); - - if( !gp_mem_tracker ) - return; - - /* Initialize the free list. */ - cl_qlist_init( &gp_mem_tracker->free_hdr_list ); - /* Initialize the allocation list. */ - cl_qlist_init( &gp_mem_tracker->alloc_list ); - - /* Initialize the spin lock to protect list operations. */ - status = cl_spinlock_init( &gp_mem_tracker->lock ); - if( status != CL_SUCCESS ) - { - __cl_free_priv( gp_mem_tracker ); - return; - } - - cl_msg_out( "\n\n\n*** Memory tracker object address = %p ***\n\n\n", - gp_mem_tracker ); -} - -/* - * Clean up memory tracking. - */ -static inline void -__cl_mem_track_stop( void ) -{ - cl_list_item_t *p_list_item; - - if( !gp_mem_tracker ) - return; - - if( !cl_is_qlist_empty( &gp_mem_tracker->alloc_list ) ) - { - /* There are still items in the list. Print them out. */ - cl_mem_display(); - } - - /* Free all allocated headers. */ - cl_spinlock_acquire( &gp_mem_tracker->lock ); - while( !cl_is_qlist_empty( &gp_mem_tracker->alloc_list ) ) - { - p_list_item = cl_qlist_remove_head( &gp_mem_tracker->alloc_list ); - __cl_free_priv( - PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ) ); - } - - while( !cl_is_qlist_empty( &gp_mem_tracker->free_hdr_list ) ) - { - p_list_item = cl_qlist_remove_head( &gp_mem_tracker->free_hdr_list ); - __cl_free_priv( - PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ) ); - } - cl_spinlock_release( &gp_mem_tracker->lock ); - - /* Destory all objects in the memory tracker object. */ - cl_spinlock_destroy( &gp_mem_tracker->lock ); - - /* Free the memory allocated for the memory tracker object. */ - __cl_free_priv( gp_mem_tracker ); -} - -/* - * Enables memory allocation tracking. - */ -void -__cl_mem_track( - IN const boolean_t start ) -{ - if( start ) - __cl_mem_track_start(); - else - __cl_mem_track_stop(); -} - -/* - * Display memory usage. - */ -void -cl_mem_display( void ) -{ - cl_list_item_t *p_list_item; - cl_malloc_hdr_t *p_hdr; - - if( !gp_mem_tracker ) - return; - - cl_spinlock_acquire( &gp_mem_tracker->lock ); - cl_msg_out( "\n\n\n*** Memory Usage ***\n" ); - p_list_item = cl_qlist_head( &gp_mem_tracker->alloc_list ); - while( p_list_item != cl_qlist_end( &gp_mem_tracker->alloc_list ) ) - { - /* - * Get the pointer to the header. Note that the object member of the - * list item will be used to store the pointer to the user's memory. - */ - p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ); - - cl_msg_out( "\tMemory block at %p allocated in file %s line %d\n", - p_hdr->p_mem, p_hdr->file_name, p_hdr->line_num ); - - p_list_item = cl_qlist_next( p_list_item ); - } - cl_msg_out( "*** End of Memory Usage ***\n\n" ); - cl_spinlock_release( &gp_mem_tracker->lock ); -} - -/* - * Check the memory using the magic bits to see if anything corrupted - * our memory. - */ -boolean_t -cl_mem_check( void ) -{ - boolean_t res = TRUE; - -#ifdef _MEM_DEBUG_MODE_ - { - cl_list_item_t *p_list_item; - cl_malloc_hdr_t *p_hdr; - size_t size; - void *p_mem; - - if( !gp_mem_tracker ) - return res; - - cl_spinlock_acquire( &gp_mem_tracker->lock ); - /* cl_msg_out( "\n\n\n*** Memory Checker ***\n" ); */ - p_list_item = cl_qlist_head( &gp_mem_tracker->alloc_list ); - while( p_list_item != cl_qlist_end( &gp_mem_tracker->alloc_list ) ) - { - /* - * Get the pointer to the header. Note that the object member of the - * list item will be used to store the pointer to the user's memory. - */ - p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ); - - /* cl_msg_out( "\tMemory block at %p allocated in file %s line %d\n", - p_hdr->p_mem, p_hdr->file_name, p_hdr->line_num ); */ - - /* calc the start */ - p_mem = (char*)p_hdr->p_mem - sizeof(size) - _MEM_DEBUG_MAGIC_SIZE_; - /* check the header magic: */ - if (memcmp(p_mem, &_MEM_DEBUG_MAGIC_START_, _MEM_DEBUG_MAGIC_SIZE_)) - { - cl_msg_out("\n *** cl_mem_check ERROR: BAD Magic Start in free of memory:%p file:%s line:%d\n", - p_hdr->p_mem , p_hdr->file_name, p_hdr->line_num - ); - res = FALSE; - } - else - { - /* obtain the size from the header */ - memcpy(&size, (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, sizeof(size)); - - if (memcmp((char*)p_mem + sizeof(size) + _MEM_DEBUG_MAGIC_SIZE_ + size, - &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_)) - { - cl_msg_out("\n *** cl_mem_check ERROR: BAD Magic End in free of memory:%p file:%s line:%d\n", - p_hdr->p_mem , p_hdr->file_name, p_hdr->line_num - ); - res = FALSE; - } - } - - p_list_item = cl_qlist_next( p_list_item ); - } - /* cl_msg_out( "*** End of Memory Checker ***\n\n" ); */ - cl_spinlock_release( &gp_mem_tracker->lock ); - } -#endif - return res; -} - -/* - * Allocates memory and stores information about the allocation in a list. - * The contents of the list can be printed out by calling the function - * "MemoryReportUsage". Memory allocation will succeed even if the list - * cannot be created. - */ -void* -__cl_malloc_trk( - IN const char* const p_file_name, - IN const int32_t line_num, - IN const size_t size ) -{ - cl_malloc_hdr_t *p_hdr; - cl_list_item_t *p_list_item; - void *p_mem; - char temp_buf[FILE_NAME_LENGTH]; - int32_t temp_line; - -#ifdef _MEM_DEBUG_MODE_ - /* If we are running in MEM_DEBUG_MODE then - the cl_mem_check will be called on every run */ - if (cl_mem_check() == FALSE) - { - cl_msg_out( "*** MEMORY ERROR !!! ***\n" ); - CL_ASSERT(0); - } -#endif - - /* - * Allocate the memory first, so that we give the user's allocation - * priority over the the header allocation. - */ -#ifndef _MEM_DEBUG_MODE_ - p_mem = __cl_malloc_priv( size ); - if( !p_mem ) - return( NULL ); -#else - p_mem = __cl_malloc_priv( size + sizeof(size) + 32 ); - if( !p_mem ) - return( NULL ); - /* now poisen */ - memset(p_mem, 0xA5, size + _MEM_DEBUG_EXTRA_SIZE_); - /* special layout */ - memcpy(p_mem, &_MEM_DEBUG_MAGIC_START_, _MEM_DEBUG_MAGIC_SIZE_); - memcpy((char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, &size, sizeof(size)); - memcpy((char*)p_mem + sizeof(size) + size + _MEM_DEBUG_MAGIC_SIZE_, - &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_); - p_mem = (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_ + sizeof(size); -#endif - - if( !gp_mem_tracker ) - return( p_mem ); - - /* - * Make copies of the file name and line number in case those - * parameters are in paged pool. - */ - temp_line = line_num; - strncpy( temp_buf, p_file_name, FILE_NAME_LENGTH ); - /* Make sure the string is null terminated. */ - temp_buf[FILE_NAME_LENGTH - 1] = '\0'; - - cl_spinlock_acquire( &gp_mem_tracker->lock ); - - /* Get a header from the free header list. */ - p_list_item = cl_qlist_remove_head( &gp_mem_tracker->free_hdr_list ); - if( p_list_item != cl_qlist_end( &gp_mem_tracker->free_hdr_list ) ) - { - /* Set the header pointer to the header retrieved from the list. */ - p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ); - } - else - { - /* We failed to get a free header. Allocate one. */ - p_hdr = __cl_malloc_priv( sizeof(cl_malloc_hdr_t) ); - if( !p_hdr ) - { - /* We failed to allocate the header. Return the user's memory. */ - cl_spinlock_release( &gp_mem_tracker->lock ); - return( p_mem ); - } - } - memcpy( p_hdr->file_name, temp_buf, FILE_NAME_LENGTH ); - p_hdr->line_num = temp_line; - /* - * We store the pointer to the memory returned to the user. This allows - * searching the list of allocated memory even if the buffer allocated is - * not in the list without dereferencing memory we do not own. - */ - p_hdr->p_mem = p_mem; - - /* Insert the header structure into our allocation list. */ - cl_qlist_insert_tail( &gp_mem_tracker->alloc_list, &p_hdr->list_item ); - cl_spinlock_release( &gp_mem_tracker->lock ); - - return( p_mem ); -} - -/* - * Allocate non-tracked memory. - */ -void* -__cl_malloc_ntrk( - IN const size_t size ) -{ - return( __cl_malloc_priv( size ) ); -} - -void* -__cl_zalloc_trk( - IN const char* const p_file_name, - IN const int32_t line_num, - IN const size_t size ) -{ - void *p_buffer; - - p_buffer = __cl_malloc_trk( p_file_name, line_num, size ); - if( p_buffer ) - memset( p_buffer, 0, size ); - - return( p_buffer ); -} - -void* -__cl_zalloc_ntrk( - IN const size_t size ) -{ - void *p_buffer; - - p_buffer = __cl_malloc_priv( size ); - if( p_buffer ) - memset( p_buffer, 0, size ); - - return( p_buffer ); -} - -static cl_status_t -__cl_find_mem( - IN const cl_list_item_t* const p_list_item, - IN void* const p_memory ) -{ - cl_malloc_hdr_t *p_hdr; - - /* Get the pointer to the header. */ - p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ); - - if( p_memory == p_hdr->p_mem ) - return( CL_SUCCESS ); - - return( CL_NOT_FOUND ); -} - -void -__cl_free_trk( - IN const char* const p_file_name, - IN const int32_t line_num, - IN void* const p_memory ) -{ - cl_malloc_hdr_t *p_hdr; - cl_list_item_t *p_list_item; - -#ifdef _MEM_DEBUG_MODE_ - /* If we are running in MEM_DEBUG_MODE then - the cl_mem_check will be called on every run */ - if (cl_mem_check() == FALSE) - { - cl_msg_out( "*** MEMORY ERROR !!! ***\n" ); - CL_ASSERT(0); - } -#endif - - if( gp_mem_tracker ) - { - cl_spinlock_acquire( &gp_mem_tracker->lock ); - - /* - * Removes an item from the allocation tracking list given a pointer - * To the user's data and returns the pointer to header referencing the - * allocated memory block. - */ - p_list_item = cl_qlist_find_from_tail( &gp_mem_tracker->alloc_list, - __cl_find_mem, p_memory ); - - if( p_list_item != cl_qlist_end(&gp_mem_tracker->alloc_list) ) - { - /* Get the pointer to the header. */ - p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ); - /* Remove the item from the list. */ - cl_qlist_remove_item( &gp_mem_tracker->alloc_list, p_list_item ); - - /* Return the header to the free header list. */ - cl_qlist_insert_head( &gp_mem_tracker->free_hdr_list, - &p_hdr->list_item ); - } else { - cl_msg_out("\n *** cl_free ERROR: free of non tracked memory:%p file:%s line:%d\n", - p_memory , p_file_name, line_num - ); - } - cl_spinlock_release( &gp_mem_tracker->lock ); - } - -#ifdef _MEM_DEBUG_MODE_ - { - size_t size; - void *p_mem; - - /* calc the start */ - p_mem = (char*)p_memory - sizeof(size) - _MEM_DEBUG_MAGIC_SIZE_; - /* check the header magic: */ - if (memcmp(p_mem, &_MEM_DEBUG_MAGIC_START_, _MEM_DEBUG_MAGIC_SIZE_)) - { - cl_msg_out("\n *** cl_free ERROR: BAD Magic Start in free of memory:%p file:%s line:%d\n", - p_memory , p_file_name, line_num - ); - } - else - { - /* obtain the size from the header */ - memcpy(&size, (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, sizeof(size)); - - if (memcmp((char*)p_mem + sizeof(size) + _MEM_DEBUG_MAGIC_SIZE_ + size, - &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_)) - { - cl_msg_out("\n *** cl_free ERROR: BAD Magic End in free of memory:%p file:%s line:%d\n", - p_memory , p_file_name, line_num - ); - } - /* now poisen */ - memset(p_mem, 0x5A, size + _MEM_DEBUG_EXTRA_SIZE_); - } - __cl_free_priv( p_mem ); - } -#else - __cl_free_priv( p_memory ); -#endif -} - -void -__cl_free_ntrk( - IN void* const p_memory ) -{ - __cl_free_priv( p_memory ); -} diff --git a/osm/complib/cl_memory_osd.c b/osm/complib/cl_memory_osd.c deleted file mode 100644 index ac2658b..0000000 --- a/osm/complib/cl_memory_osd.c +++ /dev/null @@ -1,93 +0,0 @@ -/* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Implementation of memory manipulation functions for Linux user mode. - * - * Environment: - * Linux User Mode - * - * $Revision: 1.3 $ - */ - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include - -void* -__cl_malloc_priv( - IN const size_t size ) -{ - return malloc( size ); -} - -void -__cl_free_priv( - IN void* const p_memory ) -{ - free( p_memory ); -} - -void -cl_memset( - IN void* const p_memory, - IN const uint8_t fill, - IN const size_t count ) -{ - memset( p_memory, fill, count ); -} - -void* -cl_memcpy( - IN void* const p_dest, - IN const void* const p_src, - IN const size_t count ) -{ - return( memcpy( p_dest, p_src, count ) ); -} - -int32_t -cl_memcmp( - IN const void* const p_mem, - IN const void* const p_ref, - IN const size_t count ) -{ - return( memcmp( p_mem, p_ref, count ) ); -} - diff --git a/osm/complib/cl_perf.c b/osm/complib/cl_perf.c deleted file mode 100644 index 9450bb1..0000000 --- a/osm/complib/cl_perf.c +++ /dev/null @@ -1,273 +0,0 @@ -/* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Implementation of performance tracking. - * - * Environment: - * All supported environments. - * - * $Revision: 1.3 $ - */ - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include - -/* - * Always turn on performance tracking when building this file to allow the - * performance counter functions to be built into the component library. - * Users control their use of the functions by defining the PERF_TRACK_ON - * keyword themselves before including cl_perf.h to enable the macros to - * resolve to the internal functions. - */ -#define PERF_TRACK_ON - -#include -#include - -uint64_t -__cl_perf_run_calibration( - IN cl_perf_t* const p_perf ); - -/* - * Initialize the state of the performance tracker. - */ -void -__cl_perf_construct( - IN cl_perf_t* const p_perf ) -{ - memset( p_perf, 0, sizeof(cl_perf_t) ); - p_perf->state = CL_UNINITIALIZED; -} - -/* - * Initialize the performance tracker. - */ -cl_status_t -__cl_perf_init( - IN cl_perf_t* const p_perf, - IN const uintn_t num_counters ) -{ - cl_status_t status; - cl_spinlock_t lock; - uintn_t i; - static uint64_t locked_calibration_time = 0; - static uint64_t normal_calibration_time; - - CL_ASSERT( p_perf ); - CL_ASSERT( !p_perf->size && num_counters ); - - /* Construct the performance tracker. */ - __cl_perf_construct( p_perf ); - - /* Allocate an array of counters. */ - p_perf->size = num_counters; - p_perf->data_array = (cl_perf_data_t*) - malloc( sizeof(cl_perf_data_t) * num_counters ); - - if( !p_perf->data_array ) - return( CL_INSUFFICIENT_MEMORY ); - else - memset( p_perf->data_array, 0, - sizeof(cl_perf_data_t) * num_counters ); - - /* Initialize the user's counters. */ - for( i = 0; i < num_counters; i++ ) - { - p_perf->data_array[i].min_time = ((uint64_t)~0); - cl_spinlock_construct( &p_perf->data_array[i].lock ); - } - - for( i = 0; i < num_counters; i++ ) - { - status = cl_spinlock_init( &p_perf->data_array[i].lock ); - if( status != CL_SUCCESS ) - { - __cl_perf_destroy( p_perf, FALSE ); - return( status ); - } - } - - /* - * Run the calibration only if it has not been run yet. Subsequent - * calls will use the results from the first calibration. - */ - if( !locked_calibration_time ) - { - /* - * Perform the calibration under lock to prevent thread context - * switches. - */ - cl_spinlock_construct( &lock ); - status = cl_spinlock_init( &lock ); - if( status != CL_SUCCESS ) - { - __cl_perf_destroy( p_perf, FALSE ); - return( status ); - } - - /* Measure the impact when running at elevated thread priority. */ - cl_spinlock_acquire( &lock ); - locked_calibration_time = __cl_perf_run_calibration( p_perf ); - cl_spinlock_release( &lock ); - cl_spinlock_destroy( &lock ); - - /* Measure the impact when runnin at normal thread priority. */ - normal_calibration_time = __cl_perf_run_calibration( p_perf ); - } - - /* Reset the user's performance counter. */ - p_perf->normal_calibration_time = locked_calibration_time; - p_perf->locked_calibration_time = normal_calibration_time; - p_perf->data_array[0].count = 0; - p_perf->data_array[0].total_time = 0; - p_perf->data_array[0].min_time = ((uint64_t)~0); - - p_perf->state = CL_INITIALIZED; - - return( CL_SUCCESS ); -} - -/* - * Measure the time to take performance counters. - */ -uint64_t -__cl_perf_run_calibration( - IN cl_perf_t* const p_perf ) -{ - uint64_t start_time; - uintn_t i; - PERF_DECLARE( 0 ); - - /* Start timing. */ - start_time = cl_get_time_stamp(); - - /* - * Get the performance counter repeatedly in a loop. Use the first - * user counter as our test counter. - */ - for( i = 0; i < PERF_CALIBRATION_TESTS; i++ ) - { - cl_perf_start( 0 ); - cl_perf_stop( p_perf, 0 ); - } - - /* Calculate the total time for the calibration. */ - return( cl_get_time_stamp() - start_time ); -} - -/* - * Destroy the performance tracker. - */ -void -__cl_perf_destroy( - IN cl_perf_t* const p_perf, - IN const boolean_t display ) -{ - uintn_t i; - - CL_ASSERT( cl_is_state_valid( p_perf->state ) ); - - if( !p_perf->data_array ) - return; - - /* Display the performance data as requested. */ - if( display && p_perf->state == CL_INITIALIZED ) - __cl_perf_display( p_perf ); - - /* Destroy the user's counters. */ - for( i = 0; i < p_perf->size; i++ ) - cl_spinlock_destroy( &p_perf->data_array[i].lock ); - - free( p_perf->data_array ); - p_perf->data_array = NULL; - - p_perf->state = CL_UNINITIALIZED; -} - -/* - * Reset the performance counters. - */ -void -__cl_perf_reset( - IN cl_perf_t* const p_perf ) -{ - uintn_t i; - - for( i = 0; i < p_perf->size; i++ ) - { - cl_spinlock_acquire( &p_perf->data_array[i].lock ); - p_perf->data_array[i].min_time = ((uint64_t)~0); - p_perf->data_array[i].total_time = 0; - p_perf->data_array[i].count = 0; - cl_spinlock_release( &p_perf->data_array[i].lock ); - } -} - -/* - * Display the captured performance data. - */ -void -__cl_perf_display( - IN const cl_perf_t* const p_perf ) -{ - uintn_t i; - - CL_ASSERT( p_perf ); - CL_ASSERT( p_perf->state == CL_INITIALIZED ); - - cl_msg_out( "\n\n\nCL Perf:\tPerformance Data\n" ); - - cl_msg_out( "CL Perf:\tCounter Calibration Time\n" ); - cl_msg_out( "CL Perf:\tLocked TotalTime\tNormal TotalTime\tTest Count\n" ); - cl_msg_out( "CL Perf:\t%"PRIu64"\t%"PRIu64"\t%u\n", - p_perf->locked_calibration_time, p_perf->normal_calibration_time, - PERF_CALIBRATION_TESTS ); - - cl_msg_out( "CL Perf:\tUser Performance Counters\n" ); - cl_msg_out( "CL Perf:\tIndex\tTotalTime\tMinTime\tCount\n" ); - for( i = 0; i < p_perf->size; i++ ) - { - cl_msg_out( "CL Perf:\t%lu\t%"PRIu64"\t%"PRIu64"\t%"PRIu64"\n", - i, p_perf->data_array[i].total_time, - p_perf->data_array[i].min_time, p_perf->data_array[i].count ); - } - cl_msg_out( "CL Perf:\tEnd of User Performance Counters\n" ); -} diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map index 3b8c040..9d9588b 100644 --- a/osm/complib/libosmcomp.map +++ b/osm/complib/libosmcomp.map @@ -1,9 +1,5 @@ OSMCOMP_1.1 { global: - cl_async_proc_construct; - cl_async_proc_init; - cl_async_proc_destroy; - cl_async_proc_queue; complib_init; complib_exit; cl_is_debug; @@ -75,28 +71,6 @@ OSMCOMP_1.1 { cl_fmap_remove; cl_fmap_merge; cl_fmap_delta; - __cl_malloc_priv; - __cl_free_priv; - __cl_mem_track; - cl_mem_display; - cl_mem_check; - __cl_malloc_trk; - __cl_malloc_ntrk; - __cl_zalloc_trk; - __cl_zalloc_ntrk; - __cl_find_mem; - __cl_free_trk; - __cl_free_ntrk; - cl_memset; - cl_memcpy; - cl_memcmp; - __cl_perf_run_calibration; - __cl_perf_construct; - __cl_perf_init; - __cl_perf_run_calibration; - __cl_perf_destroy; - __cl_perf_reset; - __cl_perf_display; cl_qcpool_construct; cl_qcpool_init; cl_qcpool_destroy; @@ -171,13 +145,6 @@ OSMCOMP_1.1 { cl_vector_find_from_end; cl_atomic_spinlock; cl_atomic_dec; - cl_free; - cl_malloc; - cl_perf_construct; - cl_perf_destroy; - cl_perf_display; - cl_perf_init; - cl_perf_reset; cl_zalloc; ib_error_str; ib_async_event_str; diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index 5efc11a..cf1b0e7 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -105,7 +105,6 @@ EXTRA_DIST = \ $(srcdir)/complib/cl_qlockpool.h \ $(srcdir)/complib/cl_event_wheel.h \ $(srcdir)/complib/cl_thread.h \ - $(srcdir)/complib/cl_memory.h \ $(srcdir)/complib/cl_packoff.h \ $(srcdir)/complib/cl_pool.h \ $(srcdir)/complib/cl_types_osd.h \ @@ -118,12 +117,9 @@ EXTRA_DIST = \ $(srcdir)/complib/cl_dispatcher.h \ $(srcdir)/complib/cl_spinlock_osd.h \ $(srcdir)/complib/cl_debug_osd.h \ - $(srcdir)/complib/cl_perf.h \ $(srcdir)/complib/cl_qmap.h \ $(srcdir)/complib/cl_byteswap.h \ - $(srcdir)/complib/cl_async_proc.h \ $(srcdir)/complib/cl_threadpool.h \ - $(srcdir)/complib/cl_memtrack.h \ $(srcdir)/complib/cl_types.h \ $(srcdir)/complib/cl_fleximap.h \ $(srcdir)/complib/cl_qcomppool.h \ diff --git a/osm/include/complib/cl_async_proc.h b/osm/include/complib/cl_async_proc.h deleted file mode 100644 index 8d6a71f..0000000 --- a/osm/include/complib/cl_async_proc.h +++ /dev/null @@ -1,334 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Declaration of the asynchronous processing module. - * - * Environment: - * All - * - * $Revision: 1.3 $ - */ - -#ifndef _CL_ASYNC_PROC_H_ -#define _CL_ASYNC_PROC_H_ - -#include -#include -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****h* Component Library/Asynchronous Processor -* NAME -* Asynchronous Processor -* -* DESCRIPTION -* The asynchronous processor provides threads for executing queued callbacks. -* -* The threads in the asynchronous processor wait for callbacks to be queued. -* -* The asynchronous processor functions operate on a cl_async_proc_t structure -* which should be treated as opaque and manipulated only through the provided -* functions. -* -* SEE ALSO -* Structures: -* cl_async_proc_t, cl_async_proc_item_t -* -* Initialization: -* cl_async_proc_construct, cl_async_proc_init, cl_async_proc_destroy -* -* Manipulation: -* cl_async_proc_queue -*********/ - -/****s* Component Library: Asynchronous Processor/cl_async_proc_t -* NAME -* cl_async_proc_t -* -* DESCRIPTION -* Asynchronous processor structure. -* -* The cl_async_proc_t structure should be treated as opaque, and should be -* manipulated only through the provided functions. -* -* SYNOPSIS -*/ -typedef struct _cl_async_proc -{ - cl_thread_pool_t thread_pool; - cl_qlist_t item_queue; - cl_spinlock_t lock; - -} cl_async_proc_t; -/* -* FIELDS -* item_pool -* Pool of items storing the callback function and contexts to be invoked -* by the asynchronous processor's threads. -* -* thread_pool -* Thread pool that will invoke the callbacks. -* -* item_queue -* Queue of items that the threads should process. -* -* lock -* Lock used to synchronize access to the item pool and queue. -* -* SEE ALSO -* Asynchronous Processor -*********/ - -/* - * Declare the structure so we can reference it in the following function - * prototype. - */ -typedef struct _cl_async_proc_item *__p_cl_async_proc_item_t; - -/****d* Component Library: Asynchronous Processor/cl_pfn_async_proc_cb_t -* NAME -* cl_pfn_async_proc_cb_t -* -* DESCRIPTION -* The cl_pfn_async_proc_cb_t function type defines the prototype for -* callbacks queued to and invoked by the asynchronous processor. -* -* SYNOPSIS -*/ -typedef void -(*cl_pfn_async_proc_cb_t)( - IN struct _cl_async_proc_item *p_item ); -/* -* PARAMETERS -* p_item -* Pointer to the cl_async_proc_item_t structure that was queued in -* a call to cl_async_proc_queue. -* -* NOTES -* This function type is provided as function prototype reference for the -* function provided by users as a parameter to the cl_async_proc_queue -* function. -* -* SEE ALSO -* Asynchronous Processor, cl_async_proc_item_t -*********/ - -/****s* Component Library: Asynchronous Processor/cl_async_proc_item_t -* NAME -* cl_async_proc_item_t -* -* DESCRIPTION -* Asynchronous processor item structure passed to the cl_async_proc_queue -* function to queue a callback for execution. -* -* SYNOPSIS -*/ -typedef struct _cl_async_proc_item -{ - cl_pool_item_t pool_item; - cl_pfn_async_proc_cb_t pfn_callback; - -} cl_async_proc_item_t; -/* -* FIELDS -* pool_item -* Pool item for queuing the item to be invoked by the asynchronous -* processor's threads. This field is defined as a pool item to -* allow items to be managed by a pool. -* -* pfn_callback -* Pointer to a callback function to invoke when the item is dequeued. -* -* SEE ALSO -* Asynchronous Processor, cl_async_proc_queue, cl_pfn_async_proc_cb_t -*********/ - -/****f* Component Library: Asynchronous Processor/cl_async_proc_construct -* NAME -* cl_async_proc_construct -* -* DESCRIPTION -* The cl_async_proc_construct function initializes the state of a -* thread pool. -* -* SYNOPSIS -*/ -void -cl_async_proc_construct( - IN cl_async_proc_t* const p_async_proc ); -/* -* PARAMETERS -* p_async_proc -* [in] Pointer to an asynchronous processor structure. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Allows calling cl_async_proc_destroy without first calling -* cl_async_proc_init. -* -* Calling cl_async_proc_construct is a prerequisite to calling any other -* thread pool function except cl_async_proc_init. -* -* SEE ALSO -* Asynchronous Processor, cl_async_proc_init, cl_async_proc_destroy -*********/ - -/****f* Component Library: Asynchronous Processor/cl_async_proc_init -* NAME -* cl_async_proc_init -* -* DESCRIPTION -* The cl_async_proc_init function initialized an asynchronous processor -* for use. -* -* SYNOPSIS -*/ -cl_status_t -cl_async_proc_init( - IN cl_async_proc_t* const p_async_proc, - IN const uint32_t thread_count, - IN const char* const name ); -/* -* PARAMETERS -* p_async_proc -* [in] Pointer to an asynchronous processor structure to initialize. -* -* thread_count -* [in] Number of threads to be managed by the asynchronous processor. -* -* name -* [in] Name to associate with the threads. The name may be up to 16 -* characters, including a terminating null character. All threads -* created in the asynchronous processor have the same name. -* -* RETURN VALUES -* CL_SUCCESS if the asynchronous processor creation succeeded. -* -* CL_INSUFFICIENT_MEMORY if there was not enough memory to inititalize -* the asynchronous processor. -* -* CL_ERROR if the threads could not be created. -* -* NOTES -* cl_async_proc_init creates and starts the specified number of threads. -* If thread_count is zero, the asynchronous processor creates as many -* threads as there are processors in the system. -* -* SEE ALSO -* Asynchronous Processor, cl_async_proc_construct, cl_async_proc_destroy, -* cl_async_proc_queue -*********/ - -/****f* Component Library: Asynchronous Processor/cl_async_proc_destroy -* NAME -* cl_async_proc_destroy -* -* DESCRIPTION -* The cl_async_proc_destroy function performs any necessary cleanup -* for a thread pool. -* -* SYNOPSIS -*/ -void -cl_async_proc_destroy( - IN cl_async_proc_t* const p_async_proc ); -/* -* PARAMETERS -* p_async_proc -* [in] Pointer to an asynchronous processor structure to destroy. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* This function blocks until all threads exit, and must therefore not -* be called from any of the asynchronous processor's threads. Because of -* its blocking nature, callers of cl_async_proc_destroy must ensure that -* entering a wait state is valid from the calling thread context. -* -* This function should only be called after a call to -* cl_async_proc_construct or cl_async_proc_init. -* -* SEE ALSO -* Asynchronous Processor, cl_async_proc_construct, cl_async_proc_init -*********/ - -/****f* Component Library: Asynchronous Processor/cl_async_proc_queue -* NAME -* cl_async_proc_queue -* -* DESCRIPTION -* The cl_async_proc_queue function queues a callback to an asynchronous -* processor. -* -* SYNOPSIS -*/ -void -cl_async_proc_queue( - IN cl_async_proc_t* const p_async_proc, - IN cl_async_proc_item_t* const p_item ); -/* -* PARAMETERS -* p_async_proc -* [in] Pointer to an asynchronous processor structure to initialize. -* -* p_item -* [in] Pointer to an asynchronous processor item to queue for execution. -* The callback and context fields of the item must be valid. -* -* RETURN VALUES -* This function does not return a value. -* -* SEE ALSO -* Asynchronous Processor, cl_async_proc_init, cl_pfn_async_proc_cb_t -*********/ - -END_C_DECLS - -#endif /* !defined(_CL_ASYNC_PROC_H_) */ diff --git a/osm/include/complib/cl_memory.h b/osm/include/complib/cl_memory.h deleted file mode 100644 index 9a8580b..0000000 --- a/osm/include/complib/cl_memory.h +++ /dev/null @@ -1,663 +0,0 @@ -/* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Declaration of generic memory allocation calls. - * - * Environment: - * All - * - * $Revision: 1.4 $ - */ - -#ifndef _CL_MEMORY_H_ -#define _CL_MEMORY_H_ - -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****h* Public/Memory Management -* NAME -* Memory Management -* -* DESCRIPTION -* The memory management functionality provides memory manipulation -* functions as well as powerful debugging tools. -* -* The Allocation Tracking functionality provides a means for tracking memory -* allocations in order to detect memory leaks. -* -* Memory allocation tracking stores the file name and line number where -* allocations occur. Gathering this information does have an adverse impact -* on performance, and memory tracking should therefore not be enabled in -* release builds of software. -* -* Memory tracking is compiled into the debug version of the library, -* and can be enabled for the release version as well. To Enable memory -* tracking in a release build of the public layer, users should define -* the MEM_TRACK_ON keyword for compilation. -*********/ - -/****i* Public: Memory Management/__cl_mem_track -* NAME -* __cl_mem_track -* -* DESCRIPTION -* The __cl_mem_track function enables or disables memory allocation tracking. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) -__cl_mem_track( - IN const boolean_t start ); -/* -* PARAMETERS -* start -* [in] Specifies whether to start or stop memory tracking. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* This function performs all necessary initialization for tracking -* allocations. Users should never call this function, as it is called by -* the component library framework. -* -* If the Start parameter is set to TRUE, the function starts tracking memory -* usage if not already started. When set to FALSE, memory tracking is stoped -* and all remaining allocations are displayed to the applicable debugger, if -* any. -* -* Starting memory tracking when it is already started has no effect. -* Likewise, stoping memory tracking when it is already stopped has no effect. -* -* SEE ALSO -* Memory Management, cl_mem_display -**********/ - -/****f* Public: Memory Management/cl_mem_display -* NAME -* cl_mem_display -* -* DESCRIPTION -* The cl_mem_display function displays all tracked memory allocations to -* the applicable debugger. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) -cl_mem_display( void ); -/* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Each tracked memory allocation is displayed along with the file name and -* line number that allocated it. -* -* Output is sent to the platform's debugging target, which may be the -* system log file. -* -* SEE ALSO -* Memory Management -**********/ - -/****f* Public: Memory Management/cl_mem_check -* NAME -* cl_mem_check -* -* DESCRIPTION -* The cl_mem_check function checks all tracked memory allocations to -* the applicable debugger. -* -* SYNOPSIS -*/ -boolean_t __attribute__((deprecated)) -cl_mem_check( void ); -/* -* RETURN VALUE -* TRUE if no errors were found. FALSE - otherwise. -* -* NOTES -* Each tracked memory allocation is displayed along with the file name and -* line number that allocated it. -* -* Output is sent to the platform's debugging target, which may be the -* system log file. -* -* SEE ALSO -* Memory Management -**********/ - -/****i* Public: Memory Management/__cl_malloc_trk -* NAME -* __cl_malloc_trk -* -* DESCRIPTION -* The __cl_malloc_trk function allocates and tracks a block of memory. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) * -__cl_malloc_trk( - IN const char* const p_file_name, - IN const int32_t line_num, - IN const size_t size ); -/* -* PARAMETERS -* p_file_name -* [in] Name of the source file initiating the allocation. -* -* line_num -* [in] Line number in the specified file where the allocation is -* initiated -* -* size -* [in] Size of the requested allocation. -* -* RETURN VALUES -* Pointer to allocated memory if successful. -* -* NULL otherwise. -* -* NOTES -* Allocated memory follows alignment rules specific to the different -* environments. -* This function is should not be called directly. The cl_malloc macro will -* redirect users to this function when memory tracking is enabled. -* -* SEE ALSO -* Memory Management, __cl_malloc_ntrk, __cl_zalloc_trk, __cl_free_trk -**********/ - -/****i* Public: Memory Management/__cl_zalloc_trk -* NAME -* __cl_zalloc_trk -* -* DESCRIPTION -* The __cl_zalloc_trk function allocates and tracks a block of memory -* initialized to zero. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) * -__cl_zalloc_trk( - IN const char* const p_file_name, - IN const int32_t line_num, - IN const size_t bytes ); -/* -* PARAMETERS -* p_file_name -* [in] Name of the source file initiating the allocation. -* -* line_num -* [in] Line number in the specified file where the allocation is -* initiated -* -* size -* [in] Size of the requested allocation. -* -* RETURN VALUES -* Pointer to allocated memory if successful. -* -* NULL otherwise. -* -* NOTES -* Allocated memory follows alignment rules specific to the different -* environments. -* This function should not be called directly. The cl_zalloc macro will -* redirect users to this function when memory tracking is enabled. -* -* SEE ALSO -* Memory Management, __cl_zalloc_ntrk, __cl_malloc_trk, __cl_free_trk -**********/ - -/****i* Public: Memory Management/__cl_malloc_ntrk -* NAME -* __cl_malloc_ntrk -* -* DESCRIPTION -* The __cl_malloc_ntrk function allocates a block of memory. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) * -__cl_malloc_ntrk( - IN const size_t size ); -/* -* PARAMETERS -* size -* [in] Size of the requested allocation. -* -* RETURN VALUES -* Pointer to allocated memory if successful. -* -* NULL otherwise. -* -* NOTES -* Allocated memory follows alignment rules specific to the different -* environments. -* This function is should not be called directly. The cl_malloc macro will -* redirect users to this function when memory tracking is not enabled. -* -* SEE ALSO -* Memory Management, __cl_malloc_trk, __cl_zalloc_ntrk, __cl_free_ntrk -**********/ - -/****i* Public: Memory Management/__cl_zalloc_ntrk -* NAME -* __cl_zalloc_ntrk -* -* DESCRIPTION -* The __cl_zalloc_ntrk function allocates a block of memory -* initialized to zero. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) * -__cl_zalloc_ntrk( - IN const size_t bytes ); -/* -* PARAMETERS -* size -* [in] Size of the requested allocation. -* -* RETURN VALUES -* Pointer to allocated memory if successful. -* -* NULL otherwise. -* -* NOTES -* Allocated memory follows alignment rules specific to the different -* environments. -* This function should not be called directly. The cl_zalloc macro will -* redirect users to this function when memory tracking is not enabled. -* -* SEE ALSO -* Memory Management, __cl_zalloc_trk, __cl_malloc_ntrk, __cl_free_ntrk -**********/ - -/****i* Public: Memory Management/__cl_free_trk -* NAME -* __cl_free_trk -* -* DESCRIPTION -* The __cl_free_trk function deallocates a block of tracked memory. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) -__cl_free_trk( - IN const char* const p_file_name, - IN const int32_t line_num, - IN void* const p_memory ); -/* -* PARAMETERS -* p_memory -* [in] Pointer to a memory block. -* -* p_file_name -* [in] Name of the source file initiating the allocation. -* -* line_num -* [in] Line number in the specified file where the allocation is -* initiated -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* The p_memory parameter is the pointer returned by a previous call to -* __cl_malloc_trk, or __cl_zalloc_trk. -* -* __cl_free_trk has no effect if p_memory is NULL. -* -* This function should not be called directly. The cl_free macro will -* redirect users to this function when memory tracking is enabled. -* -* SEE ALSO -* Memory Management, __cl_free_ntrk, __cl_malloc_trk, __cl_zalloc_trk -**********/ - -/****i* Public: Memory Management/__cl_free_ntrk -* NAME -* __cl_free_ntrk -* -* DESCRIPTION -* The __cl_free_ntrk function deallocates a block of memory. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) -__cl_free_ntrk( - IN void* const p_memory ); -/* -* PARAMETERS -* p_memory -* [in] Pointer to a memory block. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* The p_memory parameter is the pointer returned by a previous call to -* __cl_malloc_ntrk, or __cl_zalloc_ntrk. -* -* __cl_free_ntrk has no effect if p_memory is NULL. -* -* This function should not be called directly. The cl_free macro will -* redirect users to this function when memory tracking is not enabled. -* -* SEE ALSO -* Memory Management, __cl_free_ntrk, __cl_malloc_trk, __cl_zalloc_trk -**********/ - -/****f* Public: Memory Management/cl_malloc -* NAME -* cl_malloc -* -* DESCRIPTION -* The cl_malloc function allocates a block of memory. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) * -cl_malloc( - IN const size_t size ); -/* -* PARAMETERS -* size -* [in] Size of the requested allocation. -* -* RETURN VALUES -* Pointer to allocated memory if successful. -* -* NULL otherwise. -* -* NOTES -* Allocated memory follows alignment rules specific to the different -* environments. -* -* SEE ALSO -* Memory Management, cl_free, cl_zalloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp -**********/ - -/****f* Public: Memory Management/cl_zalloc -* NAME -* cl_zalloc -* -* DESCRIPTION -* The cl_zalloc function allocates a block of memory initialized to zero. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) * -cl_zalloc( - IN const size_t size ); -/* -* PARAMETERS -* size -* [in] Size of the requested allocation. -* -* RETURN VALUES -* Pointer to allocated memory if successful. -* -* NULL otherwise. -* -* NOTES -* Allocated memory follows alignment rules specific to the different -* environments. -* -* SEE ALSO -* Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp -**********/ - -/****f* Public: Memory Management/cl_free -* NAME -* cl_free -* -* DESCRIPTION -* The cl_free function deallocates a block of memory. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) -cl_free( - IN void* const p_memory ); -/* -* PARAMETERS -* p_memory -* [in] Pointer to a memory block. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* The p_memory parameter is the pointer returned by a previous call to -* cl_malloc, or cl_zalloc. -* -* cl_free has no effect if p_memory is NULL. -* -* SEE ALSO -* Memory Management, cl_alloc, cl_zalloc -**********/ - -/****f* Public: Memory Management/cl_memset -* NAME -* cl_memset -* -* DESCRIPTION -* The cl_memset function sets every byte in a memory range to a given value. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) -cl_memset( - IN void* const p_memory, - IN const uint8_t fill, - IN const size_t count ); -/* -* PARAMETERS -* p_memory -* [in] Pointer to a memory block. -* -* fill -* [in] Byte value with which to fill the memory. -* -* count -* [in] Number of bytes to set. -* -* RETURN VALUE -* This function does not return a value. -* -* SEE ALSO -* Memory Management, cl_memclr, cl_memcpy, cl_memcmp -**********/ - -/****f* Public: Memory Management/cl_memclr -* NAME -* cl_memclr -* -* DESCRIPTION -* The cl_memclr function sets every byte in a memory range to zero. -* -* SYNOPSIS -*/ -static inline void __attribute__((deprecated)) -cl_memclr( - IN void* const p_memory, - IN const size_t count ) -{ - memset( p_memory, 0, count ); -} -/* -* PARAMETERS -* p_memory -* [in] Pointer to a memory block. -* -* count -* [in] Number of bytes to set. -* -* RETURN VALUE -* This function does not return a value. -* -* SEE ALSO -* Memory Management, cl_memset, cl_memcpy, cl_memcmp -**********/ - -/****f* Public: Memory Management/cl_memcpy -* NAME -* cl_memcpy -* -* DESCRIPTION -* The cl_memcpy function copies a given number of bytes from -* one buffer to another. -* -* SYNOPSIS -*/ -void __attribute__((deprecated)) * -cl_memcpy( - IN void* const p_dest, - IN const void* const p_src, - IN const size_t count ); -/* -* PARAMETERS -* p_dest -* [in] Pointer to the buffer being copied to. -* -* p_src -* [in] Pointer to the buffer being copied from. -* -* count -* [in] Number of bytes to copy from the source buffer to the -* destination buffer. -* -* RETURN VALUE -* This function does not return a value. -* -* SEE ALSO -* Memory Management, cl_memset, cl_memclr, cl_memcmp -**********/ - -/****f* Public: Memory Management/cl_memcmp -* NAME -* cl_memcmp -* -* DESCRIPTION -* The cl_memcmp function compares two memory buffers. -* -* SYNOPSIS -*/ -int32_t __attribute__((deprecated)) -cl_memcmp( - IN const void* const p_mem, - IN const void* const p_ref, - IN const size_t count ); -/* -* PARAMETERS -* p_mem -* [in] Pointer to a memory block being compared. -* -* p_ref -* [in] Pointer to the reference memory block to compare against. -* -* count -* [in] Number of bytes to compare. -* -* RETURN VALUES -* Returns less than zero if p_mem is less than p_ref. -* -* Returns greater than zero if p_mem is greater than p_ref. -* -* Returns zero if the two memory regions are the identical. -* -* SEE ALSO -* Memory Management, cl_memset, cl_memclr, cl_memcpy -**********/ - -#if defined( CL_NO_TRACK_MEM ) && defined( CL_TRACK_MEM ) - #error Conflict: Cannot define both CL_NO_TRACK_MEM and CL_TRACK_MEM. -#endif - -/* - * Turn on memory allocation tracking in debug builds if not explicitly - * disabled or already turned on. - */ -#if defined( _DEBUG_ ) && \ - !defined( CL_NO_TRACK_MEM ) && \ - !defined( CL_TRACK_MEM ) - #define CL_TRACK_MEM -#endif - -/* - * Define allocation macro. - */ -#if defined( CL_TRACK_MEM ) - -#define cl_malloc( a ) \ - __cl_malloc_trk( __FILE__, __LINE__, a ) - -#define cl_zalloc( a ) \ - __cl_zalloc_trk( __FILE__, __LINE__, a ) - -#define cl_free( a ) \ - __cl_free_trk( __FILE__, __LINE__, a ) - -#else /* !defined( CL_TRACK_MEM ) */ - -#define cl_malloc( a ) \ - __cl_malloc_ntrk( a ) - -#define cl_zalloc( a ) \ - __cl_zalloc_ntrk( a ) - -#define cl_free( a ) \ - __cl_free_ntrk( a ) - -#endif /* defined( CL_TRACK_MEM ) */ - -END_C_DECLS - -#endif /* _CL_MEMORY_H_ */ diff --git a/osm/include/complib/cl_memtrack.h b/osm/include/complib/cl_memtrack.h deleted file mode 100644 index 9e97136..0000000 --- a/osm/include/complib/cl_memtrack.h +++ /dev/null @@ -1,96 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Definitions of Data-Structures for memory allocation tracking functions. - * - * Environment: - * All - * - * $Revision: 1.3 $ - */ - - -#ifndef _CL_MEMTRACK_H_ -#define _CL_MEMTRACK_H_ - -#include -#include -#include -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/* Structure to track memory allocations. */ -typedef struct _cl_mem_tracker -{ - /* List for tracking memory allocations. */ - cl_qlist_t alloc_list; - - /* Lock for synchronization. */ - cl_spinlock_t lock; - - /* List to manage free headers. */ - cl_qlist_t free_hdr_list; - -} cl_mem_tracker_t __attribute__((deprecated)); - -#define FILE_NAME_LENGTH 64 - -/* Header for all memory allocations. */ -typedef struct _cl_malloc_hdr -{ - cl_list_item_t list_item; - void *p_mem; - char file_name[FILE_NAME_LENGTH]; - int32_t line_num; - -} cl_malloc_hdr_t __attribute__((deprecated)); - -extern cl_mem_tracker_t *gp_mem_tracker; - -END_C_DECLS - -#endif /* _CL_MEMTRACK_H_ */ diff --git a/osm/include/complib/cl_perf.h b/osm/include/complib/cl_perf.h deleted file mode 100644 index 522f23f..0000000 --- a/osm/include/complib/cl_perf.h +++ /dev/null @@ -1,708 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Declaration of performance tracking. - * - * Environment: - * All - * - * $Revision: 1.3 $ - */ - -#ifndef _CL_PERF_H_ -#define _CL_PERF_H_ - -#include -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****h* Component Library/Performance Counters -* NAME -* Performance Counters -* -* DESCRIPTION -* The performance counters allows timing operations to benchmark -* software performance and help identify potential bottlenecks. -* -* All performance counters are NULL macros when disabled, preventing them -* from adversly affecting performance in builds where the counters are not -* used. -* -* Each counter records elapsed time in micro-seconds, minimum time elapsed, -* and total number of samples. -* -* Each counter is independently protected by a spinlock, allowing use of -* the counters in multi-processor environments. -* -* The impact of serializing access to performance counters is measured, -* allowing measurements to be corrected as necessary. -* -* NOTES -* Performance counters do impact performance, and should only be enabled -* when gathering data. Counters can be enabled or disabled on a per-user -* basis at compile time. To enable the counters, users should define -* the PERF_TRACK_ON keyword before including the cl_perf.h file. -* Undefining the PERF_TRACK_ON keyword disables the performance counters. -* When disabled, all performance tracking calls resolve to no-ops. -* -* When using performance counters, it is the user's responsibility to -* maintain the counter indexes. It is recomended that users define an -* enumerated type to use for counter indexes. It improves readability -* and simplifies maintenance by reducing the work necessary in managing -* the counter indexes. -* -* SEE ALSO -* Structures: -* cl_perf_t -* -* Initialization: -* cl_perf_construct, cl_perf_init, cl_perf_destroy -* -* Manipulation -* cl_perf_reset, cl_perf_display, cl_perf_start, cl_perf_update, -* cl_perf_log, cl_perf_stop -* -* Macros: -* PERF_DECLARE, PERF_DECLARE_START -*********/ - -/* - * Number of times the counter calibration test is executed. This is used - * to determine the average time to use a performance counter. - */ -#define PERF_CALIBRATION_TESTS 100000 - -/****i* Component Library: Performance Counters/cl_perf_data_t -* NAME -* cl_perf_data_t -* -* DESCRIPTION -* The cl_perf_data_t structure is used to tracking information -* for a single counter. -* -* SYNOPSIS -*/ -typedef struct _cl_perf_data -{ - uint64_t count; - uint64_t total_time; - uint64_t min_time; - cl_spinlock_t lock; - -} cl_perf_data_t; -/* -* FIELDS -* count -* Number of samples in the counter. -* -* total_time -* Total time for all samples, in microseconds. -* -* min_time -* Minimum time for any sample in the counter, in microseconds. -* -* lock -* Spinlock to serialize counter updates. -* -* SEE ALSO -* Performance Counters -*********/ - -/****i* Component Library: Performance Counters/cl_perf_t -* NAME -* cl_perf_t -* -* DESCRIPTION -* The cl_perf_t structure serves as a container for a group of performance -* counters and related calibration data. -* -* This structure should be treated as opaque and be manipulated only through -* the provided functions. -* -* SYNOPSIS -*/ -typedef struct _cl_perf -{ - cl_perf_data_t *data_array; - uintn_t size; - uint64_t locked_calibration_time; - uint64_t normal_calibration_time; - cl_state_t state; - -} cl_perf_t; -/* -* FIELDS -* data_array -* Pointer to the array of performance counters. -* -* size -* Number of counters in the counter array. -* -* locked_calibration_time -* Time needed to update counters while holding a spinlock. -* -* normal_calibration_time -* Time needed to update counters while not holding a spinlock. -* -* state -* State of the performance counter provider. -* -* SEE ALSO -* Performance Counters, cl_perf_data_t -*********/ - -/****f* Component Library: Performance Counters/cl_perf_construct -* NAME -* cl_perf_construct -* -* DESCRIPTION -* The cl_perf_construct macro constructs a performance -* tracking container. -* -* SYNOPSIS -*/ -void -cl_perf_construct( - IN cl_perf_t* const p_perf ); -/* -* PARAMETERS -* p_perf -* [in] Pointer to a performance counter container to construct. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* cl_perf_construct allows calling cl_perf_destroy without first calling -* cl_perf_init. -* -* Calling cl_perf_construct is a prerequisite to calling any other -* perfromance counter function except cl_perf_init. -* -* This function is implemented as a macro and has no effect when -* performance counters are disabled. -* -* SEE ALSO -* Performance Counters, cl_perf_init, cl_perf_destroy -*********/ - -/****f* Component Library: Performance Counters/cl_perf_init -* NAME -* cl_perf_init -* -* DESCRIPTION -* The cl_perf_init function initializes a performance counter container -* for use. -* -* SYNOPSIS -*/ -cl_status_t -cl_perf_init( - IN cl_perf_t* const p_perf, - IN const uintn_t num_counters ); -/* -* PARAMETERS -* p_perf -* [in] Pointer to a performance counter container to initalize. -* -* num_cntrs -* [in] Number of counters to allocate in the container. -* -* RETURN VALUES -* CL_SUCCESS if initialization was successful. -* -* CL_INSUFFICIENT_MEMORY if there was not enough memory to initialize -* the container. -* -* CL_ERROR if an error was encountered initializing the locks for the -* performance counters. -* -* NOTES -* This function allocates all memory required for the requested number of -* counters and initializes all locks protecting those counters. After a -* successful initialization, cl_perf_init calibrates the counters and -* resets their value. -* -* This function is implemented as a macro and has no effect when -* performance counters are disabled. -* -* SEE ALSO -* Performance Counters, cl_perf_construct, cl_perf_destroy, cl_perf_display -*********/ - -/****f* Component Library: Performance Counters/cl_perf_destroy -* NAME -* cl_perf_destroy -* -* DESCRIPTION -* The cl_perf_destroy function destroys a performance tracking container. -* -* SYNOPSIS -*/ -void -cl_perf_destroy( - IN cl_perf_t* const p_perf, - IN const boolean_t display ); -/* -* PARAMETERS -* p_perf -* [in] Pointer to a performance counter container to destroy. -* -* display -* [in] If TRUE, causes the performance counters to be displayed. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* cl_perf_destroy frees all resources allocated in a call to cl_perf_init. -* If the display parameter is set to TRUE, displays all counter values -* before deallocating resources. -* -* This function should only be called after a call to cl_perf_construct -* or cl_perf_init. -* -* This function is implemented as a macro and has no effect when -* performance counters are disabled. -* -* SEE ALSO -* Performance Counters, cl_perf_construct, cl_perf_init -*********/ - -/****f* Component Library: Performance Counters/cl_perf_reset -* NAME -* cl_perf_reset -* -* DESCRIPTION -* The cl_perf_reset function resets the counters contained in -* a performance tracking container. -* -* SYNOPSIS -*/ -void -cl_perf_reset( - IN cl_perf_t* const p_perf ); -/* -* PARAMETERS -* p_perf -* [in] Pointer to a performance counter container whose counters -* to reset. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* This function is implemented as a macro and has no effect when -* performance counters are disabled. -* -* SEE ALSO -* Performance Counters -*********/ - -/****f* Component Library: Performance Counters/cl_perf_display -* NAME -* cl_perf_display -* -* DESCRIPTION -* The cl_perf_display function displays the current performance -* counter values. -* -* SYNOPSIS -*/ -void -cl_perf_display( - IN const cl_perf_t* const p_perf ); -/* -* PARAMETERS -* p_perf -* [in] Pointer to a performance counter container whose counter -* values to display. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* This function is implemented as a macro and has no effect when -* performance counters are disabled. -* -* SEE ALSO -* Performance Counters, cl_perf_init -*********/ - -/****d* Component Library: Performance Counters/PERF_DECLARE -* NAME -* PERF_DECLARE -* -* DESCRIPTION -* The PERF_DECLARE macro declares a performance counter variable used -* to store the starting time of a timing sequence. -* -* SYNOPSIS -* PERF_DECLARE( index ) -* -* PARAMETERS -* index -* [in] Index of the performance counter for which to use this -* variable. -* -* NOTES -* Variables should generally be declared on the stack to support -* multi-threading. In cases where a counter needs to be used to -* time operations accross multiple functions, care must be taken to -* ensure that the start time stored in this variable is not overwritten -* before the related performance counter has been updated. -* -* This macro has no effect when performance counters are disabled. -* -* SEE ALSO -* Performance Counters, PERF_DECLARE_START, cl_perf_start, cl_perf_log, -* cl_perf_stop -*********/ - -/****d* Component Library: Performance Counters/PERF_DECLARE_START -* NAME -* PERF_DECLARE_START -* -* DESCRIPTION -* The PERF_DECLARE_START macro declares a performance counter variable -* and sets it to the starting time of a timed sequence. -* -* SYNOPSIS -* PERF_DECLARE_START( index ) -* -* PARAMETERS -* index -* [in] Index of the performance counter for which to use this -* variable. -* -* NOTES -* Variables should generally be declared on the stack to support -* multi-threading. -* -* This macro has no effect when performance counters are disabled. -* -* SEE ALSO -* Performance Counters, PERF_DECLARE, cl_perf_start, cl_perf_log, -* cl_perf_stop -*********/ - -/****d* Component Library: Performance Counters/cl_perf_start -* NAME -* cl_perf_start -* -* DESCRIPTION -* The cl_perf_start macro sets the starting value of a timed sequence. -* -* SYNOPSIS -*/ -void -cl_perf_start( - IN const uintn_t index ); -/* -* PARAMETERS -* index -* [in] Index of the performance counter to set. -* -* NOTES -* This macro has no effect when performance counters are disabled. -* -* SEE ALSO -* Performance Counters, PERF_DECLARE, PERF_DECLARE_START, cl_perf_log, -* cl_perf_update, cl_perf_stop -*********/ - -/****d* Component Library: Performance Counters/cl_perf_update -* NAME -* cl_perf_update -* -* DESCRIPTION -* The cl_perf_update macro adds a timing sample based on a provided start -* time to a counter in a performance counter container. -* -* SYNOPSIS -*/ -void -cl_perf_update( - IN cl_perf_t* const p_perf, - IN const uintn_t index, - IN const uint64_t start_time ); -/* -* PARAMETERS -* p_perf -* [in] Pointer to a performance counter container to whose counter -* the sample should be added. -* -* index -* [in] Number of the performance counter to update with a new sample. -* -* start_time -* [in] Timestamp to use as the start time for the timing sample. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* This macro has no effect when performance counters are disabled. -* -* SEE ALSO -* Performance Counters, PERF_DECLARE, PERF_DECLARE_START, cl_perf_start, -* cl_perf_lob, cl_perf_stop -*********/ - -/****d* Component Library: Performance Counters/cl_perf_log -* NAME -* cl_perf_log -* -* DESCRIPTION -* The cl_perf_log macro adds a given timing sample to a -* counter in a performance counter container. -* -* SYNOPSIS -*/ -void -cl_perf_log( - IN cl_perf_t* const p_perf, - IN const uintn_t index, - IN const uint64_t pc_total_time ); -/* -* PARAMETERS -* p_perf -* [in] Pointer to a performance counter container to whose counter -* the sample should be added. -* -* index -* [in] Number of the performance counter to update with a new sample. -* -* pc_total_time -* [in] Total elapsed time for the sample being added. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* This macro has no effect when performance counters are disabled. -* -* SEE ALSO -* Performance Counters, PERF_DECLARE, PERF_DECLARE_START, cl_perf_start, -* cl_perf_update, cl_perf_stop -*********/ - -/****d* Component Library: Performance Counters/cl_perf_stop -* NAME -* cl_perf_stop -* -* DESCRIPTION -* The cl_perf_log macro updates a counter in a performance counter -* container with a new timing sample. -* -* SYNOPSIS -*/ -void -cl_perf_stop( - IN cl_perf_t* const p_perf, - IN const uintn_t index ); -/* -* PARAMETERS -* p_perf -* [in] Pointer to a performance counter container to whose counter -* a sample should be added. -* -* index -* [in] Number of the performance counter to update with a new sample. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* The ending time stamp is taken and elapsed time calculated before updating -* the specified counter. -* -* This macro has no effect when performance counters are disabled. -* -* SEE ALSO -* Performance Counters, PERF_DECLARE, PERF_DECLARE_START, cl_perf_start, -* cl_perf_log -*********/ - -/* - * PERF_TRACK_ON must be defined by the user before including this file to - * enable performance tracking. To disable tracking, users should undefine - * PERF_TRACK_ON. - */ -#if defined( PERF_TRACK_ON ) -/* - * Enable performance tracking. - */ - -#define cl_perf_construct( p_perf ) \ - __cl_perf_construct( p_perf ) -#define cl_perf_init( p_perf, num_counters ) \ - __cl_perf_init( p_perf, num_counters ) -#define cl_perf_destroy( p_perf, display ) \ - __cl_perf_destroy( p_perf, display ) -#define cl_perf_reset( p_perf ) \ - __cl_perf_reset( p_perf ) -#define cl_perf_display( p_perf ) \ - __cl_perf_display( p_perf ) -#define PERF_DECLARE( index ) \ - uint64_t Pc##index -#define PERF_DECLARE_START( index ) \ - uint64 Pc##index = cl_get_time_stamp() -#define cl_perf_start( index ) \ - (Pc##index = cl_get_time_stamp()) -#define cl_perf_log( p_perf, index, pc_total_time ) \ -{\ - /* Update the performance data. This requires synchronization. */ \ - cl_spinlock_acquire( &((cl_perf_t*)p_perf)->data_array[index].lock ); \ - \ - ((cl_perf_t*)p_perf)->data_array[index].total_time += pc_total_time; \ - ((cl_perf_t*)p_perf)->data_array[index].count++; \ - if( pc_total_time < ((cl_perf_t*)p_perf)->data_array[index].min_time ) \ - ((cl_perf_t*)p_perf)->data_array[index].min_time = pc_total_time; \ - \ - cl_spinlock_release( &((cl_perf_t*)p_perf)->data_array[index].lock ); \ -} -#define cl_perf_update( p_perf, index, start_time ) \ -{\ - /* Get the ending time stamp, and calculate the total time. */ \ - uint64_t pc_total_time = cl_get_time_stamp() - start_time;\ - /* Using stack variable for start time, stop and log */ \ - cl_perf_log( p_perf, index, pc_total_time ); \ -} - -#define cl_perf_stop( p_perf, index ) \ -{\ - cl_perf_update( p_perf, index, Pc##index );\ -} - -#define cl_get_perf_values( p_perf, index, p_total, p_min, p_count ) \ -{\ - *p_total = p_perf->data_array[index].total_time; \ - *p_min = p_perf->data_array[index].min_time; \ - *p_count = p_perf->data_array[index].count; \ -} - -#define cl_get_perf_calibration( p_perf, p_locked_time, p_normal_time ) \ -{\ - *p_locked_time = p_perf->locked_calibration_time; \ - *p_normal_time = p_perf->normal_calibration_time; \ -} - -#define cl_get_perf_string( p_perf, i ) \ -"CL Perf:\t%lu\t%"PRIu64"\t%"PRIu64"\t%"PRIu64"\n", \ - i, p_perf->data_array[i].total_time, \ - p_perf->data_array[i].min_time, p_perf->data_array[i].count - -#else /* PERF_TRACK_ON */ -/* - * Disable performance tracking. - */ - -#define cl_perf_construct( p_perf ) -#define cl_perf_init( p_perf, num_cntrs ) CL_SUCCESS -#define cl_perf_destroy( p_perf, display ) -#define cl_perf_reset( p_perf ) -#define cl_perf_display( p_perf ) -#define PERF_DECLARE( index ) -#define PERF_DECLARE_START( index ) -#define cl_perf_start( index ) -#define cl_perf_log( p_perf, index, pc_total_time ) -#define cl_perf_upadate( p_perf, index, start_time ) -#define cl_perf_stop( p_perf, index ) -#define cl_get_perf_values( p_perf, index, p_total, p_min, p_count ) -#define cl_get_perf_calibration( p_perf, p_locked_time, p_normal_time ) -#endif /* PERF_TRACK_ON */ - -/* - * Internal performance tracking functions. Users should never call these - * functions directly. Instead, use the macros defined above to resolve - * to these functions when PERF_TRACK_ON is defined, which allows disabling - * performance tracking. - */ - -/* - * Initialize the state of the performance tracking structure. - */ -void -__cl_perf_construct( - IN cl_perf_t* const p_perf ); - -/* - * Size the performance tracking information and initialize all - * related structures. - */ -cl_status_t -__cl_perf_init( - IN cl_perf_t* const p_perf, - IN const uintn_t num_counters ); - -/* - * Destroy the performance tracking data. - */ -void -__cl_perf_destroy( - IN cl_perf_t* const p_perf, - IN const boolean_t display ); - -/* - * Reset the performance tracking data. - */ -void -__cl_perf_reset( - IN cl_perf_t* const p_perf ); - -/* - * Display the current performance tracking data. - */ -void -__cl_perf_display( - IN const cl_perf_t* const p_perf ); - - -END_C_DECLS - -#endif /* _CL_PERF_H_ */ -- 1.5.0.1.40.gb40d From swise at opengridcomputing.com Mon Feb 19 15:24:20 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 19 Feb 2007 17:24:20 -0600 Subject: [openib-general] OFA 1.2 tarball creation In-Reply-To: <000001c75476$bcf2eea0$6dcd180a@amr.corp.intel.com> References: <000001c75476$bcf2eea0$6dcd180a@amr.corp.intel.com> Message-ID: <1171927460.8180.70.camel@stevo-desktop> On Mon, 2007-02-19 at 14:38 -0800, Sean Hefty wrote: > >How exactly is various developers' source code pulled together to create > >the nightly OFA tarballs at www.openfabrics.org/builds (could this be > >put on the wiki somewhere?)? I went looking to see if some of Sean's > >work on RDMA CM had made it into these tarballs, and am not seeing code > >with the patches I'm looking for. > > I do not know how OFED creates their tarballs or manages their source. > > >The exact patch I'm after was 'rdma_cm: allow joins to return a unique > >address'. I remember seeing this patch on the ofed_1_2 branch in Sean's > >rdma-dev git repository about two weeks ago, though I don't see the > >ofed_1_2 branch anymore (the patch does exist on the multicast branch). > > Sean, was this patch supposed to make it to the nightly 1.2 tarballs? > > Assuming that ~vlad/ofed_1_2.git is the OFED kernel tree, then this patch does > not appear to be included. I was asked by OFED to publish an ofed_1_2 branch, > which I did, but I do not know if it was used in constructing the OFED tree. > The patch was intended to go into OFED. > The ofed_1_2 tree has the 2.6.20 drivers/modules in drivers/infiniband. They are, I think, the stock 2.6.20 drivers and modules. If there are fixes to any driver post 2.6.20, then patches get created in kernel_patches/fixes directory. These are applied as part of the configuration process when the tree is being built. Look in there to see if your change is in the form of a patch file. So you can't necessarily look at ofed_1_2/drivers/infiniband/core for the exact code, because it may get modified/patched as part of the configure done on the kernel tree during build/installation. Note in addition to patches from kernel_patches/fixes, there are other patches applied based on the kernel version (backports). Also, Sean: If you have changes that you want in OFED 1.2, you need to explicitly post patches to vlad and cc the group. They don't, by process, go pull your kernel tree to get the latest. This is different from the user libs which they pull each time they create a user package. Vlad/Michael, correct me if I'm wrong here. This all took me some time to understand and perhaps documenting this would help folks... Steve. > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From afriedle at open-mpi.org Mon Feb 19 18:41:49 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Mon, 19 Feb 2007 18:41:49 -0800 Subject: [openib-general] OFA 1.2 tarball creation In-Reply-To: <1171927460.8180.70.camel@stevo-desktop> References: <000001c75476$bcf2eea0$6dcd180a@amr.corp.intel.com> <1171927460.8180.70.camel@stevo-desktop> Message-ID: <45DA5FED.709@open-mpi.org> Steve Wise wrote: > The ofed_1_2 tree has the 2.6.20 drivers/modules in drivers/infiniband. > They are, I think, the stock 2.6.20 drivers and modules. If there are > fixes to any driver post 2.6.20, then patches get created in > kernel_patches/fixes directory. These are applied as part of the > configuration process when the tree is being built. Look in there to > see if your change is in the form of a patch file. > > So you can't necessarily look at ofed_1_2/drivers/infiniband/core for > the exact code, because it may get modified/patched as part of the > configure done on the kernel tree during build/installation. Note in > addition to patches from kernel_patches/fixes, there are other patches > applied based on the kernel version (backports). Thanks for the information. I found what I'm after in 'merged_sean_rdma_dev_ofed_1_2.patch' -- should have found that on my own. It's good to know this is going into the builds though, that wasn't obvious to me. Andrew From bunk at stusta.de Mon Feb 19 16:02:11 2007 From: bunk at stusta.de (Adrian Bunk) Date: Tue, 20 Feb 2007 01:02:11 +0100 Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: possible cleanups Message-ID: <20070220000211.GZ13958@stusta.de> This patch contains the following possible cleanups: - don't mark static functions in C files as inline - gcc should know best whether inlining makes sense - never compile the unused cxio_dbg.c - make the following needlessly global functions static: - cxio_hal.c: cxio_hal_clear_qp_ctx() - iwch_provider.c: iwch_get_qp() - #if 0 the following unused global functions: - cxio_hal.c: cxio_allocate_stag() - cxio_resource.: cxio_hal_get_rhdl() - cxio_resource.: cxio_hal_put_rhdl() Signed-off-by: Adrian Bunk --- drivers/infiniband/hw/cxgb3/Makefile | 1 drivers/infiniband/hw/cxgb3/cxio_hal.c | 22 +++++++-------- drivers/infiniband/hw/cxgb3/cxio_hal.h | 5 --- drivers/infiniband/hw/cxgb3/cxio_resource.c | 8 ++++- drivers/infiniband/hw/cxgb3/iwch_cm.c | 5 +-- drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 - drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 drivers/infiniband/hw/cxgb3/iwch_qp.c | 29 ++++++++------------ 8 files changed, 33 insertions(+), 40 deletions(-) --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old 2007-02-17 17:21:03.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile 2007-02-17 17:21:08.000000000 +0100 @@ -8,5 +8,4 @@ ifdef CONFIG_INFINIBAND_CXGB3_DEBUG EXTRA_CFLAGS += -DDEBUG -iw_cxgb3-y += cxio_dbg.o endif --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old 2007-02-17 17:22:53.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h 2007-02-17 17:25:08.000000000 +0100 @@ -144,7 +144,6 @@ void cxio_rdev_close(struct cxio_rdev *rdev); int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq, enum t3_cq_opcode op, u32 credit); -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid); int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq); int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq); int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq); @@ -155,8 +154,6 @@ int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq, struct cxio_ucontext *uctx); int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode); -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr); int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl, u32 *pbl_size, @@ -172,8 +169,6 @@ int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr); void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb); void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb); -u32 cxio_hal_get_rhdl(void); -void cxio_hal_put_rhdl(u32 rhdl); u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp); void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid); int __init cxio_hal_init(void); --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.c.old 2007-02-17 17:23:11.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.c 2007-02-17 17:36:40.000000000 +0100 @@ -46,7 +46,7 @@ static LIST_HEAD(rdev_list); static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; -static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) +static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) { struct cxio_rdev *rdev; @@ -56,8 +56,7 @@ return NULL; } -static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev - *tdev) +static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev) { struct cxio_rdev *rdev; @@ -119,7 +118,7 @@ return 0; } -static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) +static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) { struct rdma_cq_setup setup; setup.id = cqid; @@ -131,7 +130,7 @@ return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); } -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) +static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) { u64 sge_cmd; struct t3_modify_qp_wr *wqe; @@ -426,7 +425,7 @@ } } -static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) +static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) { if (CQE_OPCODE(*cqe) == T3_TERMINATE) return 0; @@ -761,6 +760,7 @@ return err; } +#if 0 /* IN : stag key, pdid, pbl_size * Out: stag index, actaul pbl_size, and pbl_addr allocated. */ @@ -771,6 +771,7 @@ return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR, perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr)); } +#endif /* 0 */ int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, @@ -1030,7 +1031,7 @@ cxio_hal_destroy_rhdl_resource(); } -static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) +static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) { struct t3_swsq *sqp; __u32 ptr = wq->sq_rptr; @@ -1059,9 +1060,8 @@ break; } -static inline void create_read_req_cqe(struct t3_wq *wq, - struct t3_cqe *hw_cqe, - struct t3_cqe *read_cqe) +static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe, + struct t3_cqe *read_cqe) { read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr; read_cqe->len = wq->oldest_read->read_len; @@ -1074,7 +1074,7 @@ /* * Return a ptr to the next read wr in the SWSQ or NULL. */ -static inline void advance_oldest_read(struct t3_wq *wq) +static void advance_oldest_read(struct t3_wq *wq) { u32 rptr = wq->oldest_read - wq->sq + 1; --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_resource.c.old 2007-02-17 17:24:42.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_resource.c 2007-02-17 17:27:17.000000000 +0100 @@ -180,7 +180,7 @@ /* * returns 0 if no resource available */ -static inline u32 cxio_hal_get_resource(struct kfifo *fifo) +static u32 cxio_hal_get_resource(struct kfifo *fifo) { u32 entry; if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32))) @@ -189,11 +189,13 @@ return 0; /* fifo emptry */ } -static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) +static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) { BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0); } +#if 0 + u32 cxio_hal_get_rhdl(void) { return cxio_hal_get_resource(rhdl_fifo); @@ -204,6 +206,8 @@ cxio_hal_put_resource(rhdl_fifo, rhdl); } +#endif /* 0 */ + u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp) { return cxio_hal_get_resource(rscp->tpt_fifo); --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old 2007-02-17 17:25:35.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h 2007-02-17 17:25:41.000000000 +0100 @@ -179,7 +179,6 @@ void iwch_qp_add_ref(struct ib_qp *qp); void iwch_qp_rem_ref(struct ib_qp *qp); -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn); struct iwch_ucontext { struct ib_ucontext ibucontext; --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old 2007-02-17 17:25:50.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-02-17 17:25:57.000000000 +0100 @@ -949,7 +949,7 @@ wake_up(&(to_iwch_qp(qp)->wait)); } -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) +static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) { PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn); return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn); --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c.old 2007-02-17 17:27:31.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c 2007-02-17 17:38:07.000000000 +0100 @@ -37,8 +37,8 @@ #define NO_SUPPORT -1 -static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, - u8 * flit_cnt) +static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, + u8 * flit_cnt) { int i; u32 plen; @@ -97,8 +97,8 @@ return 0; } -static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, - u8 *flit_cnt) +static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, + u8 *flit_cnt) { int i; u32 plen; @@ -138,8 +138,8 @@ return 0; } -static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, - u8 *flit_cnt) +static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, + u8 *flit_cnt) { if (wr->num_sge > 1) return -EINVAL; @@ -159,9 +159,8 @@ /* * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. */ -static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp, - struct ib_sge *sg_list, u32 num_sgle, - u32 * pbl_addr, u8 * page_size) +static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, + u32 num_sgle, u32 * pbl_addr, u8 * page_size) { int i; struct iwch_mr *mhp; @@ -207,9 +206,8 @@ return 0; } -static inline int iwch_build_rdma_recv(struct iwch_dev *rhp, - union t3_wr *wqe, - struct ib_recv_wr *wr) +static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe, + struct ib_recv_wr *wr) { int i, err = 0; u32 pbl_addr[4]; @@ -474,8 +472,7 @@ return err; } -static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, - int tagged) +static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged) { switch (t3err) { case TPT_ERR_STAG: @@ -673,7 +670,7 @@ spin_lock_irqsave(&qhp->lock, *flag); } -static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag) +static void flush_qp(struct iwch_qp *qhp, unsigned long *flag) { if (t3b_device(qhp->rhp)) cxio_set_wq_in_error(&qhp->wq); @@ -685,7 +682,7 @@ /* * Return non zero if at least one RECV was pre-posted. */ -static inline int rqes_posted(struct iwch_qp *qhp) +static int rqes_posted(struct iwch_qp *qhp) { return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV; } --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c.old 2007-02-17 17:27:53.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c 2007-02-17 17:38:23.000000000 +0100 @@ -210,8 +210,7 @@ return state; } -static inline void __state_set(struct iwch_ep_common *epc, - enum iwch_ep_state new) +static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new) { epc->state = new; } @@ -1460,7 +1459,7 @@ /* * Returns whether an ABORT_REQ_RSS message is a negative advice. */ -static inline int is_neg_adv_abort(unsigned int status) +static int is_neg_adv_abort(unsigned int status) { return status == CPL_ERR_RTX_NEG_ADVICE || status == CPL_ERR_PERSIST_NEG_ADVICE; From bunk at stusta.de Mon Feb 19 16:02:13 2007 From: bunk at stusta.de (Adrian Bunk) Date: Tue, 20 Feb 2007 01:02:13 +0100 Subject: [openib-general] [2.6 patch] infiniband/hw/mthca/mthca_mr.c: make 2 functions static Message-ID: <20070220000213.GA13958@stusta.de> This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk --- drivers/infiniband/hw/mthca/mthca_mr.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) --- linux-2.6.20-mm1/drivers/infiniband/hw/mthca/mthca_mr.c.old 2007-02-17 17:41:39.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/mthca/mthca_mr.c 2007-02-17 17:42:22.000000000 +0100 @@ -310,8 +310,9 @@ return mthca_is_memfree(dev) ? (PAGE_SIZE / sizeof (u64)) : 0x7ffffff; } -void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt, - int start_index, u64 *buffer_list, int list_len) +static void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, + struct mthca_mtt *mtt, int start_index, + u64 *buffer_list, int list_len) { u64 __iomem *mtts; int i; @@ -323,8 +324,9 @@ mtts + i); } -void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt, - int start_index, u64 *buffer_list, int list_len) +static void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, + struct mthca_mtt *mtt, int start_index, + u64 *buffer_list, int list_len) { __be64 *mtts; dma_addr_t dma_handle; From arlin.r.davis at intel.com Mon Feb 19 16:09:07 2007 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 19 Feb 2007 16:09:07 -0800 Subject: [openib-general] uDAPL: RDMA Write example Message-ID: Christian, dtest is a simple dapl test with snd/rcv and rdma write/read examples. http://www.openfabrics.org/git/?p=~ardavis/dapl.git;a=blob;f=test/dtest/ dtest.c -arlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Feb 19 16:11:27 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Feb 2007 16:11:27 -0800 Subject: [openib-general] [2.6 patch] infiniband/hw/mthca/mthca_mr.c: make 2 functions static In-Reply-To: <20070220000213.GA13958@stusta.de> (Adrian Bunk's message of "Tue, 20 Feb 2007 01:02:13 +0100") References: <20070220000213.GA13958@stusta.de> Message-ID: Queued for my next merge, thanks. From monil at voltaire.com Mon Feb 19 22:15:04 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 20 Feb 2007 08:15:04 +0200 Subject: [openib-general] [PATCH] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <45D9915B.6070202@voltaire.com> References: <45D9915B.6070202@voltaire.com> Message-ID: <6a122cc00702192215s6ef799abud5ebd27951dbab8b@mail.gmail.com> On 2/19/07, Moni Levy wrote: > This issue was found during partitioning & SM fail over testing. The fix was tested for 24 > hours with pkey reshuffling every few seconds. The patch applies to Roland's master > branch. I found an issue with that patch, I'll post an updated one soon. -- Moni > > SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey > table. The current implementation only queries for the index of the pkey once, when it > creates the device QP and after that moves it into working state, and hence does not > address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to > reconfigure the device QP. > > Signed-off-by: Moni Levy > --- > ipoib.h | 2 ++ > ipoib_ib.c | 22 ++++++++++++++++++++-- > ipoib_main.c | 1 + > ipoib_verbs.c | 4 +++- > 4 files changed, 26 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h > index 07deee8..ed854e8 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib.h > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h > @@ -139,6 +139,7 @@ struct ipoib_dev_priv { > struct delayed_work pkey_task; > struct delayed_work mcast_task; > struct work_struct flush_task; > + struct work_struct flush_restart_qp_task; > struct work_struct restart_task; > struct delayed_work ah_reap_task; > > @@ -261,6 +262,7 @@ struct ipoib_dev_priv *ipoib_intf_alloc( > > int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); > void ipoib_ib_dev_flush(struct work_struct *work); > +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work); > void ipoib_ib_dev_cleanup(struct net_device *dev); > > int ipoib_ib_dev_open(struct net_device *dev); > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index 59d9594..5e2ada9 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -611,7 +611,7 @@ int ipoib_ib_dev_init(struct net_device > return 0; > } > > -void ipoib_ib_dev_flush(struct work_struct *work) > +static void __ipoib_ib_dev_flush(struct work_struct *work, int restart_qp) > { > struct ipoib_dev_priv *cpriv, *priv = > container_of(work, struct ipoib_dev_priv, flush_task); > @@ -630,6 +630,12 @@ void ipoib_ib_dev_flush(struct work_stru > ipoib_dbg(priv, "flushing\n"); > > ipoib_ib_dev_down(dev, 0); > + > + if (restart_qp) { > + ipoib_dbg(priv, "restarting the device QP\n"); > + ipoib_ib_dev_stop(dev); > + ipoib_ib_dev_open(dev); > + } > > /* > * The device could have been brought down between the start and when > @@ -644,11 +650,23 @@ void ipoib_ib_dev_flush(struct work_stru > > /* Flush any child interfaces too */ > list_for_each_entry(cpriv, &priv->child_intfs, list) > - ipoib_ib_dev_flush(&cpriv->flush_task); > + __ipoib_ib_dev_flush(&cpriv->flush_task, restart_qp); > > mutex_unlock(&priv->vlan_mutex); > } > > +void ipoib_ib_dev_flush(struct work_struct *work) > +{ > + /* We only restart the QP in case of PKEY change event */ > + __ipoib_ib_dev_flush(work, 0); > +} > + > +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work) > +{ > + /* We only restart the QP in case of PKEY change event */ > + __ipoib_ib_dev_flush(work, 1); > +} > + > void ipoib_ib_dev_cleanup(struct net_device *dev) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index 705eb1d..da46b79 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -942,6 +942,7 @@ static void ipoib_setup(struct net_devic > INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); > INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); > INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); > + INIT_WORK(&priv->flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp); > INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); > INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); > } > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > index 7b717c6..c249915 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > @@ -252,12 +252,14 @@ void ipoib_event(struct ib_event_handler > container_of(handler, struct ipoib_dev_priv, event_handler); > > if (record->event == IB_EVENT_PORT_ERR || > - record->event == IB_EVENT_PKEY_CHANGE || > record->event == IB_EVENT_PORT_ACTIVE || > record->event == IB_EVENT_LID_CHANGE || > record->event == IB_EVENT_SM_CHANGE || > record->event == IB_EVENT_CLIENT_REREGISTER) { > ipoib_dbg(priv, "Port state change event\n"); > queue_work(ipoib_workqueue, &priv->flush_task); > + } else if (record->event == IB_EVENT_PKEY_CHANGE) { > + ipoib_dbg(priv, "PKEY change event\n"); > + queue_work(ipoib_workqueue, &priv->flush_restart_qp_task); > } > } > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From ogerlitz at voltaire.com Tue Feb 20 00:40:29 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 20 Feb 2007 10:40:29 +0200 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: <000001c75454$523660f0$eed4180a@amr.corp.intel.com> References: <000001c75454$523660f0$eed4180a@amr.corp.intel.com> Message-ID: <45DAB3FD.8060606@voltaire.com> Arlin Davis wrote: > Any insight would be greatly appreciated. It was our assumption that the parent process can continue > to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? As was discussed over this list in few occasions: in contrast to popular thought the fork support was deployed in libibverbs1.1 where OFED 1.1 contains libibverbs1.0 Or. From vlad at lists.openfabrics.org Tue Feb 20 02:24:10 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Tue, 20 Feb 2007 02:24:10 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070220-0200 daily build status Message-ID: <20070220102411.8D72EE6080C@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Failed: From halr at voltaire.com Tue Feb 20 05:27:00 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 08:27:00 -0500 Subject: [openib-general] Port error rate detection In-Reply-To: <45DA0E50.7010002@ornl.gov> References: <45DA0E50.7010002@ornl.gov> Message-ID: <1171978018.4380.298013.camel@hal.voltaire.com> On Mon, 2007-02-19 at 15:53, Steven Carter wrote: > I have a Nagios module that alerts on connectivity, port errors, > speed/width problems. I would like to give it the ability to change the > severity of the alert depending on whether errors are just present or if > they are increasing faster than a specified rate. The intent is to > equip the module to keep the state of the last query and possibly > history, but I wanted to make sure that I was not re-inventing the wheel > first. Is there an attribute or utility that I am overlooking that will > help me do this? Not currently (to my knowledge). The thresholding of rate aspect is similat to what will be supported in the proposed PerfManager. -- Hal > Thanks, > > Steven. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From jlentini at netapp.com Tue Feb 20 06:29:09 2007 From: jlentini at netapp.com (James Lentini) Date: Tue, 20 Feb 2007 09:29:09 -0500 (EST) Subject: [openib-general] OFED 1.2 dapl and dat.conf In-Reply-To: <45D6327B.4060606@ichips.intel.com> References: <1171397522.21471.7.camel@stevo-desktop> <45D37E8E.5050800@ichips.intel.com> <1171561783.3161.165.camel@fc6.xsintricity.com> <45D6327B.4060606@ichips.intel.com> Message-ID: On Fri, 16 Feb 2007, Arlin Davis wrote: > Doug Ledford wrote: > > > On Wed, 2007-02-14 at 13:26 -0800, Arlin Davis wrote: > > > > > Steve Wise wrote: > > > > > > > > > > Currently, the dapl rpms don't install dat.conf. I think they probably > > > > should, eh? Maybe in /etc/dat.conf > > > > > > > > > > > > > > > my specfile is setup to target sysconfdir which is typically set to > > > `$(prefix)/etc' > > > > > > %{_sysconfdir}/dat.conf > > > > > > I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir > > > can help explain? > > > > > > > Note that this setup is problematic on multilib arches. Since the > > dat.conf file hard codes a library path that's different for 32bit/64bit > > arches, installing both a 32bit and 64bit dapl library is impossible > > without munging things. > > > > For RHEL4U5/RHEL5 I changed the dat library to read dat.conf and > > have two separate conf files. A probably better approach would be to > > change the library to use a relative library name that it looks for > > starting from the libraries own directory. Hence if the dapl library is > > in /usr/lib, it looks in /usr/lib. Doing that would allow the > > 32bit/64bit libraries to share the same config file. > > > > > This is a good idea. I will take a look at dladdr options to set > appropriate starting path for dapl libraries when absolute paths are > not specified. > > James, do you see any issues with this approach? Nope. The dat registry should be able to handle provider libraries at any location in the file namespace (provided they are accessible of course). > Vladimir, can you tell me how the OFED 1.2 install scripts are > handling the dat.conf? > > -arlin > From swise at opengridcomputing.com Tue Feb 20 06:43:06 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 20 Feb 2007 08:43:06 -0600 Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: possible cleanups In-Reply-To: <20070220000211.GZ13958@stusta.de> References: <20070220000211.GZ13958@stusta.de> Message-ID: <1171982587.2101.0.camel@stevo-desktop> On Tue, 2007-02-20 at 01:02 +0100, Adrian Bunk wrote: > This patch contains the following possible cleanups: > - don't mark static functions in C files as inline - gcc should know > best whether inlining makes sense > - never compile the unused cxio_dbg.c > - make the following needlessly global functions static: > - cxio_hal.c: cxio_hal_clear_qp_ctx() > - iwch_provider.c: iwch_get_qp() > - #if 0 the following unused global functions: > - cxio_hal.c: cxio_allocate_stag() > - cxio_resource.: cxio_hal_get_rhdl() > - cxio_resource.: cxio_hal_put_rhdl() > You could just remove the code instead of #if 0... > Signed-off-by: Adrian Bunk > > --- > > drivers/infiniband/hw/cxgb3/Makefile | 1 > drivers/infiniband/hw/cxgb3/cxio_hal.c | 22 +++++++-------- > drivers/infiniband/hw/cxgb3/cxio_hal.h | 5 --- > drivers/infiniband/hw/cxgb3/cxio_resource.c | 8 ++++- > drivers/infiniband/hw/cxgb3/iwch_cm.c | 5 +-- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 - > drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 > drivers/infiniband/hw/cxgb3/iwch_qp.c | 29 ++++++++------------ > 8 files changed, 33 insertions(+), 40 deletions(-) > > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old 2007-02-17 17:21:03.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile 2007-02-17 17:21:08.000000000 +0100 > @@ -8,5 +8,4 @@ > > ifdef CONFIG_INFINIBAND_CXGB3_DEBUG > EXTRA_CFLAGS += -DDEBUG > -iw_cxgb3-y += cxio_dbg.o > endif > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old 2007-02-17 17:22:53.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h 2007-02-17 17:25:08.000000000 +0100 > @@ -144,7 +144,6 @@ > void cxio_rdev_close(struct cxio_rdev *rdev); > int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq, > enum t3_cq_opcode op, u32 credit); > -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid); > int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq); > int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq); > int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq); > @@ -155,8 +154,6 @@ > int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq, > struct cxio_ucontext *uctx); > int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode); > -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid, > - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr); > int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, > enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, > u8 page_size, __be64 *pbl, u32 *pbl_size, > @@ -172,8 +169,6 @@ > int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr); > void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb); > void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb); > -u32 cxio_hal_get_rhdl(void); > -void cxio_hal_put_rhdl(u32 rhdl); > u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp); > void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid); > int __init cxio_hal_init(void); > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.c.old 2007-02-17 17:23:11.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.c 2007-02-17 17:36:40.000000000 +0100 > @@ -46,7 +46,7 @@ > static LIST_HEAD(rdev_list); > static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; > > -static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) > +static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) > { > struct cxio_rdev *rdev; > > @@ -56,8 +56,7 @@ > return NULL; > } > > -static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev > - *tdev) > +static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev) > { > struct cxio_rdev *rdev; > > @@ -119,7 +118,7 @@ > return 0; > } > > -static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) > +static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) > { > struct rdma_cq_setup setup; > setup.id = cqid; > @@ -131,7 +130,7 @@ > return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); > } > > -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) > +static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) > { > u64 sge_cmd; > struct t3_modify_qp_wr *wqe; > @@ -426,7 +425,7 @@ > } > } > > -static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) > +static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) > { > if (CQE_OPCODE(*cqe) == T3_TERMINATE) > return 0; > @@ -761,6 +760,7 @@ > return err; > } > > +#if 0 > /* IN : stag key, pdid, pbl_size > * Out: stag index, actaul pbl_size, and pbl_addr allocated. > */ > @@ -771,6 +771,7 @@ > return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR, > perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr)); > } > +#endif /* 0 */ > > int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, > enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, > @@ -1030,7 +1031,7 @@ > cxio_hal_destroy_rhdl_resource(); > } > > -static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) > +static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) > { > struct t3_swsq *sqp; > __u32 ptr = wq->sq_rptr; > @@ -1059,9 +1060,8 @@ > break; > } > > -static inline void create_read_req_cqe(struct t3_wq *wq, > - struct t3_cqe *hw_cqe, > - struct t3_cqe *read_cqe) > +static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe, > + struct t3_cqe *read_cqe) > { > read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr; > read_cqe->len = wq->oldest_read->read_len; > @@ -1074,7 +1074,7 @@ > /* > * Return a ptr to the next read wr in the SWSQ or NULL. > */ > -static inline void advance_oldest_read(struct t3_wq *wq) > +static void advance_oldest_read(struct t3_wq *wq) > { > > u32 rptr = wq->oldest_read - wq->sq + 1; > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_resource.c.old 2007-02-17 17:24:42.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_resource.c 2007-02-17 17:27:17.000000000 +0100 > @@ -180,7 +180,7 @@ > /* > * returns 0 if no resource available > */ > -static inline u32 cxio_hal_get_resource(struct kfifo *fifo) > +static u32 cxio_hal_get_resource(struct kfifo *fifo) > { > u32 entry; > if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32))) > @@ -189,11 +189,13 @@ > return 0; /* fifo emptry */ > } > > -static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) > +static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) > { > BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0); > } > > +#if 0 > + > u32 cxio_hal_get_rhdl(void) > { > return cxio_hal_get_resource(rhdl_fifo); > @@ -204,6 +206,8 @@ > cxio_hal_put_resource(rhdl_fifo, rhdl); > } > > +#endif /* 0 */ > + > u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp) > { > return cxio_hal_get_resource(rscp->tpt_fifo); > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old 2007-02-17 17:25:35.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h 2007-02-17 17:25:41.000000000 +0100 > @@ -179,7 +179,6 @@ > > void iwch_qp_add_ref(struct ib_qp *qp); > void iwch_qp_rem_ref(struct ib_qp *qp); > -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn); > > struct iwch_ucontext { > struct ib_ucontext ibucontext; > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old 2007-02-17 17:25:50.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-02-17 17:25:57.000000000 +0100 > @@ -949,7 +949,7 @@ > wake_up(&(to_iwch_qp(qp)->wait)); > } > > -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) > +static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) > { > PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn); > return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn); > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c.old 2007-02-17 17:27:31.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c 2007-02-17 17:38:07.000000000 +0100 > @@ -37,8 +37,8 @@ > > #define NO_SUPPORT -1 > > -static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, > - u8 * flit_cnt) > +static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, > + u8 * flit_cnt) > { > int i; > u32 plen; > @@ -97,8 +97,8 @@ > return 0; > } > > -static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, > - u8 *flit_cnt) > +static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, > + u8 *flit_cnt) > { > int i; > u32 plen; > @@ -138,8 +138,8 @@ > return 0; > } > > -static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, > - u8 *flit_cnt) > +static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, > + u8 *flit_cnt) > { > if (wr->num_sge > 1) > return -EINVAL; > @@ -159,9 +159,8 @@ > /* > * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. > */ > -static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp, > - struct ib_sge *sg_list, u32 num_sgle, > - u32 * pbl_addr, u8 * page_size) > +static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, > + u32 num_sgle, u32 * pbl_addr, u8 * page_size) > { > int i; > struct iwch_mr *mhp; > @@ -207,9 +206,8 @@ > return 0; > } > > -static inline int iwch_build_rdma_recv(struct iwch_dev *rhp, > - union t3_wr *wqe, > - struct ib_recv_wr *wr) > +static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe, > + struct ib_recv_wr *wr) > { > int i, err = 0; > u32 pbl_addr[4]; > @@ -474,8 +472,7 @@ > return err; > } > > -static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, > - int tagged) > +static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged) > { > switch (t3err) { > case TPT_ERR_STAG: > @@ -673,7 +670,7 @@ > spin_lock_irqsave(&qhp->lock, *flag); > } > > -static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag) > +static void flush_qp(struct iwch_qp *qhp, unsigned long *flag) > { > if (t3b_device(qhp->rhp)) > cxio_set_wq_in_error(&qhp->wq); > @@ -685,7 +682,7 @@ > /* > * Return non zero if at least one RECV was pre-posted. > */ > -static inline int rqes_posted(struct iwch_qp *qhp) > +static int rqes_posted(struct iwch_qp *qhp) > { > return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV; > } > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c.old 2007-02-17 17:27:53.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c 2007-02-17 17:38:23.000000000 +0100 > @@ -210,8 +210,7 @@ > return state; > } > > -static inline void __state_set(struct iwch_ep_common *epc, > - enum iwch_ep_state new) > +static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new) > { > epc->state = new; > } > @@ -1460,7 +1459,7 @@ > /* > * Returns whether an ABORT_REQ_RSS message is a negative advice. > */ > -static inline int is_neg_adv_abort(unsigned int status) > +static int is_neg_adv_abort(unsigned int status) > { > return status == CPL_ERR_RTX_NEG_ADVICE || > status == CPL_ERR_PERSIST_NEG_ADVICE; > From scarter at ornl.gov Tue Feb 20 06:44:59 2007 From: scarter at ornl.gov (Steven Carter) Date: Tue, 20 Feb 2007 09:44:59 -0500 Subject: [openib-general] Port error rate detection In-Reply-To: <1171978018.4380.298013.camel@hal.voltaire.com> References: <45DA0E50.7010002@ornl.gov> <1171978018.4380.298013.camel@hal.voltaire.com> Message-ID: <45DB096B.2060306@ornl.gov> Hal Rosenstock wrote: > On Mon, 2007-02-19 at 15:53, Steven Carter wrote: > >> I have a Nagios module that alerts on connectivity, port errors, >> speed/width problems. I would like to give it the ability to change the >> severity of the alert depending on whether errors are just present or if >> they are increasing faster than a specified rate. The intent is to >> equip the module to keep the state of the last query and possibly >> history, but I wanted to make sure that I was not re-inventing the wheel >> first. Is there an attribute or utility that I am overlooking that will >> help me do this? >> > > Not currently (to my knowledge). The thresholding of rate aspect is > similat to what will be supported in the proposed PerfManager. > I noticed that in your RFC. How are you planning on presenting the data to other agents (e.g. Nagios, Openview, MRTG, etc.)? One comment that I should have made on your RFC is that I wonder if it is necessary to include the data analysis/reduction part. Just having a central location that collects the values and presents it via SNMP is extremely useful since there are a plethora of monitoring apps (free and commercial) that do what you are proposing. That way, a network manager can leverage existing tools currently used for monitoring Ethernet Nodes, Hosts, etc. You can still include a last change attribute with each counter so that simple utilities (like the one that I am writing) can get an idea of how quickly errors are occurring. Steven. > -- Hal > > >> Thanks, >> >> Steven. >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> >> > > From halr at voltaire.com Tue Feb 20 06:43:01 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 09:43:01 -0500 Subject: [openib-general] [PATCH] osm_vendor_ibumad: termination crash fix In-Reply-To: <20070219214630.GW27414@sashak.voltaire.com> References: <20070219214630.GW27414@sashak.voltaire.com> Message-ID: <1171982581.4380.302584.camel@hal.voltaire.com> On Mon, 2007-02-19 at 16:46, Sasha Khapyorsky wrote: > When OpenSM is terminated umad_receiver thread still running even after > the structures are destroyed and freed, this causes to random (but easily > reproducible) crashes. The reason is that osm_vendor_delete() does not > care about thread termination. This patch adds the receiver thread > cancellation (by using pthread_cancel() and pthread_join()) and cares to > keep have all mutexes unlocked upon termination. There is also minor > termination code consolidation - osm_vendor_port_close() function. > > Signed-off-by: Sasha Khapyorsky Good find. Thanks! Applied (to both master and ofed_1_2). -- Hal From dledford at redhat.com Tue Feb 20 06:48:08 2007 From: dledford at redhat.com (Doug Ledford) Date: Tue, 20 Feb 2007 09:48:08 -0500 Subject: [openib-general] OFED 1.2 dapl and dat.conf In-Reply-To: References: <1171397522.21471.7.camel@stevo-desktop> <45D37E8E.5050800@ichips.intel.com> <1171561783.3161.165.camel@fc6.xsintricity.com> <45D6327B.4060606@ichips.intel.com> Message-ID: <1171982888.3161.283.camel@fc6.xsintricity.com> On Tue, 2007-02-20 at 09:29 -0500, James Lentini wrote: > > On Fri, 16 Feb 2007, Arlin Davis wrote: > > > Doug Ledford wrote: > > > > > On Wed, 2007-02-14 at 13:26 -0800, Arlin Davis wrote: > > > > > > > Steve Wise wrote: > > > > > > > > > > > > > Currently, the dapl rpms don't install dat.conf. I think they probably > > > > > should, eh? Maybe in /etc/dat.conf > > > > > > > > > > > > > > > > > > > my specfile is setup to target sysconfdir which is typically set to > > > > `$(prefix)/etc' > > > > > > > > %{_sysconfdir}/dat.conf > > > > > > > > I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir > > > > can help explain? > > > > > > > > > > Note that this setup is problematic on multilib arches. Since the > > > dat.conf file hard codes a library path that's different for 32bit/64bit > > > arches, installing both a 32bit and 64bit dapl library is impossible > > > without munging things. > > > > > > For RHEL4U5/RHEL5 I changed the dat library to read dat.conf and > > > have two separate conf files. A probably better approach would be to > > > change the library to use a relative library name that it looks for > > > starting from the libraries own directory. Hence if the dapl library is > > > in /usr/lib, it looks in /usr/lib. Doing that would allow the > > > 32bit/64bit libraries to share the same config file. > > > > > > > > This is a good idea. I will take a look at dladdr options to set > > appropriate starting path for dapl libraries when absolute paths are > > not specified. > > > > James, do you see any issues with this approach? > > Nope. The dat registry should be able to handle provider libraries at > any location in the file namespace (provided they are accessible of > course). Yep. Although if you want the 64bit and 32bit dat.conf to be identical, then the best bet would be something like putting the main library in /usr/lib or /usr/lib64 and then doing a relative path from there to the provider libs, such as dapl/provider/libname.so. That way, the same filespec in the dat.conf file will find either the 64bit or 32bit provider lib depending on whether the 64bit or 32bit main library is the one searching for it. > > Vladimir, can you tell me how the OFED 1.2 install scripts are > > handling the dat.conf? > > > > -arlin > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From halr at voltaire.com Tue Feb 20 06:47:52 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 09:47:52 -0500 Subject: [openib-general] Port error rate detection In-Reply-To: <45DB096B.2060306@ornl.gov> References: <45DA0E50.7010002@ornl.gov> <1171978018.4380.298013.camel@hal.voltaire.com> <45DB096B.2060306@ornl.gov> Message-ID: <1171982868.4380.302888.camel@hal.voltaire.com> On Tue, 2007-02-20 at 09:44, Steven Carter wrote: > Hal Rosenstock wrote: > > On Mon, 2007-02-19 at 15:53, Steven Carter wrote: > > > >> I have a Nagios module that alerts on connectivity, port errors, > >> speed/width problems. I would like to give it the ability to change the > >> severity of the alert depending on whether errors are just present or if > >> they are increasing faster than a specified rate. The intent is to > >> equip the module to keep the state of the last query and possibly > >> history, but I wanted to make sure that I was not re-inventing the wheel > >> first. Is there an attribute or utility that I am overlooking that will > >> help me do this? > >> > > > > Not currently (to my knowledge). The thresholding of rate aspect is > > similat to what will be supported in the proposed PerfManager. > > > I noticed that in your RFC. How are you planning on presenting the data > to other agents (e.g. Nagios, Openview, MRTG, etc.)? One comment that I > should have made on your RFC is that I wonder if it is necessary to > include the data analysis/reduction part. I think it is because there is too much data to push up the tree to one manager. > Just having a central location that collects the values and presents it via SNMP is extremely > useful since there are a plethora of monitoring apps (free and > commercial) that do what you are proposing. In general, this information can be exported via SNMP or whatever the management infrastructure is. BTW, are there SNMP MIBs for all of this information ? To my knowledge, some of these were started but never completed. Also, the MIBs were geared at the agents rather than the managers (in the PerfMgt arena). -- Hal > That way, a network manager can leverage existing tools currently used for monitoring > Ethernet Nodes, Hosts, etc. You can still include a last change > attribute with each counter so that simple utilities (like the one that > I am writing) can get an idea of how quickly errors are occurring. > Steven. > > > -- Hal > > > > > >> Thanks, > >> > >> Steven. > >> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >> > >> > > > > > From vlad at mellanox.co.il Tue Feb 20 07:05:48 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 20 Feb 2007 17:05:48 +0200 Subject: [openib-general] OFED 1.2 dapl and dat.conf In-Reply-To: References: <1171397522.21471.7.camel@stevo-desktop> <45D37E8E.5050800@ichips.intel.com> <1171561783.3161.165.camel@fc6.xsintricity.com> <45D6327B.4060606@ichips.intel.com> Message-ID: <1171983948.4051.13.camel@vladsk-laptop> > > Vladimir, can you tell me how the OFED 1.2 install scripts are > > handling the dat.conf? > > > > -arlin > > dat.conf updated by rpmbuild process: /usr/lib is replaced by %{_libdir} (/lib for x86, ppc, ia64 and /lib64 otherwise). -- Vladimir Sokolovsky Mellanox Technologies Ltd. From halr at voltaire.com Tue Feb 20 07:06:51 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 10:06:51 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: References: Message-ID: <1171984010.4380.304008.camel@hal.voltaire.com> On Mon, 2007-02-19 at 01:40, Or Gerlitz wrote: > Hi Sean, > > this fixes a bug which did not allow to run librdmacm apps over a node > which is partial member of a partition. The patch takes the approach of the > kernel ib_find_cached_pkey implementation. > > If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix. > > Or. > > ---------------------------------------------------------------------- > The pkey extracted by the RDMA CM from the IPoIB device hardware address always > has the full membership bit set. However, when looking in the pkey table the > search must mask out the full membership bit. > > Signed-off-by: Or Gerlitz > Signed-off-by: Olga Shern > > diff --git a/src/cma.c b/src/cma.c > index c5f8cd9..9c24c6a 100644 > --- a/src/cma.c > +++ b/src/cma.c > @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev > > for (i = 0, ret = 0; !ret; i++) { > ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey); > - if (!ret && pkey == chk_pkey) { > + if ((!ret && pkey == chk_pkey) || (!ret && htons(ntohs(pkey) & 0x7fff) == chk_pkey)) { Is this true for both RC and UD QPs ? I thought that at least the UD QPs were being used for multicast in which case wouldn't full member be required for this ? -- Hal > *pkey_index = (uint16_t) i; > return 0; > } > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Tue Feb 20 07:38:29 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 20 Feb 2007 17:38:29 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1171984010.4380.304008.camel@hal.voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> Message-ID: <45DB15F5.4090406@voltaire.com> Hal Rosenstock wrote: >> The pkey extracted by the RDMA CM from the IPoIB device hardware address always >> has the full membership bit set. However, when looking in the pkey table the >> search must mask out the full membership bit. > Is this true for both RC and UD QPs ? I thought that at least the UD QPs > were being used for multicast in which case wouldn't full member be > required for this ? Yes. Its a little bit confusing: partial and full members of an IPoIB IB partition use the same MGID. When an IPoIB MGID is constructed, the pkey placed by the driver is --always-- the full membership one. However, on a node with partial membership, what's plugged into the QP is the pkey index of the partial instance... In the kernel all this is nicely hidden from the IB ULPs in ib_find_cached_pkey(). Or. From vlad at mellanox.co.il Tue Feb 20 07:44:49 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 20 Feb 2007 17:44:49 +0200 Subject: [openib-general] OFED 1.2 dapl and dat.conf In-Reply-To: <1171984868.3161.293.camel@fc6.xsintricity.com> References: <1171397522.21471.7.camel@stevo-desktop> <45D37E8E.5050800@ichips.intel.com> <1171561783.3161.165.camel@fc6.xsintricity.com> <45D6327B.4060606@ichips.intel.com> <1171983948.4051.13.camel@vladsk-laptop> <1171984868.3161.293.camel@fc6.xsintricity.com> Message-ID: <1171986289.4051.24.camel@vladsk-laptop> On Tue, 2007-02-20 at 10:21 -0500, Doug Ledford wrote: > On Tue, 2007-02-20 at 17:05 +0200, Vladimir Sokolovsky wrote: > > > > Vladimir, can you tell me how the OFED 1.2 install scripts are > > > > handling the dat.conf? > > > > > > > > -arlin > > > > > > > > dat.conf updated by rpmbuild process: > > /usr/lib is replaced by %{_libdir} (/lib for x86, ppc, ia64 and /lib64 otherwise). > > Which creates a multilib regression, aka when you install both the i386 > and x86_64 versions of the dapl rpm, they both contain a dat.conf file > at the same location in the filesystem, but with different contents. > Whether you get the 32bit or 64bit version of the dat.conf file depends > on which is installed later. Correspondingly, whichever version of the > library was installed first will be rendered inoperative by this problem > as it will be either a 32 or 64bit library that is searching for a > provider library, and the one it finds will be the opposite arch type of > itself, thereby preventing the dapl library from doing a dlopen on the > file. Therefore, whatever version of the dapl library is installed > first will no longer be able to find any valid provider libraries. This > is considered an error condition by our automated package testing tools > and we are not allowed to ship a package in this state. > I can create /etc/dat32.conf and /etc/dat64.conf. Currently, in the OFED there is no separation to 32 and 64 bit RPMs. That is on x86_64, fot example, if 32bit libraries compilation succeeded then both 32 and 64bit libraries will be a part of the same RPM. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From halr at voltaire.com Tue Feb 20 07:42:40 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 10:42:40 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DB15F5.4090406@voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> Message-ID: <1171986159.4380.306117.camel@hal.voltaire.com> On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote: > Hal Rosenstock wrote: > > >> The pkey extracted by the RDMA CM from the IPoIB device hardware address always > >> has the full membership bit set. However, when looking in the pkey table the > >> search must mask out the full membership bit. > > > Is this true for both RC and UD QPs ? I thought that at least the UD QPs > > were being used for multicast in which case wouldn't full member be > > required for this ? > > Yes. Its a little bit confusing: partial and full members of an IPoIB IB > partition use the same MGID. When an IPoIB MGID is constructed, the pkey > placed by the driver is --always-- the full membership one. However, on > a node with partial membership, what's plugged into the QP is the pkey > index of the partial instance... So in this case, do both the full and partial keys need configuring for that port ? -- Hal > In the kernel all this is nicely hidden from the IB ULPs in > ib_find_cached_pkey(). > > Or. > From dledford at redhat.com Tue Feb 20 07:55:27 2007 From: dledford at redhat.com (Doug Ledford) Date: Tue, 20 Feb 2007 10:55:27 -0500 Subject: [openib-general] OFED 1.2 dapl and dat.conf In-Reply-To: <1171986289.4051.24.camel@vladsk-laptop> References: <1171397522.21471.7.camel@stevo-desktop> <45D37E8E.5050800@ichips.intel.com> <1171561783.3161.165.camel@fc6.xsintricity.com> <45D6327B.4060606@ichips.intel.com> <1171983948.4051.13.camel@vladsk-laptop> <1171984868.3161.293.camel@fc6.xsintricity.com> <1171986289.4051.24.camel@vladsk-laptop> Message-ID: <1171986927.3161.297.camel@fc6.xsintricity.com> On Tue, 2007-02-20 at 17:44 +0200, Vladimir Sokolovsky wrote: > On Tue, 2007-02-20 at 10:21 -0500, Doug Ledford wrote: > > On Tue, 2007-02-20 at 17:05 +0200, Vladimir Sokolovsky wrote: > > > > > Vladimir, can you tell me how the OFED 1.2 install scripts are > > > > > handling the dat.conf? > > > > > > > > > > -arlin > > > > > > > > > > > dat.conf updated by rpmbuild process: > > > /usr/lib is replaced by %{_libdir} (/lib for x86, ppc, ia64 and /lib64 otherwise). > > > > Which creates a multilib regression, aka when you install both the i386 > > and x86_64 versions of the dapl rpm, they both contain a dat.conf file > > at the same location in the filesystem, but with different contents. > > Whether you get the 32bit or 64bit version of the dat.conf file depends > > on which is installed later. Correspondingly, whichever version of the > > library was installed first will be rendered inoperative by this problem > > as it will be either a 32 or 64bit library that is searching for a > > provider library, and the one it finds will be the opposite arch type of > > itself, thereby preventing the dapl library from doing a dlopen on the > > file. Therefore, whatever version of the dapl library is installed > > first will no longer be able to find any valid provider libraries. This > > is considered an error condition by our automated package testing tools > > and we are not allowed to ship a package in this state. > > > I can create /etc/dat32.conf and /etc/dat64.conf. That's pretty much what I did for our next release, but it's crude. The other solution we've been discussing would be far preferable. > Currently, in the OFED there is no separation to 32 and 64 bit RPMs. > That is on x86_64, fot example, if 32bit libraries compilation succeeded > then both 32 and 64bit libraries will be a part of the same RPM. Assuming you actually built both 32 and 64bit dapl libraries, them being in the same rpm wouldn't solve the problem that the generated dat.conf would only be correct for one or the other, not for both. And if you want to create a dat32.conf and dat64.conf, then you need to munge the dapl source code so that during a 64bit build it looks for dat64.conf and in a 32bit build it looks for dat32.conf. However, you need to munge the source code in such a way as to have it be the same on both 32bit and 64bit builds, aka something like: #ifdef __i386__ default_dapl_file = "/usr/local/ofed/etc/dat32.conf"; #else default_dapl_file = "/usr/local/ofed/etc/dat64.conf"; #endif If you make the mistake I made, which was to do one patch on 32bit arches and a different patch on 64bit arches, then the source code between the 32 and 64bit arches differs, and guess what, that throws a multilib regression as well because it breaks debuginfo packages :-/ I'll fix that in our next release. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From sean.hefty at intel.com Tue Feb 20 10:12:28 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 20 Feb 2007 10:12:28 -0800 Subject: [openib-general] OFA 1.2 tarball creation In-Reply-To: <1171927460.8180.70.camel@stevo-desktop> Message-ID: <000501c7551a$ae3fb5b0$8698070a@amr.corp.intel.com> >The ofed_1_2 tree has the 2.6.20 drivers/modules in drivers/infiniband. >They are, I think, the stock 2.6.20 drivers and modules. If there are >fixes to any driver post 2.6.20, then patches get created in >kernel_patches/fixes directory. These are applied as part of the >configuration process when the tree is being built. Look in there to >see if your change is in the form of a patch file. The patch is part of a merged_sean_rdma_dev_ofed_1_2.patch file, so it looks like it is in OFED 1.2. - Sean From mst at mellanox.co.il Tue Feb 20 10:17:55 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Feb 2007 20:17:55 +0200 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth Message-ID: <20070220181755.GC11825@mellanox.co.il> Avoid overhead of freeing/reallocating and mapping/unmapping for dma for pages that have not been written to by hardware. Signed-off-by: Michael S. Tsirkin --- This gives >10% boost in BW for message sizes up to 32K. Please queue for 2.6.21. before: # ./netperf-2.4.2/src/netperf -f M -H 11.4.3.68 -c -C -- -m 32000 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. MBytes /s % S % S us/KB us/KB 87380 16384 32000 10.00 716.23 26.22 23.94 1.430 1.306 after: # ./netperf-2.4.2/src/netperf -f M -H 11.4.3.68 -c -C -- -m 32000 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. MBytes /s % S % S us/KB us/KB 87380 16384 32000 10.00 888.67 24.13 25.08 1.061 1.102 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 8ee6f06..a23c8e3 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -68,14 +68,14 @@ struct ipoib_cm_id { static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags, u64 mapping[IPOIB_CM_RX_SG]) { int i; ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); - for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i) + for (i = 0; i < frags; ++i) ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); } @@ -93,7 +93,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr); if (unlikely(ret)) { ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret); - ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[id].mapping); + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + priv->cm.srq_ring[id].mapping); dev_kfree_skb_any(priv->cm.srq_ring[id].skb); priv->cm.srq_ring[id].skb = NULL; } @@ -101,8 +102,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) return ret; } -static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, - u64 mapping[IPOIB_CM_RX_SG]) +static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int frags, + u64 mapping[IPOIB_CM_RX_SG]) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; @@ -110,7 +111,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12); if (unlikely(!skb)) - return -ENOMEM; + return NULL; /* * IPoIB adds a 4 byte header. So we need 12 more bytes to align the @@ -122,10 +123,10 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, DMA_FROM_DEVICE); if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) { dev_kfree_skb_any(skb); - return -EIO; + return NULL; } - for (i = 0; i < IPOIB_CM_RX_SG - 1; i++) { + for (i = 0; i < frags; i++) { struct page *page = alloc_page(GFP_ATOMIC); if (!page) @@ -139,7 +140,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, } priv->cm.srq_ring[id].skb = skb; - return 0; + return skb; partial_error: @@ -148,8 +149,8 @@ partial_error: for (; i >= 0; --i) ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); - dev_kfree_skb_any(skb); - return -ENOMEM; + dev_kfree_skb_any(skb); + return NULL; } static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev, @@ -312,7 +313,7 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id, } /* Adjust length of skb with fragments to match received data */ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, - unsigned int length) + unsigned int length, struct sk_buff *toskb) { int i, num_frags; unsigned int size; @@ -329,7 +330,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, if (length == 0) { /* don't need this page */ - __free_page(frag->page); + skb_fill_page_desc(toskb, i, frag->page, 0, PAGE_SIZE); --skb_shinfo(skb)->nr_frags; } else { size = min(length, (unsigned) PAGE_SIZE); @@ -347,10 +348,11 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; - struct sk_buff *skb; + struct sk_buff *skb, *newskb; struct ipoib_cm_rx *p; unsigned long flags; u64 mapping[IPOIB_CM_RX_SG]; + int frags; ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n", wr_id, wc->opcode, wc->status); @@ -386,7 +388,11 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) } } - if (unlikely(ipoib_cm_alloc_rx_skb(dev, wr_id, mapping))) { + frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, + (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; + + newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping); + if (unlikely(!newskb)) { /* * If we can't allocate a new RX buffer, dump * this packet and reuse the old buffer. @@ -396,13 +402,13 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) goto repost; } - ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[wr_id].mapping); - memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, sizeof mapping); + ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping); + memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping); ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); - skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len); + skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb); skb->protocol = ((struct ipoib_header *) skb->data)->proto; skb->mac.raw = skb->data; @@ -1196,7 +1202,8 @@ int ipoib_cm_dev_init(struct net_device *dev) priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; for (i = 0; i < ipoib_recvq_size; ++i) { - if (ipoib_cm_alloc_rx_skb(dev, i, priv->cm.srq_ring[i].mapping)) { + if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, + priv->cm.srq_ring[i].mapping)) { ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); ipoib_cm_dev_cleanup(dev); return -ENOMEM; @@ -1231,7 +1238,8 @@ void ipoib_cm_dev_cleanup(struct net_device *dev) return; for (i = 0; i < ipoib_recvq_size; ++i) if (priv->cm.srq_ring[i].skb) { - ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[i].mapping); + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + priv->cm.srq_ring[i].mapping); dev_kfree_skb_any(priv->cm.srq_ring[i].skb); priv->cm.srq_ring[i].skb = NULL; } -- MST From halr at voltaire.com Tue Feb 20 10:21:38 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 13:21:38 -0500 Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com> Message-ID: <1171995697.4380.315840.camel@hal.voltaire.com> Hi Tzachi, On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote: > See bellow. I would like to get back to trying to close on this discussion. > Thanks > Tzachi > > > -----Original Message----- > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > Sent: Thursday, February 08, 2007 9:47 PM > > To: Tzachi Dar > > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; > > OPENIB; Michael S. Tsirkin; Hal Rosenstock > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > opensm: sigusr1: syslog() fixes]] > > > > On 20:31 Thu 08 Feb , Tzachi Dar wrote: > > > The windows open IB has decided on using a BSD only license. > > > The common implementation of pthreads as far as I know is > > LGPL, which > > > means that it can not be used in open IB. > > > > Why not? AFAIK it works perfectly (see (5,6 and Preamble)): > > http://www.gnu.org/copyleft/lesser.html > > > > And of course there are tons of examples when BSD software > > links against LGPLed glibc. > > I can of course write you an answer that will be more than 5 pages long > of why *I* don't think that > Using GPL software is bad for everyone, but I guess that my opinion > doesn't really meter, so I > Won't do it. > The page that you have referenced is of the GNU org, and even there it > is hard to say that they > are trying to encourage you to use the LGPL license. In any case, the > main point is that > When open IB windows was formed there was a general decision that it > will use BSD license. If we > Start having components with the LGPL this will break that decision, and > therefore this requires > some voting of the open IB organization. I may be missing your point but is there something in the Windows OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed code (e.g. BSD like license) in concert with non OpenIB code (like LGPL) ? Isn't that essentially what using the Windows pthreads DLL with OpenSM would be like ? As I understand it, I don't think this requires a license change or anything in the OpenIB Windows charter prevents this or needs changing. > > > The only two ways that I see around this are 1) Change the > > license of > > > open IB windows which might be a complicated thing. 2) Find an > > > implementation of pthreads that is BSD. > > > > BTW, just wondering... What is relation between windows open > > IB and OFA (and OFA's "dual-license rule")? > Well, the way I see it one can take code from the Linux part under the > BSD licance and use it in > The windows part. The otherway around seems fine to me but some say that > since the windows BSD liscance > Reqires that some text will always remain there, the other way around is > not possibale. As I'm not an > Expert in that erea I don't know who is right. I don't see how this affects what is being discussed about OpenSM. In all the cases I'm aware of, the portability is from Linux to Windows and not the other way around. -- Hal > > Sasha > > > > > > > > Thanks > > > Tzachi > > > > > > > -----Original Message----- > > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > > Sent: Thursday, February 08, 2007 7:46 PM > > > > To: Tzachi Dar; Yossi Leybovich > > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock > > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > > > opensm: sigusr1: syslog() fixes]] > > > > > > > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > > > > > Tzachi, Yossi, please join the thread. > > > > > What do you think about distributing a copy of the pthread DLL > > > > > with opensm? > > > > > > > > Any news here? Thanks. > > > > > > > > Sasha > > > > > > > > > > > > > > -- Yevgeny. > > > > > > > > > > -------- Original Message -------- > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > > > > syslog() fixes] > > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200 > > > > > From: Sasha Khapyorsky > > > > > To: Michael S. Tsirkin > > > > > CC: Yevgeny Kliteynik , > > > > OPENIB > > > > > References: <20070118194403.GA23783 at sashak.voltaire.com> > > > > > <20070118215023.GP9890 at mellanox.co.il> > > > > > > > > > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > Quoting Sasha Khapyorsky : > > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] > > opensm: sigusr1: > > > > > > > syslog() fixes] > > > > > > > > > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > > > What about pure opensource - > > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed > > > > > > > > > under LGPL, I see on the net many positive reports about > > > > stability and usability. > > > > > > > > > > > > > > > > I used it to do a windows port of linux complib at some > > > > > > > > point and opensm seemed to work fine with it. What it was > > > > lacking at > > > > > > > > that point was support for 64 bit applications, > > and for some > > > > > > > > reason (which is still unclear to me) there was a > > > > strong desire to run opensm in 64 bit mode. > > > > > > > > Seems to have been fixed now, BTW. > > > > > > > > > > > > > > So this seems to be good option for OpenSM on > > Windows. Right? > > > > > > > > > > > > No idea. Distributing a copy of the pthread DLL with > > > > opensm does not > > > > > > look like a problem. But is it worth it? > > > > > > > > > > Sure, it makes windows porting much more transparent and > > > > let us to use > > > > > standard *nix stuff w/out #ifndef WIN32. Other > > (generic) benefit > > > > > is that posix is more standard and powerful than > > wrappers like complib. > > > > > > > > > > Sasha > > > > > > > > > > > From halr at voltaire.com Tue Feb 20 10:37:45 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 13:37:45 -0500 Subject: [openib-general] [Fwd: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]]] Message-ID: <1171996664.4380.316818.camel@hal.voltaire.com> Also, looping in the OpenFabrics Windows email list on this. -- Hal -----Forwarded Message----- From: Hal Rosenstock To: Tzachi Dar Cc: OPENIB , Gilad Shainer Subject: Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] Date: 20 Feb 2007 13:21:38 -0500 Hi Tzachi, On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote: > See bellow. I would like to get back to trying to close on this discussion. > Thanks > Tzachi > > > -----Original Message----- > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > Sent: Thursday, February 08, 2007 9:47 PM > > To: Tzachi Dar > > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; > > OPENIB; Michael S. Tsirkin; Hal Rosenstock > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > opensm: sigusr1: syslog() fixes]] > > > > On 20:31 Thu 08 Feb , Tzachi Dar wrote: > > > The windows open IB has decided on using a BSD only license. > > > The common implementation of pthreads as far as I know is > > LGPL, which > > > means that it can not be used in open IB. > > > > Why not? AFAIK it works perfectly (see (5,6 and Preamble)): > > http://www.gnu.org/copyleft/lesser.html > > > > And of course there are tons of examples when BSD software > > links against LGPLed glibc. > > I can of course write you an answer that will be more than 5 pages long > of why *I* don't think that > Using GPL software is bad for everyone, but I guess that my opinion > doesn't really meter, so I > Won't do it. > The page that you have referenced is of the GNU org, and even there it > is hard to say that they > are trying to encourage you to use the LGPL license. In any case, the > main point is that > When open IB windows was formed there was a general decision that it > will use BSD license. If we > Start having components with the LGPL this will break that decision, and > therefore this requires > some voting of the open IB organization. I may be missing your point but is there something in the Windows OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed code (e.g. BSD like license) in concert with non OpenIB code (like LGPL) ? Isn't that essentially what using the Windows pthreads DLL with OpenSM would be like ? As I understand it, I don't think this requires a license change or anything in the OpenIB Windows charter prevents this or needs changing. > > > The only two ways that I see around this are 1) Change the > > license of > > > open IB windows which might be a complicated thing. 2) Find an > > > implementation of pthreads that is BSD. > > > > BTW, just wondering... What is relation between windows open > > IB and OFA (and OFA's "dual-license rule")? > Well, the way I see it one can take code from the Linux part under the > BSD licance and use it in > The windows part. The otherway around seems fine to me but some say that > since the windows BSD liscance > Reqires that some text will always remain there, the other way around is > not possibale. As I'm not an > Expert in that erea I don't know who is right. I don't see how this affects what is being discussed about OpenSM. In all the cases I'm aware of, the portability is from Linux to Windows and not the other way around. -- Hal > > Sasha > > > > > > > > Thanks > > > Tzachi > > > > > > > -----Original Message----- > > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > > Sent: Thursday, February 08, 2007 7:46 PM > > > > To: Tzachi Dar; Yossi Leybovich > > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock > > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > > > opensm: sigusr1: syslog() fixes]] > > > > > > > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > > > > > Tzachi, Yossi, please join the thread. > > > > > What do you think about distributing a copy of the pthread DLL > > > > > with opensm? > > > > > > > > Any news here? Thanks. > > > > > > > > Sasha > > > > > > > > > > > > > > -- Yevgeny. > > > > > > > > > > -------- Original Message -------- > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > > > > syslog() fixes] > > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200 > > > > > From: Sasha Khapyorsky > > > > > To: Michael S. Tsirkin > > > > > CC: Yevgeny Kliteynik , > > > > OPENIB > > > > > References: <20070118194403.GA23783 at sashak.voltaire.com> > > > > > <20070118215023.GP9890 at mellanox.co.il> > > > > > > > > > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > Quoting Sasha Khapyorsky : > > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] > > opensm: sigusr1: > > > > > > > syslog() fixes] > > > > > > > > > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > > > What about pure opensource - > > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed > > > > > > > > > under LGPL, I see on the net many positive reports about > > > > stability and usability. > > > > > > > > > > > > > > > > I used it to do a windows port of linux complib at some > > > > > > > > point and opensm seemed to work fine with it. What it was > > > > lacking at > > > > > > > > that point was support for 64 bit applications, > > and for some > > > > > > > > reason (which is still unclear to me) there was a > > > > strong desire to run opensm in 64 bit mode. > > > > > > > > Seems to have been fixed now, BTW. > > > > > > > > > > > > > > So this seems to be good option for OpenSM on > > Windows. Right? > > > > > > > > > > > > No idea. Distributing a copy of the pthread DLL with > > > > opensm does not > > > > > > look like a problem. But is it worth it? > > > > > > > > > > Sure, it makes windows porting much more transparent and > > > > let us to use > > > > > standard *nix stuff w/out #ifndef WIN32. Other > > (generic) benefit > > > > > is that posix is more standard and powerful than > > wrappers like complib. > > > > > > > > > > Sasha > > > > > > > > > > > _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Tue Feb 20 10:42:38 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 13:42:38 -0500 Subject: [openib-general] Port error rate detection In-Reply-To: <45DB12FB.7010501@ornl.gov> References: <45DA0E50.7010002@ornl.gov> <1171978018.4380.298013.camel@hal.voltaire.com> <45DB096B.2060306@ornl.gov> <1171982868.4380.302888.camel@hal.voltaire.com> <45DB12FB.7010501@ornl.gov> Message-ID: <1171996956.4380.317120.camel@hal.voltaire.com> On Tue, 2007-02-20 at 10:25, Steven Carter wrote: > Hal Rosenstock wrote: > > On Tue, 2007-02-20 at 09:44, Steven Carter wrote: > > > >> Hal Rosenstock wrote: > >> > >>> On Mon, 2007-02-19 at 15:53, Steven Carter wrote: > >>> > >>> > >>>> I have a Nagios module that alerts on connectivity, port errors, > >>>> speed/width problems. I would like to give it the ability to change the > >>>> severity of the alert depending on whether errors are just present or if > >>>> they are increasing faster than a specified rate. The intent is to > >>>> equip the module to keep the state of the last query and possibly > >>>> history, but I wanted to make sure that I was not re-inventing the wheel > >>>> first. Is there an attribute or utility that I am overlooking that will > >>>> help me do this? > >>>> > >>>> > >>> Not currently (to my knowledge). The thresholding of rate aspect is > >>> similat to what will be supported in the proposed PerfManager. > >>> > >>> > >> I noticed that in your RFC. How are you planning on presenting the data > >> to other agents (e.g. Nagios, Openview, MRTG, etc.)? One comment that I > >> should have made on your RFC is that I wonder if it is necessary to > >> include the data analysis/reduction part. > >> > > > > I think it is because there is too much data to push up the tree to one > > manager. > > > I agree, but does the data need to be pushed to one node? If you go > with a distributed approach where information is aggregated per network > device (switch or group of switches), The proposal includes a distributed approach. > then a third-party monitoring > server can collect and present it in the same way that it does for an > Ethernet network. That way, you do not need to pass information up to a > central node. You can just have a third party monitoring application > collect and present the information. I guess it just depends on how > much you want to leverage existing monitoring solutions and/or how much > capability you want inherent in the OFA software. Third party monitoring agents can hook in at the intermediate nodes in the collection hierarchy if that is what is desired. > >> Just having a central location that collects the values and presents it via SNMP is extremely > >> useful since there are a plethora of monitoring apps (free and > >> commercial) that do what you are proposing. > >> > I should have said 'a location' and not 'a central location'. Since > most monitoring applications support multiple agents, it is not > necessary to aggregate the information into one place. > > > > In general, this information can be exported via SNMP or whatever the > > management infrastructure is. > > > > BTW, are there SNMP MIBs for all of this information ? To my knowledge, > > some of these were started but never completed. Also, the MIBs were > > geared at the agents rather than the managers (in the PerfMgt arena). > > > There are standard MIBS (e.g. mib-2's ifTable) that can present most of > the useful information (in/out octets, errors, etc.) Not most of the useful IB information. > , but I would suspect that you would have to supplement that with a private MIB as > most other technologies/vendors have. Yes, as this may be data out of a non IBTA specified manager, it is likely a private MIB unless one goes for all the agent (PMA) data. There was a proposed MIB for the PMA at the IETF IPoIB WG. -- Hal > Steven. > > > -- Hal > > > > > >> That way, a network manager can leverage existing tools currently used for monitoring > >> Ethernet Nodes, Hosts, etc. You can still include a last change > >> attribute with each counter so that simple utilities (like the one that > >> I am writing) can get an idea of how quickly errors are occurring. > >> > > > > > >> Steven. > >> > >> > >>> -- Hal > >>> > >>> > >>> > >>>> Thanks, > >>>> > >>>> Steven. > >>>> > >>>> _______________________________________________ > >>>> openib-general mailing list > >>>> openib-general at openib.org > >>>> http://openib.org/mailman/listinfo/openib-general > >>>> > >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >>>> > >>>> > >>>> > >>> > >>> > > > > > From ftillier at windows.microsoft.com Tue Feb 20 10:56:33 2007 From: ftillier at windows.microsoft.com (Fab Tillier) Date: Tue, 20 Feb 2007 10:56:33 -0800 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related [was: Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <1171996664.4380.316818.camel@hal.voltaire.com> References: <1171996664.4380.316818.camel@hal.voltaire.com> Message-ID: Submissions to the OFW project are supposed to be bound by the contributor's agreement: http://windows.openib.org/openib/contribute.aspx Contributing code under anything but a BSD license violates condition 1, though there shouldn't be issues with dual licenses as long as one of the available licenses is a BSD license. In any case, we're not talking about putting the pthreads library in source or binary form in the OFW SVN, right? We're just talking about having OpenSM link to the pthreads library that is out-of-tree. So the question is whether there are any licensing issues with having a BSD code include an out-of-tree LGPL file that would affect the ability to retain the BSD license on the OpenSM files. I can see this causing problems for builds, as people would need to find/install the pthreads library before OpenSM would build successfully. -Fab -----Original Message----- From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock Sent: Tuesday, February 20, 2007 10:38 AM To: ofw at lists.openfabrics.org Cc: Gilad Shainer; OPENIB Subject: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: win related [was: Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] Also, looping in the OpenFabrics Windows email list on this. -- Hal -----Forwarded Message----- From: Hal Rosenstock To: Tzachi Dar Cc: OPENIB , Gilad Shainer Subject: Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]] Date: 20 Feb 2007 13:21:38 -0500 Hi Tzachi, On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote: > See bellow. I would like to get back to trying to close on this discussion. > Thanks > Tzachi > > > -----Original Message----- > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > Sent: Thursday, February 08, 2007 9:47 PM > > To: Tzachi Dar > > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; > > OPENIB; Michael S. Tsirkin; Hal Rosenstock > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > opensm: sigusr1: syslog() fixes]] > > > > On 20:31 Thu 08 Feb , Tzachi Dar wrote: > > > The windows open IB has decided on using a BSD only license. > > > The common implementation of pthreads as far as I know is > > LGPL, which > > > means that it can not be used in open IB. > > > > Why not? AFAIK it works perfectly (see (5,6 and Preamble)): > > http://www.gnu.org/copyleft/lesser.html > > > > And of course there are tons of examples when BSD software > > links against LGPLed glibc. > > I can of course write you an answer that will be more than 5 pages long > of why *I* don't think that > Using GPL software is bad for everyone, but I guess that my opinion > doesn't really meter, so I > Won't do it. > The page that you have referenced is of the GNU org, and even there it > is hard to say that they > are trying to encourage you to use the LGPL license. In any case, the > main point is that > When open IB windows was formed there was a general decision that it > will use BSD license. If we > Start having components with the LGPL this will break that decision, and > therefore this requires > some voting of the open IB organization. I may be missing your point but is there something in the Windows OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed code (e.g. BSD like license) in concert with non OpenIB code (like LGPL) ? Isn't that essentially what using the Windows pthreads DLL with OpenSM would be like ? As I understand it, I don't think this requires a license change or anything in the OpenIB Windows charter prevents this or needs changing. > > > The only two ways that I see around this are 1) Change the > > license of > > > open IB windows which might be a complicated thing. 2) Find an > > > implementation of pthreads that is BSD. > > > > BTW, just wondering... What is relation between windows open > > IB and OFA (and OFA's "dual-license rule")? > Well, the way I see it one can take code from the Linux part under the > BSD licance and use it in > The windows part. The otherway around seems fine to me but some say that > since the windows BSD liscance > Reqires that some text will always remain there, the other way around is > not possibale. As I'm not an > Expert in that erea I don't know who is right. I don't see how this affects what is being discussed about OpenSM. In all the cases I'm aware of, the portability is from Linux to Windows and not the other way around. -- Hal > > Sasha > > > > > > > > Thanks > > > Tzachi > > > > > > > -----Original Message----- > > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > > Sent: Thursday, February 08, 2007 7:46 PM > > > > To: Tzachi Dar; Yossi Leybovich > > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock > > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > > > opensm: sigusr1: syslog() fixes]] > > > > > > > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > > > > > Tzachi, Yossi, please join the thread. > > > > > What do you think about distributing a copy of the pthread DLL > > > > > with opensm? > > > > > > > > Any news here? Thanks. > > > > > > > > Sasha > > > > > > > > > > > > > > -- Yevgeny. > > > > > > > > > > -------- Original Message -------- > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: > > > > > syslog() fixes] > > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200 > > > > > From: Sasha Khapyorsky > > > > > To: Michael S. Tsirkin > > > > > CC: Yevgeny Kliteynik , > > > > OPENIB > > > > > References: <20070118194403.GA23783 at sashak.voltaire.com> > > > > > <20070118215023.GP9890 at mellanox.co.il> > > > > > > > > > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > Quoting Sasha Khapyorsky : > > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] > > opensm: sigusr1: > > > > > > > syslog() fixes] > > > > > > > > > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > > > What about pure opensource - > > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed > > > > > > > > > under LGPL, I see on the net many positive reports about > > > > stability and usability. > > > > > > > > > > > > > > > > I used it to do a windows port of linux complib at some > > > > > > > > point and opensm seemed to work fine with it. What it was > > > > lacking at > > > > > > > > that point was support for 64 bit applications, > > and for some > > > > > > > > reason (which is still unclear to me) there was a > > > > strong desire to run opensm in 64 bit mode. > > > > > > > > Seems to have been fixed now, BTW. > > > > > > > > > > > > > > So this seems to be good option for OpenSM on > > Windows. Right? > > > > > > > > > > > > No idea. Distributing a copy of the pthread DLL with > > > > opensm does not > > > > > > look like a problem. But is it worth it? > > > > > > > > > > Sure, it makes windows porting much more transparent and > > > > let us to use > > > > > standard *nix stuff w/out #ifndef WIN32. Other > > (generic) benefit > > > > > is that posix is more standard and powerful than > > wrappers like complib. > > > > > > > > > > Sasha > > > > > > > > > > > _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ ofw mailing list ofw at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw From krause at cup.hp.com Tue Feb 20 10:56:20 2007 From: krause at cup.hp.com (Michael Krause) Date: Tue, 20 Feb 2007 10:56:20 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <45D4D924.8070507@ichips.intel.com> References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com> <45D4B705.5020805@ichips.intel.com> <6.2.0.14.2.20070215123631.09692088@esmail.cup.hp.com> <45D4D924.8070507@ichips.intel.com> Message-ID: <6.2.0.14.2.20070220103929.02953a20@esmail.cup.hp.com> At 02:05 PM 2/15/2007, Sean Hefty wrote: >>Is this first an IBTA problem to solve if you believe there is a problem? > >Based on my interpretation, I do not believe that there's an error in the >architecture. It seems consistent. Additional clarification of what >PathRecord fields mean when the GIDs are on different subnets may be >needed, and a change to the architecture may make things easier to >implement, but that's a separate matter. > >>I contend CM does not require anything that is subnet local other than to >>target a given router port which should be derived from local SM/SA only > >Then please state how the passive side obtains the information (e.g. >SLID/DLID) it needs in order to configure its QP. I claim that >information is carried in the CM REQ. It should not be carried in the CM REQ. The SLID / DLID of the router ports should be derived through local subnet SA / SM query. When a CM REQ traverses one or more subnets there will be potentially many SLID / DLID involved in the communication. Each router should be populating its routing tables in order to build the new LRH attached to the GRH / CM REQ that it is forwarding to the next hop. >The alternatives that I see are: > >1. The passive side extracts the data from the LRH that carries the CM REQ. >2. The passive side issues its own local path record query. > >Will you please clarify where this information comes from? The router protocol determines path to the next hop. As noted in prior e-mails, the router works in conjunction with the SM/SA to populate its database so that any CM or other query for a path record to get to / from the router can be derived and optimized based on local policy, e.g. QoS, within each subnet. >>I will further state that SA-SA communication sans perhaps a >>P_Key / Q_Key service lookup should be avoided wherever possible. > >I agree - which is why my proposal avoided SA-SA communication. I see >nothing in the architecture that prohibits a node from querying an SA that >is not on its local subnet. I'd need to go back but the architecture is predicated that the SM and SA are strictly local and for security purposes their communication should remain local. Higher level management entities built to communicate with SM and SA are responsible for cross subnet communications without exposing the SA or SM to direct interaction. P_Key and Q_Key management across subnets is an example of such communication across subnets that would not be exposed to the SA and SM. Mike From halr at voltaire.com Tue Feb 20 10:56:38 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 13:56:38 -0500 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related [was: Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: References: <1171996664.4380.316818.camel@hal.voltaire.com> Message-ID: <1171997797.4380.318016.camel@hal.voltaire.com> On Tue, 2007-02-20 at 13:56, Fab Tillier wrote: > Submissions to the OFW project are supposed to be bound by the > contributor's agreement: > > http://windows.openib.org/openib/contribute.aspx > > Contributing code under anything but a BSD license violates condition 1, > though there shouldn't be issues with dual licenses as long as one of > the available licenses is a BSD license. > > In any case, we're not talking about putting the pthreads library in > source or binary form in the OFW SVN, right? Right (we're not). > We're just talking about > having OpenSM link to the pthreads library that is out-of-tree. Yes. > So the > question is whether there are any licensing issues with having a BSD > code include an out-of-tree LGPL file that would affect the ability to > retain the BSD license on the OpenSM files. I don't think this is an issue as there are other instances of this being done (outside of OpenIB). > I can see this causing > problems for builds, as people would need to find/install the pthreads > library before OpenSM would build successfully. Could install documentation for OpenSM on Windows minimize this as an issue ? -- Hal > -Fab > > -----Original Message----- > From: ofw-bounces at lists.openfabrics.org > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock > Sent: Tuesday, February 20, 2007 10:38 AM > To: ofw at lists.openfabrics.org > Cc: Gilad Shainer; OPENIB > Subject: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: win related [was: > Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] > > Also, looping in the OpenFabrics Windows email list on this. > > -- Hal > > -----Forwarded Message----- > > From: Hal Rosenstock > To: Tzachi Dar > Cc: OPENIB , Gilad Shainer > > Subject: Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH > 1/2] opensm: sigusr1: syslog() fixes]] > Date: 20 Feb 2007 13:21:38 -0500 > > Hi Tzachi, > > On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote: > > See bellow. > > I would like to get back to trying to close on this discussion. > > > Thanks > > Tzachi > > > > > -----Original Message----- > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > Sent: Thursday, February 08, 2007 9:47 PM > > > To: Tzachi Dar > > > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; > > > OPENIB; Michael S. Tsirkin; Hal Rosenstock > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > > opensm: sigusr1: syslog() fixes]] > > > > > > On 20:31 Thu 08 Feb , Tzachi Dar wrote: > > > > The windows open IB has decided on using a BSD only license. > > > > The common implementation of pthreads as far as I know is > > > LGPL, which > > > > means that it can not be used in open IB. > > > > > > Why not? AFAIK it works perfectly (see (5,6 and Preamble)): > > > http://www.gnu.org/copyleft/lesser.html > > > > > > And of course there are tons of examples when BSD software > > > links against LGPLed glibc. > > > > I can of course write you an answer that will be more than 5 pages > long > > of why *I* don't think that > > Using GPL software is bad for everyone, but I guess that my opinion > > doesn't really meter, so I > > Won't do it. > > The page that you have referenced is of the GNU org, and even there it > > is hard to say that they > > are trying to encourage you to use the LGPL license. In any case, the > > main point is that > > When open IB windows was formed there was a general decision that it > > will use BSD license. If we > > Start having components with the LGPL this will break that decision, > and > > therefore this requires > > some voting of the open IB organization. > > I may be missing your point but is there something in the Windows > OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed > code (e.g. BSD like license) in concert with non OpenIB code (like LGPL) > ? Isn't that essentially what using the Windows pthreads DLL with OpenSM > would be like ? As I understand it, I don't think this requires a > license change or anything in the OpenIB Windows charter prevents this > or needs changing. > > > > > The only two ways that I see around this are 1) Change the > > > license of > > > > open IB windows which might be a complicated thing. 2) Find an > > > > implementation of pthreads that is BSD. > > > > > > BTW, just wondering... What is relation between windows open > > > IB and OFA (and OFA's "dual-license rule")? > > Well, the way I see it one can take code from the Linux part under the > > BSD licance and use it in > > The windows part. The otherway around seems fine to me but some say > that > > since the windows BSD liscance > > Reqires that some text will always remain there, the other way around > is > > not possibale. As I'm not an > > Expert in that erea I don't know who is right. > > I don't see how this affects what is being discussed about OpenSM. In > all the cases I'm aware of, the portability is from Linux to Windows and > not the other way around. > > -- Hal > > > > Sasha > > > > > > > > > > > Thanks > > > > Tzachi > > > > > > > > > -----Original Message----- > > > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > > > Sent: Thursday, February 08, 2007 7:46 PM > > > > > To: Tzachi Dar; Yossi Leybovich > > > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal > Rosenstock > > > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] > > > > > opensm: sigusr1: syslog() fixes]] > > > > > > > > > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote: > > > > > > Tzachi, Yossi, please join the thread. > > > > > > What do you think about distributing a copy of the pthread DLL > > > > > > > with opensm? > > > > > > > > > > Any news here? Thanks. > > > > > > > > > > Sasha > > > > > > > > > > > > > > > > > -- Yevgeny. > > > > > > > > > > > > -------- Original Message -------- > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: > sigusr1: > > > > > > syslog() fixes] > > > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200 > > > > > > From: Sasha Khapyorsky > > > > > > To: Michael S. Tsirkin > > > > > > CC: Yevgeny Kliteynik , > > > > > OPENIB > > > > > > References: <20070118194403.GA23783 at sashak.voltaire.com> > > > > > > <20070118215023.GP9890 at mellanox.co.il> > > > > > > > > > > > > On 23:50 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > > Quoting Sasha Khapyorsky : > > > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] > > > opensm: sigusr1: > > > > > > > > syslog() fixes] > > > > > > > > > > > > > > > > On 07:00 Thu 18 Jan , Michael S. Tsirkin wrote: > > > > > > > > > > What about pure opensource - > > > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed > > > > > > > > > > under LGPL, I see on the net many positive reports > about > > > > > stability and usability. > > > > > > > > > > > > > > > > > > I used it to do a windows port of linux complib at some > > > > > > > > > point and opensm seemed to work fine with it. What it > was > > > > > lacking at > > > > > > > > > that point was support for 64 bit applications, > > > and for some > > > > > > > > > reason (which is still unclear to me) there was a > > > > > strong desire to run opensm in 64 bit mode. > > > > > > > > > Seems to have been fixed now, BTW. > > > > > > > > > > > > > > > > So this seems to be good option for OpenSM on > > > Windows. Right? > > > > > > > > > > > > > > No idea. Distributing a copy of the pthread DLL with > > > > > opensm does not > > > > > > > look like a problem. But is it worth it? > > > > > > > > > > > > Sure, it makes windows porting much more transparent and > > > > > let us to use > > > > > > standard *nix stuff w/out #ifndef WIN32. Other > > > (generic) benefit > > > > > > is that posix is more standard and powerful than > > > wrappers like complib. > > > > > > > > > > > > Sasha > > > > > > > > > > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > ofw mailing list > ofw at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw From arlin.r.davis at intel.com Tue Feb 20 11:06:17 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 20 Feb 2007 11:06:17 -0800 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: <45DAB3FD.8060606@voltaire.com> Message-ID: <000001c75522$334f6a00$4297070a@amr.corp.intel.com> >Arlin Davis wrote: >> Any insight would be greatly appreciated. It was our assumption that the parent process can >continue >> to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? > >As was discussed over this list in few occasions: in contrast to popular >thought the fork support was deployed in libibverbs1.1 where OFED 1.1 >contains libibverbs1.0 OFED 1.2 alpha (libibverbs 1.1) on 2.6.20 fails the same way. Does the following disclaimer still apply? "Fork support from kernel 2.6.12 and above is available provided that applications do not use threads. The fork() is supported as long as parent process does not run before child exits or calls exec(). The former can be achieved by calling wait(childpid) the later can be achieved by application specific means. Posix system() call is supported." From tziporet at mellanox.co.il Tue Feb 20 11:54:35 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 20 Feb 2007 21:54:35 +0200 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: <45DAB3FD.8060606@voltaire.com> References: <000001c75454$523660f0$eed4180a@amr.corp.intel.com> <45DAB3FD.8060606@voltaire.com> Message-ID: <45DB51FB.5090500@mellanox.co.il> Or Gerlitz wrote: > Arlin Davis wrote: > >> Any insight would be greatly appreciated. It was our assumption that the parent process can continue >> to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? >> > > As was discussed over this list in few occasions: in contrast to popular > thought the fork support was deployed in libibverbs1.1 where OFED 1.1 > contains libibverbs1.0 > > Or. > > > The only fork support in OFED 1.1 is system() or fork & exec. Note that the support in OFED 1.2 (actually changes in libibverbs 1.1) needs some change in the application. Tziporet From tziporet at mellanox.co.il Tue Feb 20 12:01:32 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 20 Feb 2007 22:01:32 +0200 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: <000001c75522$334f6a00$4297070a@amr.corp.intel.com> References: <000001c75522$334f6a00$4297070a@amr.corp.intel.com> Message-ID: <45DB539C.3050905@mellanox.co.il> Arlin Davis wrote: > > OFED 1.2 alpha (libibverbs 1.1) on 2.6.20 fails the same way. Does the following disclaimer still > apply? > > "Fork support from kernel 2.6.12 and above is available provided > that applications do not use threads. The fork() is supported as long > as parent process does not run before child exits or calls exec(). > The former can be achieved by calling wait(childpid) the later can be > achieved by application specific means. Posix system() call is > supported." > > As replied before - if you want full fork support you need to change the application. Look at the verbs header for details. Tziporet From rdreier at cisco.com Tue Feb 20 12:24:37 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Feb 2007 12:24:37 -0800 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: <45DB539C.3050905@mellanox.co.il> (Tziporet Koren's message of "Tue, 20 Feb 2007 22:01:32 +0200") References: <000001c75522$334f6a00$4297070a@amr.corp.intel.com> <45DB539C.3050905@mellanox.co.il> Message-ID: > As replied before - if you want full fork support you need to change the > application. Look at the verbs header for details. Or you could try setting the IBV_FORK_SAFE environment variable before running your application. I guess for MPI jobs you need to make sure that environment variable is propagated to every process. From arlin.r.davis at intel.com Tue Feb 20 12:40:31 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 20 Feb 2007 12:40:31 -0800 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: Message-ID: <000101c7552f$5cf14c90$4297070a@amr.corp.intel.com> > >Or you could try setting the IBV_FORK_SAFE environment variable before >running your application. I guess for MPI jobs you need to make sure >that environment variable is propagated to every process. Ahh! That's what I was looking for. Thanks! This information is scattered around in various email threads, header files, and code. Can someone please add relevant text to the OFED 1.2 release notes or a Wiki page? From tziporet at mellanox.co.il Tue Feb 20 12:57:02 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 20 Feb 2007 22:57:02 +0200 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: <000101c7552f$5cf14c90$4297070a@amr.corp.intel.com> References: <000101c7552f$5cf14c90$4297070a@amr.corp.intel.com> Message-ID: <45DB609E.6020701@mellanox.co.il> Arlin Davis wrote: >> Or you could try setting the IBV_FORK_SAFE environment variable before >> running your application. I guess for MPI jobs you need to make sure >> that environment variable is propagated to every process. >> > > Ahh! That's what I was looking for. Thanks! > > This information is scattered around in various email threads, header files, and code. Can someone > please add relevant text to the OFED 1.2 release notes or a Wiki page? > Roland, If you can send me the details (since you implemented it) I will add it to the Wiki Thanks, Tziporet From rdreier at cisco.com Tue Feb 20 12:58:03 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Feb 2007 12:58:03 -0800 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: <45DB609E.6020701@mellanox.co.il> (Tziporet Koren's message of "Tue, 20 Feb 2007 22:57:02 +0200") References: <000101c7552f$5cf14c90$4297070a@amr.corp.intel.com> <45DB609E.6020701@mellanox.co.il> Message-ID: > If you can send me the details (since you implemented it) I will add > it to the Wiki An application that wants fork() to work with libibverbs should either call ibv_fork_init() before doing anything else with libibverbs, or else a user can set the IBV_FORK_SAFE or RDMAV_FORK_SAFE environment variable to get the same effect. There is some overhead to making fork() work so it is not enabled by default. This is described in the ibv_fork_init manpage in the latest libibverbs git tree. - R. From ftillier at windows.microsoft.com Tue Feb 20 13:08:40 2007 From: ftillier at windows.microsoft.com (Fab Tillier) Date: Tue, 20 Feb 2007 13:08:40 -0800 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related [was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <1171997797.4380.318016.camel@hal.voltaire.com> References: <1171996664.4380.316818.camel@hal.voltaire.com> <1171997797.4380.318016.camel@hal.voltaire.com> Message-ID: -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Tuesday, February 20, 2007 10:57 AM On Tue, 2007-02-20 at 13:56, Fab Tillier wrote: > Submissions to the OFW project are supposed to be bound by the > contributor's agreement: > > I can see this causing > problems for builds, as people would need to find/install the pthreads > library before OpenSM would build successfully. Could install documentation for OpenSM on Windows minimize this as an issue ? [ftillier] This isn't just an install issue - it's a build issue. Anyone that wants to build OpenSM will need to find/download/install the pthreads library so that the build will succeed. If linking statically, the resulting executable will not require any special installation. It's only an install issue if you link dynamically to pitheads. -Fab From halr at voltaire.com Tue Feb 20 13:43:00 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 16:43:00 -0500 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related [was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: References: <1171996664.4380.316818.camel@hal.voltaire.com> <1171997797.4380.318016.camel@hal.voltaire.com> Message-ID: <1172007778.4380.328202.camel@hal.voltaire.com> On Tue, 2007-02-20 at 16:08, Fab Tillier wrote: > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Tuesday, February 20, 2007 10:57 AM > > On Tue, 2007-02-20 at 13:56, Fab Tillier wrote: > > Submissions to the OFW project are supposed to be bound by the > > contributor's agreement: > > > > I can see this causing > > problems for builds, as people would need to find/install the pthreads > > library before OpenSM would build successfully. > > Could install documentation for OpenSM on Windows minimize this as an > issue ? > > [ftillier] This isn't just an install issue - it's a build issue. > Anyone that wants to build OpenSM will need to find/download/install the > pthreads library so that the build will succeed. If linking statically, > the resulting executable will not require any special installation. > It's only an install issue if you link dynamically to pitheads. OK; then build and install. How big an issue is this ? I thought DLLs were dynamically linked but I'm a Windows plebe. -- Hal > -Fab From ftillier at windows.microsoft.com Tue Feb 20 13:56:11 2007 From: ftillier at windows.microsoft.com (Fab Tillier) Date: Tue, 20 Feb 2007 13:56:11 -0800 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <1172007778.4380.328202.camel@hal.voltaire.com> References: <1171996664.4380.316818.camel@hal.voltaire.com><1171997797.4380.318016.camel@hal.voltaire.com> <1172007778.4380.328202.camel@hal.voltaire.com> Message-ID: -----Original Message----- From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock Sent: Tuesday, February 20, 2007 1:43 PM On Tue, 2007-02-20 at 16:08, Fab Tillier wrote: > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Tuesday, February 20, 2007 10:57 AM > > On Tue, 2007-02-20 at 13:56, Fab Tillier wrote: > [ftillier] This isn't just an install issue - it's a build issue. > Anyone that wants to build OpenSM will need to find/download/install the > pthreads library so that the build will succeed. If linking statically, > the resulting executable will not require any special installation. > It's only an install issue if you link dynamically to pitheads. OK; then build and install. How big an issue is this ? I thought DLLs were dynamically linked but I'm a Windows plebe. [ftillier] When you build, the linker needs the import library for pthreads so that the functions get resolved as being imported from the pthreads DLL. The dependency on the pthreads DLL is then created and the DLL will be loaded dynamically, assuming it can be found in the path. So for the build process, you need to have the pthreads library available to the build tool (path to the lib). This requires installing the pthreads developer package or however it's done. If you statically link the pthreads lib, rather than dynamically link, then all the pthreads goodies go directly into the executable and you remove the dependency on an external DLL. The build process requirements are no different than for the dynamically linked case. There is also the possibility to remove the link-time dependency by calling GetProcAddress to explicitly resolve the pthreads entrypoints. This method still requires having the DLL loaded on the user's systems. Pesonally, I would rather see static linkage to the pthreads library so that only the builds are affected (something only 'experts' will be doing), while not affecting the common user. -Fab From halr at voltaire.com Tue Feb 20 14:23:49 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Feb 2007 17:23:49 -0500 Subject: [openib-general] [PATCH] osm/libvendor: compilation fixes In-Reply-To: <20070219230441.GA27414@sashak.voltaire.com> References: <20070219214630.GW27414@sashak.voltaire.com> <20070219230441.GA27414@sashak.voltaire.com> Message-ID: <1172010229.4380.330691.camel@hal.voltaire.com> On Mon, 2007-02-19 at 18:04, Sasha Khapyorsky wrote: > This adds needed header files inclusion to prevent compilation failures. > > Signed-off-by: Sasha Khapyorsky > --- Thanks. Applied (to both master and ofed_1_2). -- Hal From arlin.r.davis at intel.com Tue Feb 20 14:29:40 2007 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Tue, 20 Feb 2007 14:29:40 -0800 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: Message-ID: >An application that wants fork() to work with libibverbs should >either call ibv_fork_init() before doing anything else with >libibverbs, or else a user can set the IBV_FORK_SAFE or >RDMAV_FORK_SAFE environment variable to get the same effect. There is >some overhead to making fork() work so it is not enabled by default. >This is described in the ibv_fork_init manpage in the latest >libibverbs git tree. Does this require 2.6.16 or better kernel support? From rdreier at cisco.com Tue Feb 20 14:33:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Feb 2007 14:33:02 -0800 Subject: [openib-general] Fork issues with simple MPI program In-Reply-To: (Arlin R. Davis's message of "Tue, 20 Feb 2007 14:29:40 -0800") References: Message-ID: > Does this require 2.6.16 or better kernel support? The kernel must support the MADV_DONTFORK flag to madvise(), not sure when exactly that was merged but 2.6.16 or so sounds right. ibv_fork_init() will return an error if the kernel support is missing and fork safety won't actually work. And if you use the environment variable a warning will be printed if ibv_fork_init() fails. - R. From rdreier at cisco.com Tue Feb 20 16:12:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Feb 2007 16:12:29 -0800 Subject: [openib-general] I created a git tree for the libibverbs man pages In-Reply-To: <45BF756B.1060500@dev.mellanox.co.il> (Dotan Barak's message of "Tue, 30 Jan 2007 18:42:19 +0200") References: <45BF63A1.6090402@dev.mellanox.co.il> <45BF756B.1060500@dev.mellanox.co.il> Message-ID: I merged all these manpages into my libibverbs tree and pushed the result out to kernel.org. Please send any future updates as diffs against the libibverbs tree. Thanks, Roland From greg at kroah.com Tue Feb 20 17:50:34 2007 From: greg at kroah.com (Greg KH) Date: Tue, 20 Feb 2007 17:50:34 -0800 Subject: [openib-general] [patch 09/18] IB/mad: Fix race between cancel and receive completion In-Reply-To: <20070221014927.GA3684@kroah.com> References: <20070221014413.282048309@mini.kroah.org> Message-ID: <20070221015034.GJ3684@kroah.com> -stable review patch. If anyone has any objections, please let us know. ------------------ From: Roland Dreier When ib_cancel_mad() is called, it puts the canceled send on a list and schedules a "flushed" callback from process context. However, this leaves a window where a receive completion could be processed before the send is fully flushed. This is fine, except that ib_find_send_mad() will find the MAD and return it to the receive processing, which results in the sender getting both a successful receive and a "flushed" send completion for the same request. Understandably, this confuses the sender, which is expecting only one of these two callbacks, and leads to grief such as a use-after-free in IPoIB. Fix this by changing ib_find_send_mad() to return a send struct only if the status is still successful (and not "flushed"). The search of the send_list already had this check, so this patch just adds the same check to the search of the wait_list. Signed-off-by: Roland Dreier Signed-off-by: Chris Wright Signed-off-by: Greg Kroah-Hartman --- --- drivers/infiniband/core/mad.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.18.7.orig/drivers/infiniband/core/mad.c +++ linux-2.6.18.7/drivers/infiniband/core/mad.c @@ -1750,7 +1750,7 @@ ib_find_send_mad(struct ib_mad_agent_pri */ (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) || rcv_has_same_gid(mad_agent_priv, wr, wc))) - return wr; + return (wr->status == IB_WC_SUCCESS) ? wr : NULL; } /* -- From devesh28 at gmail.com Tue Feb 20 21:21:42 2007 From: devesh28 at gmail.com (Devesh Sharma) Date: Wed, 21 Feb 2007 10:51:42 +0530 Subject: [openib-general] Immediate data question In-Reply-To: <6.2.0.14.2.20070215071309.0979bed8@esmail.cup.hp.com> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com> <6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com> <309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.com> <6.2.0.14.2.20070215071309.0979bed8@esmail.cup.hp.com> Message-ID: <309a667c0702202121p52747748ic891a9d21a02e3d7@mail.gmail.com> On 2/15/07, Michael Krause wrote: > At 09:37 PM 2/14/2007, Devesh Sharma wrote: > >On 2/14/07, Michael Krause wrote: > >>At 05:37 AM 2/13/2007, Devesh Sharma wrote: > >> >On 2/12/07, Devesh Sharma wrote: > >> >>On 2/10/07, Tang, Changqing wrote: > >> >> > > > > >> >> > > >Not for the receiver, but the sender will be severely slowed down by > >> >> > > >having to wait for the RNR timeouts. > >> >> > > > >> >> > > RNR = Receiver Not Ready so by definition, the data flow > >> >> > > isn't going to > >> >> > > progress until the receiver is ready to receive data. If a > >> >> > > receive QP > >> >> > > enters RNR for a RC, then it is likely not progressing as > >> >> > > desired. RNR > >> >> > > was initially put in place to enable a receiver to create > >> >> > > back pressure to the sender without causing a fatal error > >> >> > > condition. It should rarely be entered and therefore should > >> >> > > have negligible impact on overall performance however when a > >> >> > > RNR occurs, no forward progress will occur so performance is > >> >> > > essentially zero. > >> >> > > >> >> > Mike: > >> >> > I still do not quite understand this issue. I have two > >> >> > situations that have RNR triggered. > >> >> > > >> >> > 1. process A and process B is connected with QP. A first post a send to > >> >> > B, B does not post receive. Then A and B are doing a long time > >> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE > >> >> > message. Finally B will post a receive. Does the first pending send > >> in A > >> >> > block all the later RDMA_WRITE ? > >> >>According to IBTA spec HCA will process WR entries in strict order in > >> >>which they are posted so the send will block all WR posted after this > >> >>send, Until-unless HCA has multiple processing elements, I think even > >> >>then processing order will be maintained by HCA > >> >> If not, since RNR is triggered > >> >> > periodically till B post receive, does it affect the RDMA_WRITE > >> >> > performance between A and B ? > >> >> > > >> >> > 2. extend above to three processes, A connect to B, B connect to C, > >> so B > >> >> > has two QPs, but one CQ.A posts a send to B, B does not post receive, > >> >post ordering accross QP is not guaranteed hence presence of same CQ > >> >or different CQ will not affect any thing. > >> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B > >> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C, > >I am sorry I have missed that in both cases same DMA channel is in use. > >> >_may_ affect the performance, since load is on same HCA. In case of > >> >Send/Recv again _may_ affect the performance, with the same reason. > >> > >>Seems orthogonal. Any time h/w is shared, multiple flows will have an > >>impact on one another. That is why we have the different arbitration > >>mechanisms to enable one to control that impact. > >Please, can you explain it more clearly? > > Most I/O devices are shared by multiple applications / kernel > subsystems. Hence, the device acts as a serialization point for what goes > on the wire / link. Sharing = resource contention and in order to add any > structure to that contention, a number of technologies provide arbitration > options. In the case of IB, the arbitration is confined to VL arbitration > where a given data flow is assigned to a VL and that VL is services at some > particular rate. A number of years ago I wrote up how one might also > provide QP arbitration (not part of the IBTA specifications) and I > understand some implementations have incorporated that or a variation of > the mechanisms into their products. Thanks mike for a nice explanation. I am sorry for the late reply, Now I got it, here Chang is trying to find out performance hit due to RNR NAK, performance hit due to device sharing is any how going to be there so "load on same HCA" is not the proper explanation. Am I correct now? > > In addition to IB link contention, there is also PCI link / bus > contention. For PCIe, given most designs did not want to waste resources > on multiple VC, there really isn't any standard arbitration > mechanism. However, many devices, especially a device like a HCA or a > RNIC, already have the concept of separate resource domains, e.g. QP, and > they provide a mechanism to associate how the QP's DMA requests or > interrupts requests are scheduled to the PCIe link. > > > >> >> > must sends RNR periodically to A, right?. So does the pending message > >> >> > from A affects B's overall performance between B and C ? > >> >But RNR NAK is not for very long time.....possibly this performance > >> >hit you will not be able to observe even. The moment rnr_counter > >> >expires connection will be broken! > >> > >>Keep in mind the timeout can be infinite. RNR NAK are not expected to be > >>frequent so their performance impact was considered reasonable. > >Thanks I missed that. > > It is a subtlety within the specification that is easy to miss. > > Mike > > > From sweitzen at cisco.com Tue Feb 20 21:52:43 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 20 Feb 2007 21:52:43 -0800 Subject: [openib-general] fix SDP bug 108 for OFED 1.2 beta? Message-ID: Tziporet and Michael, every since the SDP rewrite in OFED 1.0 rc5, SDP throughput drops with message size > 64KB, see attached graph. Can you please fix this for OFED 1.2 beta? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sdp2sdp.TCP_STREAM.000.tput_log.pdf Type: application/octet-stream Size: 2700 bytes Desc: sdp2sdp.TCP_STREAM.000.tput_log.pdf URL: From ogerlitz at voltaire.com Tue Feb 20 22:43:43 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 21 Feb 2007 08:43:43 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1171986159.4380.306117.camel@hal.voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> Message-ID: <45DBEA1F.5090901@voltaire.com> Hal Rosenstock wrote: > On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote: >> Yes. Its a little bit confusing: partial and full members of an IPoIB IB >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey >> placed by the driver is --always-- the full membership one. However, on >> a node with partial membership, what's plugged into the QP is the pkey >> index of the partial instance... > So in this case, do both the full and partial keys need configuring for > that port ? No. The SM configures --either-- the full or the partial pkey. However, no matter what the SM configures, the core & ipoib code act as the full pkey is there. This is nice simplification and it works well. Or. From rdreier at cisco.com Tue Feb 20 23:16:53 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Feb 2007 23:16:53 -0800 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: <20070220181755.GC11825@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 20 Feb 2007 20:17:55 +0200") References: <20070220181755.GC11825@mellanox.co.il> Message-ID: Thanks, queued for 2.6.21. With this patch I see small-packet latency down almost all the way back to what datagram mode gives -- on a pair of fast woodcrest systems I see latencies for netpipe tcp 1 byte messages like datagram 13.xx original CM 17.xx patched CM 14.xx so there is still a measurable difference but it is much less now. - R. From rdreier at cisco.com Tue Feb 20 23:19:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Feb 2007 23:19:28 -0800 Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: possible cleanups In-Reply-To: <1171982587.2101.0.camel@stevo-desktop> (Steve Wise's message of "Tue, 20 Feb 2007 08:43:06 -0600") References: <20070220000211.GZ13958@stusta.de> <1171982587.2101.0.camel@stevo-desktop> Message-ID: > You could just remove the code instead of #if 0... Steve, can you decide what the right thing to do with these changes is and send me the result (or just tell me to apply Adrian's patch as-is)? Thanks, Roland From ogerlitz at voltaire.com Wed Feb 21 00:48:47 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 21 Feb 2007 10:48:47 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1171986159.4380.306117.camel@hal.voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> Message-ID: <45DC076F.4060607@voltaire.com> >> Yes. Its a little bit confusing: partial and full members of an IPoIB IB >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey >> placed by the driver is --always-- the full membership one. However, on >> a node with partial membership, what's plugged into the QP is the pkey >> index of the partial instance... > So in this case, do both the full and partial keys need configuring for > that port ? No. The SM configures --either-- the full or the partial pkey. However, no matter what the SM configures, the core & ipoib code act as the full pkey is there. This is nice simplification and it works well. Or. From tzachid at mellanox.co.il Wed Feb 21 00:47:36 2007 From: tzachid at mellanox.co.il (Tzachi Dar) Date: Wed, 21 Feb 2007 10:47:36 +0200 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] Message-ID: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com> OK, Hal let's try to close this. The windows openib project was agreed by everyone to be BSD only. The fact that it is BSD means that any partner (or non partner) of the Community can download the code and use it, the way he wants. This includes: 1) Running the code as is. 2) Making changes to the code and contributing them back. 3) Making changes to the code and *NOT* giving them back to the community. Starting to depend on GPL (or LGPL) code means that the freedom of the users to do (3) is broken. Mellanox thinks that this needs a wider agreement of the open-IB consortium, which we don't have. More than that, the ideas that were introduced here about sending users to other places in order for them to find the pthread implementation are also not that great as this starts to make the life of our users harder. Also it is not clear who will give support once there are problems, and who is responsible that the license of the library won't change. So, I hope this closes the subject of using LGPL software in open-IB. By the way, what implementation of pthreads were you thinking of? I have noticed that the first implementation that Google brings was only tested on uni-processor system. (http://sourceware.org/pthreads-win32/news.html). (this is really amazing, I thought that these servers were out of the market a long time ago). To be more practical: Can you give us a better view of what you are trying to achieve? In other words, as far as I know Opensm is using complib apis to handle threads. The implementation of this functions on windows is usually trivial. Do you intend to make a re-write of opensm so that it will use pthreads or do you intend to make a find/replace And replace the complib functions with Pthreads ones? If we are talking about the second, than one can simply implement the pthread functions using trivial win32 calls. And another question: What is the functionality that you are currently missing? Can this functionality be added? Thanks Tzachi By the way, rumors I have heard say that Voltaire doesn't always give it's code back to the community, but this are just rumors, right? > -----Original Message----- > From: ofw-bounces at lists.openfabrics.org > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier > Sent: Tuesday, February 20, 2007 11:56 PM > To: Hal Rosenstock > Cc: ofw at lists.openfabrics.org; Gilad Shainer; OPENIB > Subject: RE: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: > winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] > > -----Original Message----- > From: ofw-bounces at lists.openfabrics.org > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock > Sent: Tuesday, February 20, 2007 1:43 PM > > On Tue, 2007-02-20 at 16:08, Fab Tillier wrote: > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Tuesday, February 20, 2007 10:57 AM > > > > On Tue, 2007-02-20 at 13:56, Fab Tillier wrote: > > [ftillier] This isn't just an install issue - it's a build issue. > > Anyone that wants to build OpenSM will need to find/download/install > the > > pthreads library so that the build will succeed. If linking > statically, > > the resulting executable will not require any special installation. > > It's only an install issue if you link dynamically to pitheads. > > OK; then build and install. How big an issue is this ? > > I thought DLLs were dynamically linked but I'm a Windows plebe. > > [ftillier] When you build, the linker needs the import > library for pthreads so that the functions get resolved as > being imported from the pthreads DLL. The dependency on the > pthreads DLL is then created and the DLL will be loaded > dynamically, assuming it can be found in the path. > > So for the build process, you need to have the pthreads > library available to the build tool (path to the lib). This > requires installing the pthreads developer package or however > it's done. > > If you statically link the pthreads lib, rather than > dynamically link, then all the pthreads goodies go directly > into the executable and you remove the dependency on an > external DLL. The build process requirements are no > different than for the dynamically linked case. > > There is also the possibility to remove the link-time > dependency by calling GetProcAddress to explicitly resolve > the pthreads entrypoints. > This method still requires having the DLL loaded on the > user's systems. > > Pesonally, I would rather see static linkage to the pthreads > library so that only the builds are affected (something only > 'experts' will be doing), while not affecting the common user. > > -Fab > _______________________________________________ > ofw mailing list > ofw at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw > From vlad at lists.openfabrics.org Wed Feb 21 02:26:03 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Wed, 21 Feb 2007 02:26:03 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070221-0200 daily build status Message-ID: <20070221102603.B5502E60804@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.12 Failed: From bunk at stusta.de Wed Feb 21 02:52:49 2007 From: bunk at stusta.de (Adrian Bunk) Date: Wed, 21 Feb 2007 11:52:49 +0100 Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: cleanups In-Reply-To: <1171982587.2101.0.camel@stevo-desktop> References: <20070220000211.GZ13958@stusta.de> <1171982587.2101.0.camel@stevo-desktop> Message-ID: <20070221105249.GC13958@stusta.de> On Tue, Feb 20, 2007 at 08:43:06AM -0600, Steve Wise wrote: > On Tue, 2007-02-20 at 01:02 +0100, Adrian Bunk wrote: > > This patch contains the following possible cleanups: > > - don't mark static functions in C files as inline - gcc should know > > best whether inlining makes sense > > - never compile the unused cxio_dbg.c > > - make the following needlessly global functions static: > > - cxio_hal.c: cxio_hal_clear_qp_ctx() > > - iwch_provider.c: iwch_get_qp() > > - #if 0 the following unused global functions: > > - cxio_hal.c: cxio_allocate_stag() > > - cxio_resource.: cxio_hal_get_rhdl() > > - cxio_resource.: cxio_hal_put_rhdl() > > > > You could just remove the code instead of #if 0... >... Updated patch below. cu Adrian <-- snip --> This patch contains the following possible cleanups: - don't mark static functions in C files as inline - gcc should know best whether inlining makes sense - never compile the unused cxio_dbg.c - make the following needlessly global functions static: - cxio_hal.c: cxio_hal_clear_qp_ctx() - iwch_provider.c: iwch_get_qp() - remove the following unused global functions: - cxio_hal.c: cxio_allocate_stag() - cxio_resource.: cxio_hal_get_rhdl() - cxio_resource.: cxio_hal_put_rhdl() Signed-off-by: Adrian Bunk --- drivers/infiniband/hw/cxgb3/Makefile | 1 drivers/infiniband/hw/cxgb3/cxio_hal.c | 31 +++++--------------- drivers/infiniband/hw/cxgb3/cxio_hal.h | 5 --- drivers/infiniband/hw/cxgb3/cxio_resource.c | 14 +-------- drivers/infiniband/hw/cxgb3/iwch_cm.c | 5 +-- drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 - drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 drivers/infiniband/hw/cxgb3/iwch_qp.c | 29 ++++++++---------- 8 files changed, 27 insertions(+), 61 deletions(-) --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old 2007-02-17 17:21:03.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile 2007-02-17 17:21:08.000000000 +0100 @@ -8,5 +8,4 @@ ifdef CONFIG_INFINIBAND_CXGB3_DEBUG EXTRA_CFLAGS += -DDEBUG -iw_cxgb3-y += cxio_dbg.o endif --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old 2007-02-17 17:22:53.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h 2007-02-17 17:25:08.000000000 +0100 @@ -144,7 +144,6 @@ void cxio_rdev_close(struct cxio_rdev *rdev); int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq, enum t3_cq_opcode op, u32 credit); -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid); int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq); int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq); int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq); @@ -155,8 +154,6 @@ int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq, struct cxio_ucontext *uctx); int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode); -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr); int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl, u32 *pbl_size, @@ -172,8 +169,6 @@ int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr); void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb); void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb); -u32 cxio_hal_get_rhdl(void); -void cxio_hal_put_rhdl(u32 rhdl); u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp); void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid); int __init cxio_hal_init(void); --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old 2007-02-17 17:25:35.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h 2007-02-17 17:25:41.000000000 +0100 @@ -179,7 +179,6 @@ void iwch_qp_add_ref(struct ib_qp *qp); void iwch_qp_rem_ref(struct ib_qp *qp); -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn); struct iwch_ucontext { struct ib_ucontext ibucontext; --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old 2007-02-17 17:25:50.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-02-17 17:25:57.000000000 +0100 @@ -949,7 +949,7 @@ wake_up(&(to_iwch_qp(qp)->wait)); } -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) +static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) { PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn); return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn); --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c.old 2007-02-17 17:27:31.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c 2007-02-17 17:38:07.000000000 +0100 @@ -37,8 +37,8 @@ #define NO_SUPPORT -1 -static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, - u8 * flit_cnt) +static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, + u8 * flit_cnt) { int i; u32 plen; @@ -97,8 +97,8 @@ return 0; } -static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, - u8 *flit_cnt) +static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, + u8 *flit_cnt) { int i; u32 plen; @@ -138,8 +138,8 @@ return 0; } -static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, - u8 *flit_cnt) +static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, + u8 *flit_cnt) { if (wr->num_sge > 1) return -EINVAL; @@ -159,9 +159,8 @@ /* * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. */ -static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp, - struct ib_sge *sg_list, u32 num_sgle, - u32 * pbl_addr, u8 * page_size) +static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, + u32 num_sgle, u32 * pbl_addr, u8 * page_size) { int i; struct iwch_mr *mhp; @@ -207,9 +206,8 @@ return 0; } -static inline int iwch_build_rdma_recv(struct iwch_dev *rhp, - union t3_wr *wqe, - struct ib_recv_wr *wr) +static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe, + struct ib_recv_wr *wr) { int i, err = 0; u32 pbl_addr[4]; @@ -474,8 +472,7 @@ return err; } -static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, - int tagged) +static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged) { switch (t3err) { case TPT_ERR_STAG: @@ -673,7 +670,7 @@ spin_lock_irqsave(&qhp->lock, *flag); } -static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag) +static void flush_qp(struct iwch_qp *qhp, unsigned long *flag) { if (t3b_device(qhp->rhp)) cxio_set_wq_in_error(&qhp->wq); @@ -685,7 +682,7 @@ /* * Return non zero if at least one RECV was pre-posted. */ -static inline int rqes_posted(struct iwch_qp *qhp) +static int rqes_posted(struct iwch_qp *qhp) { return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV; } --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c.old 2007-02-17 17:27:53.000000000 +0100 +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c 2007-02-17 17:38:23.000000000 +0100 @@ -210,8 +210,7 @@ return state; } -static inline void __state_set(struct iwch_ep_common *epc, - enum iwch_ep_state new) +static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new) { epc->state = new; } @@ -1460,7 +1459,7 @@ /* * Returns whether an ABORT_REQ_RSS message is a negative advice. */ -static inline int is_neg_adv_abort(unsigned int status) +static int is_neg_adv_abort(unsigned int status) { return status == CPL_ERR_RTX_NEG_ADVICE || status == CPL_ERR_PERSIST_NEG_ADVICE; --- linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_resource.c.old 2007-02-20 23:22:29.000000000 +0100 +++ linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_resource.c 2007-02-20 23:12:04.000000000 +0100 @@ -179,7 +179,7 @@ /* * returns 0 if no resource available */ -static inline u32 cxio_hal_get_resource(struct kfifo *fifo) +static u32 cxio_hal_get_resource(struct kfifo *fifo) { u32 entry; if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32))) @@ -188,21 +188,11 @@ return 0; /* fifo emptry */ } -static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) +static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) { BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0); } -u32 cxio_hal_get_rhdl(void) -{ - return cxio_hal_get_resource(rhdl_fifo); -} - -void cxio_hal_put_rhdl(u32 rhdl) -{ - cxio_hal_put_resource(rhdl_fifo, rhdl); -} - u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp) { return cxio_hal_get_resource(rscp->tpt_fifo); --- linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_hal.c.old 2007-02-20 23:22:42.000000000 +0100 +++ linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_hal.c 2007-02-20 23:12:43.000000000 +0100 @@ -45,7 +45,7 @@ static LIST_HEAD(rdev_list); static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; -static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) +static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) { struct cxio_rdev *rdev; @@ -55,8 +55,7 @@ return NULL; } -static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev - *tdev) +static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev) { struct cxio_rdev *rdev; @@ -118,7 +117,7 @@ return 0; } -static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) +static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) { struct rdma_cq_setup setup; setup.id = cqid; @@ -130,7 +129,7 @@ return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); } -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) +static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) { u64 sge_cmd; struct t3_modify_qp_wr *wqe; @@ -425,7 +424,7 @@ } } -static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) +static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) { if (CQE_OPCODE(*cqe) == T3_TERMINATE) return 0; @@ -760,17 +759,6 @@ return err; } -/* IN : stag key, pdid, pbl_size - * Out: stag index, actaul pbl_size, and pbl_addr allocated. - */ -int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr) -{ - *stag = T3_STAG_UNSET; - return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR, - perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr)); -} - int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl, u32 *pbl_size, @@ -1029,7 +1017,7 @@ cxio_hal_destroy_rhdl_resource(); } -static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) +static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) { struct t3_swsq *sqp; __u32 ptr = wq->sq_rptr; @@ -1058,9 +1046,8 @@ break; } -static inline void create_read_req_cqe(struct t3_wq *wq, - struct t3_cqe *hw_cqe, - struct t3_cqe *read_cqe) +static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe, + struct t3_cqe *read_cqe) { read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr; read_cqe->len = wq->oldest_read->read_len; @@ -1073,7 +1060,7 @@ /* * Return a ptr to the next read wr in the SWSQ or NULL. */ -static inline void advance_oldest_read(struct t3_wq *wq) +static void advance_oldest_read(struct t3_wq *wq) { u32 rptr = wq->oldest_read - wq->sq + 1; From halr at voltaire.com Wed Feb 21 03:46:10 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Feb 2007 06:46:10 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DBEA1F.5090901@voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> <45DBEA1F.5090901@voltaire.com> Message-ID: <1172058368.4380.379947.camel@hal.voltaire.com> On Wed, 2007-02-21 at 01:43, Or Gerlitz wrote: > Hal Rosenstock wrote: > > On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote: > > >> Yes. Its a little bit confusing: partial and full members of an IPoIB IB > >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey > >> placed by the driver is --always-- the full membership one. However, on > >> a node with partial membership, what's plugged into the QP is the pkey > >> index of the partial instance... > > > So in this case, do both the full and partial keys need configuring for > > that port ? > > No. The SM configures --either-- the full or the partial pkey. That's what I was afraid of :-( > However, no matter what the SM configures, the core & ipoib code act as > the full pkey is there. This is nice simplification and it works well. I believe it is a spec (compliance) violation for the port to be a partial member and join as a full member. -- Hal > Or. > From ogerlitz at voltaire.com Wed Feb 21 04:35:34 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 21 Feb 2007 14:35:34 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1172058368.4380.379947.camel@hal.voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> <45DBEA1F.5090901@voltaire.com> <1172058368.4380.379947.camel@hal.voltaire.com> Message-ID: <45DC3C96.8040100@voltaire.com> >> However, no matter what the SM configures, the core & ipoib code act as >> the full pkey is there. This is nice simplification and it works well. > I believe it is a spec (compliance) violation for the port to be a > partial member and join as a full member. Since partial members can't talk among themselves, there is no reason to form a multicast group containing --only-- ports that can --not-- talk to each other... So if the spec does not allow this (having a partial member joining with the full member pkey) - it a spec bug... Or. From ogerlitz at voltaire.com Wed Feb 21 04:52:57 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 21 Feb 2007 14:52:57 +0200 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: <20070220181755.GC11825@mellanox.co.il> References: <20070220181755.GC11825@mellanox.co.il> Message-ID: <45DC40A9.507@voltaire.com> Michael S. Tsirkin wrote: > Avoid overhead of freeing/reallocating and mapping/unmapping for dma > for pages that have not been written to by hardware. > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > index 8ee6f06..a23c8e3 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -68,14 +68,14 @@ struct ipoib_cm_id { > static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, > struct ib_cm_event *event); > > -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, > +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags, > u64 mapping[IPOIB_CM_RX_SG]) > { > int i; > > ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); > > - for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i) > + for (i = 0; i < frags; ++i) > ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); > } I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags times, correct? does this means you are trashing the IOMMU etc etc of the system? Or. From mst at mellanox.co.il Wed Feb 21 05:22:00 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Feb 2007 15:22:00 +0200 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: <45DC40A9.507@voltaire.com> References: <45DC40A9.507@voltaire.com> Message-ID: <20070221132159.GC7711@mellanox.co.il> > Quoting r. Or Gerlitz : > Subject: Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth > > Michael S. Tsirkin wrote: > > Avoid overhead of freeing/reallocating and mapping/unmapping for dma > > for pages that have not been written to by hardware. > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > > index 8ee6f06..a23c8e3 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > > @@ -68,14 +68,14 @@ struct ipoib_cm_id { > > static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, > > struct ib_cm_event *event); > > > > -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, > > +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags, > > u64 mapping[IPOIB_CM_RX_SG]) > > { > > int i; > > > > ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); > > > > - for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i) > > + for (i = 0; i < frags; ++i) > > ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); > > } > > I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on > IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags > times, correct? No. > does this means you are trashing the IOMMU etc etc of > the system? I don't think so. -- MST From halr at voltaire.com Wed Feb 21 05:20:23 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Feb 2007 08:20:23 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DC3C96.8040100@voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> <45DBEA1F.5090901@voltaire.com> <1172058368.4380.379947.camel@hal.voltaire.com> <45DC3C96.8040100@voltaire.com> Message-ID: <1172064021.4380.385825.camel@hal.voltaire.com> On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote: > >> However, no matter what the SM configures, the core & ipoib code act as > >> the full pkey is there. This is nice simplification and it works well. > > > I believe it is a spec (compliance) violation for the port to be a > > partial member and join as a full member. > > Since partial members can't talk among themselves, there is no reason to > form a multicast group containing --only-- ports that can --not-- talk > to each other... So if the spec does not allow this (having a partial > member joining with the full member pkey) - it a spec bug... I think there are two issues here then: 1. If this is the case, getting the spec changed to accomodate this use case. 2. I believe that OpenIB code is supposed to be spec compliant. -- Hal > Or. > > From jsquyres at cisco.com Wed Feb 21 06:12:26 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 21 Feb 2007 09:12:26 -0500 Subject: [openib-general] Fwd: Address List Change for Friday, 2/23/2007 References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov> Message-ID: <00D1093C-4EE4-4580-A57F-2B302E694C06@cisco.com> FYI. In case you missed it the first time: THIS LIST IS CHANGING ON FRIDAY 2/23/2007 (2 days from now). Please update your addressbooks! See the notice below for the details. Begin forwarded message: > From: "Lee, Michael Paichi" > Date: February 19, 2007 10:43:23 AM EST > To: openib-general at openib.org > Subject: [openib-general] Address List Change for Friday, 2/23/2007 > > We're in the process of migrating the maillists from the old > openib.org server to the new lists.openfabrics.org machine. The > list openib-general will be moved this Friday, February 23, 2007. > The new address for the maillist will be > general at lists.openfabrics.org. > > What this means is that messages will come from > general at lists.openfabrics.org. Conversely, replies should be made > to this address as well. Messages will also have a new subject > line prefix of [OFA General]. If you have configured your e-mail > client to filter based on maillist address or subject headers, you > may need to make some adjustments for filtering. > > However, for the sake of transition, messages sent to the previous > maillist address on the old server will forward to the new server. > This forward will remain in place until the old server is taken > offline and final DNS changes are made. We expect the old server > to go offline sometime in early March. > > The web archives will also be migrated to the new web address > shortly, http://lists.openfabrics.org. > > If you have any questions, please don't hesitate to contact me at > mplee at sandia.gov. > > Regards, > Michael Lee > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From swise at opengridcomputing.com Wed Feb 21 06:31:45 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 21 Feb 2007 08:31:45 -0600 Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: cleanups In-Reply-To: <20070221105249.GC13958@stusta.de> References: <20070220000211.GZ13958@stusta.de> <1171982587.2101.0.camel@stevo-desktop> <20070221105249.GC13958@stusta.de> Message-ID: <1172068305.21243.2.camel@stevo-desktop> Thanks Adrian! Acked-by: Steve Wise On Wed, 2007-02-21 at 11:52 +0100, Adrian Bunk wrote: > On Tue, Feb 20, 2007 at 08:43:06AM -0600, Steve Wise wrote: > > On Tue, 2007-02-20 at 01:02 +0100, Adrian Bunk wrote: > > > This patch contains the following possible cleanups: > > > - don't mark static functions in C files as inline - gcc should know > > > best whether inlining makes sense > > > - never compile the unused cxio_dbg.c > > > - make the following needlessly global functions static: > > > - cxio_hal.c: cxio_hal_clear_qp_ctx() > > > - iwch_provider.c: iwch_get_qp() > > > - #if 0 the following unused global functions: > > > - cxio_hal.c: cxio_allocate_stag() > > > - cxio_resource.: cxio_hal_get_rhdl() > > > - cxio_resource.: cxio_hal_put_rhdl() > > > > > > > You could just remove the code instead of #if 0... > >... > > Updated patch below. > > cu > Adrian > > > <-- snip --> > > > This patch contains the following possible cleanups: > - don't mark static functions in C files as inline - gcc should know > best whether inlining makes sense > - never compile the unused cxio_dbg.c > - make the following needlessly global functions static: > - cxio_hal.c: cxio_hal_clear_qp_ctx() > - iwch_provider.c: iwch_get_qp() > - remove the following unused global functions: > - cxio_hal.c: cxio_allocate_stag() > - cxio_resource.: cxio_hal_get_rhdl() > - cxio_resource.: cxio_hal_put_rhdl() > > Signed-off-by: Adrian Bunk > > --- > > drivers/infiniband/hw/cxgb3/Makefile | 1 > drivers/infiniband/hw/cxgb3/cxio_hal.c | 31 +++++--------------- > drivers/infiniband/hw/cxgb3/cxio_hal.h | 5 --- > drivers/infiniband/hw/cxgb3/cxio_resource.c | 14 +-------- > drivers/infiniband/hw/cxgb3/iwch_cm.c | 5 +-- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 - > drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 > drivers/infiniband/hw/cxgb3/iwch_qp.c | 29 ++++++++---------- > 8 files changed, 27 insertions(+), 61 deletions(-) > > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old 2007-02-17 17:21:03.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile 2007-02-17 17:21:08.000000000 +0100 > @@ -8,5 +8,4 @@ > > ifdef CONFIG_INFINIBAND_CXGB3_DEBUG > EXTRA_CFLAGS += -DDEBUG > -iw_cxgb3-y += cxio_dbg.o > endif > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old 2007-02-17 17:22:53.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h 2007-02-17 17:25:08.000000000 +0100 > @@ -144,7 +144,6 @@ > void cxio_rdev_close(struct cxio_rdev *rdev); > int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq, > enum t3_cq_opcode op, u32 credit); > -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid); > int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq); > int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq); > int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq); > @@ -155,8 +154,6 @@ > int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq, > struct cxio_ucontext *uctx); > int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode); > -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid, > - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr); > int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, > enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, > u8 page_size, __be64 *pbl, u32 *pbl_size, > @@ -172,8 +169,6 @@ > int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr); > void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb); > void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb); > -u32 cxio_hal_get_rhdl(void); > -void cxio_hal_put_rhdl(u32 rhdl); > u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp); > void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid); > int __init cxio_hal_init(void); > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old 2007-02-17 17:25:35.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h 2007-02-17 17:25:41.000000000 +0100 > @@ -179,7 +179,6 @@ > > void iwch_qp_add_ref(struct ib_qp *qp); > void iwch_qp_rem_ref(struct ib_qp *qp); > -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn); > > struct iwch_ucontext { > struct ib_ucontext ibucontext; > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old 2007-02-17 17:25:50.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-02-17 17:25:57.000000000 +0100 > @@ -949,7 +949,7 @@ > wake_up(&(to_iwch_qp(qp)->wait)); > } > > -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) > +static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) > { > PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn); > return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn); > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c.old 2007-02-17 17:27:31.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c 2007-02-17 17:38:07.000000000 +0100 > @@ -37,8 +37,8 @@ > > #define NO_SUPPORT -1 > > -static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, > - u8 * flit_cnt) > +static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, > + u8 * flit_cnt) > { > int i; > u32 plen; > @@ -97,8 +97,8 @@ > return 0; > } > > -static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, > - u8 *flit_cnt) > +static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, > + u8 *flit_cnt) > { > int i; > u32 plen; > @@ -138,8 +138,8 @@ > return 0; > } > > -static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, > - u8 *flit_cnt) > +static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, > + u8 *flit_cnt) > { > if (wr->num_sge > 1) > return -EINVAL; > @@ -159,9 +159,8 @@ > /* > * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. > */ > -static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp, > - struct ib_sge *sg_list, u32 num_sgle, > - u32 * pbl_addr, u8 * page_size) > +static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, > + u32 num_sgle, u32 * pbl_addr, u8 * page_size) > { > int i; > struct iwch_mr *mhp; > @@ -207,9 +206,8 @@ > return 0; > } > > -static inline int iwch_build_rdma_recv(struct iwch_dev *rhp, > - union t3_wr *wqe, > - struct ib_recv_wr *wr) > +static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe, > + struct ib_recv_wr *wr) > { > int i, err = 0; > u32 pbl_addr[4]; > @@ -474,8 +472,7 @@ > return err; > } > > -static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, > - int tagged) > +static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged) > { > switch (t3err) { > case TPT_ERR_STAG: > @@ -673,7 +670,7 @@ > spin_lock_irqsave(&qhp->lock, *flag); > } > > -static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag) > +static void flush_qp(struct iwch_qp *qhp, unsigned long *flag) > { > if (t3b_device(qhp->rhp)) > cxio_set_wq_in_error(&qhp->wq); > @@ -685,7 +682,7 @@ > /* > * Return non zero if at least one RECV was pre-posted. > */ > -static inline int rqes_posted(struct iwch_qp *qhp) > +static int rqes_posted(struct iwch_qp *qhp) > { > return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV; > } > --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c.old 2007-02-17 17:27:53.000000000 +0100 > +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c 2007-02-17 17:38:23.000000000 +0100 > @@ -210,8 +210,7 @@ > return state; > } > > -static inline void __state_set(struct iwch_ep_common *epc, > - enum iwch_ep_state new) > +static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new) > { > epc->state = new; > } > @@ -1460,7 +1459,7 @@ > /* > * Returns whether an ABORT_REQ_RSS message is a negative advice. > */ > -static inline int is_neg_adv_abort(unsigned int status) > +static int is_neg_adv_abort(unsigned int status) > { > return status == CPL_ERR_RTX_NEG_ADVICE || > status == CPL_ERR_PERSIST_NEG_ADVICE; > --- linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_resource.c.old 2007-02-20 23:22:29.000000000 +0100 > +++ linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_resource.c 2007-02-20 23:12:04.000000000 +0100 > @@ -179,7 +179,7 @@ > /* > * returns 0 if no resource available > */ > -static inline u32 cxio_hal_get_resource(struct kfifo *fifo) > +static u32 cxio_hal_get_resource(struct kfifo *fifo) > { > u32 entry; > if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32))) > @@ -188,21 +188,11 @@ > return 0; /* fifo emptry */ > } > > -static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) > +static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) > { > BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0); > } > > -u32 cxio_hal_get_rhdl(void) > -{ > - return cxio_hal_get_resource(rhdl_fifo); > -} > - > -void cxio_hal_put_rhdl(u32 rhdl) > -{ > - cxio_hal_put_resource(rhdl_fifo, rhdl); > -} > - > u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp) > { > return cxio_hal_get_resource(rscp->tpt_fifo); > --- linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_hal.c.old 2007-02-20 23:22:42.000000000 +0100 > +++ linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_hal.c 2007-02-20 23:12:43.000000000 +0100 > @@ -45,7 +45,7 @@ > static LIST_HEAD(rdev_list); > static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; > > -static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) > +static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) > { > struct cxio_rdev *rdev; > > @@ -55,8 +55,7 @@ > return NULL; > } > > -static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev > - *tdev) > +static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev) > { > struct cxio_rdev *rdev; > > @@ -118,7 +117,7 @@ > return 0; > } > > -static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) > +static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) > { > struct rdma_cq_setup setup; > setup.id = cqid; > @@ -130,7 +129,7 @@ > return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); > } > > -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) > +static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) > { > u64 sge_cmd; > struct t3_modify_qp_wr *wqe; > @@ -425,7 +424,7 @@ > } > } > > -static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) > +static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) > { > if (CQE_OPCODE(*cqe) == T3_TERMINATE) > return 0; > @@ -760,17 +759,6 @@ > return err; > } > > -/* IN : stag key, pdid, pbl_size > - * Out: stag index, actaul pbl_size, and pbl_addr allocated. > - */ > -int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid, > - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr) > -{ > - *stag = T3_STAG_UNSET; > - return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR, > - perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr)); > -} > - > int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, > enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, > u8 page_size, __be64 *pbl, u32 *pbl_size, > @@ -1029,7 +1017,7 @@ > cxio_hal_destroy_rhdl_resource(); > } > > -static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) > +static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) > { > struct t3_swsq *sqp; > __u32 ptr = wq->sq_rptr; > @@ -1058,9 +1046,8 @@ > break; > } > > -static inline void create_read_req_cqe(struct t3_wq *wq, > - struct t3_cqe *hw_cqe, > - struct t3_cqe *read_cqe) > +static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe, > + struct t3_cqe *read_cqe) > { > read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr; > read_cqe->len = wq->oldest_read->read_len; > @@ -1073,7 +1060,7 @@ > /* > * Return a ptr to the next read wr in the SWSQ or NULL. > */ > -static inline void advance_oldest_read(struct t3_wq *wq) > +static void advance_oldest_read(struct t3_wq *wq) > { > > u32 rptr = wq->oldest_read - wq->sq + 1; > From mst at mellanox.co.il Wed Feb 21 06:21:17 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Feb 2007 16:21:17 +0200 Subject: [openib-general] Fwd: Address List Change for Friday, 2/23/2007 In-Reply-To: <00D1093C-4EE4-4580-A57F-2B302E694C06@cisco.com> References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov> <00D1093C-4EE4-4580-A57F-2B302E694C06@cisco.com> Message-ID: <20070221142117.GD13024@mellanox.co.il> Could an example message be please sent *today* to the new list, so that client rules can be updated? I can't access my inbox on Friday or Saturday, and this change will cause problems and message loss for me unless I can prepare beforehand. Quoting Jeff Squyres : Subject: Fwd: Address List Change for Friday, 2/23/2007 FYI. In case you missed it the first time: THIS LIST IS CHANGING ON FRIDAY 2/23/2007 (2 days from now). Please update your addressbooks! See the notice below for the details. Begin forwarded message: > From: "Lee, Michael Paichi" > Date: February 19, 2007 10:43:23 AM EST > To: openib-general at openib.org > Subject: [openib-general] Address List Change for Friday, 2/23/2007 > > We're in the process of migrating the maillists from the old > openib.org server to the new lists.openfabrics.org machine. The > list openib-general will be moved this Friday, February 23, 2007. > The new address for the maillist will be > general at lists.openfabrics.org. > > What this means is that messages will come from > general at lists.openfabrics.org. Conversely, replies should be made > to this address as well. Messages will also have a new subject > line prefix of [OFA General]. If you have configured your e-mail > client to filter based on maillist address or subject headers, you > may need to make some adjustments for filtering. > > However, for the sake of transition, messages sent to the previous > maillist address on the old server will forward to the new server. > This forward will remain in place until the old server is taken > offline and final DNS changes are made. We expect the old server > to go offline sometime in early March. > > The web archives will also be migrated to the new web address > shortly, http://lists.openfabrics.org. > > If you have any questions, please don't hesitate to contact me at > mplee at sandia.gov. > > Regards, > Michael Lee > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From halr at voltaire.com Wed Feb 21 06:31:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Feb 2007 09:31:58 -0500 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com> Message-ID: <1172068314.4380.390208.camel@hal.voltaire.com> Tzachi, On Wed, 2007-02-21 at 03:47, Tzachi Dar wrote: > OK, Hal let's try to close this. Thanks. > The windows openib project was agreed by everyone to be BSD only. > The fact that it is BSD means that any partner (or non partner) of the > Community can download the code and use it, the way he wants. > This includes: > 1) Running the code as is. > 2) Making changes to the code and contributing them back. > 3) Making changes to the code and *NOT* giving them back to the > community. > > Starting to depend on GPL (or LGPL) code means that the freedom of the > users to do (3) is broken. > Mellanox thinks that this needs a wider agreement of the open-IB > consortium, which we don't have. The package in question is licensed with LGPL. I don't think that LGPL precludes usage #3 (although GPL precludes usage #3). See http://www.gnu.org/licenses/lgpl.html particularly #5 and #6. > More than that, the ideas that were introduced here about sending users > to other places in order for > them to find the pthread implementation are also not that great as this > starts to make the life of our users harder. Is this a major hurdle ? Is it substantially harder ? > Also it is not clear who will give support once there are problems, I would think it is from wherever they get OpenSM support. That support may need to interact with this project on some basis. Is this different from Linux (and pthreads) ? I agree that it is a change from the existing model. > and who is responsible that the license of the library won't change. I'm not sure how to answer this one but I don't think the license can just change. I guess if it did, we would need to deal with this when that occurred. Are you aware of some impending change here ? > So, I hope this closes the subject of using LGPL software in open-IB. I don't think we're there yet... > By the way, what implementation of pthreads were you thinking of? I have > noticed that the first implementation that Google brings was only tested > on uni-processor system. > (http://sourceware.org/pthreads-win32/news.html). (this is really > amazing, I thought that these servers were out of the market a long time > ago). I think that is old information and this also supports 64 bit architectures as well. > To be more practical: > Can you give us a better view of what you are trying to achieve? In > other words, as far as I know > Opensm is using complib apis to handle threads. The implementation of > this functions on windows is usually trivial. > Do you intend to make a re-write of opensm so that it will use pthreads > or do you intend to make a find/replace > And replace the complib functions with Pthreads ones? If we are talking > about the second, than one can simply implement the pthread functions > using trivial win32 calls. > > And another question: What is the functionality that you are currently > missing? Can this functionality be added? There will be another posting addressing these questions. -- Hal > Thanks > Tzachi > > By the way, rumors I have heard say that Voltaire doesn't always give > it's code back to the community, but this are just rumors, right? > > -----Original Message----- > > From: ofw-bounces at lists.openfabrics.org > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier > > Sent: Tuesday, February 20, 2007 11:56 PM > > To: Hal Rosenstock > > Cc: ofw at lists.openfabrics.org; Gilad Shainer; OPENIB > > Subject: RE: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: > > winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] > > > > -----Original Message----- > > From: ofw-bounces at lists.openfabrics.org > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock > > Sent: Tuesday, February 20, 2007 1:43 PM > > > > On Tue, 2007-02-20 at 16:08, Fab Tillier wrote: > > > -----Original Message----- > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > Sent: Tuesday, February 20, 2007 10:57 AM > > > > > > On Tue, 2007-02-20 at 13:56, Fab Tillier wrote: > > > [ftillier] This isn't just an install issue - it's a build issue. > > > Anyone that wants to build OpenSM will need to find/download/install > > the > > > pthreads library so that the build will succeed. If linking > > statically, > > > the resulting executable will not require any special installation. > > > It's only an install issue if you link dynamically to pitheads. > > > > OK; then build and install. How big an issue is this ? > > > > I thought DLLs were dynamically linked but I'm a Windows plebe. > > > > [ftillier] When you build, the linker needs the import > > library for pthreads so that the functions get resolved as > > being imported from the pthreads DLL. The dependency on the > > pthreads DLL is then created and the DLL will be loaded > > dynamically, assuming it can be found in the path. > > > > So for the build process, you need to have the pthreads > > library available to the build tool (path to the lib). This > > requires installing the pthreads developer package or however > > it's done. > > > > If you statically link the pthreads lib, rather than > > dynamically link, then all the pthreads goodies go directly > > into the executable and you remove the dependency on an > > external DLL. The build process requirements are no > > different than for the dynamically linked case. > > > > There is also the possibility to remove the link-time > > dependency by calling GetProcAddress to explicitly resolve > > the pthreads entrypoints. > > This method still requires having the DLL loaded on the > > user's systems. > > > > Pesonally, I would rather see static linkage to the pthreads > > library so that only the builds are affected (something only > > 'experts' will be doing), while not affecting the common user. > > > > -Fab > > _______________________________________________ > > ofw mailing list > > ofw at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw > > From halr at voltaire.com Wed Feb 21 06:38:40 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Feb 2007 09:38:40 -0500 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <1172068314.4380.390208.camel@hal.voltaire.com> References: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com> <1172068314.4380.390208.camel@hal.voltaire.com> Message-ID: <1172068719.4380.390591.camel@hal.voltaire.com> On Wed, 2007-02-21 at 09:31, Hal Rosenstock wrote: > Tzachi, > > On Wed, 2007-02-21 at 03:47, Tzachi Dar wrote: > > OK, Hal let's try to close this. > > Thanks. > > > The windows openib project was agreed by everyone to be BSD only. > > The fact that it is BSD means that any partner (or non partner) of the > > Community can download the code and use it, the way he wants. > > This includes: > > 1) Running the code as is. > > 2) Making changes to the code and contributing them back. > > 3) Making changes to the code and *NOT* giving them back to the > > community. > > > > Starting to depend on GPL (or LGPL) code means that the freedom of the > > users to do (3) is broken. > > Mellanox thinks that this needs a wider agreement of the open-IB > > consortium, which we don't have. > > The package in question is licensed with LGPL. I don't think that LGPL > precludes usage #3 (although GPL precludes usage #3). See > http://www.gnu.org/licenses/lgpl.html particularly #5 and #6. > > > More than that, the ideas that were introduced here about sending users > > to other places in order for > > them to find the pthread implementation are also not that great as this > > starts to make the life of our users harder. > > Is this a major hurdle ? Is it substantially harder ? > > > Also it is not clear who will give support once there are problems, > > I would think it is from wherever they get OpenSM support. That support > may need to interact with this project on some basis. Is this different > from Linux (and pthreads) ? I agree that it is a change from the > existing model. > > > and who is responsible that the license of the library won't change. > > I'm not sure how to answer this one but I don't think the license can > just change. I guess if it did, we would need to deal with this when > that occurred. Are you aware of some impending change here ? > > > So, I hope this closes the subject of using LGPL software in open-IB. > > I don't think we're there yet... > > > By the way, what implementation of pthreads were you thinking of? I have > > noticed that the first implementation that Google brings was only tested > > on uni-processor system. > > (http://sourceware.org/pthreads-win32/news.html). (this is really > > amazing, I thought that these servers were out of the market a long time > > ago). > > I think that is old information and this also supports 64 bit > architectures as well. I think I found what you were referring to: http://sources.redhat.com/pthreads-win32/news.html RELEASE 2.8.0 ------------- Testing and verification ------------------------ This release has not yet been tested on SMP architechtures. All tests pass on a uni-processor system. RELEASE 2.7.0 ------------- Testing and verification ------------------------ This release has been tested (passed the test suite) on both uni-processor and multi-processor systems. Release 2.8.0 is relatively new (2006-12-22). > > To be more practical: > > Can you give us a better view of what you are trying to achieve? In > > other words, as far as I know > > Opensm is using complib apis to handle threads. The implementation of > > this functions on windows is usually trivial. > > Do you intend to make a re-write of opensm so that it will use pthreads > > or do you intend to make a find/replace > > And replace the complib functions with Pthreads ones? If we are talking > > about the second, than one can simply implement the pthread functions > > using trivial win32 calls. > > > > And another question: What is the functionality that you are currently > > missing? Can this functionality be added? > > There will be another posting addressing these questions. > > -- Hal > > > Thanks > > Tzachi > > > > By the way, rumors I have heard say that Voltaire doesn't always give > > it's code back to the community, but this are just rumors, right? > > > > -----Original Message----- > > > From: ofw-bounces at lists.openfabrics.org > > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier > > > Sent: Tuesday, February 20, 2007 11:56 PM > > > To: Hal Rosenstock > > > Cc: ofw at lists.openfabrics.org; Gilad Shainer; OPENIB > > > Subject: RE: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: > > > winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] > > > > > > -----Original Message----- > > > From: ofw-bounces at lists.openfabrics.org > > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock > > > Sent: Tuesday, February 20, 2007 1:43 PM > > > > > > On Tue, 2007-02-20 at 16:08, Fab Tillier wrote: > > > > -----Original Message----- > > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > > Sent: Tuesday, February 20, 2007 10:57 AM > > > > > > > > On Tue, 2007-02-20 at 13:56, Fab Tillier wrote: > > > > [ftillier] This isn't just an install issue - it's a build issue. > > > > Anyone that wants to build OpenSM will need to find/download/install > > > the > > > > pthreads library so that the build will succeed. If linking > > > statically, > > > > the resulting executable will not require any special installation. > > > > It's only an install issue if you link dynamically to pitheads. > > > > > > OK; then build and install. How big an issue is this ? > > > > > > I thought DLLs were dynamically linked but I'm a Windows plebe. > > > > > > [ftillier] When you build, the linker needs the import > > > library for pthreads so that the functions get resolved as > > > being imported from the pthreads DLL. The dependency on the > > > pthreads DLL is then created and the DLL will be loaded > > > dynamically, assuming it can be found in the path. > > > > > > So for the build process, you need to have the pthreads > > > library available to the build tool (path to the lib). This > > > requires installing the pthreads developer package or however > > > it's done. > > > > > > If you statically link the pthreads lib, rather than > > > dynamically link, then all the pthreads goodies go directly > > > into the executable and you remove the dependency on an > > > external DLL. The build process requirements are no > > > different than for the dynamically linked case. > > > > > > There is also the possibility to remove the link-time > > > dependency by calling GetProcAddress to explicitly resolve > > > the pthreads entrypoints. > > > This method still requires having the DLL loaded on the > > > user's systems. > > > > > > Pesonally, I would rather see static linkage to the pthreads > > > library so that only the builds are affected (something only > > > 'experts' will be doing), while not affecting the common user. > > > > > > -Fab > > > _______________________________________________ > > > ofw mailing list > > > ofw at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw > > > From sashak at voltaire.com Wed Feb 21 07:25:55 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 21 Feb 2007 17:25:55 +0200 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com> Message-ID: <20070221152555.GK27414@sashak.voltaire.com> On 10:47 Wed 21 Feb , Tzachi Dar wrote: > OK, Hal let's try to close this. > > The windows openib project was agreed by everyone to be BSD only. > The fact that it is BSD means that any partner (or non partner) of the > Community can download the code and use it, the way he wants. > This includes: > 1) Running the code as is. > 2) Making changes to the code and contributing them back. > 3) Making changes to the code and *NOT* giving them back to the > community. > > Starting to depend on GPL (or LGPL) code means that the freedom of the > users to do (3) is broken. Indeed it would be broken with GPL, but it is _not_ the case with LGPL. It is fragment from LGPL: 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. > By the way, what implementation of pthreads were you thinking of? I have > noticed that the first implementation that Google brings was only tested > on uni-processor system. > (http://sourceware.org/pthreads-win32/news.html). Release 2.7.0 of pthreads-w32 was tested on SMP too (as stated in there http://sourceware.org/pthreads-win32/news.html) > > To be more practical: > Can you give us a better view of what you are trying to achieve? In > other words, as far as I know > Opensm is using complib apis to handle threads. Right, and it has very limited and sometimes broken functionality. > The implementation of > this functions on windows is usually trivial. > Do you intend to make a re-write of opensm so that it will use pthreads > or do you intend to make a find/replace > And replace the complib functions with Pthreads ones? If we are talking > about the second, than one can simply implement the pthread functions > using trivial win32 calls. I'm fine with this idea (additional functionality will be needed however). I would suppose that using ready-to-use pthread library is simpler, but it is up to you - I guess any working pthread implementation should be fine for us. Hal? > And another question: What is the functionality that you are currently > missing? Mainly conditional variables (pthread_cond_wait(), pthread_cond_timedwait()), proper thread cancellation primitives (including threads cleanup), probably some another things later. > Can this functionality be added? Probably, but AFAIK Windows don't have pthread_cond_wait() equivalent, so I don't know. > > Thanks > Tzachi > > By the way, rumors I have heard say that Voltaire doesn't always give > it's code back to the community, but this are just rumors, right? Hey Tzachi, I will not spend the time in order to investigate a "rumors you have heard". If you have to say something just do it, I don't think somebody should deal with a "rumors". Sasha From jsquyres at cisco.com Wed Feb 21 07:28:15 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 21 Feb 2007 10:28:15 -0500 Subject: [openib-general] Address List Change for Friday, 2/23/2007 In-Reply-To: <20070221142117.GD13024@mellanox.co.il> References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov> <00D1093C-4EE4-4580-A57F-2B302E694C06@cisco.com> <20070221142117.GD13024@mellanox.co.il> Message-ID: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com> Can you look at the other lists that have migrated for examples? (e.g., ewg) It may be complex to send an actual example message *before* the list moves. On Feb 21, 2007, at 9:21 AM, Michael S. Tsirkin wrote: > Could an example message be please sent *today* to the new list, > so that client rules can be updated? > > I can't access my inbox on Friday or Saturday, and this change > will cause problems and message loss for me unless I can prepare > beforehand. > > > Quoting Jeff Squyres : > Subject: Fwd: Address List Change for Friday, 2/23/2007 > > FYI. In case you missed it the first time: THIS LIST IS CHANGING ON > FRIDAY 2/23/2007 (2 days from now). Please update your addressbooks! > > See the notice below for the details. > > > > Begin forwarded message: > >> From: "Lee, Michael Paichi" >> Date: February 19, 2007 10:43:23 AM EST >> To: openib-general at openib.org >> Subject: [openib-general] Address List Change for Friday, 2/23/2007 >> >> We're in the process of migrating the maillists from the old >> openib.org server to the new lists.openfabrics.org machine. The >> list openib-general will be moved this Friday, February 23, 2007. >> The new address for the maillist will be >> general at lists.openfabrics.org. >> >> What this means is that messages will come from >> general at lists.openfabrics.org. Conversely, replies should be made >> to this address as well. Messages will also have a new subject >> line prefix of [OFA General]. If you have configured your e-mail >> client to filter based on maillist address or subject headers, you >> may need to make some adjustments for filtering. >> >> However, for the sake of transition, messages sent to the previous >> maillist address on the old server will forward to the new server. >> This forward will remain in place until the old server is taken >> offline and final DNS changes are made. We expect the old server >> to go offline sometime in early March. >> >> The web archives will also be migrated to the new web address >> shortly, http://lists.openfabrics.org. >> >> If you have any questions, please don't hesitate to contact me at >> mplee at sandia.gov. >> >> Regards, >> Michael Lee >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ >> openib-general > > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general > > -- > MST -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Wed Feb 21 07:34:02 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Feb 2007 17:34:02 +0200 Subject: [openib-general] Address List Change for Friday, 2/23/2007 In-Reply-To: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com> References: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com> Message-ID: <20070221153402.GB17761@mellanox.co.il> > Quoting Jeff Squyres : > Subject: Re: Address List Change for Friday, 2/23/2007 > > Can you look at the other lists that have migrated for examples? > (e.g., ewg) If I look at other lists, there's no guarantee the rule will catch the actual message. > > It may be complex to send an actual example message *before* the list > moves. In this case, maybe the migration can be done in the middle of the week? > > On Feb 21, 2007, at 9:21 AM, Michael S. Tsirkin wrote: > > > Could an example message be please sent *today* to the new list, > > so that client rules can be updated? > > > > I can't access my inbox on Friday or Saturday, and this change > > will cause problems and message loss for me unless I can prepare > > beforehand. > > > > > > Quoting Jeff Squyres : > > Subject: Fwd: Address List Change for Friday, 2/23/2007 > > > > FYI. In case you missed it the first time: THIS LIST IS CHANGING ON > > FRIDAY 2/23/2007 (2 days from now). Please update your addressbooks! > > > > See the notice below for the details. > > > > > > > > Begin forwarded message: > > > >> From: "Lee, Michael Paichi" > >> Date: February 19, 2007 10:43:23 AM EST > >> To: openib-general at openib.org > >> Subject: [openib-general] Address List Change for Friday, 2/23/2007 > >> > >> We're in the process of migrating the maillists from the old > >> openib.org server to the new lists.openfabrics.org machine. The > >> list openib-general will be moved this Friday, February 23, 2007. > >> The new address for the maillist will be > >> general at lists.openfabrics.org. > >> > >> What this means is that messages will come from > >> general at lists.openfabrics.org. Conversely, replies should be made > >> to this address as well. Messages will also have a new subject > >> line prefix of [OFA General]. If you have configured your e-mail > >> client to filter based on maillist address or subject headers, you > >> may need to make some adjustments for filtering. > >> > >> However, for the sake of transition, messages sent to the previous > >> maillist address on the old server will forward to the new server. > >> This forward will remain in place until the old server is taken > >> offline and final DNS changes are made. We expect the old server > >> to go offline sometime in early March. > >> > >> The web archives will also be migrated to the new web address > >> shortly, http://lists.openfabrics.org. > >> > >> If you have any questions, please don't hesitate to contact me at > >> mplee at sandia.gov. > >> > >> Regards, > >> Michael Lee > >> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ > >> openib-general > > > > > > -- > > Jeff Squyres > > Server Virtualization Business Unit > > Cisco Systems > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > > openib-general > > > > -- > > MST -- Jeff Squyres Server Virtualization Business Unit Cisco Systems -- MST From jsquyres at cisco.com Wed Feb 21 08:08:59 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 21 Feb 2007 11:08:59 -0500 Subject: [openib-general] Address List Change for Friday, 2/23/2007 In-Reply-To: <20070221153402.GB17761@mellanox.co.il> References: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com> <20070221153402.GB17761@mellanox.co.il> Message-ID: On Feb 21, 2007, at 10:34 AM, Michael S. Tsirkin wrote: >> Can you look at the other lists that have migrated for examples? >> (e.g., ewg) > > If I look at other lists, there's no guarantee the rule will catch > the actual message. Can't you just paste in the new address of the list in your existing rules? I must be missing something. >> It may be complex to send an actual example message *before* the list >> moves. > > In this case, maybe the migration can be done in the middle of the > week? I'll let Michael Lee answer; we're currently driving off his goodwill and his schedule. I guess I didn't see why this was complex -- if a few mails get misplaced over the weekend because cutting-n-pasting the new e-mail address into existing rules somehow didn't work, is there a huge problem? -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From sashak at voltaire.com Wed Feb 21 08:43:19 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 21 Feb 2007 18:43:19 +0200 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <1172068314.4380.390208.camel@hal.voltaire.com> References: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com> <1172068314.4380.390208.camel@hal.voltaire.com> Message-ID: <20070221164319.GO27414@sashak.voltaire.com> On 09:31 Wed 21 Feb , Hal Rosenstock wrote: > > > and who is responsible that the license of the library won't change. > > I'm not sure how to answer this one but I don't think the license can > just change. The license changing will not work "backward", only "forward". So if some version was released under LGPL this version still be usable under LGPL. Sasha From tzachid at mellanox.co.il Wed Feb 21 08:56:47 2007 From: tzachid at mellanox.co.il (Tzachi Dar) Date: Wed, 21 Feb 2007 18:56:47 +0200 Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] Message-ID: <6C2C79E72C305246B504CBA17B5500C9EBB1F2@mtlexch01.mtl.com> What you are saying is true but there is a problem with that: If the community decides that there is a different license than probably most people will move to it. The rest of the people will stay with an old software that is not supported At all. There are two examples that one can give here: 1) Think of people who have started to write code under the GPL V1. Can they still find support for that today. Are there still projects being developed? The version was changed and everyone had to except it. 2) The second example (from recent time) is of course Novell. (I must say here that (1) I'm not a lawyer, (2) I'm not an expert to the case, and (3) I really don't want to start an arguing about Novel). Novel was using Linux under the GPL code. It did things that were not in the spirit of GPL but probably didn't break it. Now there is a movement to change GPL so that Novell will not be able to use it any more. I really don't know who is right or who is wrong here but if we can avoid being in that place that is better. To be on the practical side, I have read the introduction to pthreads in the past and from what I saw it was relatively easy to implement that on Win32. I want to look at the functions that were mentioned before in this thread and see if that is still the case. Let me get back to you on this at the beginning of next week. Thanks Tzachi > -----Original Message----- > From: ofw-bounces at lists.openfabrics.org > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Sasha > Khapyorsky > Sent: Wednesday, February 21, 2007 6:43 PM > To: Hal Rosenstock > Cc: ofw at lists.openfabrics.org; Gilad Shainer; OPENIB; Fab Tillier > Subject: Re: [openib-general] [ofw] [Fwd: Re: [Fwd: > Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] > > On 09:31 Wed 21 Feb , Hal Rosenstock wrote: > > > > > and who is responsible that the license of the library > won't change. > > > > I'm not sure how to answer this one but I don't think the > license can > > just change. > > The license changing will not work "backward", only > "forward". So if some version was released under LGPL this > version still be usable under LGPL. > > Sasha > _______________________________________________ > ofw mailing list > ofw at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw > From or.gerlitz at gmail.com Wed Feb 21 09:05:50 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 21 Feb 2007 19:05:50 +0200 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: <20070221132159.GC7711@mellanox.co.il> References: <45DC40A9.507@voltaire.com> <20070221132159.GC7711@mellanox.co.il> Message-ID: <15ddcffd0702210905v4bddbd06n656679c4985d0bf2@mail.gmail.com> On 2/21/07, Michael S. Tsirkin wrote: >> Quoting r. Or Gerlitz : >> I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on >> IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags >> times, correct? > No. OK, lets keep this simple: does the ipoib cm post recv flow number of calls to dma_map_xxx equals to the ipoib cm recv completion handling number of calls to dma_unmap_xxx ??? Or. From msalisbury at interactivesupercomputing.com Wed Feb 21 09:08:05 2007 From: msalisbury at interactivesupercomputing.com (Mark Salisbury) Date: Wed, 21 Feb 2007 12:08:05 -0500 Subject: [openib-general] initial setup problems Message-ID: <200702211208.05613.msalisbury@interactivesupercomputing.com> trying to setup ofed-1.1 on mellanox HW using Intel MPI. trying to run an MPI hello world equivalent, I get most of the way through startup and then it bombs out. I am unable to find any info about unexpected DAPL event 4008 here is the output of an example run: running mpdallexit on raki1 LAUNCHED mpd on raki1 via RUNNING: mpd on raki1 LAUNCHED mpd on raki2 via raki1 LAUNCHED mpd on raki4 via raki1 RUNNING: mpd on raki4 RUNNING: mpd on raki2 I_MPI: [0] check_one_device(): attributes for device: I_MPI: [0] check_one_device(): NEEDS_LDAT MAYBE I_MPI: [0] check_one_device(): HAS_COLLECTIVES (null) I_MPI: [0] check_one_device(): I_MPI_LIBRARY_VERSION 3.0 I_MPI: [0] check_one_device(): I_MPI_VERSION_DATE_OF_BUILD Fri Sep 15 14:32:24 MSD 2006 I_MPI: [0] check_one_device(): I_MPI_VERSION_PKGNAME_UNTARRED mpi_src.32.svsmpi004.20060915 I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.62 2006/09/15 08:43:15 Exp $ I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_LINE ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20060915.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20060915 -all -copyout -noinstall I_MPI: [0] check_one_device(): I_MPI_VERSION_MACHINENAME svsmpi020 I_MPI: [0] check_one_device(): I_MPI_DEVICE_VERSION 3.0.20060915 I_MPI: [0] check_one_device(): I_MPI_GCC_VERSION 3.4.4 20050721 (Red Hat 3.4.4-2) I_MPI: [0] set_up_devices(): I_MPI_DAPL_PROVIDER = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST_SUFFIX = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_IP_ADDR = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_PORT = NULL I_MPI: [0] check_one_device(): attributes for device: I_MPI: [0] check_one_device(): NEEDS_LDAT MAYBE I_MPI: [0] check_one_device(): HAS_COLLECTIVES (null) I_MPI: [0] check_one_device(): I_MPI_LIBRARY_VERSION 3.0 I_MPI: [0] check_one_device(): I_MPI_VERSION_DATE_OF_BUILD Fri Sep 15 14:32:24 MSD 2006 I_MPI: [0] check_one_device(): I_MPI_VERSION_PKGNAME_UNTARRED mpi_src.32.svsmpi004.20060915 I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.62 2006/09/15 08:43:15 Exp $ I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_LINE ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20060915.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20060915 -all -copyout -noinstall I_MPI: [0] check_one_device(): I_MPI_VERSION_MACHINENAME svsmpi020 I_MPI: [0] check_one_device(): I_MPI_DEVICE_VERSION 3.0.20060915 I_MPI: [0] check_one_device(): I_MPI_GCC_VERSION 3.4.4 20050721 (Red Hat 3.4.4-2) I_MPI: [0] set_up_devices(): I_MPI_DAPL_PROVIDER = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST_SUFFIX = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_IP_ADDR = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_PORT = NULL I_MPI: [0] check_one_device(): attributes for device: I_MPI: [0] check_one_device(): NEEDS_LDAT MAYBE I_MPI: [0] check_one_device(): HAS_COLLECTIVES (null) I_MPI: [0] check_one_device(): I_MPI_LIBRARY_VERSION 3.0 I_MPI: [0] check_one_device(): I_MPI_VERSION_DATE_OF_BUILD Fri Sep 15 14:32:24 MSD 2006 I_MPI: [0] check_one_device(): I_MPI_VERSION_PKGNAME_UNTARRED mpi_src.32.svsmpi004.20060915 I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.62 2006/09/15 08:43:15 Exp $ I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_LINE ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20060915.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20060915 -all -copyout -noinstall I_MPI: [0] check_one_device(): I_MPI_VERSION_MACHINENAME svsmpi020 I_MPI: [0] check_one_device(): I_MPI_DEVICE_VERSION 3.0.20060915 I_MPI: [0] check_one_device(): I_MPI_GCC_VERSION 3.4.4 20050721 (Red Hat 3.4.4-2) I_MPI: [0] set_up_devices(): I_MPI_DAPL_PROVIDER = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST_SUFFIX = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_IP_ADDR = NULL I_MPI: [0] set_up_devices(): I_MPI_DAPL_PORT = NULL I_MPI: [0] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so I_MPI: [2] I_MPI_dlopen_dat(): I_MPI: [0] my_dlopen(): trying to dlopen: libdat.sotrying to dlopen default -ldat: libdat.so I_MPI: [2] my_dlopen(): trying to dlopen: libdat.so I_MPI: [1] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so I_MPI: [1] my_dlopen(): trying to dlopen: libdat.so I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [1] MPIDI_CH3I_RDMA_init(): I_MPI: [2] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3_Init(): will use rdma configuration I_MPI: [0] MPI_Init: The process (pid=17898) started on raki1 Greetings from process 17898(0) I_MPI: [1] MPIDI_CH3_Init(): will use rdma configuration I_MPI: [1] MPI_Init: The process (pid=16216) started on raki2 I_MPI: [2] MPIDI_CH3_Init(): will use rdma configuration I_MPI: [2] MPI_Init: The process (pid=16330) started on raki4 [2:raki4] unexpected DAPL event 4008 from 0:raki1 [1:raki2] unexpected DAPL event 4008 from 0:raki1 rank 2 in job 1 raki1_37392 caused collective abort of all ranks exit status of rank 2: return code 254 rank 1 in job 1 raki1_37392 caused collective abort of all ranks exit status of rank 1: return code 254 From sean.hefty at intel.com Wed Feb 21 09:09:35 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 21 Feb 2007 09:09:35 -0800 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DBEA1F.5090901@voltaire.com> Message-ID: <000101c755db$0f9ec290$8698070a@amr.corp.intel.com> >However, no matter what the SM configures, the core & ipoib code act as >the full pkey is there. This is nice simplification and it works well. Is the problem here really in the librdmacm or in the core/ipoib software? (I looked at the patch, but haven't looked into the full reason why it's needed.) - Sean From mst at mellanox.co.il Wed Feb 21 09:14:20 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Feb 2007 19:14:20 +0200 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: <15ddcffd0702210905v4bddbd06n656679c4985d0bf2@mail.gmail.com> References: <45DC40A9.507@voltaire.com> <20070221132159.GC7711@mellanox.co.il> <15ddcffd0702210905v4bddbd06n656679c4985d0bf2@mail.gmail.com> Message-ID: <20070221171420.GB22672@mellanox.co.il> > Quoting r. Or Gerlitz : > Subject: Re: [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth > > On 2/21/07, Michael S. Tsirkin wrote: > >> Quoting r. Or Gerlitz : > > >> I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on > >> IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags > >> times, correct? > > > No. > > OK, lets keep this simple: does the ipoib cm post recv flow number of > calls to dma_map_xxx equals to the ipoib cm recv completion handling > number of calls to dma_unmap_xxx ??? AFAIK yes. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From krause at cup.hp.com Wed Feb 21 09:25:11 2007 From: krause at cup.hp.com (Michael Krause) Date: Wed, 21 Feb 2007 09:25:11 -0800 Subject: [openib-general] Immediate data question In-Reply-To: <309a667c0702202121p52747748ic891a9d21a02e3d7@mail.gmail.co m> References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com> <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net> <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com> <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net> <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com> <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com> <6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com> <309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.com> <6.2.0.14.2.20070215071309.0979bed8@esmail.cup.hp.com> <309a667c0702202121p52747748ic891a9d21a02e3d7@mail.gmail.com> Message-ID: <6.2.0.14.2.20070221092429.02cc6380@esmail.cup.hp.com> At 09:21 PM 2/20/2007, Devesh Sharma wrote: >On 2/15/07, Michael Krause wrote: >>At 09:37 PM 2/14/2007, Devesh Sharma wrote: >> >On 2/14/07, Michael Krause wrote: >> >>At 05:37 AM 2/13/2007, Devesh Sharma wrote: >> >> >On 2/12/07, Devesh Sharma wrote: >> >> >>On 2/10/07, Tang, Changqing wrote: >> >> >> > > > >> >> >> > > >Not for the receiver, but the sender will be severely slowed >> down by >> >> >> > > >having to wait for the RNR timeouts. >> >> >> > > >> >> >> > > RNR = Receiver Not Ready so by definition, the data flow >> >> >> > > isn't going to >> >> >> > > progress until the receiver is ready to receive data. If a >> >> >> > > receive QP >> >> >> > > enters RNR for a RC, then it is likely not progressing as >> >> >> > > desired. RNR >> >> >> > > was initially put in place to enable a receiver to create >> >> >> > > back pressure to the sender without causing a fatal error >> >> >> > > condition. It should rarely be entered and therefore should >> >> >> > > have negligible impact on overall performance however when a >> >> >> > > RNR occurs, no forward progress will occur so performance is >> >> >> > > essentially zero. >> >> >> > >> >> >> > Mike: >> >> >> > I still do not quite understand this issue. I have two >> >> >> > situations that have RNR triggered. >> >> >> > >> >> >> > 1. process A and process B is connected with QP. A first post a >> send to >> >> >> > B, B does not post receive. Then A and B are doing a long time >> >> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE >> >> >> > message. Finally B will post a receive. Does the first pending send >> >> in A >> >> >> > block all the later RDMA_WRITE ? >> >> >>According to IBTA spec HCA will process WR entries in strict order in >> >> >>which they are posted so the send will block all WR posted after this >> >> >>send, Until-unless HCA has multiple processing elements, I think even >> >> >>then processing order will be maintained by HCA >> >> >> If not, since RNR is triggered >> >> >> > periodically till B post receive, does it affect the RDMA_WRITE >> >> >> > performance between A and B ? >> >> >> > >> >> >> > 2. extend above to three processes, A connect to B, B connect to C, >> >> so B >> >> >> > has two QPs, but one CQ.A posts a send to B, B does not post >> receive, >> >> >post ordering accross QP is not guaranteed hence presence of same CQ >> >> >or different CQ will not affect any thing. >> >> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B >> >> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C, >> >I am sorry I have missed that in both cases same DMA channel is in use. >> >> >_may_ affect the performance, since load is on same HCA. In case of >> >> >Send/Recv again _may_ affect the performance, with the same reason. >> >> >> >>Seems orthogonal. Any time h/w is shared, multiple flows will have an >> >>impact on one another. That is why we have the different arbitration >> >>mechanisms to enable one to control that impact. >> >Please, can you explain it more clearly? >> >>Most I/O devices are shared by multiple applications / kernel >>subsystems. Hence, the device acts as a serialization point for what goes >>on the wire / link. Sharing = resource contention and in order to add any >>structure to that contention, a number of technologies provide arbitration >>options. In the case of IB, the arbitration is confined to VL arbitration >>where a given data flow is assigned to a VL and that VL is services at some >>particular rate. A number of years ago I wrote up how one might also >>provide QP arbitration (not part of the IBTA specifications) and I >>understand some implementations have incorporated that or a variation of >>the mechanisms into their products. >Thanks mike for a nice explanation. I am sorry for the late reply, >Now I got it, here Chang is trying to find out performance hit due to >RNR NAK, performance hit due to device sharing is any how going to be >there so "load on same HCA" is not the proper explanation. >Am I correct now? Yes. You need to separate RNR NAK performance impacts as distinct from the multiple application sharing impacts. Mike >>In addition to IB link contention, there is also PCI link / bus >>contention. For PCIe, given most designs did not want to waste resources >>on multiple VC, there really isn't any standard arbitration >>mechanism. However, many devices, especially a device like a HCA or a >>RNIC, already have the concept of separate resource domains, e.g. QP, and >>they provide a mechanism to associate how the QP's DMA requests or >>interrupts requests are scheduled to the PCIe link. >> >> >> >> >> > must sends RNR periodically to A, right?. So does the pending >> message >> >> >> > from A affects B's overall performance between B and C ? >> >> >But RNR NAK is not for very long time.....possibly this performance >> >> >hit you will not be able to observe even. The moment rnr_counter >> >> >expires connection will be broken! >> >> >> >>Keep in mind the timeout can be infinite. RNR NAK are not expected to be >> >>frequent so their performance impact was considered reasonable. >> >Thanks I missed that. >> >>It is a subtlety within the specification that is easy to miss. >> >>Mike >> >> From vlad at dev.mellanox.co.il Wed Feb 21 09:56:45 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 21 Feb 2007 19:56:45 +0200 Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available Message-ID: <1172080605.5256.35.camel@vladsk-laptop> New OFED build is available: http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070221-1741.tgz Bugzilla is updated with fixed issues: https://bugs.openfabrics.org/ -- Vladimir Sokolovsky Mellanox Technologies Ltd. From swise at opengridcomputing.com Wed Feb 21 10:02:28 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 21 Feb 2007 12:02:28 -0600 Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available In-Reply-To: <1172080605.5256.35.camel@vladsk-laptop> References: <1172080605.5256.35.camel@vladsk-laptop> Message-ID: <1172080948.27101.15.camel@stevo-desktop> Hey Vlad, What about bugs: 355 and 357? On Wed, 2007-02-21 at 19:56 +0200, Vladimir Sokolovsky wrote: > New OFED build is available: > > http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070221-1741.tgz > > > Bugzilla is updated with fixed issues: > https://bugs.openfabrics.org/ > > From mst at mellanox.co.il Wed Feb 21 10:14:43 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Feb 2007 20:14:43 +0200 Subject: [openib-general] [ewg] Re: OFED-1.2-20070221-1741.tgz package is available In-Reply-To: <1172080948.27101.15.camel@stevo-desktop> References: <1172080605.5256.35.camel@vladsk-laptop> <1172080948.27101.15.camel@stevo-desktop> Message-ID: <20070221181443.GB27239@mellanox.co.il> Steve, can't you post a patch for 357? Quoting Steve Wise : Subject: [ewg] Re: [openib-general] OFED-1.2-20070221-1741.tgz package is available Hey Vlad, What about bugs: 355 and 357? On Wed, 2007-02-21 at 19:56 +0200, Vladimir Sokolovsky wrote: > New OFED build is available: > > http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070221-1741.tgz > > > Bugzilla is updated with fixed issues: > https://bugs.openfabrics.org/ > > _______________________________________________ ewg mailing list ewg at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- MST From vlad at dev.mellanox.co.il Wed Feb 21 10:16:01 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 21 Feb 2007 20:16:01 +0200 Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available In-Reply-To: <1172080948.27101.15.camel@stevo-desktop> References: <1172080605.5256.35.camel@vladsk-laptop> <1172080948.27101.15.camel@stevo-desktop> Message-ID: <1172081761.5256.45.camel@vladsk-laptop> On Wed, 2007-02-21 at 12:02 -0600, Steve Wise wrote: > Hey Vlad, > > What about bugs: 355 and 357? > > Bug: 355 (problems building modules that depend on OFED 1.2 modules) In order to build kernel modules depending on OFED's modules you need to take Modules.symvers file from /src/openib/Modules.symvers (part of kernel-ib-devel RPM) and copy this to modules subdir and then compile your module. Currently I see that /src/openib/Modules.symvers is empty. I will check this issue. For now you can use the attached script to create Modules.symvers file. Bug: 357 (cxgb3 can't be selected on sles9sp3) cxgb3 driver compilation failed on sles9sp3 in previous OFED build. Then it was disabled in build_env.sh script in order to prevent OFED installation failure. Did you fixed this compilation issue? -- Vladimir Sokolovsky Mellanox Technologies Ltd. -------------- next part -------------- A non-text attachment was scrubbed... Name: create_Module.symvers.sh Type: application/x-shellscript Size: 1031 bytes Desc: not available URL: From swise at opengridcomputing.com Wed Feb 21 10:34:27 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 21 Feb 2007 12:34:27 -0600 Subject: [openib-general] [ewg] Re: OFED-1.2-20070221-1741.tgz package is available In-Reply-To: <20070221181443.GB27239@mellanox.co.il> References: <1172080605.5256.35.camel@vladsk-laptop> <1172080948.27101.15.camel@stevo-desktop> <20070221181443.GB27239@mellanox.co.il> Message-ID: <1172082867.27101.25.camel@stevo-desktop> On Wed, 2007-02-21 at 20:14 +0200, Michael S. Tsirkin wrote: > Steve, can't you post a patch for 357? > I could, but I'm not sure what is _not_ supported for SLES9SP3. The script currently only allows mthca, sdp, and ipoib. cxgb3 should be allowed. But probably most other drivers too... I can provide a patch to allow cxgb3 if that's what you want... Steve. From robert.j.woodruff at intel.com Wed Feb 21 10:49:45 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 21 Feb 2007 10:49:45 -0800 Subject: [openib-general] Git on hosting.openfabrics.org server seems broken Message-ID: I appears that when I clone anyone's git tree locally on the hosting.openfabrics.org server, it only clones the master branch and I get none of the branches. The only difference in what I do now from what I did before is that the git version on the server is now 1.5.0., and before it was git version 1.4.4.3. Has anyone tried to do a clone of a git tree on the server lately ? Can you see the git branches of the cloned tree with git-branch. woody From sashak at voltaire.com Wed Feb 21 11:09:22 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 21 Feb 2007 21:09:22 +0200 Subject: [openib-general] Git on hosting.openfabrics.org server seems broken In-Reply-To: References: Message-ID: <20070221190922.GU27414@sashak.voltaire.com> On 10:49 Wed 21 Feb , Woodruff, Robert J wrote: > > I appears that when I clone anyone's git tree > locally on the hosting.openfabrics.org server, > it only clones the master branch and I get none of the branches. > The only difference in what I do now from what I did before > is that the git version on the server is now 1.5.0., and before it was > git version 1.4.4.3. Default branch layout was slightly changed with 1.5.0. Look at: http://lkml.org/lkml/2007/2/13/426 (but I think it should be due to your local git version, not?) > Has anyone tried to do a clone of a git tree on the > server lately ? Can you see the git branches of the cloned > tree with git-branch. git-branch -r Sasha From robert.j.woodruff at intel.com Wed Feb 21 11:09:23 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 21 Feb 2007 11:09:23 -0800 Subject: [openib-general] Git on hosting.openfabrics.org server seems broken In-Reply-To: <20070221190922.GU27414@sashak.voltaire.com> Message-ID: Aaarg!!! Not only is git terse and difficult to use, but once you finally learn the commands, they change them on you in the next version. -----Original Message----- From: Sasha Khapyorsky [mailto:sashak at voltaire.com] Sent: Wednesday, February 21, 2007 11:09 AM To: Woodruff, Robert J Cc: OPENIB; Michael S. Tsirkin Subject: Re: Git on hosting.openfabrics.org server seems broken On 10:49 Wed 21 Feb , Woodruff, Robert J wrote: > > I appears that when I clone anyone's git tree > locally on the hosting.openfabrics.org server, > it only clones the master branch and I get none of the branches. > The only difference in what I do now from what I did before > is that the git version on the server is now 1.5.0., and before it was > git version 1.4.4.3. Default branch layout was slightly changed with 1.5.0. Look at: http://lkml.org/lkml/2007/2/13/426 (but I think it should be due to your local git version, not?) > Has anyone tried to do a clone of a git tree on the > server lately ? Can you see the git branches of the cloned > tree with git-branch. git-branch -r Sasha From sean.hefty at intel.com Wed Feb 21 11:49:49 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 21 Feb 2007 11:49:49 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <6.2.0.14.2.20070220103929.02953a20@esmail.cup.hp.com> Message-ID: <000201c755f1$727618d0$8698070a@amr.corp.intel.com> I sent a message on this topic to the IBTA several days ago, but I am still awaiting details (likely early next week). >It should not be carried in the CM REQ. The SLID / DLID of the router >ports should be derived through local subnet SA / SM query. When a CM REQ >traverses one or more subnets there will be potentially many SLID / DLID >involved in the communication. Each router should be populating its >routing tables in order to build the new LRH attached to the GRH / CM REQ >that it is forwarding to the next hop. I'm referring to configuration of the QP, not the operation of the routers. To establish a connection, the passive side QP needs to transition from Init to RTR. As part of that transition, the modify QP verb needs as input the Destination LID of its local router. It sounds like you expect the passive side to perform an SA query to obtain its own local routing information, which would essentially invalidate the data carried in the primary and alternate path fields in the CM REQ. >From reading 12.7.11, 13.5.1, and 17.4, I do not believe that such a requirement was expected to be placed on the passive side of a connection. The initial response I received agreed with this. >I'd need to go back but the architecture is predicated that the SM and SA >are strictly local and for security purposes their communication should >remain local. Higher level management entities built to communicate with >SM and SA are responsible for cross subnet communications without exposing >the SA or SM to direct interaction. P_Key and Q_Key management across >subnets is an example of such communication across subnets that would not >be exposed to the SA and SM. My initial thoughts are that this sounds like a good idea. It's not eliminating the need for interacting with a remote SA, so much as it abstracts it to another entity. My hope is that we can reach an agreement on the CM REQ. Depending on that, it still needs to determine if the existing SA attributes are sufficient to allow forming inter-subnet connections, and if they are, can such attributes be obtained. - Sean From or.gerlitz at gmail.com Wed Feb 21 12:34:11 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 21 Feb 2007 22:34:11 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <000101c755db$0f9ec290$8698070a@amr.corp.intel.com> References: <45DBEA1F.5090901@voltaire.com> <000101c755db$0f9ec290$8698070a@amr.corp.intel.com> Message-ID: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com> On 2/21/07, Sean Hefty wrote: >>However, no matter what the SM configures, the core & ipoib code act as >>the full pkey is there. This is nice simplification and it works well. > Is the problem here really in the librdmacm or in the core/ipoib software? There is no problem. As i have explained over this thread the ipoib and the core abstract away from the user the actual value of the MSb of the pkey, that is whether it is a full or partial membership pkey. IPoIB does it by OR-ing 0x8000 to the pkey it uses and the core does it in ib_find_cached_pkey() which when provided a pkey, return the index of $pkey or of $pkey & 0x7fff which ever one of the them is there. The only missing piece is for librdmacm to play this game as well and the patch does this. > (I looked at the patch, but haven't looked into the full reason why it's > needed.) start with checking me... tell the SM to configure 0x7fff instead of 0xffff to one of your nodes as the pkey at index 0, then see that ping is working but librdmacm RC utils such as rping or ib_rdma_bw -c do not. Then apply the patch and check again. Or. Or. From sean.hefty at intel.com Wed Feb 21 12:42:38 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 21 Feb 2007 12:42:38 -0800 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com> Message-ID: <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com> >There is no problem. As i have explained over this thread the ipoib >and the core abstract away from the user the actual value of the MSb >of the pkey, that is whether it is a full or partial membership pkey. But *why* does the kernel code do this, and should it? - Sean From swise at opengridcomputing.com Wed Feb 21 12:45:39 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 21 Feb 2007 14:45:39 -0600 Subject: [openib-general] [PATCH 2.6.21] iw_cxgb3: Stop the EP Timer on BAD CLOSE. Message-ID: <1172090739.27101.39.camel@stevo-desktop> Stop the ep timer in ec_status() if the status indicates a bad close. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index e5442e3..d00e5dd 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1635,6 +1635,7 @@ static int ec_status(struct t3cdev *tdev printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n", __FUNCTION__, ep->hwtid); + stop_ep_timer(ep); attrs.next_state = IWCH_QP_STATE_ERROR; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, From or.gerlitz at gmail.com Wed Feb 21 12:45:44 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 21 Feb 2007 22:45:44 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1172064021.4380.385825.camel@hal.voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> <45DBEA1F.5090901@voltaire.com> <1172058368.4380.379947.camel@hal.voltaire.com> <45DC3C96.8040100@voltaire.com> <1172064021.4380.385825.camel@hal.voltaire.com> Message-ID: <15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock wrote: > On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote: > > > I believe it is a spec (compliance) violation for the port to be a > > > partial member and join as a full member. > > Since partial members can't talk among themselves, there is no reason to > > form a multicast group containing --only-- ports that can --not-- talk > > to each other... So if the spec does not allow this (having a partial > > member joining with the full member pkey) - it a spec bug... > I think there are two issues here then: > 1. If this is the case, getting the spec changed to accomodate this use case > 2. I believe that OpenIB code is supposed to be spec compliant. If the IPoIB spec does not allow both partial and full members of a partition to share a broadcast domain (eg the IPv4 broadcast group associated with the full membership pkey) or any other multicast group, burn it (or at least the relevant section). The OpenIB code supposed to work and as done with the RDMA CM header, the implementation should not wait for spec to be written or changed. Or. From swise at opengridcomputing.com Wed Feb 21 12:46:40 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 21 Feb 2007 14:46:40 -0600 Subject: [openib-general] [PATCH ofed_1_2] iw_cxgb3: Stop the EP Timer on BAD CLOSE. Message-ID: <1172090800.27101.40.camel@stevo-desktop> Stop the ep timer in ec_status() if the status indicates a bad close. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index e5442e3..d00e5dd 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1635,6 +1635,7 @@ static int ec_status(struct t3cdev *tdev printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n", __FUNCTION__, ep->hwtid); + stop_ep_timer(ep); attrs.next_state = IWCH_QP_STATE_ERROR; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, From swise at opengridcomputing.com Wed Feb 21 12:48:21 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 21 Feb 2007 14:48:21 -0600 Subject: [openib-general] [PATCH 1/2] ofed_1_2 Fix copyrights in the cxgb3 driver. In-Reply-To: <1171569595.13282.60.camel@stevo-desktop> References: <1171569595.13282.60.camel@stevo-desktop> Message-ID: <1172090901.27101.42.camel@stevo-desktop> Vlad, Please apply this to ofed_1_2. Thanks, Steve. On Thu, 2007-02-15 at 13:59 -0600, Steve Wise wrote: > Fix copyrights in the cxgb3 driver. > > Remove the Open Grid Computing copyright. It shouldn't be there. > > Signed-off-by: Steve Wise > --- > > drivers/net/cxgb3/cxgb3_defs.h | 1 - > drivers/net/cxgb3/cxgb3_offload.c | 1 - > drivers/net/cxgb3/cxgb3_offload.h | 1 - > drivers/net/cxgb3/l2t.c | 1 - > drivers/net/cxgb3/l2t.h | 1 - > drivers/net/cxgb3/t3cdev.h | 1 - > 6 files changed, 0 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/cxgb3/cxgb3_defs.h b/drivers/net/cxgb3/cxgb3_defs.h > old mode 100755 > new mode 100644 > index 16e0049..e14862b > --- a/drivers/net/cxgb3/cxgb3_defs.h > +++ b/drivers/net/cxgb3/cxgb3_defs.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/net/cxgb3/cxgb3_offload.c b/drivers/net/cxgb3/cxgb3_offload.c > old mode 100755 > new mode 100644 > index c3a02d6..46e9068 > --- a/drivers/net/cxgb3/cxgb3_offload.c > +++ b/drivers/net/cxgb3/cxgb3_offload.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/net/cxgb3/cxgb3_offload.h b/drivers/net/cxgb3/cxgb3_offload.h > old mode 100755 > new mode 100644 > index 0e6beb6..f15446a > --- a/drivers/net/cxgb3/cxgb3_offload.h > +++ b/drivers/net/cxgb3/cxgb3_offload.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/net/cxgb3/l2t.c b/drivers/net/cxgb3/l2t.c > old mode 100755 > new mode 100644 > index 3c0cb85..d660af7 > --- a/drivers/net/cxgb3/l2t.c > +++ b/drivers/net/cxgb3/l2t.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2003-2007 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/net/cxgb3/l2t.h b/drivers/net/cxgb3/l2t.h > old mode 100755 > new mode 100644 > index ba5d2cb..d790013 > --- a/drivers/net/cxgb3/l2t.h > +++ b/drivers/net/cxgb3/l2t.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2003-2007 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/net/cxgb3/t3cdev.h b/drivers/net/cxgb3/t3cdev.h > old mode 100755 > new mode 100644 > index 9af3bcd..fa4099b > --- a/drivers/net/cxgb3/t3cdev.h > +++ b/drivers/net/cxgb3/t3cdev.h > @@ -1,6 +1,5 @@ > /* > * Copyright (C) 2006-2007 Chelsio Communications. All rights reserved. > - * Copyright (C) 2006-2007 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Wed Feb 21 12:48:37 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 21 Feb 2007 14:48:37 -0600 Subject: [openib-general] [PATCH 2/2] ofed_1_2 Fix copyrights in the iw_cxgb3 driver. In-Reply-To: <1171569621.13282.62.camel@stevo-desktop> References: <1171569621.13282.62.camel@stevo-desktop> Message-ID: <1172090917.27101.44.camel@stevo-desktop> And this one too... Thanks, Steve. On Thu, 2007-02-15 at 14:00 -0600, Steve Wise wrote: > Fix copyrights in the iw_cxgb3 driver. > > Remove the Open Grid Computing copyright. It shouldn't be there. > > Signed-off-by: Steve Wise > --- > > drivers/infiniband/hw/cxgb3/core/cxio_dbg.c | 1 - > drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 1 - > drivers/infiniband/hw/cxgb3/core/cxio_hal.h | 1 - > drivers/infiniband/hw/cxgb3/core/cxio_resource.c | 1 - > drivers/infiniband/hw/cxgb3/core/cxio_resource.h | 1 - > drivers/infiniband/hw/cxgb3/core/cxio_wr.h | 1 - > drivers/infiniband/hw/cxgb3/iwch.c | 1 - > drivers/infiniband/hw/cxgb3/iwch.h | 1 - > drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 - > drivers/infiniband/hw/cxgb3/iwch_cm.h | 1 - > drivers/infiniband/hw/cxgb3/iwch_cq.c | 1 - > drivers/infiniband/hw/cxgb3/iwch_ev.c | 1 - > drivers/infiniband/hw/cxgb3/iwch_mem.c | 1 - > drivers/infiniband/hw/cxgb3/iwch_provider.c | 1 - > drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 - > drivers/infiniband/hw/cxgb3/iwch_qp.c | 1 - > drivers/infiniband/hw/cxgb3/iwch_user.h | 1 - > 17 files changed, 0 insertions(+), 17 deletions(-) > > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c > index dfaa704..d6b6c97 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c > index 5e31816..229edd5 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h > index e5e702d..1553bda 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c > index d1d8722..cf78050 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h > index a6bbe83..a2703a3 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h > index 234a084..6c7ac55 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c > index 0c95f2c..de44c57 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch.c > +++ b/drivers/infiniband/hw/cxgb3/iwch.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h > index 8b11198..8d9390b 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch.h > +++ b/drivers/infiniband/hw/cxgb3/iwch.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c > index 3237fc8..21fadbe 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h > index 893f9d0..855f1ef 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_cm.h > +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c > index ff09509..225fcfa 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_cq.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c > index 646f612..f4cd5ec 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_ev.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c > index 5909ec5..335e9a4 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_mem.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index 4a46771..3f64dbf 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h > index d9d94e3..7322773 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c > index 9cc8b5e..e1e35d9 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h > index e8ff061..bf0a2f6 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_user.h > +++ b/drivers/infiniband/hw/cxgb3/iwch_user.h > @@ -1,6 +1,5 @@ > /* > * Copyright (c) 2006 Chelsio, Inc. All rights reserved. > - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From or.gerlitz at gmail.com Wed Feb 21 12:50:26 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 21 Feb 2007 22:50:26 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com> References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com> <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com> Message-ID: <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com> On 2/21/07, Sean Hefty wrote: > >There is no problem. As i have explained over this thread the ipoib > >and the core abstract away from the user the actual value of the MSb > >of the pkey, that is whether it is a full or partial membership pkey. > > But *why* does the kernel code do this, and should it? It does this since its makes life simple and robust. Note that since the HCA validates the pkey in the in coming packet, no matter what the IB SW would do, partial members of a partition can't talk to each other. So the approach taken by the core/ipoib code was to just ignore the MSb in places where the code looks for the pkey --index-- and use the full member pkey when forming MGIDs. This seems fine to me. Or. From halr at voltaire.com Wed Feb 21 14:29:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Feb 2007 17:29:21 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> <45DBEA1F.5090901@voltaire.com> <1172058368.4380.379947.camel@hal.voltaire.com> <45DC3C96.8040100@voltaire.com> <1172064021.4380.385825.camel@hal.voltaire.com> <15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com> Message-ID: <1172096957.4380.418140.camel@hal.voltaire.com> On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote: > On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock wrote: > > On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote: > > > > > I believe it is a spec (compliance) violation for the port to be a > > > > partial member and join as a full member. > > > > Since partial members can't talk among themselves, there is no reason to > > > form a multicast group containing --only-- ports that can --not-- talk > > > to each other... So if the spec does not allow this (having a partial > > > member joining with the full member pkey) - it a spec bug... > > > I think there are two issues here then: > > 1. If this is the case, getting the spec changed to accomodate this use case > > 2. I believe that OpenIB code is supposed to be spec compliant. > > If the IPoIB spec does not allow both partial and full members of a > partition to share a broadcast domain (eg the IPv4 broadcast group > associated with the full membership pkey) or any other multicast > group, burn it (or at least the relevant section). I was referring to the IB spec, not an IPoIB RFC. > The OpenIB code supposed to work and as done with the RDMA CM header, > the implementation should not wait for spec to be written or changed. Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics wanted to issue code which is not IBA spec compliant. -- Hal > Or. From mshefty at ichips.intel.com Wed Feb 21 14:36:24 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Feb 2007 14:36:24 -0800 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com> References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com> <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com> <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com> Message-ID: <45DCC968.30208@ichips.intel.com> > It does this since its makes life simple and robust. Is an SM prevented from loading two PKeys into an HCA's PKey table that differ by only the membership bit? I can't think of any reason to do such a thing, but depending on which index was selected could limit which nodes you could communicate with. > Note that since the HCA validates the pkey in the in coming packet, no > matter what the IB SW would do, partial members of a partition can't > talk to each other. So the approach taken by the core/ipoib code was > to just ignore the MSb in places where the code looks for the pkey > --index-- and use the full member pkey when forming MGIDs. This seems > fine to me. My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't the one in the search. Can this lead to a QP being configured in such a way that communication with a remote QP would silently fail? I realize that a user could call ib_get_cached_pkey and see if the returned value matches the one in the original search, but this is a non-obvious way to check for a mismatch. I'm not against this patch, but I want to make sure that I understand the issues, so we're not creating a work-around solution. The patch is against the librdmacm, yet there's nothing that I see in the librdmacm that makes me think it's behaving incorrectly. - Sean From tom at opengridcomputing.com Wed Feb 21 14:40:21 2007 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 21 Feb 2007 16:40:21 -0600 Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available In-Reply-To: <1172081761.5256.45.camel@vladsk-laptop> References: <1172080605.5256.35.camel@vladsk-laptop> <1172080948.27101.15.camel@stevo-desktop> <1172081761.5256.45.camel@vladsk-laptop> Message-ID: <1172097621.5994.13.camel@trinity.ogc.int> Vlad: On Wed, 2007-02-21 at 20:16 +0200, Vladimir Sokolovsky wrote: > On Wed, 2007-02-21 at 12:02 -0600, Steve Wise wrote: > > Hey Vlad, > > > > What about bugs: 355 and 357? > > > > > Bug: 355 (problems building modules that depend on OFED 1.2 modules) > > In order to build kernel modules depending on OFED's modules you need to > take Modules.symvers file from /src/openib/Modules.symvers (part > of kernel-ib-devel RPM) and copy this to modules subdir and then compile > your module. Won't this blow away all the version information for the non-IB symbols? > Currently I see that /src/openib/Modules.symvers is empty. I > will check this issue. > > For now you can use the attached script to create Modules.symvers file. > > Bug: 357 (cxgb3 can't be selected on sles9sp3) > > cxgb3 driver compilation failed on sles9sp3 in previous OFED build. > Then it was disabled in build_env.sh script in order to prevent OFED > installation failure. > Did you fixed this compilation issue? > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mshefty at ichips.intel.com Wed Feb 21 15:05:58 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Feb 2007 15:05:58 -0800 Subject: [openib-general] GetTable path record query not returningDGID=SGID paths In-Reply-To: <1171514817.22446.145890.camel@hal.voltaire.com> References: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com> <1171514817.22446.145890.camel@hal.voltaire.com> Message-ID: <45DCD056.50108@ichips.intel.com> >>We haven't looked into this in more detail yet. This was our observation while >>testing on a larger (64 node) cluster this morning that we don't have access to >>at the moment. With the local SA cache running, we were surprised to see any >>retries, and when we looked into it more, retries were always for loopback >>connections. Our investigation showed a couple of things. When we pulled our systems off into a small cluster and ran opensm, things were fine. The cache was working as normal, and we did get loopback paths from opensm. On our development cluster, the cache was never getting any path records. It would issue a GetTable query, and the SM would respond. The response had a status of 0 (success), but never returned any path records. I believe that the SM node is running OFED 1.1.1. I don't have the ability to modify the kernel on the larger 64-node cluster that we were testing on to see what is going on there. - Sean From halr at voltaire.com Wed Feb 21 14:53:22 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Feb 2007 17:53:22 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DCC968.30208@ichips.intel.com> References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com> <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com> <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com> <45DCC968.30208@ichips.intel.com> Message-ID: <1172098401.4380.419534.camel@hal.voltaire.com> On Wed, 2007-02-21 at 17:36, Sean Hefty wrote: > > It does this since its makes life simple and robust. > > Is an SM prevented from loading two PKeys into an HCA's PKey table that differ > by only the membership bit? Nope. > I can't think of any reason to do such a thing, Me neither. It would be a configuration error of sorts. > but depending on which index was > selected could limit which nodes you could communicate with. > > Note that since the HCA validates the pkey in the in coming packet, no > > matter what the IB SW would do, partial members of a partition can't > > talk to each other. So the approach taken by the core/ipoib code was > > to just ignore the MSb in places where the code looks for the pkey > > --index-- and use the full member pkey when forming MGIDs. This seems > > fine to me. > > My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't > the one in the search. Can this lead to a QP being configured in such a way > that communication with a remote QP would silently fail? > > I realize that a user could call ib_get_cached_pkey and see if the returned > value matches the one in the original search, but this is a non-obvious way to > check for a mismatch. > > I'm not against this patch, but I want to make sure that I understand the > issues, so we're not creating a work-around solution. The patch is against the > librdmacm, yet there's nothing that I see in the librdmacm that makes me think > it's behaving incorrectly. I'm not sure it's this patch in particular but it appears that there may be some non compliant behavior being exercised IMO. -- Hal > - Sean From halr at voltaire.com Wed Feb 21 15:22:34 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Feb 2007 18:22:34 -0500 Subject: [openib-general] GetTable path record query not returningDGID=SGID paths In-Reply-To: <45DCD056.50108@ichips.intel.com> References: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com> <1171514817.22446.145890.camel@hal.voltaire.com> <45DCD056.50108@ichips.intel.com> Message-ID: <1172100153.4380.421309.camel@hal.voltaire.com> On Wed, 2007-02-21 at 18:05, Sean Hefty wrote: > >>We haven't looked into this in more detail yet. This was our observation while > >>testing on a larger (64 node) cluster this morning that we don't have access to > >>at the moment. With the local SA cache running, we were surprised to see any > >>retries, and when we looked into it more, retries were always for loopback > >>connections. > > Our investigation showed a couple of things. When we pulled our systems off > into a small cluster and ran opensm, things were fine. The cache was working as > normal, and we did get loopback paths from opensm. > > On our development cluster, the cache was never getting any path records. It > would issue a GetTable query, and the SM would respond. The response had a > status of 0 (success), but never returned any path records. I believe that the > SM node is running OFED 1.1.1. I'm unaware of any changes in this area of OpenSM which would cause this but maybe I'm forgetting something. Can you run opensm with -V and send the logs to me ? This should be instructive. -- Hal > I don't have the ability to modify the kernel on the larger 64-node cluster that > we were testing on to see what is going on there. > > - Sean From halr at voltaire.com Wed Feb 21 15:32:18 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Feb 2007 18:32:18 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1172098401.4380.419534.camel@hal.voltaire.com> References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com> <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com> <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com> <45DCC968.30208@ichips.intel.com> <1172098401.4380.419534.camel@hal.voltaire.com> Message-ID: <1172100738.4380.421860.camel@hal.voltaire.com> On Wed, 2007-02-21 at 17:53, Hal Rosenstock wrote: > On Wed, 2007-02-21 at 17:36, Sean Hefty wrote: > > > It does this since its makes life simple and robust. > > > > Is an SM prevented from loading two PKeys into an HCA's PKey table that differ > > by only the membership bit? > > Nope. > > > I can't think of any reason to do such a thing, > > Me neither. It would be a configuration error of sorts. It is vendor dependent whether the SM would allow this. As Sasha points out, this cannot be done with OpenSM (at least currently). -- Hal > > but depending on which index was > > selected could limit which nodes you could communicate with. > > > > Note that since the HCA validates the pkey in the in coming packet, no > > > matter what the IB SW would do, partial members of a partition can't > > > talk to each other. So the approach taken by the core/ipoib code was > > > to just ignore the MSb in places where the code looks for the pkey > > > --index-- and use the full member pkey when forming MGIDs. This seems > > > fine to me. > > > > My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't > > the one in the search. Can this lead to a QP being configured in such a way > > that communication with a remote QP would silently fail? > > > > I realize that a user could call ib_get_cached_pkey and see if the returned > > value matches the one in the original search, but this is a non-obvious way to > > check for a mismatch. > > > > I'm not against this patch, but I want to make sure that I understand the > > issues, so we're not creating a work-around solution. The patch is against the > > librdmacm, yet there's nothing that I see in the librdmacm that makes me think > > it's behaving incorrectly. > > I'm not sure it's this patch in particular but it appears that there may > be some non compliant behavior being exercised IMO. > > -- Hal > > > - Sean > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From changquing.tang at hp.com Wed Feb 21 15:48:59 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Wed, 21 Feb 2007 23:48:59 -0000 Subject: [openib-general] I created a git tree for the libibverbs man pages In-Reply-To: References: <45BF63A1.6090402@dev.mellanox.co.il> <45BF756B.1060500@dev.mellanox.co.il> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84037D91A1@G3W0634.americas.hpqcorp.net> Hi, Roland: What is the Max # of cards OFED driver/library can support on a single node ? Thanks. --CQ > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier > Sent: Tuesday, February 20, 2007 6:12 PM > To: Dotan Barak > Cc: openib-general > Subject: Re: [openib-general] I created a git tree for the > libibverbs man pages > > I merged all these manpages into my libibverbs tree and > pushed the result out to kernel.org. > > Please send any future updates as diffs against the libibverbs tree. > > Thanks, > Roland > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From rdreier at cisco.com Wed Feb 21 16:55:31 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 21 Feb 2007 16:55:31 -0800 Subject: [openib-general] I created a git tree for the libibverbs man pages In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84037D91A1@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Wed, 21 Feb 2007 23:48:59 -0000") References: <45BF63A1.6090402@dev.mellanox.co.il> <45BF756B.1060500@dev.mellanox.co.il> <349DCDA352EACF42A0C49FA6DCEA84037D91A1@G3W0634.americas.hpqcorp.net> Message-ID: > What is the Max # of cards OFED driver/library can support on a > single node ? The lowest limit I know of is the # of device minors available for /dev/infiniband/uverbs files, which is 32. How many devices are you interested in supporting? This limit could probably be increased without too much trouble, but I doubt any realistic system will run into it anyway. - R. From changquing.tang at hp.com Wed Feb 21 17:17:27 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Thu, 22 Feb 2007 01:17:27 -0000 Subject: [openib-general] I created a git tree for the libibverbs man pages In-Reply-To: References: <45BF63A1.6090402@dev.mellanox.co.il> <45BF756B.1060500@dev.mellanox.co.il> <349DCDA352EACF42A0C49FA6DCEA84037D91A1@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84037D9238@G3W0634.americas.hpqcorp.net> Supporting upto 32 cards on a node is big enough for quite a while. I just want to check if only 4 or 8 is supported. Thanks. --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Wednesday, February 21, 2007 6:56 PM > To: Tang, Changqing > Cc: Dotan Barak; openib-general > Subject: Re: [openib-general] I created a git tree for the > libibverbs man pages > > > What is the Max # of cards OFED driver/library can support on a > > single node ? > > The lowest limit I know of is the # of device minors > available for /dev/infiniband/uverbs files, which is 32. How > many devices are you interested in supporting? > > This limit could probably be increased without too much > trouble, but I doubt any realistic system will run into it anyway. > > - R. > From akepner at sgi.com Wed Feb 21 17:21:11 2007 From: akepner at sgi.com (akepner at sgi.com) Date: Wed, 21 Feb 2007 17:21:11 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race Message-ID: <20070222012111.GB3352@sgi.com> In: http://openib.org/pipermail/openib-general/2006-December/030251.html I described a potential race between DMA and CQ updates on Altix systems. At that time the bug hadn't been observed, but was expected to be possible on "large" NUMA systems. A first-cut at a patch was sent out, some very reasonable objections were raised, and the thread fizzled out. Since that time we've been able to produce the bug, and show that the patch I sent fixes the problem. (OK, the patch I sent with the addition of a small but important patchlet.) The biggest concern with the earlier patch seemed to be backward compatibility. There was a stab at addressing that in http://tinyurl.com/2x3s52, but no commentary. (Too ugly for words?) Any suggestions as to how to proceed? Should I just code something up in order to have a concrete target to discuss? Or are there any new thoughts based on the previous emails? -- Arthur From dotanb at dev.mellanox.co.il Wed Feb 21 23:08:35 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 22 Feb 2007 09:08:35 +0200 Subject: [openib-general] I created a git tree for the libibverbs man pages In-Reply-To: References: <45BF63A1.6090402@dev.mellanox.co.il> <45BF756B.1060500@dev.mellanox.co.il> Message-ID: <45DD4173.70701@dev.mellanox.co.il> Roland Dreier wrote: > I merged all these manpages into my libibverbs tree and pushed the > result out to kernel.org. > > Please send any future updates as diffs against the libibverbs tree. > > Thanks, > Roland > those are great news, thanks. Before the OFED 1.2 release i plan to send you a patch to fix some issues. thank again Dotan From sweitzen at cisco.com Wed Feb 21 23:11:50 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 21 Feb 2007 23:11:50 -0800 Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64 Message-ID: I tried both RHEL4 and SLES10 usinstall.sh, and get this. I filed bug 379, anyone else tried ppc64? gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include/infiniband -I./../libibcommon/incl\ ude/infiniband -Wall -m64 -g -O2 -MT libibumad_la-umad.lo -MD -MP -MF .deps/lib\ ibumad_la-umad.Tpo -c src/umad.c -fPIC -DPIC -o .libs/libibumad_la-umad.o In file included from src/umad.c:50: ./include/infiniband/umad.h:37:31: infiniband/common.h: No such file or directo\ ry src/umad.c: In function `port_alloc': src/umad.c:94: warning: implicit declaration of function `IBWARN' src/umad.c: In function `get_port': src/umad.c:160: warning: implicit declaration of function `snprintf' src/umad.c:163: warning: implicit declaration of function `sys_read_uint' src/umad.c:177: warning: implicit declaration of function `sys_read_uint64' src/umad.c:182: warning: implicit declaration of function `sys_read_gid' src/umad.c: In function `get_ca': src/umad.c:354: warning: implicit declaration of function `sys_read_string' src/umad.c:363: warning: implicit declaration of function `sys_read_guid' make[3]: *** [libibumad_la-umad.lo] Error 1 make[3]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\ anagement/libibumad' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\ anagement/libibumad' make[1]: *** [all] Error 2 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\ anagement/libibumad' make: *** [subdirs] Error 1 make: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/mana\ gement' Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Wed Feb 21 23:28:48 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 22 Feb 2007 09:28:48 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1172096957.4380.418140.camel@hal.voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> <45DBEA1F.5090901@voltaire.com> <1172058368.4380.379947.camel@hal.voltaire.com> <45DC3C96.8040100@voltaire.com> <1172064021.4380.385825.camel@hal.voltaire.com> <15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com> <1172096957.4380.418140.camel@hal.voltaire.com> Message-ID: <45DD4630.3070101@voltaire.com> Hal Rosenstock wrote: > On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote: >> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock wrote: >> If the IPoIB spec does not allow both partial and full members of a >> partition to share a broadcast domain (eg the IPv4 broadcast group >> associated with the full membership pkey) or any other multicast >> group, burn it (or at least the relevant section). > I was referring to the IB spec, not an IPoIB RFC. Can you provide a pointer? >> The OpenIB code supposed to work and as done with the RDMA CM header, >> the implementation should not wait for spec to be written or changed. > Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics > wanted to issue code which is not IBA spec compliant. The code resides in the Linux kernel, period. Linux is not under the control of this or that organization, period, period. Linux uses an hierarchic maintainship structure where Roland, Sean and yourself are listed as the maintainers, which means you are able to promote and/or block this or that agenda, go for it! Or. From ogerlitz at voltaire.com Thu Feb 22 00:04:14 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 22 Feb 2007 10:04:14 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DCC968.30208@ichips.intel.com> References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com> <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com> <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com> <45DCC968.30208@ichips.intel.com> Message-ID: <45DD4E7E.50009@voltaire.com> Sean Hefty wrote: >> Note that since the HCA validates the pkey in the in coming packet, no >> matter what the IB SW would do, partial members of a partition can't >> talk to each other. So the approach taken by the core/ipoib code was >> to just ignore the MSb in places where the code looks for the pkey >> --index-- and use the full member pkey when forming MGIDs. This seems >> fine to me. > My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't > the one in the search. Can this lead to a QP being configured in such a way > that communication with a remote QP would silently fail? My understanding is that when an IPoIB broadcast domain contains both partial and full members (*) attempts to communicate between two partial members would silently fail, does this silence is something you think we should work to change? (*) eg when you have bunch or clients and a server or bunch of servers and you don't want to allow --clients-- to communicate among themselves) > I'm not against this patch, but I want to make sure that I understand the > issues, so we're not creating a work-around solution. The patch is against the > librdmacm, yet there's nothing that I see in the librdmacm that makes me think > it's behaving incorrectly. My thinking is that if in the end of this thread we are willing to move forward without changing ib_find_cached_pkey() then this patch should be merged. Or. From mst at mellanox.co.il Thu Feb 22 01:00:06 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Feb 2007 11:00:06 +0200 Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64 In-Reply-To: References: Message-ID: <20070222090006.GA9727@mellanox.co.il> > Quoting Scott Weitzenkamp (sweitzen) : > Subject: anyone have OFED 1.2 alpha1 compiling on ppc64 > > I tried both RHEL4 and SLES10 usinstall.sh, and get this. I filed bug 379, > anyone else tried ppc64? Scott, could pls you upload the kernel sources and .config files to staging? If you do, we'll be able to add these to mightly cross-build environment. -- MST From noreply at eoxiamail.com Thu Feb 22 01:52:04 2007 From: noreply at eoxiamail.com (Airtist.com) Date: Thu, 22 Feb 2007 10:52:04 +0100 Subject: [openib-general] =?UTF-8?Q?Airtist_Telecharger_vos_MP3_sans_DRM_a_partir_de_0, 2=80?= Message-ID: <557febc6d61aa3920004b0efb3c66140@www.eoxiamail.com> An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Thu Feb 22 02:26:53 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 22 Feb 2007 02:26:53 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070222-0200 daily build status Message-ID: <20070222102653.E48CAE6080D@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.9-42.ELsmp Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1062: error: ‘adapter_list_lock’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1069: error: ‘adapter_list_lock’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From vlad at lists.openfabrics.org Thu Feb 22 03:16:15 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 22 Feb 2007 03:16:15 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070222-0251 daily build status Message-ID: <20070222111615.7B4B4E603B1@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function 'sg_dma_len' /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'add_adapter': /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1062: error: 'adapter_list_lock' undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'remove_adapter': /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1069: error: 'adapter_list_lock' undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: 'ADVERTISE_PAUSE_CAP' undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: 'ADVERTISE_PAUSE_ASYM' undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From vlad at mellanox.co.il Thu Feb 22 04:09:29 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 22 Feb 2007 14:09:29 +0200 Subject: [openib-general] [PATCH 1/2] ofed_1_2 Fix copyrights in the cxgb3 driver. In-Reply-To: <1171569595.13282.60.camel@stevo-desktop> References: <1171569595.13282.60.camel@stevo-desktop> Message-ID: <1172146169.18306.0.camel@vladsk-laptop> On Thu, 2007-02-15 at 13:59 -0600, Steve Wise wrote: > Fix copyrights in the cxgb3 driver. > > Remove the Open Grid Computing copyright. It shouldn't be there. > > Signed-off-by: Steve Wise > --- Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From vlad at mellanox.co.il Thu Feb 22 04:09:47 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 22 Feb 2007 14:09:47 +0200 Subject: [openib-general] [PATCH 2/2] ofed_1_2 Fix copyrights in the iw_cxgb3 driver. In-Reply-To: <1171569621.13282.62.camel@stevo-desktop> References: <1171569621.13282.62.camel@stevo-desktop> Message-ID: <1172146187.18306.2.camel@vladsk-laptop> On Thu, 2007-02-15 at 14:00 -0600, Steve Wise wrote: > Fix copyrights in the iw_cxgb3 driver. > > Remove the Open Grid Computing copyright. It shouldn't be there. > > Signed-off-by: Steve Wise > --- Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From halr at voltaire.com Thu Feb 22 04:04:12 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Feb 2007 07:04:12 -0500 Subject: [openib-general] [ewg] anyone have OFED 1.2 alpha1 compiling on ppc64 In-Reply-To: References: Message-ID: <1172145850.4380.466231.camel@hal.voltaire.com> On Thu, 2007-02-22 at 02:11, Scott Weitzenkamp (sweitzen) wrote: > I tried both RHEL4 and SLES10 usinstall.sh, and get this. I filed bug > 379, anyone else tried ppc64? > > gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include/infiniband > -I./../libibcommon/incl\ > ude/infiniband -Wall -m64 -g -O2 -MT libibumad_la-umad.lo -MD -MP -MF > .deps/lib\ > ibumad_la-umad.Tpo -c src/umad.c -fPIC -DPIC -o > .libs/libibumad_la-umad.o > In file included from src/umad.c:50: > ./include/infiniband/umad.h:37:31: infiniband/common.h: No such file > or > directo\ > ry > src/umad.c: In function `port_alloc': > src/umad.c:94: warning: implicit declaration of function `IBWARN' > src/umad.c: In function `get_port': > src/umad.c:160: warning: implicit declaration of function `snprintf' > src/umad.c:163: warning: implicit declaration of function > `sys_read_uint' > src/umad.c:177: warning: implicit declaration of function > `sys_read_uint64' > src/umad.c:182: warning: implicit declaration of function > `sys_read_gid' > src/umad.c: In function `get_ca': > src/umad.c:354: warning: implicit declaration of function > `sys_read_string' > src/umad.c:363: warning: implicit declaration of function > `sys_read_guid' > make[3]: *** [libibumad_la-umad.lo] Error 1 > make[3]: Leaving directory > `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\ > anagement/libibumad' > make[2]: *** [all-recursive] Error 1 > make[2]: Leaving directory > `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\ > anagement/libibumad' > make[1]: *** [all] Error 2 > make[1]: Leaving directory > `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\ > anagement/libibumad' > make: *** [subdirs] Error 1 > make: Leaving directory > `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/mana\ > gement' That missing header (common.h) is in libibcommon. Somehow, libibcommon is not installed. libibumad depends on libibcommon. Is this a build/install script issue with OFED 1.2 ? Vlad ? -- Hal > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > ______________________________________________________________________ > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From vlad at mellanox.co.il Thu Feb 22 04:13:53 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 22 Feb 2007 14:13:53 +0200 Subject: [openib-general] [PATCH ofed_1_2] iw_cxgb3: Stop the EP Timer on BAD CLOSE. In-Reply-To: <1172090800.27101.40.camel@stevo-desktop> References: <1172090800.27101.40.camel@stevo-desktop> Message-ID: <1172146433.18306.4.camel@vladsk-laptop> On Wed, 2007-02-21 at 14:46 -0600, Steve Wise wrote: > Stop the ep timer in ec_status() if the status indicates a > bad close. > > Signed-off-by: Steve Wise > --- > > drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c > index e5442e3..d00e5dd 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c > @@ -1635,6 +1635,7 @@ static int ec_status(struct t3cdev *tdev > > printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n", > __FUNCTION__, ep->hwtid); > + stop_ep_timer(ep); > attrs.next_state = IWCH_QP_STATE_ERROR; > iwch_modify_qp(ep->com.qp->rhp, > ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, > Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From jackm at dev.mellanox.co.il Thu Feb 22 04:32:00 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 22 Feb 2007 14:32:00 +0200 Subject: [openib-general] libibverbs: can't compile more than once due to man3 symbolic links Message-ID: <200702221432.00546.jackm@dev.mellanox.co.il> The code below was just added to libibverbs/Makefile.am install-data-hook: cd $(DESTDIR)$(mandir)/man3 && \ $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \ .... This creates a problem when re-compiling/re-installing libibverbs -- the "ln -s" ( = $(LN_S) ) fails because the symbolic links still exist in the man/man3 directory. I rummaged around the libtool documentation, and there is no pre-defined macro which does "ln -fs" (which would just overwrite the current links). Any ideas on how to fix this problem cleanly (i.e., without violating spirit of libtool/automake)? - Jack From jsquyres at cisco.com Thu Feb 22 04:38:03 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 22 Feb 2007 07:38:03 -0500 Subject: [openib-general] libibverbs: can't compile more than once due to man3 symbolic links In-Reply-To: <200702221432.00546.jackm@dev.mellanox.co.il> References: <200702221432.00546.jackm@dev.mellanox.co.il> Message-ID: Is there a reason not to use man_MANS = ibv_get_async_event.3 .... ? On Feb 22, 2007, at 7:32 AM, Jack Morgenstein wrote: > The code below was just added to libibverbs/Makefile.am > > install-data-hook: > cd $(DESTDIR)$(mandir)/man3 && \ > $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \ > .... > > This creates a problem when re-compiling/re-installing libibverbs -- > the "ln -s" ( = $(LN_S) ) fails because the symbolic links still exist > in the man/man3 directory. > > I rummaged around the libtool documentation, and there is no pre- > defined > macro which does "ln -fs" (which would just overwrite the current > links). > > Any ideas on how to fix this problem cleanly (i.e., without > violating spirit > of libtool/automake)? > > - Jack > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Thu Feb 22 04:40:12 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Feb 2007 14:40:12 +0200 Subject: [openib-general] libibverbs: can't compile more than once due to man3 symbolic links In-Reply-To: <200702221432.00546.jackm@dev.mellanox.co.il> References: <200702221432.00546.jackm@dev.mellanox.co.il> Message-ID: <20070222124012.GD9727@mellanox.co.il> > Quoting Jack Morgenstein : > Subject: libibverbs: can't compile more than once due to man3 symbolic links > > The code below was just added to libibverbs/Makefile.am > > install-data-hook: > cd $(DESTDIR)$(mandir)/man3 && \ > $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \ > .... > > This creates a problem when re-compiling/re-installing libibverbs -- > the "ln -s" ( = $(LN_S) ) fails because the symbolic links still exist > in the man/man3 directory. > > I rummaged around the libtool documentation, and there is no pre-defined > macro which does "ln -fs" (which would just overwrite the current links). > > Any ideas on how to fix this problem cleanly (i.e., without violating spirit > of libtool/automake)? Probably just add $(RM) ibv_ack_async_event.3 && \ -- MST From jsquyres at cisco.com Thu Feb 22 04:49:16 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 22 Feb 2007 07:49:16 -0500 Subject: [openib-general] libibverbs: can't compile more than once due to man3 symbolic links In-Reply-To: References: <200702221432.00546.jackm@dev.mellanox.co.il> Message-ID: <37A54161-C56C-42A5-91C1-D5335A3FC546@cisco.com> Blah -- disregard; I read the mail too quickly and didn't look at the actual Makefile.am to see what you were really asking. FWIW, the "install" app, by default, removes things before copying in the new target. So putting a manual "rm -f" in here, while klunky, has precedent and will make it work. On Feb 22, 2007, at 7:38 AM, Jeff Squyres wrote: > Is there a reason not to use > > man_MANS = ibv_get_async_event.3 .... > > ? > > > On Feb 22, 2007, at 7:32 AM, Jack Morgenstein wrote: > >> The code below was just added to libibverbs/Makefile.am >> >> install-data-hook: >> cd $(DESTDIR)$(mandir)/man3 && \ >> $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \ >> .... >> >> This creates a problem when re-compiling/re-installing libibverbs -- >> the "ln -s" ( = $(LN_S) ) fails because the symbolic links still >> exist >> in the man/man3 directory. >> >> I rummaged around the libtool documentation, and there is no pre- >> defined >> macro which does "ln -fs" (which would just overwrite the current >> links). >> >> Any ideas on how to fix this problem cleanly (i.e., without >> violating spirit >> of libtool/automake)? >> >> - Jack >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ >> openib-general > > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From jackm at dev.mellanox.co.il Thu Feb 22 04:57:17 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 22 Feb 2007 14:57:17 +0200 Subject: [openib-general] [PATCH] libibverbs: can't compile more than once due to man3 symbolic links In-Reply-To: <200702221432.00546.jackm@dev.mellanox.co.il> References: <200702221432.00546.jackm@dev.mellanox.co.il> Message-ID: <200702221457.17675.jackm@dev.mellanox.co.il> The following patch removes manpage symbolic links so that they may be relinked in the install. Suggested by Michael Tsirkin. Signed-off-by: Jack Morgenstein diff --git a/Makefile.am b/Makefile.am index 5d2383e..455041e 100644 --- a/Makefile.am +++ b/Makefile.am @@ -70,6 +70,18 @@ dist-hook: libibverbs.spec install-data-hook: cd $(DESTDIR)$(mandir)/man3 && \ + $(RM) ibv_ack_async_event.3 && \ + $(RM) ibv_ack_cq_events.3 && \ + $(RM) ibv_close_device.3 && \ + $(RM) ibv_dealloc_pd.3 && \ + $(RM) ibv_dereg_mr.3 && \ + $(RM) ibv_destroy_ah.3 && \ + $(RM) ibv_destroy_comp_channel.3 && \ + $(RM) ibv_destroy_cq.3 && \ + $(RM) ibv_destroy_qp.3 && \ + $(RM) ibv_destroy_srq.3 && \ + $(RM) ibv_detach_mcast.3 && \ + $(RM) ibv_free_device_list.3 && \ $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \ $(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3 && \ $(LN_S) ibv_open_device.3 ibv_close_device.3 && \ From swise at opengridcomputing.com Thu Feb 22 06:13:17 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 22 Feb 2007 08:13:17 -0600 Subject: [openib-general] [PATCH 0/7] cxgb3 - Chelsio T3 1G/10G driver updates In-Reply-To: <45DD8559.7090106@chelsio.com> References: <45DD8559.7090106@chelsio.com> Message-ID: <1172153597.23995.9.camel@stevo-desktop> Divy, Do these need to be pulled into OFED 1.2 as well? Steve. On Thu, 2007-02-22 at 03:58 -0800, Divy Le Ray wrote: > Jeff, > > I'm sending a series of incremental patches updating > the cxgb3 driver. These patches are built against Linus'git tree. > > Cheers, > Divy > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ From halr at voltaire.com Thu Feb 22 06:45:02 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Feb 2007 09:45:02 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DD4630.3070101@voltaire.com> References: <1171984010.4380.304008.camel@hal.voltaire.com> <45DB15F5.4090406@voltaire.com> <1171986159.4380.306117.camel@hal.voltaire.com> <45DBEA1F.5090901@voltaire.com> <1172058368.4380.379947.camel@hal.voltaire.com> <45DC3C96.8040100@voltaire.com> <1172064021.4380.385825.camel@hal.voltaire.com> <15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com> <1172096957.4380.418140.camel@hal.voltaire.com> <45DD4630.3070101@voltaire.com> Message-ID: <1172155499.4380.475947.camel@hal.voltaire.com> On Thu, 2007-02-22 at 02:28, Or Gerlitz wrote: > Hal Rosenstock wrote: > > On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote: > >> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock wrote: > > >> If the IPoIB spec does not allow both partial and full members of a > >> partition to share a broadcast domain (eg the IPv4 broadcast group > >> associated with the full membership pkey) or any other multicast > >> group, burn it (or at least the relevant section). > > > I was referring to the IB spec, not an IPoIB RFC. > > Can you provide a pointer? See MCMemberRecord:P_Key description in table 210 (p. 908). > >> The OpenIB code supposed to work and as done with the RDMA CM header, > >> the implementation should not wait for spec to be written or changed. > > > Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics > > wanted to issue code which is not IBA spec compliant. > > The code resides in the Linux kernel, period. Linux is not under the > control of this or that organization, period, period. Linux uses an > hierarchic maintainship structure where Roland, Sean and yourself are > listed as the maintainers, which means you are able to promote and/or > block this or that agenda, go for it! OpenIB claims IBA compliance (currently mostly v1.2) and is there any good reason that we shouldn't continue to adhere to this ? -- Hal > Or. From halr at voltaire.com Thu Feb 22 06:45:56 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Feb 2007 09:45:56 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DD4E7E.50009@voltaire.com> References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com> <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com> <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com> <45DCC968.30208@ichips.intel.com> <45DD4E7E.50009@voltaire.com> Message-ID: <1172155552.4380.475949.camel@hal.voltaire.com> On Thu, 2007-02-22 at 03:04, Or Gerlitz wrote: > Sean Hefty wrote: > >> Note that since the HCA validates the pkey in the in coming packet, no > >> matter what the IB SW would do, partial members of a partition can't > >> talk to each other. So the approach taken by the core/ipoib code was > >> to just ignore the MSb in places where the code looks for the pkey > >> --index-- and use the full member pkey when forming MGIDs. This seems > >> fine to me. > > > My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't > > the one in the search. Can this lead to a QP being configured in such a way > > that communication with a remote QP would silently fail? > > My understanding is that when an IPoIB broadcast domain contains both > partial and full members (*) attempts to communicate between two partial > members would silently fail, An IB multicast group _cannot_ have partial members so this never should get far enough to where two limited members would be unable to communicate. -- Hal > does this silence is something you think we > should work to change? > > (*) eg when you have bunch or clients and a server or bunch of servers > and you don't want to allow --clients-- to communicate among themselves) > > > I'm not against this patch, but I want to make sure that I understand the > > issues, so we're not creating a work-around solution. The patch is against the > > librdmacm, yet there's nothing that I see in the librdmacm that makes me think > > it's behaving incorrectly. > > My thinking is that if in the end of this thread we are willing to move > forward without changing ib_find_cached_pkey() then this patch should be > merged. > > Or. From ogerlitz at voltaire.com Thu Feb 22 07:07:13 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 22 Feb 2007 17:07:13 +0200 (IST) Subject: [openib-general] librdmacm examples not working under OFED 1.2 alpha Message-ID: I have tested RH4 U4 and to some extent also RH5 beta and see the following: under RH4 U4 ============ - rping: addr and route resolution passing, client getting reject on conn req - udaddy: working fine on both UDP and IPOIB port spaces - mckey: not applicable on RH4 U4 till my patch with ip_ib_mc_map is merged under both udaddy and rping librdmacm report: librdmacm: couldn't read ABI version. librdmacm: assuming: 4 under RH5 ========= basically, the same: rping does not work, udaddy works on both port spaces. Also was able to check mckey and it works fine on both port spaces. The ABI error print is not seen. The rping client/server logs are below, Or. rping client ============ root at Adi6 ~]# rping -c -v -d -a 193.168.80.175 ipaddr (193.168.80.175) librdmacm: couldn't read ABI version. librdmacm: assuming: 4 created cm_id 0x505f10 cma_event type 0 cma_id 0x505f10 (parent) cma_event type 2 cma_id 0x505f10 (parent) rdma_resolve_addr - rdma_resolve_route successful created pd 0x507830 created channel 0x506260 created cq 0x507880 created qp 0x507990 rping_setup_buffers called on cb 0x505010 allocated & registered buffers... cq_thread started. cq completion failed status 5 wait for CONNECTED state 10 connect error -1 rping_free_buffers called on cb 0x505010 cma_event type 8 cma_id 0x505f10 (parent) cma event 8, error 0 rping server =========== root at Adi5 ~]# rping -s -d -v -S 100 -C 100 verbose size 100 count 100 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 created cm_id 0x505f00 rdma_bind_addr successful rdma_listen From swise at opengridcomputing.com Thu Feb 22 07:12:17 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 22 Feb 2007 09:12:17 -0600 Subject: [openib-general] librdmacm examples not working under OFED 1.2 alpha In-Reply-To: References: Message-ID: <1172157137.26393.3.camel@stevo-desktop> What device? On Thu, 2007-02-22 at 17:07 +0200, Or Gerlitz wrote: > I have tested RH4 U4 and to some extent also RH5 beta and see the following: > > under RH4 U4 > ============ > > - rping: addr and route resolution passing, client getting reject on conn req > > - udaddy: working fine on both UDP and IPOIB port spaces > > - mckey: not applicable on RH4 U4 till my patch with ip_ib_mc_map is merged > > under both udaddy and rping librdmacm report: > > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > > under RH5 > ========= > > basically, the same: rping does not work, udaddy works on both port spaces. > Also was able to check mckey and it works fine on both port spaces. > The ABI error print is not seen. > > The rping client/server logs are below, > > Or. > > rping client > ============ > > root at Adi6 ~]# rping -c -v -d -a 193.168.80.175 > ipaddr (193.168.80.175) > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > created cm_id 0x505f10 > cma_event type 0 cma_id 0x505f10 (parent) > cma_event type 2 cma_id 0x505f10 (parent) > rdma_resolve_addr - rdma_resolve_route successful > created pd 0x507830 > created channel 0x506260 > created cq 0x507880 > created qp 0x507990 > rping_setup_buffers called on cb 0x505010 > allocated & registered buffers... > cq_thread started. > cq completion failed status 5 > wait for CONNECTED state 10 > connect error -1 > rping_free_buffers called on cb 0x505010 > cma_event type 8 cma_id 0x505f10 (parent) > cma event 8, error 0 > > rping server > =========== > root at Adi5 ~]# rping -s -d -v -S 100 -C 100 > verbose > size 100 > count 100 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > created cm_id 0x505f00 > rdma_bind_addr successful > rdma_listen > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Thu Feb 22 07:13:21 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 22 Feb 2007 17:13:21 +0200 Subject: [openib-general] librdmacm examples not working under OFED 1.2 alpha In-Reply-To: <1172157137.26393.3.camel@stevo-desktop> References: <1172157137.26393.3.camel@stevo-desktop> Message-ID: <45DDB311.30802@voltaire.com> Steve Wise wrote: > What device? mthca From mst at mellanox.co.il Thu Feb 22 07:16:06 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Feb 2007 17:16:06 +0200 Subject: [openib-general] librdmacm examples not working under OFED 1.2 alpha In-Reply-To: References: Message-ID: <20070222151606.GD29559@mellanox.co.il> > > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 I think there was a kernel patch from Woody to address this. Woody? -- MST From vlad at dev.mellanox.co.il Thu Feb 22 08:13:28 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 22 Feb 2007 18:13:28 +0200 Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available In-Reply-To: <1172097621.5994.13.camel@trinity.ogc.int> References: <1172080605.5256.35.camel@vladsk-laptop> <1172080948.27101.15.camel@stevo-desktop> <1172081761.5256.45.camel@vladsk-laptop> <1172097621.5994.13.camel@trinity.ogc.int> Message-ID: <1172160808.29968.2.camel@vladsk-laptop> On Wed, 2007-02-21 at 16:40 -0600, Tom Tucker wrote: > > Bug: 355 (problems building modules that depend on OFED 1.2 modules) > > > > In order to build kernel modules depending on OFED's modules you need to > > take Modules.symvers file from /src/openib/Modules.symvers (part > > of kernel-ib-devel RPM) and copy this to modules subdir and then compile > > your module. > > Won't this blow away all the version information for the non-IB symbols? > See Documentation/kbuild/modules.txt (under kernel sources): --- 7.3 Symbols from another external module ... Use an extra Module.symvers file When an external module is built, a Module.symvers file is generated containing all exported symbols which are not defined in the kernel. To get access to symbols from module 'bar', one can copy the Module.symvers file from the compilation of the 'bar' module to the directory where the 'foo' module is built. During the module build, kbuild will read the Module.symvers file in the directory of the external module and when the build is finished, a new Module.symvers file is created containing the sum of all symbols defined and not part of the kernel. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From mplee at sandia.gov Thu Feb 22 08:44:25 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Thu, 22 Feb 2007 09:44:25 -0700 Subject: [openib-general] Address List Change Now Scheduled for Wednesday, 2/28/2007 In-Reply-To: References: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com> <20070221153402.GB17761@mellanox.co.il> Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F03BC471E@ES22SNLNT.srn.sandia.gov> The list will now be migrated on Wednesday, 2/28/2007. List address: general at lists.openfabrics.org Updated change-date: Wednesday, 2/28/2007 Michael -----Original Message----- From: Jeff Squyres [mailto:jsquyres at cisco.com] Sent: Wednesday, February 21, 2007 8:09 AM To: Michael S. Tsirkin Cc: OpenFabrics General; Lee, Michael Paichi Subject: Re: Address List Change for Friday, 2/23/2007 On Feb 21, 2007, at 10:34 AM, Michael S. Tsirkin wrote: >> Can you look at the other lists that have migrated for examples? >> (e.g., ewg) > > If I look at other lists, there's no guarantee the rule will catch the > actual message. Can't you just paste in the new address of the list in your existing rules? I must be missing something. >> It may be complex to send an actual example message *before* the list >> moves. > > In this case, maybe the migration can be done in the middle of the > week? I'll let Michael Lee answer; we're currently driving off his goodwill and his schedule. I guess I didn't see why this was complex -- if a few mails get misplaced over the weekend because cutting-n-pasting the new e-mail address into existing rules somehow didn't work, is there a huge problem? -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From sweitzen at cisco.com Thu Feb 22 09:36:07 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 22 Feb 2007 09:36:07 -0800 Subject: [openib-general] [ewg] anyone have OFED 1.2 alpha1 compiling on ppc64 In-Reply-To: <1172145850.4380.466231.camel@hal.voltaire.com> References: <1172145850.4380.466231.camel@hal.voltaire.com> Message-ID: > That missing header (common.h) is in libibcommon. Somehow, libibcommon > is not installed. libibumad depends on libibcommon. Is this a > build/install script issue with OFED 1.2 ? Vlad ? > > -- Hal I tried install.sh again, this time telling it to build libibcommon instead of relying on dependencies, and get this: + install -m 0755 /var/tmp/OFED/usr/local/ofed/bin32/mread /var/tmp/OFED/usr/lo\ cal/ofed/bin install: cannot stat `/var/tmp/OFED/usr/local/ofed/bin32/mread': No such file o\ r directory I believe mread has been renamed to mstread. # ls /var/tmp/OFED/usr/local/ofed/bin32 mstflint mstmread mstmwrite mstregdump mstvpd Scott From sean.hefty at intel.com Thu Feb 22 09:47:42 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 22 Feb 2007 09:47:42 -0800 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DD4E7E.50009@voltaire.com> Message-ID: <000201c756a9$8d650c60$8698070a@amr.corp.intel.com> >My understanding is that when an IPoIB broadcast domain contains both >partial and full members (*) attempts to communicate between two partial >members would silently fail, does this silence is something you think we >should work to change? I'm looking at this from a different view than just ipoib multicast groups. For example, can two users of the ib_cm successfully establish a connection, but not actually be able to transfer data between each other? This seems possible, though unlikely. This is the type of silent failure I'm referring to. Without this patch, two clients that try to connect using the librdmacm will fail. That failure is reported to the user. With this patch, the connection would be created, but I don't think that it guarantees that communication can actually occur. I don't want to mask a configuration issue. >My thinking is that if in the end of this thread we are willing to move >forward without changing ib_find_cached_pkey() then this patch should be >merged. I'm still unsure about where the cause of this problem lies. It may be that the kernel rdma_cm or rdma_ucm needs to change if we decide the ib_find_cached_pkey is correct. - Sean From sean.hefty at intel.com Thu Feb 22 09:56:04 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 22 Feb 2007 09:56:04 -0800 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1172155552.4380.475949.camel@hal.voltaire.com> Message-ID: <000301c756aa$b89e0020$8698070a@amr.corp.intel.com> >An IB multicast group _cannot_ have partial members so this never should >get far enough to where two limited members would be unable to >communicate. Can someone help my understanding here? Is ipoib joining a multicast group using the full membership PKey, even if the node that it joins from only has the limited membership PKey configured? And the code in ib_find_cached_pkey helps enable this? - Sean From jackm at dev.mellanox.co.il Thu Feb 22 10:16:43 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 22 Feb 2007 20:16:43 +0200 Subject: [openib-general] [PATCH v2] libibverbs: can't compile more than once due to man3 symbolic links In-Reply-To: <200702221457.17675.jackm@dev.mellanox.co.il> References: <200702221432.00546.jackm@dev.mellanox.co.il> <200702221457.17675.jackm@dev.mellanox.co.il> Message-ID: <200702222016.43559.jackm@dev.mellanox.co.il> Missed 2 lines in the patch. Below is the correct patch: --- The following patch removes manpage symbolic links so that they may be relinked in the install. Suggested by Michael Tsirkin. Signed-off-by: Jack Morgenstein diff --git a/Makefile.am b/Makefile.am index 5d2383e..705b184 100644 --- a/Makefile.am +++ b/Makefile.am @@ -70,6 +70,20 @@ dist-hook: libibverbs.spec install-data-hook: cd $(DESTDIR)$(mandir)/man3 && \ + $(RM) ibv_ack_async_event.3 && \ + $(RM) ibv_ack_cq_events.3 && \ + $(RM) ibv_close_device.3 && \ + $(RM) ibv_dealloc_pd.3 && \ + $(RM) ibv_dereg_mr.3 && \ + $(RM) ibv_destroy_ah.3 && \ + $(RM) ibv_destroy_comp_channel.3 && \ + $(RM) ibv_destroy_cq.3 && \ + $(RM) ibv_destroy_qp.3 && \ + $(RM) ibv_destroy_srq.3 && \ + $(RM) ibv_detach_mcast.3 && \ + $(RM) ibv_free_device_list.3 && \ + $(RM) ibv_init_ah_from_wc.3 && \ + $(RM) mult_to_ibv_rate.3 && \ $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \ $(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3 && \ $(LN_S) ibv_open_device.3 ibv_close_device.3 && \ From Jesse.Butler at Sun.COM Thu Feb 22 10:27:12 2007 From: Jesse.Butler at Sun.COM (Jesse Butler) Date: Thu, 22 Feb 2007 13:27:12 -0500 Subject: [openib-general] mthca adjust_key() Message-ID: <45DDE080.60507@sun.com> Could anyone tell me why this routine in mthca is necessary? There aren't any comments to explain it; I'm wondering if this is a workaround for Sinai of some kind? static inline u32 adjust_key(struct mthca_dev *dev, u32 key) { if (dev->mthca_flags & MTHCA_FLAG_SINAI_OPT) return ((key << 20) & 0x800000) | (key & 0x7fffff); else return key; } Thanks in advance, Jesse From rdreier at cisco.com Thu Feb 22 10:30:00 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 10:30:00 -0800 Subject: [openib-general] [PATCH v2] libibverbs: can't compile more than once due to man3 symbolic links In-Reply-To: <200702222016.43559.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Thu, 22 Feb 2007 20:16:43 +0200") References: <200702221432.00546.jackm@dev.mellanox.co.il> <200702221457.17675.jackm@dev.mellanox.co.il> <200702222016.43559.jackm@dev.mellanox.co.il> Message-ID: Thanks, I applied this and pushed it out. From rdreier at cisco.com Thu Feb 22 10:34:16 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 10:34:16 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: <20070222012111.GB3352@sgi.com> (akepner@sgi.com's message of "Wed, 21 Feb 2007 17:21:11 -0800") References: <20070222012111.GB3352@sgi.com> Message-ID: > A first-cut at a patch was sent out, some very reasonable > objections were raised, and the thread fizzled out. Sorry, I meant to respond again, but I never got around to it. > The biggest concern with the earlier patch seemed to be > backward compatibility. There was a stab at addressing > that in http://tinyurl.com/2x3s52, but no commentary. > (Too ugly for words?) I think you went off into the weeds there, but I'll respond to that earlier email in detail. > Any suggestions as to how to proceed? Should I just code > something up in order to have a concrete target to discuss? > Or are there any new thoughts based on the previous emails? I actually have a vague plan for a somewhat cleaner way to get this fix. For a variety of reasons, I am planning on changing the way the kernel handles memory registration so that low-level drivers have more control over what happens. This would allow us to folow Gleb's suggestion to use register MR to create and map the kernel's buffer and avoid some of the error path ugliness. So I would prefer to map the coherent memory that way. However this will take a while to come to fruition, since it is kind of a background task for me. How severe is this issue? In other words, when you produced the problem, was it a synthetic test, or a workload that someone might actually want to run? - R. From rdreier at cisco.com Thu Feb 22 10:40:35 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 10:40:35 -0800 Subject: [openib-general] mthca adjust_key() In-Reply-To: <45DDE080.60507@sun.com> (Jesse Butler's message of "Thu, 22 Feb 2007 13:27:12 -0500") References: <45DDE080.60507@sun.com> Message-ID: > Could anyone tell me why this routine in mthca is necessary? There > aren't any comments to explain it; I'm wondering if this is a workaround > for Sinai of some kind? > > static inline u32 adjust_key(struct mthca_dev *dev, u32 key) > { > if (dev->mthca_flags & MTHCA_FLAG_SINAI_OPT) > return ((key << 20) & 0x800000) | (key & 0x7fffff); > else > return key; > } It's a performance optimization for Sinai. - R. From akepner at sgi.com Thu Feb 22 10:32:08 2007 From: akepner at sgi.com (akepner at sgi.com) Date: Thu, 22 Feb 2007 10:32:08 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: References: <20070222012111.GB3352@sgi.com> Message-ID: <20070222183208.GC3352@sgi.com> On Thu, Feb 22, 2007 at 10:34:16AM -0800, Roland Dreier wrote: > > I actually have a vague plan for a somewhat cleaner way to get this > fix. For a variety of reasons, I am planning on changing the way the > kernel handles memory registration so that low-level drivers have more > control over what happens. This would allow us to folow Gleb's > suggestion to use register MR to create and map the kernel's buffer > and avoid some of the error path ugliness. So I would prefer to map > the coherent memory that way. OK, I look forward to seeing what you have in mind. > > However this will take a while to come to fruition, since it is kind > of a background task for me. How severe is this issue? In other > words, when you produced the problem, was it a synthetic test, or a > workload that someone might actually want to run? > We found this accidentally, running a normal MPI job, on a "normally sized" machine (i.e., tens, not hundreds of processors.) It appears to be more easily produced that we'd expected, and we consider it to be a severe problem. -- Arthur From rdreier at cisco.com Thu Feb 22 11:07:05 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 11:07:05 -0800 Subject: [openib-general] [RFC/BUG] libibverbs: DMA vs. CQ race In-Reply-To: ( akepner@sgi.com's message of "Fri, 2 Feb 2007 13:34:15 -0800 (PST)") References: Message-ID: > Assuming that something along the lines of the previous patch > is used, we need to address userspace/kernel compatibility. > > The existing abi versioning doesn't seem to be exactly what > we want to use, though, because we want to change a verb's > semantics to work around a bug. (Changing the abi_version > may be an inevitable result, though.) > > How about adding "semantic flags" to the mthca_* commands > (mthca_create_cq, etc.)? Userspace could read the contents of > a new sysfs file which, if found, would indicate the flags > that the kernel understands. Then it could pass the flags, if > it chooses, to get the kernel to use the desired semantics. This is not really the design philosophy that we've used in defining the user-kernel interfaces for IB verbs. Rather than having complexity in the kernel to handle both old and new ways of doing things, the way we've used to handle cases like this is the following: - specify new fixed ABI (in this case, mthca abi_version 2) - update library to handle old and new ABI (in this case, update libmthca to use mthca kernel abi 1 or 2 depending on what it detects at runtime) - update kernel to implement new ABI, and remove old ABI from kernel (in this case, update kernel mthca driver to abi_version 2) The net effect of this is that updated userspace works fine with any kernel, but updating the kernel will require updating userspace libraries too. However the important point is that once userspace is updated, it's still possible to boot into old kernels and have things work without downgrading userspace. If we really wanted to export some flags from mthca back to libmthca, I guess it would be possible to bump the abi version and add a flags field to the response to the alloc_ucontext command, but in this case I don't see a reason to worry about it. - R. From bugzilla-daemon at lists.openfabrics.org Thu Feb 22 11:10:00 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 22 Feb 2007 11:10:00 -0800 (PST) Subject: [openib-general] [Bug 384] New: OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3 ppc64 Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=384 Summary: OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3 ppc64 Product: OpenFabrics Linux Version: 1.2alpha1 Platform: PPC64 OS/Version: RHEL 4 Status: NEW Severity: normal Priority: P3 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: sweitzen at cisco.com + cd linux/drivers/net/bonding/ ++ pwd + make -C /lib/modules/2.6.9-34.EL/build M=/var/tmp/OFEDRPM/BUILD/ib-bonding-0.\ 9.0/linux/drivers/net/bonding make: Entering directory `/usr/src/kernels/2.6.9-34.EL-ppc64' LD /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bui\ lt-in.o CC [M] /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bon\ d_main.o In file included from /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net\ /bonding/bond_main.c:78: /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: In\ function `bond_set_slave_inactive_flags': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260\ : error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260\ : error: (Each undeclared identifier is reported only once /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260\ : error: for each function it appears in.) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:262\ : error: `IFF_SLAVE_NEEDARP' undeclared (first use in this function) .... -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rdreier at cisco.com Thu Feb 22 11:10:27 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 11:10:27 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: <20070222183208.GC3352@sgi.com> (akepner@sgi.com's message of "Thu, 22 Feb 2007 10:32:08 -0800") References: <20070222012111.GB3352@sgi.com> <20070222183208.GC3352@sgi.com> Message-ID: > We found this accidentally, running a normal MPI job, on a > "normally sized" machine (i.e., tens, not hundreds of > processors.) It appears to be more easily produced that > we'd expected, and we consider it to be a severe problem. Hmm, OK. Then I will do my best to make sure we get a fix for this into 2.6.22. - R. From vlad at mellanox.co.il Thu Feb 22 12:28:16 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 22 Feb 2007 22:28:16 +0200 Subject: [openib-general] [ewg] anyone have OFED 1.2 alpha1 compiling on ppc64 Message-ID: <6C2C79E72C305246B504CBA17B5500C922B36D@mtlexch01.mtl.com> Hi Scott, Try OFED-1.2-20070221-1741. This issue was fixed in this package. Regards, Vladimir -----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Thursday, February 22, 2007 7:36 PM To: Hal Rosenstock Cc: Openfabrics-ewg at openib.org; OPENIB; Vladimir Sokolovsky Subject: RE: [ewg] anyone have OFED 1.2 alpha1 compiling on ppc64 > That missing header (common.h) is in libibcommon. Somehow, libibcommon > is not installed. libibumad depends on libibcommon. Is this a > build/install script issue with OFED 1.2 ? Vlad ? > > -- Hal I tried install.sh again, this time telling it to build libibcommon instead of relying on dependencies, and get this: + install -m 0755 /var/tmp/OFED/usr/local/ofed/bin32/mread /var/tmp/OFED/usr/lo\ cal/ofed/bin install: cannot stat `/var/tmp/OFED/usr/local/ofed/bin32/mread': No such file o\ r directory I believe mread has been renamed to mstread. # ls /var/tmp/OFED/usr/local/ofed/bin32 mstflint mstmread mstmwrite mstregdump mstvpd Scott From or.gerlitz at gmail.com Thu Feb 22 13:09:25 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 22 Feb 2007 23:09:25 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <000301c756aa$b89e0020$8698070a@amr.corp.intel.com> References: <1172155552.4380.475949.camel@hal.voltaire.com> <000301c756aa$b89e0020$8698070a@amr.corp.intel.com> Message-ID: <15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com> On 2/22/07, Sean Hefty wrote: > >An IB multicast group _cannot_ have partial members so this never should > >get far enough to where two limited members would be unable to > >communicate. > Can someone help my understanding here? Is ipoib joining a multicast group > using the full membership PKey, even if the node that it joins from only has the > limited membership PKey configured? And the code in ib_find_cached_pkey helps > enable this? Yep. The ipoib create_child function Or-s 0x8000 to the device pkey which was provided by the user. Now, IPoIB uses the device pkey when forming MGIDs and when doing modify qp to init. Indeed the way ib_find_cached_pkey() is implemented, make the latter use trivial. Or. From or.gerlitz at gmail.com Thu Feb 22 13:15:36 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 22 Feb 2007 23:15:36 +0200 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <000201c756a9$8d650c60$8698070a@amr.corp.intel.com> References: <45DD4E7E.50009@voltaire.com> <000201c756a9$8d650c60$8698070a@amr.corp.intel.com> Message-ID: <15ddcffd0702221315h1da6b16cre9ce95e5c65790bc@mail.gmail.com> On 2/22/07, Sean Hefty wrote: > >My understanding is that when an IPoIB broadcast domain contains both > >partial and full members (*) attempts to communicate between two partial > >members would silently fail, does this silence is something you think we > >should work to change? > > I'm looking at this from a different view than just ipoib multicast groups. For > example, can two users of the ib_cm successfully establish a connection, but not > actually be able to transfer data between each other? This seems possible, > though unlikely. This is the type of silent failure I'm referring to. I don't think this is possible since the active CM uses the pkey index of the pkey provided in REQ.path to send the REQ mad, same for the passive CM - it uses the index in its table of REQ.path.pkey. So if the CMs are able to talk over QP1 using this pkey index the CM consumers can talk over their RC (REQ) / UD (SIDR REQ) QPs. And both the CM and its consumers would use the same index - the one returned from the ib_get_cached_pkey Or. From mst at mellanox.co.il Thu Feb 22 13:14:42 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Feb 2007 23:14:42 +0200 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: References: <20070220181755.GC11825@mellanox.co.il> Message-ID: <20070222211442.GB9143@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth > > Thanks, queued for 2.6.21. With this patch I see small-packet latency > down almost all the way back to what datagram mode gives -- on a pair > of fast woodcrest systems I see latencies for netpipe tcp 1 byte > messages like > > datagram 13.xx > original CM 17.xx > patched CM 14.xx > > so there is still a measurable difference but it is much less now. Hmm. An old system I tried here has a much higher latency, but does not seem to exhibit latency difference between datagram and CM. 1. Is there something special you do when you run the benchmark (msi, taskset, ...)? 2. On a wild guess that the issue here is higher interrupt rate with CM, is there a chance you could test the following patch posted by me earlier? http://www.mail-archive.com/openib-general at openib.org/msg29290.html Thanks, -- MST From rdreier at cisco.com Thu Feb 22 13:21:17 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 13:21:17 -0800 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: <20070222211442.GB9143@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 22 Feb 2007 23:14:42 +0200") References: <20070220181755.GC11825@mellanox.co.il> <20070222211442.GB9143@mellanox.co.il> Message-ID: > 1. Is there something special you do when you run the benchmark (msi, taskset, ...)? Yes, I am using MSI-X, and I pin the interrupt handler to one CPU (CPU#0 in my particular case). Then I use taskset to pin the NPtcp process to a CPU in a different package (CPU#2 in my system). BTW with these same systems, I am getting up to ~1150 MB/sec of throughput with DDR mem-free Arbel, as measured with NPtcp. > 2. On a wild guess that the issue here is higher interrupt rate with CM, > is there a chance you could test the following patch posted by me earlier? > http://www.mail-archive.com/openib-general at openib.org/msg29290.html OK, I'll try that when I get a chance. - R. From sweitzen at cisco.com Thu Feb 22 13:25:57 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 22 Feb 2007 13:25:57 -0800 Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64 In-Reply-To: <20070222090006.GA9727@mellanox.co.il> References: <20070222090006.GA9727@mellanox.co.il> Message-ID: How do I upload sources? > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Thursday, February 22, 2007 1:00 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Openfabrics-ewg at openib.org; OPENIB > Subject: Re: anyone have OFED 1.2 alpha1 compiling on ppc64 > > > Quoting Scott Weitzenkamp (sweitzen) : > > Subject: anyone have OFED 1.2 alpha1 compiling on ppc64 > > > > I tried both RHEL4 and SLES10 usinstall.sh, and get this. > I filed bug 379, > > anyone else tried ppc64? > > Scott, could pls you upload the kernel sources and .config > files to staging? > If you do, we'll be able to add these to mightly cross-build > environment. > > -- > MST > From rdreier at cisco.com Thu Feb 22 13:34:13 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 13:34:13 -0800 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: (Roland Dreier's message of "Thu, 22 Feb 2007 13:21:17 -0800") References: <20070220181755.GC11825@mellanox.co.il> <20070222211442.GB9143@mellanox.co.il> Message-ID: OK, I applied the following patch (I had to change one line of your patch to get it to apply because the small-message changed the context so one chunk didn't apply). Anyway I don't see any difference in small message latency or large message throughput. (Actually latency seems slightly worse but I think the change is within my normal variability so I'm don't think the difference is significant) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 2594db2..20d7ad4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -98,9 +98,9 @@ enum { #define IPOIB_OP_RECV (1ul << 31) #ifdef CONFIG_INFINIBAND_IPOIB_CM -#define IPOIB_CM_OP_SRQ (1ul << 30) +#define IPOIB_OP_CM (1ul << 30) #else -#define IPOIB_CM_OP_SRQ (0) +#define IPOIB_OP_CM (0) #endif /* structs */ @@ -143,7 +143,6 @@ struct ipoib_cm_rx { struct ipoib_cm_tx { struct ib_cm_id *id; - struct ib_cq *cq; struct ib_qp *qp; struct list_head list; struct net_device *dev; @@ -232,6 +231,7 @@ struct ipoib_dev_priv { unsigned tx_tail; struct ib_sge tx_sge; struct ib_send_wr tx_wr; + unsigned tx_outstanding; struct ib_wc ibwc[IPOIB_NUM_WC]; @@ -438,6 +438,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, unsigned int mtu); void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc); +void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc); #else struct ipoib_cm_tx; @@ -526,6 +527,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *w { } +static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) +{ +} #endif #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 3484e8b..9515ef6 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -82,7 +82,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) struct ib_recv_wr *bad_wr; int i, ret; - priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; + priv->cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV; for (i = 0; i < IPOIB_CM_RX_SG; ++i) priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i]; @@ -344,7 +344,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; + unsigned int wr_id = wc->wr_id & ~(IPOIB_OP_CM | IPOIB_OP_RECV); struct sk_buff *skb, *newskb; struct ipoib_cm_rx *p; unsigned long flags; @@ -436,7 +436,7 @@ static inline int post_send(struct ipoib_dev_priv *priv, priv->tx_sge.addr = addr; priv->tx_sge.length = len; - priv->tx_wr.wr_id = wr_id; + priv->tx_wr.wr_id = wr_id | IPOIB_OP_CM; return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr); } @@ -487,20 +487,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ dev->trans_start = jiffies; ++tx->tx_head; - if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) { + if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n", tx->qp->qp_num); netif_stop_queue(dev); - set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); } } } -static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx, - struct ib_wc *wc) +void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id; + struct ipoib_cm_tx *tx = wc->qp->qp_context; + unsigned int wr_id = wc->wr_id & ~IPOIB_OP_CM; struct ipoib_tx_buf *tx_req; unsigned long flags; @@ -525,11 +524,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx spin_lock_irqsave(&priv->tx_lock, flags); ++tx->tx_tail; - if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) && - tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) { - clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(dev); - } if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) { @@ -552,11 +550,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx tx->neigh = NULL; } - /* queue would be re-started anyway when TX is destroyed, - * but it makes sense to do it ASAP here. */ - if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) - netif_wake_queue(dev); - if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { list_move(&tx->list, &priv->cm.reap_list); queue_work(ipoib_workqueue, &priv->cm.reap_task); @@ -570,19 +563,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx spin_unlock_irqrestore(&priv->tx_lock, flags); } -static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr) -{ - struct ipoib_cm_tx *tx = tx_ptr; - int n, i; - - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); - do { - n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc); - for (i = 0; i < n; ++i) - ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i); - } while (n == IPOIB_NUM_WC); -} - int ipoib_cm_dev_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -702,17 +682,18 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even return 0; } -static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq) +static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_cm_tx *tx) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_init_attr attr = {}; attr.recv_cq = priv->cq; + attr.send_cq = priv->cq; attr.srq = priv->cm.srq; attr.cap.max_send_wr = ipoib_sendq_size; attr.cap.max_send_sge = 1; attr.sq_sig_type = IB_SIGNAL_ALL_WR; attr.qp_type = IB_QPT_RC; - attr.send_cq = cq; + attr.qp_context = tx; return ib_create_qp(priv->pd, &attr); } @@ -792,21 +773,7 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn, goto err_tx; } - p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p, - ipoib_sendq_size + 1); - if (IS_ERR(p->cq)) { - ret = PTR_ERR(p->cq); - ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret); - goto err_cq; - } - - ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP); - if (ret) { - ipoib_warn(priv, "failed to request completion notification: %d\n", ret); - goto err_req_notify; - } - - p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq); + p->qp = ipoib_cm_create_tx_qp(p->dev, p); if (IS_ERR(p->qp)) { ret = PTR_ERR(p->qp); ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret); @@ -843,12 +810,8 @@ err_modify: err_id: p->id = NULL; ib_destroy_qp(p->qp); -err_req_notify: err_qp: p->qp = NULL; - ib_destroy_cq(p->cq); -err_cq: - p->cq = NULL; err_tx: return ret; } @@ -857,6 +820,7 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) { struct ipoib_dev_priv *priv = netdev_priv(p->dev); struct ipoib_tx_buf *tx_req; + unsigned long flags; ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n", p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail); @@ -867,12 +831,6 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) if (p->qp) ib_destroy_qp(p->qp); - if (p->cq) - ib_destroy_cq(p->cq); - - if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags)) - netif_wake_queue(p->dev); - if (p->tx_ring) { while ((int) p->tx_tail - (int) p->tx_head < 0) { tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)]; @@ -880,6 +838,12 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p) DMA_TO_DEVICE); dev_kfree_skb_any(tx_req->skb); ++p->tx_tail; + spin_lock_irqsave(&priv->tx_lock, flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(p->dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) + netif_wake_queue(p->dev); + spin_unlock_irqrestore(&priv->tx_lock, flags); } kfree(p->tx_ring); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f2aa923..19a3d3e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -266,11 +266,10 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; - if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) { - clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); + if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && + netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(dev); - } spin_unlock_irqrestore(&priv->tx_lock, flags); if (wc->status != IB_WC_SUCCESS && @@ -282,12 +281,17 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc) { - if (wc->wr_id & IPOIB_CM_OP_SRQ) - ipoib_cm_handle_rx_wc(dev, wc); - else if (wc->wr_id & IPOIB_OP_RECV) - ipoib_ib_handle_rx_wc(dev, wc); - else - ipoib_ib_handle_tx_wc(dev, wc); + if (wc->wr_id & IPOIB_OP_CM) { + if (wc->wr_id & IPOIB_OP_RECV) + ipoib_cm_handle_rx_wc(dev, wc); + else + ipoib_cm_handle_tx_wc(dev, wc); + } else { + if (wc->wr_id & IPOIB_OP_RECV) + ipoib_ib_handle_rx_wc(dev, wc); + else + ipoib_ib_handle_tx_wc(dev, wc); + } } void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) @@ -370,10 +374,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, address->last_send = priv->tx_head; ++priv->tx_head; - if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) { + if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); netif_stop_queue(dev); - set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags); } } } @@ -549,6 +552,7 @@ int ipoib_ib_dev_stop(struct net_device *dev) DMA_TO_DEVICE); dev_kfree_skb_any(tx_req->skb); ++priv->tx_tail; + --priv->tx_outstanding; } for (i = 0; i < ipoib_recvq_size; ++i) { diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 18d27fd..ece1a0c 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -900,7 +900,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port) goto out_rx_ring_cleanup; } - /* priv->tx_head & tx_tail are already 0 */ + /* priv->tx_head, tx_tail & tx_outstanding are already 0 */ if (ipoib_ib_dev_init(dev, ca, port)) goto out_tx_ring_cleanup; From mst at mellanox.co.il Thu Feb 22 13:38:05 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Feb 2007 23:38:05 +0200 Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64 In-Reply-To: References: <20070222090006.GA9727@mellanox.co.il> Message-ID: <20070222213805.GC9143@mellanox.co.il> Don't you have an account at ssh.openfabrics.org? If yes, just put kernel sources and the .config under your home directory Quoting r. Scott Weitzenkamp (sweitzen) : Subject: Re: anyone have OFED 1.2 alpha1 compiling on ppc64 How do I upload sources? > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Thursday, February 22, 2007 1:00 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Openfabrics-ewg at openib.org; OPENIB > Subject: Re: anyone have OFED 1.2 alpha1 compiling on ppc64 > > > Quoting Scott Weitzenkamp (sweitzen) : > > Subject: anyone have OFED 1.2 alpha1 compiling on ppc64 > > > > I tried both RHEL4 and SLES10 usinstall.sh, and get this. > I filed bug 379, > > anyone else tried ppc64? > > Scott, could pls you upload the kernel sources and .config > files to staging? > If you do, we'll be able to add these to mightly cross-build > environment. > > -- > MST > _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From mst at mellanox.co.il Thu Feb 22 13:42:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Feb 2007 23:42:24 +0200 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: References: <20070220181755.GC11825@mellanox.co.il> <20070222211442.GB9143@mellanox.co.il> Message-ID: <20070222214223.GD9143@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth > > OK, I applied the following patch (I had to change one line of your > patch to get it to apply because the small-message changed the context > so one chunk didn't apply). > > Anyway I don't see any difference in small message latency or large > message throughput. (Actually latency seems slightly worse but I > think the change is within my normal variability so I'm don't think > the difference is significant) OK, thanks for testing this. I need to spend more time on reproducing this issue, and profiling. I'll add this to my todo list. -- MST From sweitzen at cisco.com Thu Feb 22 13:53:02 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 22 Feb 2007 13:53:02 -0800 Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64 In-Reply-To: <20070222213805.GC9143@mellanox.co.il> References: <20070222090006.GA9727@mellanox.co.il> <20070222213805.GC9143@mellanox.co.il> Message-ID: > Don't you have an account at ssh.openfabrics.org? Can an admin please give me an account? Scott From mst at mellanox.co.il Thu Feb 22 14:00:18 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 23 Feb 2007 00:00:18 +0200 Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64 In-Reply-To: References: Message-ID: <20070222220018.GB4542@mellanox.co.il> > Quoting Scott Weitzenkamp (sweitzen) : > Subject: RE: anyone have OFED 1.2 alpha1 compiling on ppc64 > > > Don't you have an account at ssh.openfabrics.org? > > Can an admin please give me an account? I'm not an admin but I think you want to post your ssh public key. -- MST From mshefty at ichips.intel.com Thu Feb 22 14:18:43 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Feb 2007 14:18:43 -0800 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com> References: <1172155552.4380.475949.camel@hal.voltaire.com> <000301c756aa$b89e0020$8698070a@amr.corp.intel.com> <15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com> Message-ID: <45DE16C3.5020809@ichips.intel.com> >>Can someone help my understanding here? Is ipoib joining a multicast group >>using the full membership PKey, even if the node that it joins from only has the >>limited membership PKey configured? And the code in ib_find_cached_pkey helps >>enable this? > > Yep. The ipoib create_child function Or-s 0x8000 to the device pkey > which was provided by the user. Now, IPoIB uses the device pkey when > forming MGIDs and when doing modify qp to init. Indeed the way > ib_find_cached_pkey() is implemented, make the latter use trivial. Doesn't this allow ipoib to join a multicast group for which it may not be able to communicate with all members? For the broadcast group, this seems like an error to me. Can ipoib work in such a configuration? If all nodes were assigned a partial membership PKey, none of them could communicate, but no errors would be generated anywhere. Joining a multicast group requires specifying the full membership PKey. I don't see anything in the spec that explicitly prohibits joining the group from a node with only a partial membership PKey, but at first glance, this seems like a subnet configuration issue. Is there some use of this I'm overlooking? - Sean From divy at chelsio.com Thu Feb 22 14:21:53 2007 From: divy at chelsio.com (Divy Le Ray) Date: Thu, 22 Feb 2007 14:21:53 -0800 Subject: [openib-general] [PATCH 0/7] cxgb3 - Chelsio T3 1G/10G driver updates In-Reply-To: <1172153597.23995.9.camel@stevo-desktop> References: <45DD8559.7090106@chelsio.com> <1172153597.23995.9.camel@stevo-desktop> Message-ID: <45DE1781.5000407@chelsio.com> Steve Wise wrote: > Divy, > > Do these need to be pulled into OFED 1.2 as well? > Hi Steve, Yes, I believe so. Cheers, Divy > Steve. > > > On Thu, 2007-02-22 at 03:58 -0800, Divy Le Ray wrote: > >> Jeff, >> >> I'm sending a series of incremental patches updating >> the cxgb3 driver. These patches are built against Linus'git tree. >> >> Cheers, >> Divy >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo at vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > > From mst at mellanox.co.il Thu Feb 22 15:19:17 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 23 Feb 2007 01:19:17 +0200 Subject: [openib-general] IPOIB NAPI In-Reply-To: References: Message-ID: <20070222231917.GC9059@mellanox.co.il> > > An API idea: > > how about instead testing missed_events, we add a flag: > > > > IB_CQ_TEST (or a longer name IB_CQ_REPORT_MISSED_EVENTS?) > > and change ib_req_notify_cq to return int which will keep > > the missed_events value, only if this flag is set? > > > > This has 2 advatages > > - Less churn updating all users to new API - they just ignore return value - > > and still almost no overhead for them as they don't set IB_CQ_TEST > > - For all users we have to push less values on stack - note compiler can't > > get rid of them as we are calling function through a pointer > > - For users that do > > missed_events = ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP | IB_CQ_TEST) > > we get the result in register. > > Yes, I like this. So ib_req_notify_cq() gets a return value that is > negative if an error occurred, 0 if everything is fine, or positive if > a missed event might have happened. > > I think I prefer the longer name IB_CQ_REPORT_MISSED_EVENTS -- at > least there's a chance at guessing what it means even if you don't > read the documentation. By the way, how about extending the userspace API in a similiar fashion? missed_events = ibv_req_notify_cq(priv->cq, IBV_CQ_NEXT_COMP | IBV_CQ_REPORT_MISSED_EVENTS) -- MST From rdreier at cisco.com Thu Feb 22 15:21:11 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 15:21:11 -0800 Subject: [openib-general] IPOIB NAPI In-Reply-To: <20070222231917.GC9059@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 23 Feb 2007 01:19:17 +0200") References: <20070222231917.GC9059@mellanox.co.il> Message-ID: > By the way, how about extending the userspace API in a similiar > fashion? > > missed_events = ibv_req_notify_cq(priv->cq, IBV_CQ_NEXT_COMP | > IBV_CQ_REPORT_MISSED_EVENTS) It would require a kernel-user ABI bump. Is it worth it? - R. From sean.hefty at intel.com Thu Feb 22 15:35:23 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 22 Feb 2007 15:35:23 -0800 Subject: [openib-general] ipoib & the partial pkey, was: librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DE16C3.5020809@ichips.intel.com> Message-ID: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com> >Doesn't this allow ipoib to join a multicast group for which it may not be able >to communicate with all members? For the broadcast group, this seems like an >error to me. Can ipoib work in such a configuration? If all nodes were >assigned a partial membership PKey, none of them could communicate, but no >errors would be generated anywhere. I looked into this more... RFC 4391 states (middle of page 5): For a node to join a partition, one of its ports must be assigned the relevant P_Key by the SM [RFC4392]. Jumping to RFC 4392 (top of page 4): at the time of creating an IB multicast group, multiple values such as the P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc. have to be specified. These values should be such that all potential members of the IB multicast group are able to communicate with one another when using them. and page 14: Note that this IB_join to the broadcast group is a FullMember join. If any of the ports or the switches linking the port to the rest of the IPoIB subnet cannot support the parameters (e.g., path MTU or P_Key) associated with the broadcast group, then the IB_join request will fail and the requesting port will not become part of the IPoIB subnet My initial interpretation of these statements lead me to believe that pkey check in ib_find_cached_pkey should not mask out the upper bit, which would prevent ipoib from joining a multicast group until it has been configured with the full membership pkey for the broadcast group. Does this seem reasonable? - Sean From mst at mellanox.co.il Thu Feb 22 15:46:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 23 Feb 2007 01:46:24 +0200 Subject: [openib-general] IPOIB NAPI In-Reply-To: References: Message-ID: <20070222234624.GB4447@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: IPOIB NAPI > > > By the way, how about extending the userspace API in a similiar > > fashion? > > > > missed_events = ibv_req_notify_cq(priv->cq, IBV_CQ_NEXT_COMP | > > IBV_CQ_REPORT_MISSED_EVENTS) > > It would require a kernel-user ABI bump. Is it worth it? I hear some people asking for it: I imagine reasons are same as NAPI - race-free, clean API to switch from polling to event mode - rather than a minor optimization. -- MST From mst at mellanox.co.il Thu Feb 22 15:57:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 23 Feb 2007 01:57:24 +0200 Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter Message-ID: <20070222235724.GC4447@mellanox.co.il> GCC seems to be unable to propogate constants across calls to htonl. So it turns out to be worth the while to replace htonl with a hand-written macro in case of constant parameter. Signed-off-by: Michael S. Tsirkin Signed-off-by: Ishai Rabinovitz --- Roland, I'm looking at micro-optimizing libmthca/mthca some more. The following optimization is minor, but it seems quite safe. What do you think? Tested with gcc 4.0.3. diff --git a/src/cq.c b/src/cq.c index 0aeb7a9..9428f74 100644 --- a/src/cq.c +++ b/src/cq.c @@ -275,7 +275,7 @@ static int handle_error_cqe(struct mthca_cq *cq, * doorbell count field. In that case we always free the CQE. */ if (mthca_is_memfree(cq->ibv_cq.context) || - !(new_wqe & htonl(0x3f)) || (!cqe->db_cnt && dbd)) + !(new_wqe & CONSTANT_HTONL(0x3f)) || (!cqe->db_cnt && dbd)) return 0; cqe->db_cnt = htons(ntohs(cqe->db_cnt) - dbd); diff --git a/src/mthca.h b/src/mthca.h index 1f31bc3..798029f 100644 --- a/src/mthca.h +++ b/src/mthca.h @@ -112,6 +112,20 @@ enum { MTHCA_OPCODE_INVALID = 0xff }; +/* GCC does not seem to be able to do constant propogation + * across htonl/ntohl calls */ +#if __BYTE_ORDER == __LITTLE_ENDIAN +#define CONSTANT_HTONL(x) \ + ((((unsigned)x) >> 24) | \ + ((((unsigned)x) >> 8) & 0xff00) | \ + ((((unsigned)x) << 8) & 0xff0000) | \ + (((unsigned)x) << 24)) +#elif __BYTE_ORDER == __BIG_ENDIAN +#define CONSTANT_HTONL(x) (x) +#else +#define CONSTANT_HTONL(x) htonl(x) +#endif + struct mthca_ah_page; struct mthca_device { diff --git a/src/qp.c b/src/qp.c index f2483e9..85d3385 100644 --- a/src/qp.c +++ b/src/qp.c @@ -138,10 +138,10 @@ int mthca_tavor_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, ((struct mthca_next_seg *) wqe)->ee_nds = 0; ((struct mthca_next_seg *) wqe)->flags = ((wr->send_flags & IBV_SEND_SIGNALED) ? - htonl(MTHCA_NEXT_CQ_UPDATE) : 0) | + CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) | ((wr->send_flags & IBV_SEND_SOLICITED) ? - htonl(MTHCA_NEXT_SOLICIT) : 0) | - htonl(1); + CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) : 0) | + CONSTANT_HTONL(1); if (wr->opcode == IBV_WR_SEND_WITH_IMM || wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM) ((struct mthca_next_seg *) wqe)->imm = wr->imm_data; @@ -359,9 +359,9 @@ int mthca_tavor_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, ((struct mthca_next_seg *) wqe)->nda_op = 0; ((struct mthca_next_seg *) wqe)->ee_nds = - htonl(MTHCA_NEXT_DBD); + CONSTANT_HTONL(MTHCA_NEXT_DBD); ((struct mthca_next_seg *) wqe)->flags = - htonl(MTHCA_NEXT_CQ_UPDATE); + CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE); wqe += sizeof (struct mthca_next_seg); size = sizeof (struct mthca_next_seg) / 16; @@ -505,10 +505,10 @@ int mthca_arbel_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, ((struct mthca_next_seg *) wqe)->flags = ((wr->send_flags & IBV_SEND_SIGNALED) ? - htonl(MTHCA_NEXT_CQ_UPDATE) : 0) | + CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) | ((wr->send_flags & IBV_SEND_SOLICITED) ? - htonl(MTHCA_NEXT_SOLICIT) : 0) | - htonl(1); + CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) : 0) | + CONSTANT_HTONL(1); if (wr->opcode == IBV_WR_SEND_WITH_IMM || wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM) ((struct mthca_next_seg *) wqe)->imm = wr->imm_data; @@ -750,7 +750,7 @@ int mthca_arbel_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, if (i < qp->rq.max_gs) { ((struct mthca_data_seg *) wqe)->byte_count = 0; - ((struct mthca_data_seg *) wqe)->lkey = htonl(MTHCA_INVAL_LKEY); + ((struct mthca_data_seg *) wqe)->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY); ((struct mthca_data_seg *) wqe)->addr = 0; } @@ -872,7 +872,7 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap, for (scatter = (void *) (next + 1); (void *) scatter < (void *) next + (1 << qp->rq.wqe_shift); ++scatter) - scatter->lkey = htonl(MTHCA_INVAL_LKEY); + scatter->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY); } for (i = 0; i < qp->sq.max; ++i) { @@ -956,10 +956,10 @@ int mthca_free_err_wqe(struct mthca_qp *qp, int is_send, else next = get_recv_wqe(qp, index); - *dbd = !!(next->ee_nds & htonl(MTHCA_NEXT_DBD)); - if (next->ee_nds & htonl(0x3f)) - *new_wqe = (next->nda_op & htonl(~0x3f)) | - (next->ee_nds & htonl(0x3f)); + *dbd = !!(next->ee_nds & CONSTANT_HTONL(MTHCA_NEXT_DBD)); + if (next->ee_nds & CONSTANT_HTONL(0x3f)) + *new_wqe = (next->nda_op & CONSTANT_HTONL(~0x3f)) | + (next->ee_nds & CONSTANT_HTONL(0x3f)); else *new_wqe = 0; diff --git a/src/srq.c b/src/srq.c index f9fc006..e27c8dc 100644 --- a/src/srq.c +++ b/src/srq.c @@ -142,7 +142,7 @@ int mthca_tavor_post_srq_recv(struct ibv_srq *ibsrq, if (i < srq->max_gs) { ((struct mthca_data_seg *) wqe)->byte_count = 0; - ((struct mthca_data_seg *) wqe)->lkey = htonl(MTHCA_INVAL_LKEY); + ((struct mthca_data_seg *) wqe)->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY); ((struct mthca_data_seg *) wqe)->addr = 0; } @@ -150,7 +150,7 @@ int mthca_tavor_post_srq_recv(struct ibv_srq *ibsrq, htonl((ind << srq->wqe_shift) | 1); wmb(); ((struct mthca_next_seg *) prev_wqe)->ee_nds = - htonl(MTHCA_NEXT_DBD); + CONSTANT_HTONL(MTHCA_NEXT_DBD); srq->wrid[ind] = wr->wr_id; srq->first_free = next_ind; @@ -247,7 +247,7 @@ int mthca_arbel_post_srq_recv(struct ibv_srq *ibsrq, if (i < srq->max_gs) { ((struct mthca_data_seg *) wqe)->byte_count = 0; - ((struct mthca_data_seg *) wqe)->lkey = htonl(MTHCA_INVAL_LKEY); + ((struct mthca_data_seg *) wqe)->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY); ((struct mthca_data_seg *) wqe)->addr = 0; } @@ -313,7 +313,7 @@ int mthca_alloc_srq_buf(struct ibv_pd *pd, struct ibv_srq_attr *attr, for (scatter = wqe + sizeof (struct mthca_next_seg); (void *) scatter < wqe + (1 << srq->wqe_shift); ++scatter) - scatter->lkey = htonl(MTHCA_INVAL_LKEY); + scatter->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY); } srq->first_free = 0; -- MST From sean.hefty at intel.com Thu Feb 22 16:59:07 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 22 Feb 2007 16:59:07 -0800 Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland Message-ID: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> Roland, Please consider the following minor fixes for 2.6.21: rdma_cm: remove unused node_guid from cma_device structure. ib_cm: remove ca_guid from cm_device structure. rdma_cm: request reversible paths only. ib_core: Set hop limit in ib_init_ah_from_wc correctly. The patches are in git.openfabrics.org/~shefty/rdma-dev.git, for-roland branch, which is based on 2.6.21-rc1. Signed-off-by: Sean Hefty --- commit 28e218621d36cf9da42f07af08775769eb289fc0 Author: Sean Hefty Date: Thu Feb 22 11:37:44 2007 -0800 rdma_cm: remove unused node_guid from cma_device structure. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index bb27ce9..d441815 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -77,7 +77,6 @@ static int next_port; struct cma_device { struct list_head list; struct ib_device *device; - __be64 node_guid; struct completion comp; atomic_t refcount; struct list_head id_list; @@ -2674,7 +2673,6 @@ static void cma_add_one(struct ib_device *device) return; cma_dev->device = device; - cma_dev->node_guid = device->node_guid; init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); commit 6de97f2a3373357d720b1653dfc0aac6d40b7506 Author: Sean Hefty Date: Thu Feb 22 11:37:38 2007 -0800 ib_cm: remove ca_guid from cm_device structure. The cm_device references an ib_device, which contains the node_guid. diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d446998..842cd0b 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -88,7 +88,6 @@ struct cm_port { struct cm_device { struct list_head list; struct ib_device *device; - __be64 ca_guid; struct cm_port port[0]; }; @@ -739,8 +738,8 @@ retest: ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); spin_unlock_irqrestore(&cm_id_priv->lock, flags); ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT, - &cm_id_priv->av.port->cm_dev->ca_guid, - sizeof cm_id_priv->av.port->cm_dev->ca_guid, + &cm_id_priv->id.device->node_guid, + sizeof cm_id_priv->id.device->node_guid, NULL, 0); break; case IB_CM_REQ_RCVD: @@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, req_msg->local_comm_id = cm_id_priv->id.local_id; req_msg->service_id = param->service_id; - req_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid; + req_msg->local_ca_guid = cm_id_priv->id.device->node_guid; cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num)); cm_req_set_resp_res(req_msg, param->responder_resources); cm_req_set_init_depth(req_msg, param->initiator_depth); @@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg, cm_rep_set_flow_ctrl(rep_msg, param->flow_control); cm_rep_set_rnr_retry_count(rep_msg, param->rnr_retry_count); cm_rep_set_srq(rep_msg, param->srq); - rep_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid; + rep_msg->local_ca_guid = cm_id_priv->id.device->node_guid; if (param->private_data && param->private_data_len) memcpy(rep_msg->private_data, param->private_data, @@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device) return; cm_dev->device = device; - cm_dev->ca_guid = device->node_guid; set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); for (i = 1; i <= device->phys_port_cnt; i++) { commit 87680047dd09ca4a4e8ec575dad215c92cf45ed3 Author: Sean Hefty Date: Wed Feb 21 16:40:44 2007 -0800 rdma_cm: request reversible paths only The rdma_cm requires that path records be reversible. Set the reversible bit when issuing an path record query. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f8d69b3..bb27ce9 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1492,11 +1492,13 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, ib_addr_get_dgid(addr, &path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); path_rec.numb_path = 1; + path_rec.reversible = 1; id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device, id_priv->id.port_num, &path_rec, IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | - IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH, + IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_REVERSIBLE, timeout_ms, GFP_KERNEL, cma_query_handler, work, &id_priv->query); commit 30947e5b7db42184d66746ac1187d4abbf89018d Author: Sean Hefty Date: Wed Feb 21 16:37:31 2007 -0800 ib_core: Set hop limit in ib_init_ah_from_wc correctly. The hop_limit value in the ah_attr should be 0xFF, not the value read from the received GRH (which should be 0). See 13.5.4.4 in the 1.2 IB spec. diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 8b5dd36..ccdf93d 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -167,7 +167,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, ah_attr->grh.sgid_index = (u8) gid_index; flow_class = be32_to_cpu(grh->version_tclass_flow); ah_attr->grh.flow_label = flow_class & 0xFFFFF; - ah_attr->grh.hop_limit = grh->hop_limit; + ah_attr->grh.hop_limit = 0xFF; ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF; } return 0; From rdreier at cisco.com Thu Feb 22 17:10:40 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 17:10:40 -0800 Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland In-Reply-To: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> (Sean Hefty's message of "Thu, 22 Feb 2007 16:59:07 -0800") References: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> Message-ID: These all look fine, I'll queue them up. > Signed-off-by: Sean Hefty I notice that the actual patches you committed don't have your sign-off in the git changelog. I assume this is a mistake so I'll add it back in... From rdreier at cisco.com Thu Feb 22 17:15:09 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 17:15:09 -0800 Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland In-Reply-To: (Roland Dreier's message of "Thu, 22 Feb 2007 17:10:40 -0800") References: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> Message-ID: > I notice that the actual patches you committed don't have your > sign-off in the git changelog. I assume this is a mistake so I'll add > it back in... which means I can't just pull your branch. But that's OK, still doing git format-patch, edit patches, git am is pretty easy. From rdreier at cisco.com Thu Feb 22 17:51:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 17:51:30 -0800 Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland In-Reply-To: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> (Sean Hefty's message of "Thu, 22 Feb 2007 16:59:07 -0800") References: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> Message-ID: > The patches are in git.openfabrics.org/~shefty/rdma-dev.git, > for-roland branch, which is based on 2.6.21-rc1. One other request: please include a URL that I can just copy and paste, so I don't actually have to read and parse complete sentences. Something like: the patches are in git://git.openfabrics.org/~shefty/rdma-dev.git for-roland - R. From rdreier at cisco.com Thu Feb 22 17:55:36 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 17:55:36 -0800 Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland In-Reply-To: (Roland Dreier's message of "Thu, 22 Feb 2007 17:51:30 -0800") References: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> Message-ID: Anyway, all 4 queued up in my for-2.6.21 branch From sean.hefty at intel.com Thu Feb 22 21:10:32 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 22 Feb 2007 21:10:32 -0800 Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland In-Reply-To: Message-ID: <000401c75708$f1cf2d70$bcd5180a@amr.corp.intel.com> >the patches are in > > git://git.openfabrics.org/~shefty/rdma-dev.git for-roland I will do that in the future. And yes, the sign off line was just a mistake. Thanks for fixing that. - Sean From rdreier at cisco.com Thu Feb 22 22:09:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Feb 2007 22:09:28 -0800 Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter In-Reply-To: <20070222235724.GC4447@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 23 Feb 2007 01:57:24 +0200") References: <20070222235724.GC4447@mellanox.co.il> Message-ID: > GCC seems to be unable to propogate constants across calls to htonl. > So it turns out to be worth the while to replace htonl with > a hand-written macro in case of constant parameter. I'm wondering why this helps you. On my system (which has Debian's old glibc 2.3.6, certainly nothing particularly fancy), I see in my : /* Get machine dependent optimized versions of byte swapping functions. */ #include #ifdef __OPTIMIZE__ /* We can optimize calls to the conversion functions. Either nothing has to be done or we are using directly the byte-swapping functions which often can be inlined. */ # if __BYTE_ORDER == __BIG_ENDIAN //... # else # if __BYTE_ORDER == __LITTLE_ENDIAN # define ntohl(x) __bswap_32 (x) and so on (and gcc defines __OPTIMIZE__ if you pass it any -O flag including -Os). And in I have /* Swap bytes in 32 bit value. */ #define __bswap_constant_32(x) \ ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) | \ (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24)) and variations of __bswap_32() that look roughly like # define __bswap_32(x) \ (__extension__ \ ({ register unsigned int __v, __x = (x); \ if (__builtin_constant_p (__x)) \ __v = __bswap_constant_32 (__x); \ else \ and so on. (The point of all this being that for constants, htonl() should expand to roughly the same thing as your CONSTANT_HTONL() -- the only difference is that you don't have the & for the << 24 and >> 24 parts, which I guess just has the potential to bite us if someone did something like CONSTANT_HTONL(1L) on a 64-bit system). As a quick test I compiled the code #include enum { Y = 5 }; uint32_t foo(uint32_t x) { return x | htonl(Y); } with gcc -c -O and the disassembly of foo() looks like 0000000000000000 : 0: 89 f8 mov %edi,%eax 2: 0d 00 00 00 05 or $0x5000000,%eax 7: c3 retq and so everything works exactly the way we would want. (32-bit i386 also just does or with a constant too). In fact for libmthca I just checked that the preprocessor output of places like the following (which your patch converts) ((wr->send_flags & IBV_SEND_SIGNALED) ? htonl(MTHCA_NEXT_CQ_UPDATE) : 0) | is ((wr->send_flags & IBV_SEND_SIGNALED) ? (__extension__ ({ register unsigned int __v, __x = (MTHCA_NEXT_CQ_UPDATE); if (__builtin_constant_p (__x)) __v = ((((__x) & 0xff000000) >> 24) | (((__x) & 0x00ff0000) >> 8) | (((__x) & 0x0000ff00) << 8) | (((__x) & 0x000000ff) << 24)); else __asm__ ("bswap %0" : "=r" (__v) : "0" (__x)); __v; })) : 0) | And if I compare the generated assembly for libmthca with and without your patch (on both x86-64 and i386), I don't see any significant difference (the size is exactly the same, I just see things like the compiler using eax and edx in the opposite order and trivial things like that). So what is different in your setup that causes this patch to make a difference for you? (BTW, one thing I did notice while looking at the i386 assembly is that one micro-optimization that might make sense to use something like __attribute__((regparm(3))) for internal function calls within libibverbs and libmthca on i386, since otherwise we waste instructions pushing stuff on the stack for no reason other than compliance with the crufty old i386 ABI. Something like a FASTCALL macro in perhaps... if anyone really cares about 32-bit i386 performance any more) - R. From jgunthorpe at obsidianresearch.com Thu Feb 22 23:00:55 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 23 Feb 2007 00:00:55 -0700 Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter In-Reply-To: References: <20070222235724.GC4447@mellanox.co.il> Message-ID: <20070223070055.GC25553@obsidianresearch.com> On Thu, Feb 22, 2007 at 10:09:28PM -0800, Roland Dreier wrote: > (BTW, one thing I did notice while looking at the i386 assembly is > that one micro-optimization that might make sense to use something > like __attribute__((regparm(3))) for internal function calls within > libibverbs and libmthca on i386, since otherwise we waste instructions > pushing stuff on the stack for no reason other than compliance with > the crufty old i386 ABI. Something like a FASTCALL macro in > perhaps... if anyone really cares about 32-bit > i386 performance any more) Newer gccs have the -fwhole-program --combine options that address this and more. One of the things that happens is that all internal functions are made 'static' and all compilation units are optimized in one go. gcc will optimize calling convention and alot of other things for static functions. That should provide an across the board micro-improvement even on x86-64. Jason From vlad at lists.openfabrics.org Fri Feb 23 02:28:23 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Fri, 23 Feb 2007 02:28:23 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070223-0200 daily build status Message-ID: <20070223102823.7EAFEE607F3@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From mst at mellanox.co.il Fri Feb 23 03:24:15 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 23 Feb 2007 13:24:15 +0200 Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter In-Reply-To: References: <20070222235724.GC4447@mellanox.co.il> Message-ID: <20070223112415.GB4415@mellanox.co.il> > So what is different in your setup that causes this patch to make a > difference for you? Hmm. I agree it is somewhat strange. Below is a simple test that attempts to compare htonl, CONSTANT_HTONL, and an array-driven implementation. The code line is taken directly from htonl. Could you compile and run it please? I see: $ gcc -O2 1.c $ ./a.out test1 122396.00 usec test2 10517799.00 usec test3 104099.00 usec which seems to imply CONSTANT_HTONL is much faster. Ideas? ------------------------------- #include #include #include #include #define SIZE 255 enum ibv_send_flags { IBV_SEND_FENCE = 1 << 0, IBV_SEND_SIGNALED = 1 << 1, IBV_SEND_SOLICITED = 1 << 2, IBV_SEND_INLINE = 1 << 3 }; enum { MTHCA_NEXT_DBD = 1 << 7, MTHCA_NEXT_FENCE = 1 << 6, MTHCA_NEXT_CQ_UPDATE = 1 << 3, MTHCA_NEXT_EVENT_GEN = 1 << 2, MTHCA_NEXT_SOLICIT = 1 << 1, }; int ar[SIZE]; void init_ar() { ar[0]=htonl(1); ar[IBV_SEND_SIGNALED]=htonl(MTHCA_NEXT_CQ_UPDATE|1);; ar[IBV_SEND_SOLICITED]=htonl(MTHCA_NEXT_SOLICIT|1);; ar[IBV_SEND_SIGNALED|IBV_SEND_SOLICITED]=htonl(MTHCA_NEXT_CQ_UPDATE|MTHCA_NEXT_SOLICIT|1);; } int test1(int x) { return ar[x & (IBV_SEND_SIGNALED | IBV_SEND_SOLICITED)]; } int test2(int x) { return ((x & IBV_SEND_SIGNALED) ? htonl(MTHCA_NEXT_CQ_UPDATE) : 0) | ((x & IBV_SEND_SOLICITED) ? htonl(MTHCA_NEXT_SOLICIT) : 0) | htonl(1); } #if __BYTE_ORDER == __LITTLE_ENDIAN #define CONSTANT_HTONL(x) \ ((x >> 24) | ((x >> 8) & 0xff00) | ((x << 8) & 0xff0000) | (x << 24)) #elif __BYTE_ORDER == __BIG_ENDIAN #define CONSTANT_HTONL(x) (x) #else #define CONSTANT_HTONL(x) htonl(x) #endif int test3(int x) { return ((x & IBV_SEND_SIGNALED) ? CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) | ((x & IBV_SEND_SOLICITED) ? CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) : 0) | CONSTANT_HTONL(1); } struct timeval start, end; void timestart(void) { if (gettimeofday(&start, NULL)) { perror("gettimeofday"); return; } } void timeend(void) { if (gettimeofday(&end, NULL)) { perror("gettimeofday"); return; } { float usec = (end.tv_sec - start.tv_sec) * 1000000 + (end.tv_usec - start.tv_usec); printf("%.2f usec\n", usec); } } main() { int i; init_ar(); printf("test1\n"); timestart(); for (i=0; i<100000000; ++i) { (void) test1(IBV_SEND_SIGNALED); (void) test1(0); (void) test1(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED); (void) test1(IBV_SEND_SOLICITED); } timeend(); printf("test2\n"); timestart(); for (i=0; i<100000000; ++i) { (void) test2(IBV_SEND_SIGNALED); (void) test2(0); (void) test2(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED); (void) test2(IBV_SEND_SOLICITED); } timeend(); printf("test3\n"); timestart(); for (i=0; i<100000000; ++i) { (void) test3(IBV_SEND_SIGNALED); (void) test3(0); (void) test3(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED); (void) test3(IBV_SEND_SOLICITED); } timeend(); } -- MST From mst at mellanox.co.il Fri Feb 23 03:36:43 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 23 Feb 2007 13:36:43 +0200 Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter In-Reply-To: <20070223112415.GB4415@mellanox.co.il> References: <20070222235724.GC4447@mellanox.co.il> <20070223112415.GB4415@mellanox.co.il> Message-ID: <20070223113643.GC4415@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: [PATCH] libmthca: optimize calls to htonl with constant parameter > > > So what is different in your setup that causes this patch to make a > > difference for you? > > Hmm. I agree it is somewhat strange. > > Below is a simple test that attempts to compare htonl, CONSTANT_HTONL, > and an array-driven implementation. The code line is taken directly from htonl. > Could you compile and run it please? OK, this was stupid, the test was missing #include so htonl was expanded by a gcc intrinsic which seems to work worse than the macro tricks present in netinet/in.h. I guess this include got killed on the test system somehow, and this explains why I saw a difference in libmthca. -- MST From halr at voltaire.com Fri Feb 23 03:49:04 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Feb 2007 06:49:04 -0500 Subject: [openib-general] ipoib & the partial pkey, was: librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1172230425.4102.1248.camel@hal.voltaire.com> References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com> <1172230425.4102.1248.camel@hal.voltaire.com> Message-ID: <1172231343.4102.2202.camel@hal.voltaire.com> On Thu, 2007-02-22 at 18:35, Sean Hefty wrote: > >Doesn't this allow ipoib to join a multicast group for which it may not be able > >to communicate with all members? For the broadcast group, this seems like an > >error to me. Can ipoib work in such a configuration? If all nodes were > >assigned a partial membership PKey, none of them could communicate, but no > >errors would be generated anywhere. > > I looked into this more... > > RFC 4391 states (middle of page 5): > > For a node to join a partition, one of its ports must be assigned the relevant > P_Key by the SM [RFC4392]. > > Jumping to RFC 4392 (top of page 4): > > at the time of creating an IB multicast group, multiple values such as the > P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc. have to be > specified. These values should be such that all potential members of the IB > multicast group are able to communicate with one another when using them. Seems to me that for P_Key this would mean full membership. > and page 14: > > Note that this IB_join to the broadcast group is a FullMember join. FullMember here is referring to MCMemberRecord:JoinState rather than partition membership. -- Hal > If any of > the ports or the switches linking the port to the rest of the IPoIB subnet > cannot support the parameters (e.g., path MTU or P_Key) associated with the > broadcast group, then the IB_join request will fail and the requesting port will > not become part of the IPoIB subnet > > My initial interpretation of these statements lead me to believe that pkey check > in ib_find_cached_pkey should not mask out the upper bit, which would prevent > ipoib from joining a multicast group until it has been configured with the full > membership pkey for the broadcast group. Does this seem reasonable? > > - Sean From halr at voltaire.com Fri Feb 23 04:13:59 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Feb 2007 07:13:59 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <1172230422.4102.1246.camel@hal.voltaire.com> References: <1172155552.4380.475949.camel@hal.voltaire.com> <000301c756aa$b89e0020$8698070a@amr.corp.intel.com> <15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com> <45DE16C3.5020809@ichips.intel.com> <1172230422.4102.1246.camel@hal.voltaire.com> Message-ID: <1172232836.4102.3709.camel@hal.voltaire.com> On Thu, 2007-02-22 at 17:18, Sean Hefty wrote: > >>Can someone help my understanding here? Is ipoib joining a multicast group > >>using the full membership PKey, even if the node that it joins from only has the > >>limited membership PKey configured? And the code in ib_find_cached_pkey helps > >>enable this? > > > > Yep. The ipoib create_child function Or-s 0x8000 to the device pkey > > which was provided by the user. Now, IPoIB uses the device pkey when > > forming MGIDs and when doing modify qp to init. Indeed the way > > ib_find_cached_pkey() is implemented, make the latter use trivial. > > Doesn't this allow ipoib to join a multicast group for which it may not be able > to communicate with all members? Yes, if the join were to succeed which appears to me to be to be noncompliant behavior. > For the broadcast group, this seems like an error to me. Why for just the broadcast group ? Isn't it any IPoIB MC group for which this would be done ? (See below as to what the IBA spec says). > Can ipoib work in such a configuration? If all nodes were > assigned a partial membership PKey, none of them could communicate, but no > errors would be generated anywhere. > > Joining a multicast group requires specifying the full membership PKey. I don't > see anything in the spec that explicitly prohibits joining the group from a node > with only a partial membership PKey, What about the description og P_Key in MCMemberRecord (table 210 on p. 908 which is compliance) which states: "All members of the multicast group shall have full membership in the partition indicated by the partition key." -- Hal > but at first glance, this seems like a > subnet configuration issue. Is there some use of this I'm overlooking? > > - Sean From halr at voltaire.com Fri Feb 23 03:33:45 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Feb 2007 06:33:45 -0500 Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey In-Reply-To: <45DE16C3.5020809@ichips.intel.com> References: <1172155552.4380.475949.camel@hal.voltaire.com> <000301c756aa$b89e0020$8698070a@amr.corp.intel.com> <15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com> <45DE16C3.5020809@ichips.intel.com> Message-ID: <1172230422.4102.1246.camel@hal.voltaire.com> On Thu, 2007-02-22 at 17:18, Sean Hefty wrote: > >>Can someone help my understanding here? Is ipoib joining a multicast group > >>using the full membership PKey, even if the node that it joins from only has the > >>limited membership PKey configured? And the code in ib_find_cached_pkey helps > >>enable this? > > > > Yep. The ipoib create_child function Or-s 0x8000 to the device pkey > > which was provided by the user. Now, IPoIB uses the device pkey when > > forming MGIDs and when doing modify qp to init. Indeed the way > > ib_find_cached_pkey() is implemented, make the latter use trivial. > > Doesn't this allow ipoib to join a multicast group for which it may not be able > to communicate with all members? Yes, if the join were to succeed which appears to be to be noncompliant behavior. > For the broadcast group, this seems like an error to me. Why for just the broadcast group ? Isn't it any IPoIB MC group for which this would be done ? (See below as to what the IBA spec says). > Can ipoib work in such a configuration? If all nodes were > assigned a partial membership PKey, none of them could communicate, but no > errors would be generated anywhere. > > Joining a multicast group requires specifying the full membership PKey. I don't > see anything in the spec that explicitly prohibits joining the group from a node > with only a partial membership PKey, What about the description og P_Key in MCMemberRecord (table 210 on p. 908 which is compliance) which states: "All members of the multicast group shall have full membership in the partition indicated by the partition key." -- Hal > but at first glance, this seems like a > subnet configuration issue. Is there some use of this I'm overlooking? > > - Sean From rdreier at cisco.com Fri Feb 23 07:32:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 23 Feb 2007 07:32:51 -0800 Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter References: <20070222235724.GC4447@mellanox.co.il> <20070223070055.GC25553@obsidianresearch.com> Message-ID: > Newer gccs have the -fwhole-program --combine options that address > this and more. One of the things that happens is that all internal > functions are made 'static' and all compilation units are optimized in > one go. Good point... but is there any sane way to use that feature with automake and libtool? I know that the autotools are a pain but I really don't want to reimplement the useful stuff they give us, and I don't know of any really practical replacement... - R. From sean.hefty at intel.com Fri Feb 23 12:15:09 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 23 Feb 2007 12:15:09 -0800 Subject: [openib-general] [PATCH] for OFED 1.2 Message-ID: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com> I would like these fixes in OFED 1.2 as well. What git tree / branch do I generate a patch against? - Sean --- rdma_cm: remove unused node_guid from cma_device structure. ib_cm: remove ca_guid from cm_device structure. rdma_cm: request reversible paths only. ib_core: Set hop limit in ib_init_ah_from_wc correctly. The patches are in: git://git.openfabrics.org/~shefty/rdma-dev.git for-roland (sign-off line was added to the actual commit messages) Signed-off-by: Sean Hefty --- commit 28e218621d36cf9da42f07af08775769eb289fc0 Author: Sean Hefty Date: Thu Feb 22 11:37:44 2007 -0800 rdma_cm: remove unused node_guid from cma_device structure. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index bb27ce9..d441815 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -77,7 +77,6 @@ static int next_port; struct cma_device { struct list_head list; struct ib_device *device; - __be64 node_guid; struct completion comp; atomic_t refcount; struct list_head id_list; @@ -2674,7 +2673,6 @@ static void cma_add_one(struct ib_device *device) return; cma_dev->device = device; - cma_dev->node_guid = device->node_guid; init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); commit 6de97f2a3373357d720b1653dfc0aac6d40b7506 Author: Sean Hefty Date: Thu Feb 22 11:37:38 2007 -0800 ib_cm: remove ca_guid from cm_device structure. The cm_device references an ib_device, which contains the node_guid. diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d446998..842cd0b 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -88,7 +88,6 @@ struct cm_port { struct cm_device { struct list_head list; struct ib_device *device; - __be64 ca_guid; struct cm_port port[0]; }; @@ -739,8 +738,8 @@ retest: ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); spin_unlock_irqrestore(&cm_id_priv->lock, flags); ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT, - &cm_id_priv->av.port->cm_dev->ca_guid, - sizeof cm_id_priv->av.port->cm_dev->ca_guid, + &cm_id_priv->id.device->node_guid, + sizeof cm_id_priv->id.device->node_guid, NULL, 0); break; case IB_CM_REQ_RCVD: @@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, req_msg->local_comm_id = cm_id_priv->id.local_id; req_msg->service_id = param->service_id; - req_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid; + req_msg->local_ca_guid = cm_id_priv->id.device->node_guid; cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num)); cm_req_set_resp_res(req_msg, param->responder_resources); cm_req_set_init_depth(req_msg, param->initiator_depth); @@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg, cm_rep_set_flow_ctrl(rep_msg, param->flow_control); cm_rep_set_rnr_retry_count(rep_msg, param->rnr_retry_count); cm_rep_set_srq(rep_msg, param->srq); - rep_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid; + rep_msg->local_ca_guid = cm_id_priv->id.device->node_guid; if (param->private_data && param->private_data_len) memcpy(rep_msg->private_data, param->private_data, @@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device) return; cm_dev->device = device; - cm_dev->ca_guid = device->node_guid; set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); for (i = 1; i <= device->phys_port_cnt; i++) { commit 87680047dd09ca4a4e8ec575dad215c92cf45ed3 Author: Sean Hefty Date: Wed Feb 21 16:40:44 2007 -0800 rdma_cm: request reversible paths only The rdma_cm requires that path records be reversible. Set the reversible bit when issuing an path record query. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f8d69b3..bb27ce9 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1492,11 +1492,13 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, ib_addr_get_dgid(addr, &path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); path_rec.numb_path = 1; + path_rec.reversible = 1; id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device, id_priv->id.port_num, &path_rec, IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | - IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH, + IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_REVERSIBLE, timeout_ms, GFP_KERNEL, cma_query_handler, work, &id_priv->query); commit 30947e5b7db42184d66746ac1187d4abbf89018d Author: Sean Hefty Date: Wed Feb 21 16:37:31 2007 -0800 ib_core: Set hop limit in ib_init_ah_from_wc correctly. The hop_limit value in the ah_attr should be 0xFF, not the value read from the received GRH (which should be 0). See 13.5.4.4 in the 1.2 IB spec. diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 8b5dd36..ccdf93d 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -167,7 +167,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, ah_attr->grh.sgid_index = (u8) gid_index; flow_class = be32_to_cpu(grh->version_tclass_flow); ah_attr->grh.flow_label = flow_class & 0xFFFFF; - ah_attr->grh.hop_limit = grh->hop_limit; + ah_attr->grh.hop_limit = 0xFF; ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF; } return 0; From rdreier at cisco.com Fri Feb 23 13:11:34 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 23 Feb 2007 13:11:34 -0800 Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: cleanups In-Reply-To: <1172068305.21243.2.camel@stevo-desktop> (Steve Wise's message of "Wed, 21 Feb 2007 08:31:45 -0600") References: <20070220000211.GZ13958@stusta.de> <1171982587.2101.0.camel@stevo-desktop> <20070221105249.GC13958@stusta.de> <1172068305.21243.2.camel@stevo-desktop> Message-ID: thanks, queued for 2.6.21 From rdreier at cisco.com Fri Feb 23 13:13:00 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 23 Feb 2007 13:13:00 -0800 Subject: [openib-general] [PATCH 2.6.21] iw_cxgb3: Stop the EP Timer on BAD CLOSE. In-Reply-To: <1172090739.27101.39.camel@stevo-desktop> (Steve Wise's message of "Wed, 21 Feb 2007 14:45:39 -0600") References: <1172090739.27101.39.camel@stevo-desktop> Message-ID: thanks, queued for 2.6.21 From arlin.r.davis at intel.com Fri Feb 23 15:06:09 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 23 Feb 2007 15:06:09 -0800 Subject: [openib-general] [PATCH] uDAPL - include dapltest and dtest in build Message-ID: <000001c7579f$347974a0$ff0da8c0@amr.corp.intel.com> This uDAPL patch adds both dapltest and dtest utilities, including manual pages, to the DAPL project build. The dapltest required some modifications to build on x86_64. James, please review. Signed-off by: Arlin Davis ardavis at ichips.intel.com diff --git a/Makefile.am b/Makefile.am index 1190f20..e2bf4dc 100644 --- a/Makefile.am +++ b/Makefile.am @@ -179,7 +179,9 @@ libdatinclude_HEADERS = dat/include/dat/dat.h \ dat/include/dat/udat.h \ dat/include/dat/udat_redirection.h \ dat/include/dat/udat_vendor_specific.h - + +man_MANS = man/dtest.1 man/dapltest.1 + EXTRA_DIST = dat/common/dat_dictionary.h \ dat/common/dat_dr.h \ dat/common/dat_init.h \ @@ -228,8 +230,10 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ dat/udat/libdat.map \ doc/dat.conf \ dapl/udapl/libdaplcma.map \ - dapl/udapl/libdaplscm.map \ - libdat.spec.in + libdat.spec.in \ + $(man_MANS) dist-hook: libdat.spec cp libdat.spec $(distdir) + +SUBDIRS = . test/dtest test/dapltest diff --git a/configure.in b/configure.in index bf5ec09..324bfa1 100644 --- a/configure.in +++ b/configure.in @@ -1,11 +1,11 @@ dnl Process this file with autoconf to produce a configure script. AC_PREREQ(2.57) -AC_INIT(dapl, 1.2.0, dapl-devel at lists.sourceforge.net) +AC_INIT(dapl, 1.2.1, openib-general at openib.org) AC_CONFIG_SRCDIR([dat/udat/udat.c]) AC_CONFIG_AUX_DIR(config) AM_CONFIG_HEADER(config.h) -AM_INIT_AUTOMAKE(dapl, 1.2.0) +AM_INIT_AUTOMAKE(dapl, 1.2.1) AM_PROG_LIBTOOL @@ -60,5 +60,6 @@ AC_CACHE_CHECK(whether this is an RHEL system, ac_cv_rhel, fi) AM_CONDITIONAL(OS_RHEL, test "$ac_cv_rhel" = "yes") -AC_CONFIG_FILES([Makefile libdat.spec]) +AC_CONFIG_FILES([Makefile test/dtest/Makefile test/dapltest/Makefile libdat.spec]) + AC_OUTPUT diff --git a/man/dapltest.1 b/man/dapltest.1 new file mode 100644 index 0000000..8ff4493 --- /dev/null +++ b/man/dapltest.1 @@ -0,0 +1,390 @@ +." Text automatically generated by txt2man +.TH dapltest 1 "February 23, 2007" "uDAPL 1.2" "USER COMMANDS" + +.SH NAME +\fB +\fBdapltest \fP- test for the Direct Access Programming Library (DAPL) +\fB +.SH DESCRIPTION + +Dapltest is a set of tests developed to exercise, characterize, +and verify the DAPL interfaces during development and porting. +At least two instantiations of the test must be run. One acts +as the server, fielding requests and spawning server-side test +threads as needed. Other client invocations connect to the server +and issue test requests. The server side of the test, once invoked, +listens continuously for client connection requests, until quit or +killed. Upon receipt of a connection request, the connection is +established, the server and client sides swap version numbers to +verify that they are able to communicate, and the client sends +the test request to the server. If the version numbers match, +and the test request is well-formed, the server spawns the threads +needed to run the test before awaiting further connections. +.SH USAGE + +dapltest [ -f script_file_name ] +[ -T S|Q|T|P|L ] [ -D device_name ] [ -d ] [ -R HT|LL|EC|PM|BE ] +.PP +With no arguments, dapltest runs as a server using default values, +and loops accepting requests from clients. + +The -f option allows all arguments to be placed in a file, to ease +test automation. + +The following arguments are common to all tests: +.TP +.B +[ -T S|Q|T|P|L ] +Test function to be performed: +.RS +.TP +.B +S +- server loop +.TP +.B +Q +- quit, client requests that server +wait for any outstanding tests to +complete, then clean up and exit +.TP +.B +T +- transaction test, transfers data between +client and server +.TP +.B +P +- performance test, times DTO operations +.TP +.B +L +- limit test, exhausts various resources, +runs in client w/o server interaction +Default: S +.RE +.TP +.B +[ -D device_name ] +Specifies the interface adapter name as documented in +the /etc/dat.conf static configuration file. This name +corresponds to the provider library to open. +Default: none +.TP +.B +[ -d ] +Enables extra debug verbosity, primarily tracing +of the various DAPL operations as they progress. +Repeating this parameter increases debug spew. +Errors encountered result in the test spewing some +explanatory text and stopping; this flag provides +more detail about what lead up to the error. +Default: zero +.TP +.B +[ -R BE ] +Indicate the quality of service (QoS) desired. +Choices are: +.RS +.TP +.B +HT +- high throughput +.TP +.B +LL +- low latency +.TP +.B +EC +- economy (neither HT nor LL) +.TP +.B +PM +- premium +.TP +.B +BE +- best effort +Default: BE +.RE +.RE +.PP +.B +Usage - Quit test client +.PP +.nf +.fam C + dapltest [Common_Args] [ -s server_name ] + + Quit testing (-T Q) connects to the server to ask it to clean up and + exit (after it waits for any outstanding test runs to complete). + In addition to being more polite than simply killing the server, + this test exercises the DAPL object teardown code paths. + There is only one argument other than those supported by all tests: + + -s server_name Specifies the name of the server interface. + No default. + + +.fam T +.fi +.B +Usage - Transaction test client +.PP +.nf +.fam C + dapltest [Common_Args] [ -s server_name ] + [ -t threads ] [ -w endpoints ] [ -i iterations ] [ -Q ] + [ -V ] [ -P ] OPclient OPserver [ op3, + + Transaction testing (-T T) transfers a variable amount of data between + client and server. The data transfer can be described as a sequence of + individual operations; that entire sequence is transferred 'iterations' + times by each thread over all of its endpoint(s). + + The following parameters determine the behavior of the transaction test: + + -s server_name Specifies the name or IP address of the server interface. + No default. + + [ -t threads ] Specify the number of threads to be used. + Default: 1 + + [ -w endpoints ] Specify the number of connected endpoints per thread. + Default: 1 + + [ -i iterations ] Specify the number of times the entire sequence + of data transfers will be made over each endpoint. + Default: 1000 + + [ -Q ] Funnel completion events into a CNO. + Default: use EVDs + + [ -V ] Validate the data being transferred. + Default: ignore the data + + [ -P ] Turn on DTO completion polling + Default: off + + OP1 OP2 [ OP3, \.\.\. ] + A single transaction (OPx) consists of: + + server|client Indicates who initiates the + data transfer. + + SR|RR|RW Indicates the type of transfer: + SR send/recv + RR RDMA read + RW RDMA write + Defaults: none + + [ seg_size [ num_segs ] ] + Indicates the amount and format + of the data to be transferred. + Default: 4096 1 + (i.e., 1 4KB buffer) + + [ -f ] For SR transfers only, indicates + that a client's send transfer + completion should be reaped when + the next recv completion is reaped. + Sends and receives must be paired + (one client, one server, and in that + order) for this option to be used. + + Restrictions: + + Due to the flow control algorithm used by the transaction test, there + must be at least one SR OP for both the client and the server. + + Requesting data validation (-V) causes the test to automatically append + three OPs to those specified. These additional operations provide + synchronization points during each iteration, at which all user-specified + transaction buffers are checked. These three appended operations satisfy + the "one SR in each direction" requirement. + + The transaction OP list is printed out if -d is supplied. + +.fam T +.fi +.B +Usage - Performance test client +.PP +.nf +.fam C + dapltest [Common_Args] -s server_name [ -m p|b ] + [ -i iterations ] [ -p pipeline ] OP + + Performance testing (-T P) times the transfer of an operation. + The operation is posted 'iterations' times. + + The following parameters determine the behavior of the transaction test: + + -s server_name Specifies the name or IP address of the server interface. + No default. + + -m b|p Used to choose either blocking (b) or polling (p) + Default: blocking (b) + + [ -i iterations ] Specify the number of times the entire sequence + of data transfers will be made over each endpoint. + Default: 1000 + + [ -p pipeline ] Specify the pipline length, valid arguments are in + the range [0,MAX_SEND_DTOS]. If a value greater than + MAX_SEND_DTOS is requested the value will be + adjusted down to MAX_SEND_DTOS. + Default: MAX_SEND_DTOS + + OP Specifies the operation as follow: + + RR|RW Indicates the type of transfer: + RR RDMA read + RW RDMA write + Defaults: none + + [ seg_size [ num_segs ] ] + Indicates the amount and format + of the data to be transferred. + Default: 4096 1 + (i.e., 1 4KB buffer) +.fam T +.RE +.RE +.PP +.B +Usage - Limit test client +.PP +.nf +.fam C + Limit testing (-T L) neither requires nor connects to any server + instance. The client runs one or more tests which attempt to + exhaust various resources to determine DAPL limits and exercise + DAPL error paths. If no arguments are given, all tests are run. + + Limit testing creates the sequence of DAT objects needed to + move data back and forth, attempting to find the limits supported + for the DAPL object requested. For example, if the LMR creation + limit is being examined, the test will create a set of + {IA, PZ, CNO, EVD, EP} before trying to run dat_lmr_create() to + failure using that set of DAPL objects. The 'width' parameter + can be used to control how many of these parallel DAPL object + sets are created before beating upon the requested constructor. + Use of -m limits the number of dat_*_create() calls that will + be attempted, which can be helpful if the DAPL in use supports + essentailly unlimited numbers of some objects. + + The limit test arguments are: + + [ -m maximum ] Specify the maximum number of dapl_*_create() + attempts. + Default: run to object creation failure + + [ -w width ] Specify the number of DAPL object sets to + create while initializing. + Default: 1 + + [ limit_ia ] Attempt to exhaust dat_ia_open() + + [ limit_pz ] Attempt to exhaust dat_pz_create() + + [ limit_cno ] Attempt to exhaust dat_cno_create() + + [ limit_evd ] Attempt to exhaust dat_evd_create() + + [ limit_ep ] Attempt to exhaust dat_ep_create() + + [ limit_rsp ] Attempt to exhaust dat_rsp_create() + + [ limit_psp ] Attempt to exhaust dat_psp_create() + + [ limit_lmr ] Attempt to exhaust dat_lmr_create(4KB) + + [ limit_rpost ] Attempt to exhaust dat_ep_post_recv(4KB) + + [ limit_size_lmr ] Probe maximum size dat_lmr_create() + +.nf +.fam C + Default: run all tests + + +.fam T +.fi +.SH EXAMPLES + +dapltest -T S -d -D OpenIB-cma +.PP +.nf +.fam C + Starts a server process with debug verbosity. + +.fam T +.fi +dapltest -T T -d -s host1-ib0 -D OpenIB-cma -i 100 client SR 4096 2 server SR 4096 2 +.PP +.nf +.fam C + Runs a transaction test, with both sides + sending one buffer with two 4KB segments, + one hundred times. + +.fam T +.fi +dapltest -T P -d -s host1-ib0 -D OpenIB-cma -i 100 SR 4096 2 +.PP +.nf +.fam C + Runs a performance test, with the client + sending one buffer with two 4KB segments, + one hundred times. + +.fam T +.fi +dapltest -T Q -s host1-ib0 -D OpenIB-cma +.PP +.nf +.fam C + Asks the server to clean up and exit. + +.fam T +.fi +dapltest -T L -D OpenIB-cma -d -w 16 -m 1000 +.PP +.nf +.fam C + Runs all of the limit tests, setting up + 16 complete sets of DAPL objects, and + creating at most a thousand instances + when trying to exhaust resources. + +.fam T +.fi +dapltest -T T -V -d -t 2 -w 4 -i 55555 -s linux3 -D OpenIB-cma +client RW 4096 1 server RW 2048 4 +client SR 1024 4 server SR 4096 2 +client SR 1024 3 -f server SR 2048 1 -f +.PP +.nf +.fam C + Runs a more complicated transaction test, + with two thread using four EPs each, + sending a more complicated buffer pattern + for a larger number of iterations, + validating the data received. + + +.fam T +.fi +.RE +.TP +.B +BUGS +(and To Do List) +.PP +.nf +.fam C + Use of CNOs (-Q) is not yet supported. + + Further limit tests could be added. diff --git a/man/dtest.1 b/man/dtest.1 new file mode 100755 index 0000000..1e227e5 --- /dev/null +++ b/man/dtest.1 @@ -0,0 +1,78 @@ +.TH dtest 1 "February 23, 2007" "uDAPL 1.2" "USER COMMANDS" + +.SH NAME +dtest \- simple uDAPL send/receive and RDMA test + +.SH SYNOPSIS +.B dtest +[\-P provider] [\-b buf size] [\-B burst count][\-v] [\-c] [\-p] [\-d]\fB [-s]\fR + +.B dtest +[\-P provider] [\-b buf size] [\-B burst count][\-v] [\-c] [\-p] [\-d]\fB [-h HOSTNAME]\fR + +.SH DESCRIPTION +.PP +dtest is a simple test used to exercise and verify the uDAPL interfaces. +At least two instantiations of the test must be run. One acts as the server +and the other the client. The server side of the test, once invoked listens +for connection requests, until timing out or killed. Upon receipt of a cd +connection request, the connection is established, the server and client +sides exchange information necessary to perform RDMA writes and reads. + +.SH OPTIONS + +.PP +.TP +\fB\-P\fR=\fIPROVIDER\fR +use \fIPROVIDER\fR to specify uDAPL interface using /etc/dat.conf (default OpenIB-cma) +.TP +\fB\-b\fR=\fIBUFFER_SIZE\fR +use buffer size \fIBUFFER_SIZE\fR for RDMA(default 64) +.TP +\fB\-B\fR=\fIBURST_COUNT\fR +use busrt count \fIBURST_COUNT\fR for interations (default 10) +.TP +\fB\-v\fR, verbose output(default off) +.TP +\fB\-c\fR, use consumer notification events (default off) +.TP +\fB\-p\fR, use polling (default wait for event) +.TP +\fB\-d\fR, delay in seconds before close (default off) +.TP +\fB\-s\fR, run as server (default - run as server) +.TP +\fB\-h\fR=\fIHOSTNAME\fR +use \fIHOSTNAME\fR to specify server hostname or IP address (default - none) + +.SH EXAMPLES + +dtest -P OpenIB-cma -v -s +.PP +.nf +.fam C + Starts a server process with debug verbosity using provider OpenIB-cma. + +.fam T +.fi +dtest -P OpenIB-cma -h server1-ib0 +.PP +.nf +.fam C + Starts a client process, using OpenIB-cma provider to connect to hostname server1-ib0. + +.fam T + +.SH SEE ALSO +.BR dapltest(1) + +.SH AUTHORS +.TP +Arlin Davis +.RI < ardavis at ichips.intel.com > + +.SH BUGS + + + + diff --git a/test/dapltest/Makefile.am b/test/dapltest/Makefile.am new file mode 100755 index 0000000..0c83924 --- /dev/null +++ b/test/dapltest/Makefile.am @@ -0,0 +1,56 @@ +INCLUDES = -I include \ + -I mdep/linux + +bin_PROGRAMS = dapltest + +dapltest_SOURCES = \ + cmd/dapl_main.c \ + cmd/dapl_params.c \ + cmd/dapl_fft_cmd.c \ + cmd/dapl_getopt.c \ + cmd/dapl_limit_cmd.c \ + cmd/dapl_netaddr.c \ + cmd/dapl_performance_cmd.c \ + cmd/dapl_qos_util.c \ + cmd/dapl_quit_cmd.c \ + cmd/dapl_server_cmd.c \ + cmd/dapl_transaction_cmd.c \ + test/dapl_bpool.c \ + test/dapl_client.c \ + test/dapl_client_info.c \ + test/dapl_cnxn.c \ + test/dapl_execute.c \ + test/dapl_fft_connmgt.c \ + test/dapl_fft_endpoint.c \ + test/dapl_fft_hwconn.c \ + test/dapl_fft_mem.c \ + test/dapl_fft_pz.c \ + test/dapl_fft_queryinfo.c \ + test/dapl_fft_test.c \ + test/dapl_fft_util.c \ + test/dapl_limit.c \ + test/dapl_memlist.c \ + test/dapl_performance_client.c \ + test/dapl_performance_server.c \ + test/dapl_performance_stats.c \ + test/dapl_performance_util.c \ + test/dapl_quit_util.c \ + test/dapl_server.c \ + test/dapl_server_info.c \ + test/dapl_test_data.c \ + test/dapl_test_util.c \ + test/dapl_thread.c \ + test/dapl_transaction_stats.c \ + test/dapl_transaction_test.c \ + test/dapl_transaction_util.c \ + test/dapl_util.c \ + common/dapl_endian.c \ + common/dapl_global.c \ + common/dapl_performance_cmd_util.c \ + common/dapl_quit_cmd_util.c \ + common/dapl_transaction_cmd_util.c \ + udapl/udapl_tdep.c \ + mdep/linux/dapl_mdep_user.c + +dapltest_LDADD = $(srcdir)/../../dat/udat/libdat.la +dapltest_LDFLAGS = -lpthread diff --git a/test/dapltest/configure.in b/test/dapltest/configure.in new file mode 100755 index 0000000..ebdd59d --- /dev/null +++ b/test/dapltest/configure.in @@ -0,0 +1,26 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(dapltest, 1.2.1, dapl-devel at lists.sourceforge.net) +AC_CONFIG_SRCDIR([$top_srcdir/dapl/test/dapltest/cmd/dapl_main.c]) +AC_CONFIG_AUX_DIR(config) +AM_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE(dapltest, 1.2.1) + +AM_PROG_LIBTOOL + +dnl Checks for programs +AC_PROG_CC + +dnl Checks for libraries +if test "$disable_libcheck" != "yes" +then +AC_CHECK_LIB(pthread, pthread_attr_init, [], + AC_MSG_ERROR([pthread_attr_init() not found, dapltset requires pthreads])) +fi + +dnl Checks for header files. + +AC_CONFIG_FILES([Makefile]) + +AC_OUTPUT diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h b/test/dapltest/mdep/linux/dapl_mdep_user.h index 981783d..c05dd30 100644 --- a/test/dapltest/mdep/linux/dapl_mdep_user.h +++ b/test/dapltest/mdep/linux/dapl_mdep_user.h @@ -138,10 +138,16 @@ DT_Mdep_GetTimeStamp ( void ) } while (tbu0 != tbu1); return (((unsigned long long)tbu0) << 32) | tbl; #else +#if defined(__x86_64__) + unsigned int __a,__d; + asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); + return ((unsigned long)__a) | (((unsigned long)__d)<<32); +#else #error "Non-Pentium and Non-PPC Linux - unimplemented" #endif #endif #endif +#endif } /* diff --git a/test/dtest/Makefile.am b/test/dtest/Makefile.am new file mode 100755 index 0000000..ac9837b --- /dev/null +++ b/test/dtest/Makefile.am @@ -0,0 +1,4 @@ +bin_PROGRAMS = dtest +dtest_SOURCES = dtest.c +dtest_LDADD = $(srcdir)/../../dat/udat/libdat.la + diff --git a/test/dtest/configure.in b/test/dtest/configure.in new file mode 100755 index 0000000..822df5e --- /dev/null +++ b/test/dtest/configure.in @@ -0,0 +1,21 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(dtest, 1.2.1, dapl-devel at lists.sourceforge.net) +AC_CONFIG_SRCDIR([$top_srcdir/dapl/test/dtest/dtest.c]) +AC_CONFIG_AUX_DIR(config) +AM_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE(dtest, 1.2.1) + +AM_PROG_LIBTOOL + +dnl Checks for programs +AC_PROG_CC + +dnl Checks for libraries + +dnl Checks for header files. + +AC_CONFIG_FILES([Makefile]) + +AC_OUTPUT diff --git a/test/dtest/makefile b/test/dtest/makefile deleted file mode 100644 index 858d77f..0000000 --- a/test/dtest/makefile +++ /dev/null @@ -1,16 +0,0 @@ -CC = gcc -CFLAGS = -O2 -g - -DAT_INC = ../../dat/include -DAT_LIB = /usr/local/lib - -all: dtest - -clean: - rm -f *.o;touch *.c;rm -f dtest - -dtest: ./dtest.c - $(CC) $(CFLAGS) ./dtest.c -o dtest \ - -DDAPL_PROVIDER='"OpenIB-cma"' \ - -I $(DAT_INC) -L $(DAT_LIB) -ldat - From vlad at lists.openfabrics.org Sat Feb 24 02:28:03 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sat, 24 Feb 2007 02:28:03 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070224-0200 daily build status Message-ID: <20070224102804.37709E607FD@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From cppbala at yahoo.com Sat Feb 24 06:33:08 2007 From: cppbala at yahoo.com (Bala) Date: Sat, 24 Feb 2007 06:33:08 -0800 (PST) Subject: [openib-general] ib0 interface up but can't ping Message-ID: <327816.12892.qm@web35105.mail.mud.yahoo.com> Hi All, we are trying to setup IB on HP c-class blade system(x86_64) having 4X DDR IB Mezzanine HCA with CentOS-4. 1. Downloaded ofed-1.1 2. build the RPMS and installed two machines 3. set the IP in icfg-ib0 and rebooted 4. can see the ib0 interface up on both the machines 5. but can't ping the other machine using IB interface pls let me know what we are missing or what is required to ping the other machines using IB?? also we have tried with RHEL-4 and ofed-1.1 after installing the RPMS and rebooting the servers, we can see the interface but can't set the IP it always complains that other machines using the IP. pls let us know how we can over come this error thanks in advance, -bala- ____________________________________________________________________________________ Want to start your own business? Learn how on Yahoo! Small Business. http://smallbusiness.yahoo.com/r-index From halr at voltaire.com Sat Feb 24 07:28:50 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 24 Feb 2007 10:28:50 -0500 Subject: [openib-general] ib0 interface up but can't ping In-Reply-To: <327816.12892.qm@web35105.mail.mud.yahoo.com> References: <327816.12892.qm@web35105.mail.mud.yahoo.com> Message-ID: <1172330920.4102.100648.camel@hal.voltaire.com> On Sat, 2007-02-24 at 09:33, Bala wrote: > Hi All, > we are trying to setup IB on HP c-class blade > system(x86_64) having 4X DDR IB Mezzanine HCA with > CentOS-4. > > 1. Downloaded ofed-1.1 > 2. build the RPMS and installed two machines > 3. set the IP in icfg-ib0 and rebooted > 4. can see the ib0 interface up on both the machines > 5. but can't ping the other machine using IB > interface > > pls let me know what we are missing or what is > required > to ping the other machines using IB?? > > also we have tried with RHEL-4 and ofed-1.1 after > installing the RPMS and rebooting the servers, we can > see the interface but can't set the IP it always > complains that other machines using the IP. > > pls let us know how we can over come this error Are the ports in active state ? -- Hal > thanks in advance, > -bala- > > > > > > ____________________________________________________________________________________ > Want to start your own business? > Learn how on Yahoo! Small Business. > http://smallbusiness.yahoo.com/r-index > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From dotanb at dev.mellanox.co.il Sat Feb 24 09:54:02 2007 From: dotanb at dev.mellanox.co.il (dotanb at dev.mellanox.co.il) Date: Sat, 24 Feb 2007 19:54:02 +0200 (IST) Subject: [openib-general] ib0 interface up but can't ping In-Reply-To: <1172330920.4102.100648.camel@hal.voltaire.com> References: <327816.12892.qm@web35105.mail.mud.yahoo.com> <1172330920.4102.100648.camel@hal.voltaire.com> Message-ID: <2199.85.65.223.188.1172339642.squirrel@dev.mellanox.co.il> > On Sat, 2007-02-24 at 09:33, Bala wrote: >> Hi All, >> we are trying to setup IB on HP c-class blade >> system(x86_64) having 4X DDR IB Mezzanine HCA with >> CentOS-4. >> >> 1. Downloaded ofed-1.1 >> 2. build the RPMS and installed two machines >> 3. set the IP in icfg-ib0 and rebooted >> 4. can see the ib0 interface up on both the machines >> 5. but can't ping the other machine using IB >> interface >> >> pls let me know what we are missing or what is >> required >> to ping the other machines using IB?? >> >> also we have tried with RHEL-4 and ofed-1.1 after >> installing the RPMS and rebooting the servers, we can >> see the interface but can't set the IP it always >> complains that other machines using the IP. >> >> pls let us know how we can over come this error > > Are the ports in active state ? > OpenSM (or any other SM) must be active in order to use IPoIB. Dotan From sashak at voltaire.com Sat Feb 24 12:13:42 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 24 Feb 2007 22:13:42 +0200 Subject: [openib-general] [PATCH] opensm: updn performance improvements Message-ID: <20070224201342.GB9147@sashak.voltaire.com> There are various performance improvements for up/down routing engine: - updn_node object which is referenced by switch's priv pointer - ranking for switches only - replace time consuming cl_list by cl_qlist - reuse already collected up/down related information (in updn_node structure) instead of rediscovering - eliminate many inner loops - mask time consuming logging - elminate using two lists with BFS - minor cleaups Now up/down looks 5-6 times faster. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_updn.c | 743 +++++++++++++++---------------------------- 1 files changed, 257 insertions(+), 486 deletions(-) diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index 8b86958..e8282f4 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2006 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -40,8 +40,6 @@ * * Environment: * Linux User Mode - * - * $Revision: 1.0 $ */ #if HAVE_CONFIG_H @@ -61,25 +59,10 @@ /* direction */ typedef enum _updn_switch_dir { - UP = 0, - DOWN + UP = 0, + DOWN } updn_switch_dir_t; -/* This enum respresent available states in the UPDN algorithm */ -typedef enum _updn_state -{ - UPDN_INIT = 0, - UPDN_RANK, - UPDN_MIN_HOP_CALC, -} updn_state_t; - -/* Rank value of this node */ -typedef struct _updn_rank -{ - cl_map_item_t map_item; - uint8_t rank; -} updn_rank_t; - /* Histogram element - the number of occurences of the same hop value */ typedef struct _updn_hist { @@ -87,12 +70,6 @@ typedef struct _updn_hist uint32_t bar_value; } updn_hist_t; -typedef struct _updn_next_step -{ - updn_switch_dir_t state; - osm_switch_t *p_sw; -} updn_next_step_t; - /* guids list */ typedef struct _updn_input { @@ -100,17 +77,26 @@ typedef struct _updn_input uint64_t *guid_list; } updn_input_t; +struct updn_node { + cl_list_item_t list; + osm_switch_t *sw; + updn_switch_dir_t dir; + unsigned rank; + unsigned is_root; + unsigned visited; +}; + /* updn structure */ typedef struct _updn { - updn_state_t state; boolean_t auto_detect_root_nodes; - cl_qmap_t guid_rank_tbl; updn_input_t updn_ucast_reg_inputs; - cl_list_t * p_root_nodes; + cl_list_t *p_root_nodes; osm_opensm_t *p_osm; } updn_t; +#define NOISE_L(log, fmt, arg...) + /* ///////////////////////////////// */ /* Statics */ /* ///////////////////////////////// */ @@ -122,27 +108,17 @@ static void __osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); remote ports */ static updn_switch_dir_t __updn_get_dir( - IN updn_t *p_updn, - IN uint8_t cur_rank, - IN uint8_t rem_rank, + IN unsigned cur_rank, + IN unsigned rem_rank, IN uint64_t cur_guid, - IN uint64_t rem_guid ) + IN uint64_t rem_guid, + IN unsigned cur_is_root, + IN unsigned rem_is_root ) { - uint32_t i = 0, max_num_guids = p_updn->updn_ucast_reg_inputs.num_guids; - uint64_t *p_guid = p_updn->updn_ucast_reg_inputs.guid_list; - boolean_t cur_is_root = FALSE, rem_is_root = FALSE; - /* HACK: comes to solve root nodes connection, in a classic subnet root nodes do not connect - directly, but in case they are we assign to root node an UP direction to allow UPDN discover + directly, but in case they are we assign to root node an UP direction to allow UPDN to discover the subnet correctly (and not from the point of view of the last root node). */ - for ( i = 0; i < max_num_guids; i++ ) - { - if (cur_guid == p_guid[i]) - cur_is_root = TRUE; - if (rem_guid == p_guid[i]) - rem_is_root = TRUE; - } if (cur_is_root && rem_is_root) return UP; @@ -162,58 +138,18 @@ __updn_get_dir( /********************************************************************** **********************************************************************/ -/* This function creates a new element of updn_next_step_t type then return its - pointer, Null if malloc has failed */ -static updn_next_step_t* -__updn_create_updn_next_step_t( - IN updn_switch_dir_t state, - IN osm_switch_t* const p_sw ) -{ - updn_next_step_t *p_next_step; - - p_next_step = (updn_next_step_t*) malloc(sizeof(*p_next_step)); - if (p_next_step) - { - memset(p_next_step, 0, sizeof(*p_next_step)); - p_next_step->state = state; - p_next_step->p_sw = p_sw; - } - - return p_next_step; -} - -/********************************************************************** - **********************************************************************/ -/* This function updates an element in the qmap list by guid index and rank value */ +/* This function updates rank value for a node */ /* Return 0 if no need to further update 1 if brought a new value */ static int __updn_update_rank( - IN cl_qmap_t *p_guid_rank_tbl, - IN ib_net64_t guid, - IN uint8_t rank ) + IN struct updn_node *u, + IN unsigned rank ) { - updn_rank_t *p_updn_rank; - - p_updn_rank = (updn_rank_t*) cl_qmap_get(p_guid_rank_tbl, guid); - if (p_updn_rank == (updn_rank_t*) cl_qmap_end(p_guid_rank_tbl)) + if (u->rank > rank) { - p_updn_rank = (updn_rank_t*) malloc(sizeof(updn_rank_t)); - - CL_ASSERT(p_updn_rank); - - p_updn_rank->rank = rank; - - cl_qmap_insert(p_guid_rank_tbl, guid, &p_updn_rank->map_item); + u->rank = rank; return 1; } - else - { - if (p_updn_rank->rank > rank) - { - p_updn_rank->rank = rank; - return 1; - } - } return 0; } @@ -223,20 +159,18 @@ __updn_update_rank( **********************************************************************/ static int __updn_bfs_by_node( - IN updn_t *p_updn, + IN osm_log_t *p_log, IN osm_subn_t *p_subn, - IN osm_port_t *p_port, - IN cl_qmap_t *p_guid_rank_tbl ) + IN osm_port_t *p_port ) { /* Init local vars */ osm_switch_t *p_self_node = NULL; uint8_t pn, pn_rem; osm_physp_t *p_physp, *p_remote_physp; - cl_list_t *p_currList, *p_nextList; + cl_qlist_t list; uint16_t root_lid; - updn_next_step_t *p_updn_switch, *p_tmp; + struct updn_node *u; updn_switch_dir_t next_dir, current_dir; - osm_log_t *p_log = &p_updn->p_osm->log; OSM_LOG_ENTER( p_log, __updn_bfs_by_node ); @@ -248,21 +182,6 @@ __updn_bfs_by_node( return 1; } - /* Init the list pointers */ - p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); - if (!p_nextList) - { - osm_log( p_log, OSM_LOG_ERROR, - "__updn_bfs_by_node: ERR AA14: " - "No memory for p_nextList\n" ); - OSM_LOG_EXIT( p_log ); - return 1; - } - - cl_list_construct( p_nextList ); - cl_list_init( p_nextList, 10 ); - p_currList = p_nextList; - /* The Root BFS - lid */ root_lid = cl_ntoh16(osm_physp_get_base_lid( p_physp )); /* printf ("-V- BFS through lid : 0x%x\n", root_lid); */ @@ -273,7 +192,7 @@ __updn_bfs_by_node( if (p_port->p_node->sw) { p_self_node = p_port->p_node->sw; - /* Update its Min Hop Table */ + /* Update it's Min Hop Table */ osm_log( p_log, OSM_LOG_DEBUG, "__updn_bfs_by_node: " "Update Min Hop Table of GUID 0x%" PRIx64 "\n", @@ -282,7 +201,7 @@ __updn_bfs_by_node( } else { - /* This is a CA or router - need to take its remote port */ + /* This is a CA or router - need to take it's remote port */ p_remote_physp = p_physp->p_remote_physp; /* make sure that the following occur: @@ -304,7 +223,7 @@ __updn_bfs_by_node( else { p_self_node = p_remote_physp->p_node->sw; - /* Update its Min Hop Table */ + /* Update it's Min Hop Table */ /* NOTE : Check if there is a function which prints the Min Hop Table */ osm_log( p_log, OSM_LOG_DEBUG, "__updn_bfs_by_node: " @@ -322,201 +241,111 @@ __updn_bfs_by_node( "Starting from switch - port GUID 0x%" PRIx64 "\n", cl_ntoh64(p_self_node->p_node->node_info.port_guid) ); - /* Update list with the updn_next_step_t new element */ - /* NOTE : When inserting an item which is a pointer to a struct, does remove - action also free its memory */ - if (!(p_tmp=__updn_create_updn_next_step_t(UP, p_self_node))) - { - osm_log( p_log, OSM_LOG_ERROR, - "__updn_bfs_by_node: ERR AA08: " - "Could not create updn_next_step_t\n" ); - return 1; - } + /* Update current list with the new element */ + u = p_self_node->priv; + u->dir = UP; - cl_list_insert_tail(p_currList, p_tmp); + cl_qlist_init(&list); + cl_qlist_insert_tail(&list, &u->list); /* BFS the list till no next element */ - osm_log( p_log, OSM_LOG_VERBOSE, - "__updn_bfs_by_node: " - "BFS the subnet [\n" ); - - while (!cl_is_list_empty(p_currList)) + while (!cl_is_qlist_empty(&list)) { - osm_log( p_log, OSM_LOG_DEBUG, + ib_net64_t remote_guid, current_guid; + + NOISE_L( p_log, OSM_LOG_DEBUG, "__updn_bfs_by_node: " "Starting a new iteration with %zu elements in current list\n", - cl_list_count(p_currList) ); - /* Init the switch directed list */ - p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); - if (!p_nextList) - { - osm_log( p_log, OSM_LOG_ERROR, - "__updn_bfs_by_node: ERR AA15: " - "No memory for p_nextList\n" ); - OSM_LOG_EXIT( p_log ); - return 1; - } + cl_qlist_count(&list) ); - cl_list_construct( p_nextList ); - cl_list_init( p_nextList, 10 ); - /* Go over all current list items till it's empty */ - /* printf ("-V- In inner while\n"); */ - p_updn_switch = (updn_next_step_t*)cl_list_remove_head( p_currList ); - /* While there is a pointer to updn struct we continue to BFS */ - while (p_updn_switch) + u = (struct updn_node *)cl_qlist_remove_head(&list); + u->visited = 0; /* cleanup */ + current_dir = u->dir; + current_guid = osm_node_get_node_guid(u->sw->p_node); + NOISE_L( p_log, OSM_LOG_DEBUG, + "__updn_bfs_by_node: " + "Visiting port GUID 0x%" PRIx64 "\n", + cl_ntoh64(current_guid) ); + /* Go over all ports of the switch and find unvisited remote nodes */ + for ( pn = 0; pn < osm_switch_get_num_ports(u->sw); pn++ ) { - current_dir = p_updn_switch->state; - osm_log( p_log, OSM_LOG_DEBUG, + osm_node_t *p_remote_node; + struct updn_node *rem_u; + uint8_t current_min_hop, remote_min_hop, set_hop_return_value; + osm_switch_t *p_remote_sw; + + p_remote_node = osm_node_get_remote_node(u->sw->p_node, pn, &pn_rem); + /* If no remote node OR remote node is not a SWITCH + continue to next pn */ + if( !p_remote_node || !p_remote_node->sw ) + continue; + /* Fetch remote guid only after validation of remote node */ + remote_guid = osm_node_get_node_guid(p_remote_node); + p_remote_sw = p_remote_node->sw; + rem_u = p_remote_sw->priv; + /* Decide which direction to mark it (UP/DOWN) */ + next_dir = __updn_get_dir(u->rank, rem_u->rank, + current_guid, remote_guid, + u->is_root, rem_u->is_root); + + NOISE_L( p_log, OSM_LOG_DEBUG, "__updn_bfs_by_node: " - "Visiting port GUID 0x%" PRIx64 "\n", - cl_ntoh64(p_updn_switch->p_sw->p_node->node_info.port_guid) ); - /* Go over all ports of the switch and find unvisited remote nodes */ - for ( pn = 0; pn < osm_switch_get_num_ports(p_updn_switch->p_sw); pn++ ) + "move from 0x%016" PRIx64 " rank: %u " + "to 0x%016" PRIx64" rank: %u\n", + cl_ntoh64(current_guid), u->rank, + cl_ntoh64(remote_guid), rem->rank ); + /* Check if this is a legal step : the only illegal step is going + from DOWN to UP */ + if ((current_dir == DOWN) && (next_dir == UP)) { - /* printf("-V- Inner for in port num 0x%X\n", pn); */ - osm_node_t *p_remote_node; - cl_list_iterator_t updn_switch_iterator; - boolean_t HasVisited = FALSE; - ib_net64_t remote_guid,current_guid; - updn_rank_t *p_rem_rank, *p_cur_rank; - uint8_t current_min_hop, remote_min_hop, set_hop_return_value; - osm_switch_t *p_remote_sw; - - current_guid = osm_node_get_node_guid(p_updn_switch->p_sw->p_node); - p_remote_node = osm_node_get_remote_node( p_updn_switch->p_sw->p_node, - pn, &pn_rem ); - /* If no remote node OR remote node is not a SWITCH - continue to next pn */ - if( !p_remote_node || - (osm_node_get_type(p_remote_node) != IB_NODE_TYPE_SWITCH) ) - continue; - /* Fetch remote guid only after validation of remote node */ - remote_guid = osm_node_get_node_guid(p_remote_node); - /* printf ("-V- Current guid : 0x%" PRIx64 " Remote guid : 0x%" PRIx64 "\n", */ - /* cl_ntoh64(current_guid), cl_ntoh64(remote_guid)); */ - p_remote_sw = p_remote_node->sw; - p_rem_rank = (updn_rank_t*)cl_qmap_get(p_guid_rank_tbl, remote_guid); - p_cur_rank = (updn_rank_t*)cl_qmap_get(p_guid_rank_tbl, current_guid); - /* Decide which direction to mark it (UP/DOWN) */ - next_dir = __updn_get_dir (p_updn, p_cur_rank->rank, p_rem_rank->rank, - current_guid, remote_guid); - osm_log( p_log, OSM_LOG_DEBUG, "__updn_bfs_by_node: " - "move from 0x%016" PRIx64 " rank: %u " - "to 0x%016" PRIx64" rank: %u\n", - cl_ntoh64(current_guid), p_cur_rank->rank, - cl_ntoh64(remote_guid), p_rem_rank->rank ); - /* Check if this is a legal step : the only illegal step is going - from DOWN to UP */ - if ((current_dir == DOWN) && (next_dir == UP)) + "Avoiding move from 0x%016" PRIx64 " to 0x%016" PRIx64"\n", + cl_ntoh64(current_guid), cl_ntoh64(remote_guid) ); + /* Illegal step */ + continue; + } + /* Set MinHop value for the current lid */ + current_min_hop = osm_switch_get_least_hops(u->sw, root_lid); + /* Check hop count if better insert into NextState list && update + the remote node Min Hop Table */ + remote_min_hop = osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem); + if (current_min_hop + 1 < remote_min_hop) + { + NOISE_L( p_log, OSM_LOG_DEBUG, + "__updn_bfs_by_node (less): " + "Setting Min Hop Table of switch: 0x%" PRIx64 + "\n\t\tCurrent hop count is: %d, next hop count: %d" + "\n\tlid to set: 0x%x" + "\n\tport number: 0x%X" + "\n\thops number: %d\n", + cl_ntoh64(remote_guid), remote_min_hop,current_min_hop + 1, + root_lid, pn_rem, current_min_hop + 1 ); + set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1); + if (set_hop_return_value) { - osm_log( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "Avoiding move from 0x%016" PRIx64 " to 0x%016" PRIx64"\n", - cl_ntoh64(current_guid), cl_ntoh64(remote_guid) ); - /* Illegal step */ - continue; + osm_log( p_log, OSM_LOG_ERROR, + "__updn_bfs_by_node (less) ERR AA01: " + "Invalid value returned from set min hop is: %d\n", + set_hop_return_value ); } - /* Set MinHop value for the current lid */ - current_min_hop = osm_switch_get_least_hops(p_updn_switch->p_sw,root_lid); - /* Check hop count if better insert into NextState list && update - the remote node Min Hop Table */ - remote_min_hop = osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem); - if (current_min_hop + 1 < remote_min_hop) - { - osm_log( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node (less): " - "Setting Min Hop Table of switch: 0x%" PRIx64 - "\n\t\tCurrent hop count is: %d, next hop count: %d" - "\n\tlid to set: 0x%x" - "\n\tport number: 0x%X" - " \n\thops number: %d\n", - cl_ntoh64(remote_guid), remote_min_hop,current_min_hop + 1, - root_lid, pn_rem, current_min_hop + 1 ); - set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1); - if (set_hop_return_value) - { - osm_log( p_log, OSM_LOG_ERROR, - "__updn_bfs_by_node (less) ERR AA01: " - "Invalid value returned from set min hop is: %d\n", - set_hop_return_value ); - } - /* Check if remote port is allready has been visited */ - updn_switch_iterator = cl_list_head(p_nextList); - while( updn_switch_iterator != cl_list_end(p_nextList) ) - { - updn_next_step_t *p_updn; - p_updn = (updn_next_step_t*)cl_list_obj(updn_switch_iterator); - /* Mark HasVisited only if: - 1. Same node guid - 2. Same direction - */ - if ((p_updn->p_sw->p_node == p_remote_node) && (p_updn->state == next_dir)) - HasVisited = TRUE; - updn_switch_iterator = cl_list_next(updn_switch_iterator); - } - if (!HasVisited) - { - /* Insert updn_switch item into the next list */ - if(!(p_tmp=__updn_create_updn_next_step_t(next_dir, p_remote_sw))) - { - osm_log( p_log, OSM_LOG_ERROR, - "__updn_bfs_by_node: ERR AA11: " - "Could not create updn_next_step_t\n" ); - return 1; - } - osm_log( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "Inserting new element to the next list: guid=0x%" PRIx64 " %s\n", - cl_ntoh64(p_tmp->p_sw->p_node->node_info.port_guid), - (p_tmp->state == UP ? "UP" : "DOWN") - ); - cl_list_insert_tail(p_nextList, p_tmp); - } - /* If the same value only update entry - at the min hop table */ - } else if (current_min_hop + 1 == osm_switch_get_hop_count(p_remote_sw, - root_lid, - pn_rem)) + /* Check if remote port has already been visited */ + if (!rem_u->visited) { - osm_log( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node (equal): " - "Setting Min Hop Table of switch: 0x%" PRIx64 - "\n\t\tCurrent hop count is: %d, next hop count: %d" - "\n\tlid to set: 0x%x" - "\n\tport number: 0x%X" - "\n\thops number: %d\n", - cl_ntoh64(remote_guid), - osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem), - current_min_hop + 1, root_lid, pn_rem, current_min_hop + 1 ); - set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1); - - if (set_hop_return_value) - { - osm_log( p_log, OSM_LOG_ERROR, - "__updn_bfs_by_node (less) ERR AA12: " - "Invalid value returned from set min hop is: %d\n", - set_hop_return_value ); - } + /* Insert updn_switch item into the next list */ + rem_u->dir = next_dir; + rem_u->visited = 1; + NOISE_L( p_log, OSM_LOG_DEBUG, + "__updn_bfs_by_node: " + "Inserting new element to the next list: guid=0x%" PRIx64 " %s\n", + cl_ntoh64(rem_u->sw->p_node->node_info.port_guid), + (rem_u->dir == UP ? "UP" : "DOWN")); + cl_qlist_insert_tail(&list, &rem_u->list); } } - free (p_updn_switch); - p_updn_switch = (updn_next_step_t*)cl_list_remove_head( p_currList ); } - /* Cleanup p_currList */ - cl_list_destroy( p_currList ); - free (p_currList); - - /* Reassign p_currList to p_nextList */ - p_currList = p_nextList; } - /* Cleanup p_currList - Had the pointer to cl_list_t */ - cl_list_destroy( p_currList ); - free (p_currList); - osm_log( p_log, OSM_LOG_VERBOSE, - "__updn_bfs_by_node: " - "BFS the subnet ]\n" ); OSM_LOG_EXIT( p_log ); return 0; } @@ -527,23 +356,8 @@ static void updn_destroy( IN updn_t* const p_updn ) { - cl_map_item_t *p_map_item; uint64_t *p_guid_list_item; - /* Destroy the updn struct */ - p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl); - while( p_map_item != cl_qmap_end( &p_updn->guid_rank_tbl )) - { - osm_log ( &p_updn->p_osm->log, OSM_LOG_DEBUG, - "updn_destroy: " - "guid = 0x%" PRIx64 " rank = %u\n", - cl_ntoh64(cl_qmap_key(p_map_item)), - ((updn_rank_t *)p_map_item)->rank ); - cl_qmap_remove_item( &p_updn->guid_rank_tbl, p_map_item ); - free( (updn_rank_t *)p_map_item); - p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl ); - } - /* free the array of guids */ if (p_updn->updn_ucast_reg_inputs.guid_list) free(p_updn->updn_ucast_reg_inputs.guid_list); @@ -592,8 +406,6 @@ updn_init( OSM_LOG_ENTER( &p_osm->log, updn_init ); p_updn->p_osm = p_osm; - p_updn->state = UPDN_INIT; - cl_qmap_init( &p_updn->guid_rank_tbl ); p_list = (cl_list_t*)malloc(sizeof(cl_list_t)); if (!p_list) { @@ -691,171 +503,99 @@ updn_subn_rank( IN updn_t* p_updn ) { /* Init local vars */ - osm_port_t *p_root_port = NULL; - uint16_t tbl_size; + osm_switch_t *p_sw; uint8_t rank = base_rank; - osm_physp_t *p_physp, *p_remote_physp, *p_physp_temp; - cl_list_t *p_currList,*p_nextList; + osm_physp_t *p_physp, *p_remote_physp; + cl_qlist_t list; cl_status_t did_cause_update; + struct updn_node *u, *remote_u; uint8_t num_ports, port_num; osm_log_t *p_log = &p_updn->p_osm->log; OSM_LOG_ENTER( p_log, updn_subn_rank ); - osm_log( p_log, OSM_LOG_VERBOSE, - "updn_subn_rank: " - "Ranking starts from GUID 0x%" PRIx64 "\n", root_guid ); - - /* Init the list pointers */ - p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); - if (!p_nextList) + p_sw = osm_get_switch_by_guid(&p_updn->p_osm->subn, cl_hton64(root_guid)); + if(!p_sw) { osm_log( p_log, OSM_LOG_ERROR, - "updn_subn_rank: ERR AA15: " - "No memory for p_nextList\n" ); + "updn_subn_rank: ERR AA05: " + "Wrong switch GUID 0x%" PRIx64 "\n", root_guid ); OSM_LOG_EXIT( p_log ); return 1; } - cl_list_construct( p_nextList ); - cl_list_init( p_nextList, 10 ); - p_currList = p_nextList; + osm_log( p_log, OSM_LOG_VERBOSE, + "updn_subn_rank: " + "Ranking starts from GUID 0x%" PRIx64 "\n", root_guid ); - /* Check valid subnet & guid */ - tbl_size = (uint16_t)(cl_qmap_count(&p_updn->p_osm->subn.port_guid_tbl)); - if (tbl_size == 0) - { - osm_log( p_log, OSM_LOG_ERROR, - "updn_subn_rank: ERR AA04: " - "Port guid table is empty, cannot perform ranking\n" ); - OSM_LOG_EXIT( p_log ); - return 1; - } + u = p_sw->priv; + u->is_root = 1; - p_root_port = (osm_port_t*) cl_qmap_get(&p_updn->p_osm->subn.port_guid_tbl, - cl_ntoh64(root_guid)); - if( p_root_port == (osm_port_t*)cl_qmap_end( &p_updn->p_osm->subn.port_guid_tbl ) ) - { - osm_log( p_log, OSM_LOG_ERROR, - "updn_subn_rank: ERR AA05: " - "Wrong guid value: 0x%" PRIx64 "\n", root_guid ); - OSM_LOG_EXIT( p_log ); - return 1; - } - - /* Rank the first chosen guid anyway since its the base rank */ + /* Rank the first guid chosen anyway since it's the base rank */ osm_log( p_log, OSM_LOG_DEBUG, "updn_subn_rank: " "Ranking port GUID 0x%" PRIx64 "\n", root_guid ); - __updn_update_rank(&p_updn->guid_rank_tbl, cl_ntoh64(root_guid), rank); - /* - HACK: We are assuming SM is running on HCA, so when getting the default - port we'll get the port connected to the rest of the subnet. If SM is - running on SWITCH - we should try to get a dr path from all switch ports. - */ - p_physp = osm_port_get_default_phys_ptr( p_root_port ); - CL_ASSERT( p_physp ); - CL_ASSERT( osm_physp_is_valid( p_physp ) ); - /* We can safely add the node to the list */ - cl_list_insert_tail(p_nextList, p_physp); - /* Assign pointer to the list for BFS */ - p_currList = p_nextList; - - /* BFS the list till its empty */ - osm_log( p_log, OSM_LOG_VERBOSE, - "updn_subn_rank: " - "BFS the subnet [\n" ); + __updn_update_rank(u, rank); + + cl_qlist_init(&list); + cl_qlist_insert_tail(&list, &u->list); - while (!cl_is_list_empty(p_currList)) + /* BFS the list till it's empty */ + while (!cl_is_qlist_empty(&list)) { rank++; - p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); - if (!p_nextList) - { - osm_log( p_log, OSM_LOG_ERROR, - "updn_subn_rank: ERR AA16: " - "No memory for p_nextList\n" ); - OSM_LOG_EXIT( p_log ); - return 1; - } - cl_list_construct( p_nextList ); - cl_list_init( p_nextList, 10 ); - p_physp = (osm_physp_t*)cl_list_remove_head( p_currList ); - /* Go over all remote nodes and rank them (if not allready visited) till - no elemtent in the list p_currList */ - while ( p_physp != NULL ) + u = (struct updn_node *)cl_qlist_remove_head(&list); + /* Go over all remote nodes and rank them (if not already visited) */ + p_sw = u->sw; + num_ports = osm_switch_get_num_ports(p_sw); + osm_log( p_log, OSM_LOG_DEBUG, + "updn_subn_rank: " + "Handling switch GUID 0x%" PRIx64 "\n", + cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)) ); + for (port_num = 1; port_num < num_ports; port_num++) { - num_ports = osm_node_get_num_physp( p_physp->p_node ); - osm_log( p_log, OSM_LOG_DEBUG, - "updn_subn_rank: " - "Handling port GUID 0x%" PRIx64 "\n", - cl_ntoh64(p_physp->port_guid) ); - for (port_num = 1; port_num < num_ports; port_num++) + ib_net64_t port_guid; + + /* Current port fetched in order to get remote side */ + p_physp = osm_node_get_physp_ptr( p_sw->p_node, port_num ); + p_remote_physp = p_physp->p_remote_physp; + + /* + make sure that all the following occur on p_remote_physp: + 1. The port isn't NULL + 2. The port is a valid port + 3. It is a switch + */ + if ( p_remote_physp && + osm_physp_is_valid( p_remote_physp ) && + p_remote_physp->p_node->sw ) { - ib_net64_t port_guid; - - /* Current port fetched in order to get remote side */ - p_physp_temp = osm_node_get_physp_ptr( p_physp->p_node, port_num ); - p_remote_physp = p_physp_temp->p_remote_physp; - - /* - make sure that all the following occur on p_remote_physp: - 1. The port isn't NULL - 2. The port is a valid port - */ - if ( p_remote_physp && - osm_physp_is_valid ( p_remote_physp )) - { - port_guid = p_remote_physp->port_guid; - osm_log( p_log, OSM_LOG_DEBUG, - "updn_subn_rank: " - "Visiting remote port GUID 0x%" PRIx64 "\n", - cl_ntoh64(port_guid) ); - /* Was it visited ? - Only if the pointer equal to cl_qmap_end its not - found in the list */ - osm_log( p_log, OSM_LOG_DEBUG, - "updn_subn_rank: " - "Ranking port GUID 0x%" PRIx64 "\n", cl_ntoh64(port_guid) ); - did_cause_update = __updn_update_rank(&p_updn->guid_rank_tbl, port_guid, rank); - - osm_log( p_log, OSM_LOG_VERBOSE, - "updn_subn_rank: " - "Rank of port GUID 0x%" PRIx64 " = %u\n", cl_ntoh64(port_guid), - ((updn_rank_t*)cl_qmap_get(&p_updn->guid_rank_tbl, port_guid))->rank - ); - - if (did_cause_update) - { - cl_list_insert_tail(p_nextList, p_remote_physp); - } - } + remote_u = p_remote_physp->p_node->sw->priv; + port_guid = p_remote_physp->port_guid; + NOISE_L( p_log, OSM_LOG_DEBUG, + "updn_subn_rank: " + "Ranking port GUID 0x%" PRIx64 "\n", cl_ntoh64(port_guid) ); + did_cause_update = __updn_update_rank(remote_u, rank); + + osm_log( p_log, OSM_LOG_DEBUG, + "updn_subn_rank: " + "Rank of port GUID 0x%" PRIx64 " = %u\n", + cl_ntoh64(port_guid), + remote_u->rank ); + + if (did_cause_update) + cl_qlist_insert_tail(&list, &remote_u->list); } - /* Propagte through the next item in the p_currList */ - p_physp = (osm_physp_t*)cl_list_remove_head( p_currList ); } - /* First free the allocation of cl_list pointer then reallocate */ - cl_list_destroy( p_currList ); - free(p_currList); - /* p_currList is empty - need to assign it to p_nextList */ - p_currList = p_nextList; } - osm_log( p_log, OSM_LOG_VERBOSE, - "updn_subn_rank: " - "BFS the subnet ]\n" ); - - cl_list_destroy( p_currList ); - free(p_currList); - /* Print Summary of ranking */ osm_log( p_log, OSM_LOG_VERBOSE, "updn_subn_rank: " "Rank Info :\n\t Root Guid = 0x%" PRIx64 "\n\t Max Node Rank = %d\n", - cl_ntoh64(p_root_port->guid), rank ); - p_updn->state = UPDN_RANK; + root_guid, rank ); OSM_LOG_EXIT( p_log ); return 0; } @@ -875,25 +615,6 @@ __osm_subn_set_up_down_min_hop_table( OSM_LOG_ENTER( p_log, __osm_subn_set_up_down_min_hop_table ); - if (p_updn->state == UPDN_INIT) - { - osm_log( p_log, OSM_LOG_ERROR, - "__osm_subn_set_up_down_min_hop_table: ERR AA06: " - "Calculating Min Hop only allowed after ranking\n" ); - OSM_LOG_EXIT( p_log ); - return 1; - } - - /* Check if its a non switched subnet .. */ - if ( cl_is_qmap_empty( &p_subn->sw_guid_tbl ) ) - { - osm_log( p_log, OSM_LOG_ERROR, - "__osm_subn_set_up_down_min_hop_table: ERR AA10: " - "This is a non switched subnet, cannot perform UPDN algorithm\n" ); - OSM_LOG_EXIT( p_log ); - return 1; - } - /* Go over all the switches in the subnet - for each init their Min Hop Table */ osm_log( p_log, OSM_LOG_VERBOSE, @@ -927,8 +648,7 @@ __osm_subn_set_up_down_min_hop_table( "__osm_subn_set_up_down_min_hop_table: " "BFS through port GUID 0x%" PRIx64 "\n", cl_ntoh64(port_guid) ); - if(__updn_bfs_by_node(p_updn, p_subn, p_port, - &p_updn->guid_rank_tbl)) + if(__updn_bfs_by_node(p_log, p_subn, p_port)) { OSM_LOG_EXIT( p_log ); return 1; @@ -952,7 +672,6 @@ __osm_subn_calc_up_down_min_hop_table( IN updn_t* p_updn ) { uint8_t idx = 0; - cl_map_item_t *p_map_item; int status; OSM_LOG_ENTER( &p_updn->p_osm->log, osm_subn_calc_up_down_min_hop_table ); @@ -965,7 +684,18 @@ __osm_subn_calc_up_down_min_hop_table( osm_log( &p_updn->p_osm->log, OSM_LOG_ERROR, "__osm_subn_calc_up_down_min_hop_table: ERR AA0A: " "No guids were given or number of guids is 0\n" ); - return 1; + status = -1; + goto _exit; + } + + /* Check if it's not a switched subnet */ + if ( cl_is_qmap_empty( &p_updn->p_osm->subn.sw_guid_tbl ) ) + { + osm_log( &p_updn->p_osm->log, OSM_LOG_ERROR, + "__osm_subn_calc_up_down_min_hop_table: ERR AAOB: " + "This is not a switched subnet, cannot perform UPDN algorithm\n" ); + status = -1; + goto _exit; } for (idx = 0; idx < num_guids; idx++) @@ -980,27 +710,16 @@ __osm_subn_calc_up_down_min_hop_table( status = __osm_subn_set_up_down_min_hop_table(p_updn); - /* Cleanup updn rank tbl */ - p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl); - while( p_map_item != cl_qmap_end( &p_updn->guid_rank_tbl )) - { - osm_log( &p_updn->p_osm->log, OSM_LOG_DEBUG, - "__osm_subn_calc_up_down_min_hop_table: " - "guid = 0x%" PRIx64 " rank = %u\n", - cl_ntoh64(cl_qmap_key(p_map_item)), - ((updn_rank_t *)p_map_item)->rank ); - cl_qmap_remove_item( &p_updn->guid_rank_tbl, p_map_item); - free( (updn_rank_t *)p_map_item); - p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl); - } - + _exit: OSM_LOG_EXIT( &p_updn->p_osm->log ); return status; } /********************************************************************** **********************************************************************/ -static void expand_lid_matrices_for_lmc(osm_subn_t *p_subn) +static void +expand_lid_matrices_for_lmc( + osm_subn_t *p_subn ) { cl_map_item_t *p_next_port, *p_next_sw; osm_port_t *p_port; @@ -1009,7 +728,8 @@ static void expand_lid_matrices_for_lmc(osm_subn_t *p_subn) uint8_t port, num_ports; p_next_port = cl_qmap_head( &p_subn->port_guid_tbl ); - while (p_next_port != cl_qmap_end(&p_subn->port_guid_tbl)) { + while (p_next_port != cl_qmap_end(&p_subn->port_guid_tbl)) + { p_port = (osm_port_t *)p_next_port; p_next_port = cl_qmap_next(p_next_port); if (p_port->p_node->sw && @@ -1019,7 +739,8 @@ static void expand_lid_matrices_for_lmc(osm_subn_t *p_subn) if (!min_lid || min_lid == max_lid) continue; p_next_sw = cl_qmap_head(&p_subn->sw_guid_tbl); - while (p_next_sw != cl_qmap_end(&p_subn->sw_guid_tbl)) { + while (p_next_sw != cl_qmap_end(&p_subn->sw_guid_tbl)) + { p_sw = (osm_switch_t *)p_next_sw; p_next_sw = cl_qmap_next(p_next_sw); num_ports = osm_switch_get_num_ports(p_sw); @@ -1034,20 +755,62 @@ static void expand_lid_matrices_for_lmc(osm_subn_t *p_subn) /********************************************************************** **********************************************************************/ +static struct updn_node * +create_updn_node( + osm_switch_t *sw ) +{ + struct updn_node *u; + + u = malloc(sizeof(*u)); + if (!u) + return NULL; + memset(u, 0, sizeof(*u)); + u->sw = sw; + u->rank = 0xffffffff; + return u; +} + +static void +delete_updn_node( + struct updn_node *u ) +{ + u->sw->priv = NULL; + free(u); +} + +/********************************************************************** + **********************************************************************/ /* UPDN callback function */ static int __osm_updn_call( void *ctx ) { updn_t *p_updn = ctx; + cl_map_item_t *p_item; + osm_switch_t *p_sw; OSM_LOG_ENTER( &p_updn->p_osm->log, __osm_updn_call ); + p_item = cl_qmap_head(&p_updn->p_osm->subn.sw_guid_tbl); + while(p_item != cl_qmap_end(&p_updn->p_osm->subn.sw_guid_tbl)) + { + p_sw = (osm_switch_t *)p_item; + p_item = cl_qmap_next(p_item); + p_sw->priv = create_updn_node(p_sw); + if (!p_sw->priv) + { + osm_log( &(p_updn->p_osm->log), OSM_LOG_ERROR, + "__osm_updn_call: ERR AA0C: " + " cannot create updn node\n" ); + OSM_LOG_EXIT( &p_updn->p_osm->log ); + return -1; + } + } + /* First auto detect root nodes - if required */ if ( p_updn->auto_detect_root_nodes ) { osm_ucast_mgr_build_lid_matrices( &p_updn->p_osm->sm.ucast_mgr ); - /* printf ("-V- b4 osm_updn_find_root_nodes_by_min_hop\n"); */ __osm_updn_find_root_nodes_by_min_hop( p_updn ); } /* printf ("-V- after osm_updn_find_root_nodes_by_min_hop\n"); */ @@ -1066,8 +829,16 @@ __osm_updn_call( else osm_log( &p_updn->p_osm->log, OSM_LOG_INFO, "__osm_updn_call: " - "disable UPDN algorithm, no root nodes were found\n" ); + "disabling UPDN algorithm, no root nodes were found\n" ); + p_item = cl_qmap_head(&p_updn->p_osm->subn.sw_guid_tbl); + while(p_item != cl_qmap_end(&p_updn->p_osm->subn.sw_guid_tbl)) + { + p_sw = (osm_switch_t *)p_item; + p_item = cl_qmap_next(p_item); + delete_updn_node(p_sw->priv); + } + OSM_LOG_EXIT( &p_updn->p_osm->log ); return 0; } @@ -1137,7 +908,7 @@ __osm_updn_find_root_nodes_by_min_hop( osm_log( &p_osm->log, OSM_LOG_DEBUG, "__osm_updn_find_root_nodes_by_min_hop: " - "current number of ports in the subnet is %d\n", + "Current number of ports in the subnet is %d\n", cl_qmap_count(&p_osm->subn.port_guid_tbl) ); /* Init the required vars */ cl_qmap_init( &min_hop_hist ); @@ -1159,7 +930,7 @@ __osm_updn_find_root_nodes_by_min_hop( /* Find the Maximum number of CAs (and routers) for histogram normalization */ osm_log( &p_osm->log, OSM_LOG_VERBOSE, "__osm_updn_find_root_nodes_by_min_hop: " - "Find the number of CAs and store them in cl_list\n" ); + "Finding the number of CAs and storing them in cl_map\n" ); p_next_port = (osm_port_t*)cl_qmap_head( &p_osm->subn.port_guid_tbl ); while( p_next_port != (osm_port_t*)cl_qmap_end( &p_osm->subn.port_guid_tbl ) ) { p_port = p_next_port; @@ -1177,13 +948,13 @@ __osm_updn_find_root_nodes_by_min_hop( cl_map_insert( &ca_by_lid_map, self_lid_ho, (void *)0x1); osm_log( &p_osm->log, OSM_LOG_DEBUG, "__osm_updn_find_root_nodes_by_min_hop: " - "Inserting into array GUID 0x%" PRIx64 ", Lid: 0x%X\n", + "Inserting GUID 0x%" PRIx64 ", Lid: 0x%X into array\n", cl_ntoh64(osm_port_get_guid(p_port)), self_lid_ho ); } } osm_log( &p_osm->log, OSM_LOG_DEBUG, "__osm_updn_find_root_nodes_by_min_hop: " - "Found %u CA, %u SW in the subnet\n", numCas, numSws ); + "Found %u CAs, %u SWs in the subnet\n", numCas, numSws ); p_next_sw = (osm_switch_t*)cl_qmap_head( &p_osm->subn.sw_guid_tbl ); osm_log( &p_osm->log, OSM_LOG_VERBOSE, "__osm_updn_find_root_nodes_by_min_hop: " @@ -1201,7 +972,7 @@ __osm_updn_find_root_nodes_by_min_hop( p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); /* Clear Min Hop Table && FWD Tbls - This should caused opensm to - rebuild its FWD tables , post setting Min Hop Tables */ + rebuild it's FWD tables, post setting Min Hop Tables */ max_lid_ho = osm_switch_get_max_lid_ho(p_sw); /* Get base lid of switch by retrieving port 0 lid of node pointer */ self_lid_ho = cl_ntoh16( osm_node_get_base_lid( p_sw->p_node, 0 ) ); @@ -1285,7 +1056,7 @@ __osm_updn_find_root_nodes_by_min_hop( numHopBarsOverThd1, numHopBarsOverThd2 ); } - /* destroy the qmap table and all its content - no longer needed */ + /* destroy the qmap table and all it's content - no longer needed */ osm_log( &p_osm->log, OSM_LOG_DEBUG, "__osm_updn_find_root_nodes_by_min_hop: " "Cleanup: delete histogram " -- 1.5.0.1.26.gf5a92 From mst at mellanox.co.il Sat Feb 24 21:38:20 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Feb 2007 07:38:20 +0200 Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter In-Reply-To: References: <20070222235724.GC4447@mellanox.co.il> <20070223070055.GC25553@obsidianresearch.com> Message-ID: <20070225053820.GB6177@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] libmthca: optimize calls to htonl with constant parameter > > > Newer gccs have the -fwhole-program --combine options that address > > this and more. One of the things that happens is that all internal > > functions are made 'static' and all compilation units are optimized in > > one go. > > Good point... but is there any sane way to use that feature with > automake and libtool? I know that the autotools are a pain but I > really don't want to reimplement the useful stuff they give us, and I > don't know of any really practical replacement... Once KDE4 is out, I expect that most systems will start shipping cmake. Maybe it'll be practical to switch to that then. -- MST From vlad at mellanox.co.il Sun Feb 25 01:00:57 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 25 Feb 2007 11:00:57 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com> References: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com> Message-ID: <1172394057.12388.3.camel@vladsk-laptop> On Fri, 2007-02-23 at 12:15 -0800, Sean Hefty wrote: > I would like these fixes in OFED 1.2 as well. What git tree / branch do I > generate a patch against? > > - Sean git://git.openfabrics.org/~vlad/ofed_1_2/.git branch: ofed_1_2 - Vladimir > > --- > > rdma_cm: remove unused node_guid from cma_device structure. > ib_cm: remove ca_guid from cm_device structure. > rdma_cm: request reversible paths only. > ib_core: Set hop limit in ib_init_ah_from_wc correctly. > > The patches are in: > > git://git.openfabrics.org/~shefty/rdma-dev.git for-roland > > (sign-off line was added to the actual commit messages) > > Signed-off-by: Sean Hefty > --- > commit 28e218621d36cf9da42f07af08775769eb289fc0 > Author: Sean Hefty > Date: Thu Feb 22 11:37:44 2007 -0800 > > rdma_cm: remove unused node_guid from cma_device structure. > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index bb27ce9..d441815 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -77,7 +77,6 @@ static int next_port; > struct cma_device { > struct list_head list; > struct ib_device *device; > - __be64 node_guid; > struct completion comp; > atomic_t refcount; > struct list_head id_list; > @@ -2674,7 +2673,6 @@ static void cma_add_one(struct ib_device *device) > return; > > cma_dev->device = device; > - cma_dev->node_guid = device->node_guid; > > init_completion(&cma_dev->comp); > atomic_set(&cma_dev->refcount, 1); > > commit 6de97f2a3373357d720b1653dfc0aac6d40b7506 > Author: Sean Hefty > Date: Thu Feb 22 11:37:38 2007 -0800 > > ib_cm: remove ca_guid from cm_device structure. > > The cm_device references an ib_device, which contains the node_guid. > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index d446998..842cd0b 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -88,7 +88,6 @@ struct cm_port { > struct cm_device { > struct list_head list; > struct ib_device *device; > - __be64 ca_guid; > struct cm_port port[0]; > }; > > @@ -739,8 +738,8 @@ retest: > ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); > spin_unlock_irqrestore(&cm_id_priv->lock, flags); > ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT, > - &cm_id_priv->av.port->cm_dev->ca_guid, > - sizeof cm_id_priv->av.port->cm_dev->ca_guid, > + &cm_id_priv->id.device->node_guid, > + sizeof cm_id_priv->id.device->node_guid, > NULL, 0); > break; > case IB_CM_REQ_RCVD: > @@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, > > req_msg->local_comm_id = cm_id_priv->id.local_id; > req_msg->service_id = param->service_id; > - req_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid; > + req_msg->local_ca_guid = cm_id_priv->id.device->node_guid; > cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num)); > cm_req_set_resp_res(req_msg, param->responder_resources); > cm_req_set_init_depth(req_msg, param->initiator_depth); > @@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg, > cm_rep_set_flow_ctrl(rep_msg, param->flow_control); > cm_rep_set_rnr_retry_count(rep_msg, param->rnr_retry_count); > cm_rep_set_srq(rep_msg, param->srq); > - rep_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid; > + rep_msg->local_ca_guid = cm_id_priv->id.device->node_guid; > > if (param->private_data && param->private_data_len) > memcpy(rep_msg->private_data, param->private_data, > @@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device) > return; > > cm_dev->device = device; > - cm_dev->ca_guid = device->node_guid; > > set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); > for (i = 1; i <= device->phys_port_cnt; i++) { > > commit 87680047dd09ca4a4e8ec575dad215c92cf45ed3 > Author: Sean Hefty > Date: Wed Feb 21 16:40:44 2007 -0800 > > rdma_cm: request reversible paths only > > The rdma_cm requires that path records be reversible. Set the reversible > bit when issuing an path record query. > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index f8d69b3..bb27ce9 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -1492,11 +1492,13 @@ static int cma_query_ib_route(struct rdma_id_private > *id_priv, int timeout_ms, > ib_addr_get_dgid(addr, &path_rec.dgid); > path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); > path_rec.numb_path = 1; > + path_rec.reversible = 1; > > id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device, > id_priv->id.port_num, &path_rec, > IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | > - IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH, > + IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | > + IB_SA_PATH_REC_REVERSIBLE, > timeout_ms, GFP_KERNEL, > cma_query_handler, work, &id_priv->query); > > > commit 30947e5b7db42184d66746ac1187d4abbf89018d > Author: Sean Hefty > Date: Wed Feb 21 16:37:31 2007 -0800 > > ib_core: Set hop limit in ib_init_ah_from_wc correctly. > > The hop_limit value in the ah_attr should be 0xFF, not the value read > from the received GRH (which should be 0). See 13.5.4.4 in the 1.2 IB spec. > > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c > index 8b5dd36..ccdf93d 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -167,7 +167,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 > port_num, struct ib_wc *wc, > ah_attr->grh.sgid_index = (u8) gid_index; > flow_class = be32_to_cpu(grh->version_tclass_flow); > ah_attr->grh.flow_label = flow_class & 0xFFFFF; > - ah_attr->grh.hop_limit = grh->hop_limit; > + ah_attr->grh.hop_limit = 0xFF; > ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF; > } > return 0; -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Sun Feb 25 02:26:07 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sun, 25 Feb 2007 02:26:07 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070225-0200 daily build status Message-ID: <20070225102608.251A3E607F6@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod --with-rds-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: Build failed on ia64 with linux-2.6.19 Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband/core/addr.c:62: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.19' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.18 Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband/core/addr.c:62: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.18' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.12 Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.12' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.13 Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.13' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.15 Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.15' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.14 Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.14' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.17 Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.17' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.16 Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’ /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.5-7.244-smp Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit': /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:467: error: implicit declaration of function 'proto_unregister' /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init': /home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:516: error: implicit declaration of function 'proto_register' make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From ogerlitz at voltaire.com Sun Feb 25 02:48:30 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 25 Feb 2007 12:48:30 +0200 Subject: [openib-general] ipoib & the partial pkey In-Reply-To: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com> References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com> Message-ID: <45E1697E.6050007@voltaire.com> Sean Hefty wrote: > I looked into this more... > RFC 4391 states (middle of page 5): > For a node to join a partition, one of its ports must be assigned the relevant > P_Key by the SM [RFC4392]. > Jumping to RFC 4392 (top of page 4): Just to have us agree on the quote, it is from section 4 of rfc 4392 (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt > at the time of creating an IB multicast group, multiple values such as the > P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc. have to be > specified. These values should be such that all potential members of the IB > multicast group are able to communicate with one another when using them. OK, I suggest to remove this spec limitation, as it does not allow the use case of a server using a partition for which inter-client communication is not allowed. Actually since it does not let people use partial membership partitioning with IPoIB as every ipoib device needs to join the broadcast group, it is probably a spec bug and not a limitation done on purpose. A simple real-life example is I/O target, the system admin wants IB block and/or file storage traffic to use a partition, but he does not want initiators to communicate among themselves on this partition. To achieve that the SM is configured to assign the partial pkey to the initiator nodes and the full pkey to the target ports. The current implementation of IPoIB and core perfectly (and transparently...) supports that. Or. From mst at mellanox.co.il Sun Feb 25 04:22:11 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Feb 2007 14:22:11 +0200 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: References: Message-ID: <20070225122211.GD5331@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth > > OK, I applied the following patch (I had to change one line of your > patch to get it to apply because the small-message changed the context > so one chunk didn't apply). > > Anyway I don't see any difference in small message latency or large > message throughput. (Actually latency seems slightly worse but I > think the change is within my normal variability so I'm don't think > the difference is significant) OK. I wonder whether unrolling the loop in skb_put_frags might be helpful. Could you please try the following? Does this affect latency for you? (I don't see any difference in between UD and CM either with or without this patch). Try to improve small message latency some more by unrolling more loops. Signed-off-by: Michael S. Tsirkin --- diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index a389854..a8895b4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -311,38 +311,6 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id, return 0; } } -/* Adjust length of skb with fragments to match received data */ -static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, - unsigned int length, struct sk_buff *toskb) -{ - int i, num_frags; - unsigned int size; - - /* put header into skb */ - size = min(length, hdr_space); - skb->tail += size; - skb->len += size; - length -= size; - - num_frags = skb_shinfo(skb)->nr_frags; - for (i = 0; i < num_frags; i++) { - skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - - if (length == 0) { - /* don't need this page */ - skb_fill_page_desc(toskb, i, frag->page, 0, PAGE_SIZE); - --skb_shinfo(skb)->nr_frags; - } else { - size = min(length, (unsigned) PAGE_SIZE); - - frag->size = size; - skb->data_len += size; - skb->truesize += size; - skb->len += size; - length -= size; - } - } -} void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { @@ -352,7 +320,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) struct ipoib_cm_rx *p; unsigned long flags; u64 mapping[IPOIB_CM_RX_SG]; - int frags; + unsigned head_size, frag_size, frags; ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n", wr_id, wc->opcode, wc->status); @@ -388,8 +356,9 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) } } - frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, - (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; + head_size = min(wc->byte_len, (unsigned)IPOIB_CM_HEAD_SIZE); + frag_size = wc->byte_len - head_size; + frags = PAGE_ALIGN(frag_size) / PAGE_SIZE; newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping); if (unlikely(!newskb)) { @@ -408,7 +377,18 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); - skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb); + memcpy(&skb_shinfo(newskb)->frags[frags], &skb_shinfo(skb)->frags[frags], + (IPOIB_CM_RX_SG - 1 - frags) * sizeof(skb_frag_t)); + skb_shinfo(newskb)->nr_frags = IPOIB_CM_RX_SG - 1; + + skb_shinfo(skb)->nr_frags = frags; + skb->tail += head_size; + skb->len += wc->byte_len; + skb->data_len += frag_size; + skb->truesize += frag_size; + if (frags) + skb_shinfo(skb)->frags[frags - 1].size = + (frag_size - 1) % PAGE_SIZE + 1; skb->protocol = ((struct ipoib_header *) skb->data)->proto; skb->mac.raw = skb->data; -- MST From dotanb at dev.mellanox.co.il Sun Feb 25 06:03:28 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 25 Feb 2007 16:03:28 +0200 Subject: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path() In-Reply-To: References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com> <1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il> Message-ID: <45E19730.7010008@dev.mellanox.co.il> Hi and sorry about the delay in the reply. Roland Dreier wrote: > > In issue number 296 that i opened several months ago in the Bugzilla, i > > reported about two missing attributes: the first one is the static_rate, > > and the second one is the src_path_bits which is not being filled right. > > The patch I posted fixes the static rate, right? > > You'll need to explain what you mean about src_path_bits, because at > first glance the code looks OK to me. > Here is the code that handles the src_path_bits: int ib_init_ah_from_path(struct ib_device *device, u8 port_num, struct ib_sa_path_rec *rec, struct ib_ah_attr *ah_attr) { int ret; u16 gid_index; memset(ah_attr, 0, sizeof *ah_attr); ah_attr->dlid = be16_to_cpu(rec->dlid); ah_attr->sl = rec->sl; ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f; I have a feeling that this function doesn't handle the src_path_bits as it should because it doesn't care what is the LMC value of the slid (i think that if the LMC is < 8) wrong bits may be set in the src_path_bits. I think that no one noticed any failure in this code (so far) because not many users use LMC > 0 in their subnet, and most of the code that will call this function will use it with the base port LID. thanks Dotan From kliteyn at dev.mellanox.co.il Sun Feb 25 06:23:30 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 25 Feb 2007 16:23:30 +0200 Subject: [openib-general] [PATCH] osm: Flushing log file after OSM_SYS_LOG message Message-ID: <45E19BE2.2070704@dev.mellanox.co.il> Hi Hal, OSM log should be flushed when OSM_SYS_LOG message is printed. We had this once, but somehow it has disappeared. This fix has to go both to trunk and to 1.2. Thanks, --Yevgeny Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_log.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c index d76031d..f95ed85 100644 --- a/osm/opensm/osm_log.c +++ b/osm/opensm/osm_log.c @@ -204,7 +204,8 @@ osm_log( #endif /* flush log */ - if (ret > 0 && (p_log->flush || (verbosity & OSM_LOG_ERROR)) && + if ( ret > 0 && + (p_log->flush || (verbosity & OSM_LOG_ERROR) || (verbosity & OSM_LOG_SYS)) && fflush( p_log->out_port ) < 0) ret = -1; -- 1.4.4.1.GIT From vlad at lists.openfabrics.org Sun Feb 25 08:06:32 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sun, 25 Feb 2007 08:06:32 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070225-0736 daily build status Message-ID: <20070225160632.A0A9AE603C8@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod --with-rds-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'add_adapter': /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: 'adapter_list_lock' undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'remove_adapter': /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: 'adapter_list_lock' undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: 'ADVERTISE_PAUSE_CAP' undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: 'ADVERTISE_PAUSE_ASYM' undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.5-7.244-smp Log: /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit': /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:467: error: implicit declaration of function 'proto_unregister' /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init': /home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:516: error: implicit declaration of function 'proto_register' make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From mst at mellanox.co.il Sun Feb 25 09:08:16 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Feb 2007 19:08:16 +0200 Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package In-Reply-To: References: Message-ID: <20070225170816.GA18630@mellanox.co.il> > Quoting r. Woodruff, Robert J : > Subject: Re: [openfabrics-ewg] new OFED 1.2 package > > I am also still seeing the issue with the rdma_cm abi_version on RedHat > EL4-U3, > bug number, 347. The bug report contains the patch that should fix this. OK, here's a somewhat cleaned-up patch. However, I have a question: should not the module cleanup function remove ucma_class and class_attr_abi_version that were created at module initialization? ---- diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index e2e8d32..e9e024e 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -847,13 +847,12 @@ static struct miscdevice ucma_misc = { .fops = &ucma_fops, }; -static ssize_t show_abi_version(struct device *dev, - struct device_attribute *attr, - char *buf) +static struct class *ucma_class; +static ssize_t show_abi_version(struct class *class_dev, char *buf) { return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION); } -static DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); +static CLASS_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); static int __init ucma_init(void) { @@ -863,7 +862,13 @@ static int __init ucma_init(void) if (ret) return ret; - ret = device_create_file(ucma_misc.this_device, &dev_attr_abi_version); + ucma_class = class_create(THIS_MODULE, "infiniband_ucma"); + if (IS_ERR(ucma_class)) { + printk(KERN_ERR "rdma_ucm: couldn't create class infiniband_ucma\n"); + goto err; + } + + ret = class_create_file(ucma_class, &class_attr_abi_version); if (ret) { printk(KERN_ERR "rdma_ucm: couldn't create abi_version attr\n"); goto err; @@ -876,7 +881,6 @@ err: static void __exit ucma_cleanup(void) { - device_remove_file(ucma_misc.this_device, &dev_attr_abi_version); misc_deregister(&ucma_misc); idr_destroy(&ctx_idr); } -- MST From sashak at voltaire.com Sun Feb 25 11:58:37 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 25 Feb 2007 21:58:37 +0200 Subject: [openib-general] [PATCH] osm: Flushing log file after OSM_SYS_LOG message In-Reply-To: <45E19BE2.2070704@dev.mellanox.co.il> References: <45E19BE2.2070704@dev.mellanox.co.il> Message-ID: <20070225195837.GC11957@sashak.voltaire.com> On 16:23 Sun 25 Feb , Yevgeny Kliteynik wrote: > Hi Hal, > > OSM log should be flushed when OSM_SYS_LOG message is > printed. We had this once, but somehow it has disappeared. > > This fix has to go both to trunk and to 1.2. > > Thanks, > > --Yevgeny > > Signed-off-by: Yevgeny Kliteynik > --- > osm/opensm/osm_log.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c > index d76031d..f95ed85 100644 > --- a/osm/opensm/osm_log.c > +++ b/osm/opensm/osm_log.c > @@ -204,7 +204,8 @@ osm_log( > #endif > > /* flush log */ > - if (ret > 0 && (p_log->flush || (verbosity & OSM_LOG_ERROR)) && > + if ( ret > 0 && > + (p_log->flush || (verbosity & OSM_LOG_ERROR) || (verbosity & OSM_LOG_SYS)) && verbosity & (OSM_LOG_ERROR|OSM_LOG_SYS)? Sasha From sashak at voltaire.com Sun Feb 25 13:48:45 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 25 Feb 2007 23:48:45 +0200 Subject: [openib-general] [PATCH] opensm: faster min hops Message-ID: <20070225214845.GF11957@sashak.voltaire.com> After gprof output analyzing, I noticed that current lmx (switch's lid matrix) implementation is extremely slow. This simple hops matrix reimplementation makes lid matrices build process two times faster. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_port_profile.h | 1 - osm/include/opensm/osm_router.h | 1 - osm/include/opensm/osm_switch.h | 182 ++++++++------------------------- osm/opensm/osm_switch.c | 115 ++++++++++++++------- osm/opensm/osm_ucast_ftree.c | 3 - osm/opensm/osm_ucast_mgr.c | 16 +-- osm/opensm/osm_ucast_updn.c | 2 +- 7 files changed, 124 insertions(+), 196 deletions(-) diff --git a/osm/include/opensm/osm_port_profile.h b/osm/include/opensm/osm_port_profile.h index 952393d..a07b057 100644 --- a/osm/include/opensm/osm_port_profile.h +++ b/osm/include/opensm/osm_port_profile.h @@ -55,7 +55,6 @@ #include #include #include -#include #include #include diff --git a/osm/include/opensm/osm_router.h b/osm/include/opensm/osm_router.h index 168ce77..63c7566 100644 --- a/osm/include/opensm/osm_router.h +++ b/osm/include/opensm/osm_router.h @@ -52,7 +52,6 @@ #include #include #include -#include #include #include #include diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index 053b18a..19381f8 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -53,7 +53,6 @@ #include #include #include -#include #include #include #include @@ -105,10 +104,12 @@ typedef struct _osm_switch cl_map_item_t map_item; osm_node_t *p_node; ib_switch_info_t switch_info; - osm_fwd_tbl_t fwd_tbl; - osm_lid_matrix_t lmx; uint16_t max_lid_ho; + unsigned num_ports; + unsigned num_hops; + uint8_t **hops; osm_port_profile_t *p_prof; + osm_fwd_tbl_t fwd_tbl; osm_mcast_tbl_t mcast_tbl; uint32_t discovery_count; void *priv; @@ -124,19 +125,25 @@ typedef struct _osm_switch * switch_info * IBA defined SwitchInfo structure for this switch. * -* fwd_tbl -* This switch's forwarding table. +* max_lid_ho +* Max LID that is accessible from this switch. +* +* num_ports +* Number of ports for this switch. * -* lmx +* num_hops +* Size of hops table for this switch. +* +* hops * LID Matrix for this switch containing the hop count * to every LID from every port. * -* max_lid_ho -* Max LID that is accessible from this switch. -* -* p_pro +* p_prof * Pointer to array of Port Profile objects for this switch. * +* fwd_tbl +* This switch's forwarding table. +* * mcast_tbl * Multicast forwarding table for this switch. * @@ -149,70 +156,9 @@ typedef struct _osm_switch * Switch object *********/ -/****f* OpenSM: Switch/osm_switch_construct +/****f* OpenSM: Switch/osm_switch_delete * NAME -* osm_switch_construct -* -* DESCRIPTION -* This function constructs a Switch object. -* -* SYNOPSIS -*/ -void -osm_switch_construct( - IN osm_switch_t* const p_sw ); -/* -* PARAMETERS -* p_sw -* [in] Pointer to a Switch object to construct. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Allows calling osm_switch_init, and osm_switch_destroy. -* -* Calling osm_switch_construct is a prerequisite to calling any other -* method except osm_switch_init. -* -* SEE ALSO -* Switch object, osm_switch_init, osm_switch_destroy -*********/ - -/****f* OpenSM: Switch/osm_switch_destroy -* NAME -* osm_switch_destroy -* -* DESCRIPTION -* The osm_switch_destroy function destroys the object, releasing -* all resources. -* -* SYNOPSIS -*/ -void -osm_switch_destroy( - IN osm_switch_t* const p_sw ); -/* -* PARAMETERS -* p_sw -* [in] Pointer to the object to destroy. -* -* RETURN VALUE -* None. -* -* NOTES -* Performs any necessary cleanup of the specified object. -* Further operations should not be attempted on the destroyed object. -* This function should only be called after a call to osm_switch_construct -* or osm_switch_init. -* -* SEE ALSO -* Switch object, osm_switch_construct, osm_switch_init -*********/ - -/****f* OpenSM: Switch/osm_switch_destroy -* NAME -* osm_switch_destroy +* osm_switch_delete * * DESCRIPTION * Destroys and deallocates the object. @@ -236,42 +182,6 @@ osm_switch_delete( * Switch object, osm_switch_construct, osm_switch_init *********/ -/****f* OpenSM: Switch/osm_switch_init -* NAME -* osm_switch_init -* -* DESCRIPTION -* The osm_switch_init function initializes a Switch object for use. -* -* SYNOPSIS -*/ -ib_api_status_t -osm_switch_init( - IN osm_switch_t* const p_sw, - IN osm_node_t* const p_node, - IN const osm_madw_t* const p_madw ); -/* -* PARAMETERS -* p_sw -* [in] Pointer to an osm_switch_t object to initialize. -* -* p_node -* [in] Pointer to the node object of this switch -* -* p_madw -* [in] Pointer to the MAD Wrapper containing the switch's -* SwitchInfo attribute. -* -* RETURN VALUES -* IB_SUCCESS if the Switch object was initialized successfully. -* -* NOTES -* Allows calling other node methods. -* -* SEE ALSO -* Switch object, osm_switch_construct, osm_switch_destroy -*********/ - /****f* OpenSM: Switch/osm_switch_new * NAME * osm_switch_new @@ -317,8 +227,9 @@ static inline boolean_t osm_switch_is_leaf_lid( IN const osm_switch_t* const p_sw, IN const uint16_t lid_ho ) -{ - return( osm_lid_matrix_get_least_hops( &p_sw->lmx, lid_ho ) <= 1 ); +{ + return (lid_ho > p_sw->max_lid_ho || !p_sw->hops[lid_ho]) ? FALSE : + (p_sw->hops[lid_ho][0] <= 1); } /* * PARAMETERS @@ -353,7 +264,8 @@ osm_switch_get_hop_count( IN const uint16_t lid_ho, IN const uint8_t port_num ) { - return( osm_lid_matrix_get( &p_sw->lmx, lid_ho, port_num ) ); + return (lid_ho > p_sw->max_lid_ho || !p_sw->hops[lid_ho]) ? + OSM_NO_PATH : p_sw->hops[lid_ho][port_num]; } /* * PARAMETERS @@ -411,15 +323,12 @@ osm_switch_get_fwd_tbl_ptr( * * SYNOPSIS */ -static inline cl_status_t +cl_status_t osm_switch_set_hops( IN osm_switch_t* const p_sw, IN const uint16_t lid_ho, IN const uint8_t port_num, - IN const uint8_t num_hops ) -{ - return( osm_lid_matrix_set( &p_sw->lmx, lid_ho, port_num, num_hops ) ); -} + IN const uint8_t num_hops ); /* * PARAMETERS * p_sw @@ -442,35 +351,23 @@ osm_switch_set_hops( * SEE ALSO *********/ -/****f* OpenSM: Switch/osm_switch_set_min_lid_size +/****f* OpenSM: Switch/osm_switch_hops_clear * NAME -* osm_switch_set_min_lid_size +* osm_switch_hops_clear * * DESCRIPTION -* Sets the size of the switch's routing table to at least accomodate the -* specified LID value (host ordered) +* Cleanup existing hops tables (lid matrix) * * SYNOPSIS */ -static inline cl_status_t -osm_switch_set_min_lid_size( - IN osm_switch_t* const p_sw, - IN const uint16_t lid_ho ) -{ - return( osm_lid_matrix_set_min_lid_size( &p_sw->lmx, lid_ho ) ); -} +void +osm_switch_hops_clear( + IN osm_switch_t *p_sw ); /* * PARAMETERS * p_sw * [in] Pointer to a Switch object. * -* lid_ho -* [in] LID value (host order) for which to set the count. -* -* RETURN VALUES -* Sets the size of the switch's routing table to at least accomodate the -* specified LID value (host ordered) -* * NOTES * * SEE ALSO @@ -491,7 +388,8 @@ osm_switch_get_least_hops( IN const osm_switch_t* const p_sw, IN const uint16_t lid_ho ) { - return( osm_lid_matrix_get_least_hops( &p_sw->lmx, lid_ho ) ); + return (lid_ho > p_sw->max_lid_ho || !p_sw->hops[lid_ho]) ? + OSM_NO_PATH : p_sw->hops[lid_ho][0]; } /* * PARAMETERS @@ -768,9 +666,7 @@ static inline uint16_t osm_switch_get_max_lid_ho( IN const osm_switch_t* const p_sw ) { - if (p_sw->max_lid_ho != 0) - return p_sw->max_lid_ho; - return( osm_lid_matrix_get_max_lid_ho( &p_sw->lmx ) ); + return p_sw->max_lid_ho; } /* * PARAMETERS @@ -799,7 +695,7 @@ static inline uint8_t osm_switch_get_num_ports( IN const osm_switch_t* const p_sw ) { - return( osm_lid_matrix_get_num_ports( &p_sw->lmx ) ); + return p_sw->num_ports; } /* * PARAMETERS @@ -1348,12 +1244,16 @@ osm_switch_path_count_get( */ void osm_switch_prepare_path_rebuild( - IN osm_switch_t* const p_sw ); + IN osm_switch_t* p_sw, + IN uint16_t max_lids ); /* * PARAMETERS * p_sw * [in] Pointer to the Switch object. * +* max_lids +* [in] Max number of lids in the subnet. +* * RETURN VALUE * None. * diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c index 8e7728b..7c57398 100644 --- a/osm/opensm/osm_switch.c +++ b/osm/opensm/osm_switch.c @@ -55,20 +55,34 @@ #include #include +cl_status_t /********************************************************************** **********************************************************************/ -void -osm_switch_construct( - IN osm_switch_t* const p_sw ) +osm_switch_set_hops( + IN osm_switch_t* const p_sw, + IN const uint16_t lid_ho, + IN const uint8_t port_num, + IN const uint8_t num_hops ) { - CL_ASSERT( p_sw ); - memset( p_sw, 0, sizeof(*p_sw) ); - osm_lid_matrix_construct( &p_sw->lmx ); + if (lid_ho > p_sw->max_lid_ho) + return -1; + if (!p_sw->hops[lid_ho]) { + p_sw->hops[lid_ho] = malloc(p_sw->num_ports); + if (!p_sw->hops[lid_ho]) + return -1; + memset(p_sw->hops[lid_ho], 0xff, p_sw->num_ports); + } + + p_sw->hops[lid_ho][port_num] = num_hops; + if (p_sw->hops[lid_ho][0] > num_hops) + p_sw->hops[lid_ho][0] = num_hops; + + return 0; } /********************************************************************** **********************************************************************/ -ib_api_status_t +static ib_api_status_t osm_switch_init( IN osm_switch_t* const p_sw, IN osm_node_t* const p_node, @@ -80,12 +94,6 @@ osm_switch_init( uint8_t num_ports; uint32_t port_num; - CL_ASSERT( p_sw ); - CL_ASSERT( p_madw ); - CL_ASSERT( p_node ); - - osm_switch_construct( p_sw ); - p_smp = osm_madw_get_smp_ptr( p_madw ); p_si = (ib_switch_info_t*)ib_smp_get_payload_ptr( p_smp ); num_ports = osm_node_get_num_physp( p_node ); @@ -94,10 +102,7 @@ osm_switch_init( p_sw->p_node = p_node; p_sw->switch_info = *p_si; - - status = osm_lid_matrix_init( &p_sw->lmx, num_ports ); - if( status != IB_SUCCESS ) - goto Exit; + p_sw->num_ports = num_ports; status = osm_fwd_tbl_init( &p_sw->fwd_tbl, p_si ); if( status != IB_SUCCESS ) @@ -127,23 +132,20 @@ osm_switch_init( /********************************************************************** **********************************************************************/ void -osm_switch_destroy( - IN osm_switch_t* const p_sw ) +osm_switch_delete( + IN OUT osm_switch_t** const pp_sw ) { - /* free memory to avoid leaks */ + osm_switch_t *p_sw = *pp_sw; + unsigned i; osm_mcast_tbl_destroy( &p_sw->mcast_tbl ); free( p_sw->p_prof ); osm_fwd_tbl_destroy( &p_sw->fwd_tbl ); - osm_lid_matrix_destroy( &p_sw->lmx ); -} - -/********************************************************************** - **********************************************************************/ -void -osm_switch_delete( - IN OUT osm_switch_t** const pp_sw ) -{ - osm_switch_destroy( *pp_sw ); + if (p_sw->hops) { + for (i = 0 ; i < p_sw->num_hops ; i++) + if (p_sw->hops[i]) + free(p_sw->hops[i]); + free(p_sw->hops); + } free( *pp_sw ); *pp_sw = NULL; } @@ -158,6 +160,9 @@ osm_switch_new( ib_api_status_t status; osm_switch_t *p_sw; + CL_ASSERT( p_madw ); + CL_ASSERT( p_node ); + p_sw = (osm_switch_t*)malloc( sizeof(*p_sw) ); if( p_sw ) { @@ -322,6 +327,9 @@ osm_switch_recommend_path( } } + if (osm_node_get_base_lid(p_sw->p_node, 0) == cl_hton16(lid_ho)) + return 0; + /* This algorithm selects a port based on a static load balanced selection across equal hop-count ports. @@ -337,7 +345,7 @@ osm_switch_recommend_path( */ /* port number starts with zero and num_ports is 1 + num phys ports */ - for ( port_num = 0; port_num < num_ports; port_num++ ) + for ( port_num = 1; port_num < num_ports; port_num++ ) { if ( osm_switch_get_hop_count( p_sw, lid_ho, port_num ) == least_hops) { @@ -466,16 +474,45 @@ osm_switch_recommend_path( /********************************************************************** **********************************************************************/ void -osm_switch_prepare_path_rebuild( - IN osm_switch_t* const p_sw ) +osm_switch_hops_clear( + IN osm_switch_t *p_sw ) { - uint8_t port_num; - uint8_t num_ports; + unsigned i; + for (i = 0 ; i < p_sw->num_hops ; i++) + if (p_sw->hops[i]) + memset(p_sw->hops[i], 0xff, p_sw->num_ports); +} - num_ports = osm_switch_get_num_ports( p_sw ); - osm_lid_matrix_clear( &p_sw->lmx ); - for( port_num = 0; port_num < num_ports; port_num++ ) - osm_port_prof_construct( &p_sw->p_prof[port_num] ); +/********************************************************************** + **********************************************************************/ +void +osm_switch_prepare_path_rebuild( + IN osm_switch_t* p_sw, + IN uint16_t max_lids ) +{ + unsigned i; + + for( i = 0; i < p_sw->num_ports; i++ ) + osm_port_prof_construct( &p_sw->p_prof[i] ); + if (!p_sw->hops) { + p_sw->hops = malloc((max_lids + 1)*sizeof(p_sw->hops[0])); + if (!p_sw->hops) + return; + memset(p_sw->hops, 0, (max_lids + 1)*sizeof(p_sw->hops[0])); + p_sw->num_hops = max_lids + 1; + } + else if (max_lids + 1 > p_sw->num_hops) { + uint8_t **old_hops = p_sw->hops; + p_sw->hops = malloc((max_lids + 1)*sizeof(p_sw->hops[0])); + if (!p_sw->hops) + return; + memcpy(p_sw->hops, old_hops, p_sw->num_hops*sizeof(p_sw->hops[0])); + memset(p_sw->hops + p_sw->num_hops, 0, + (max_lids + 1 - p_sw->num_hops)*sizeof(p_sw->hops[0])); + p_sw->num_hops = max_lids + 1; + free(old_hops); + } + p_sw->max_lid_ho = max_lids; } /********************************************************************** diff --git a/osm/opensm/osm_ucast_ftree.c b/osm/opensm/osm_ucast_ftree.c index 21aa4a8..61db1d7 100644 --- a/osm/opensm/osm_ucast_ftree.c +++ b/osm/opensm/osm_ucast_ftree.c @@ -782,9 +782,6 @@ __osm_ftree_sw_set_hops( IN uint8_t port_num, IN uint8_t hops) { - /* make sure the lid matrix has enough room */ - osm_switch_set_min_lid_size(p_sw->p_osm_sw, max_lid_ho); - /* set local min hop table(LID) */ return osm_switch_set_hops(p_sw->p_osm_sw, lid_ho, diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 306c795..93cafae 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -407,11 +407,13 @@ static void __osm_ucast_mgr_dump_tables(osm_ucast_mgr_t *p_mgr) Starting a rebuild, so notify the switch so it can clear tables, etc... **********************************************************************/ static void -__osm_ucast_mgr_clean_switch( +__osm_ucast_mgr_setup_switch( IN cl_map_item_t* const p_map_item, - IN void* context ) + IN void* cxt ) { - osm_switch_prepare_path_rebuild((osm_switch_t *)p_map_item); + uint16_t lids = cl_ptr_vector_get_size(&((osm_subn_t *)cxt)->port_lid_tbl); + osm_switch_prepare_path_rebuild((osm_switch_t *)p_map_item, + lids ? lids - 1 : 0); } /********************************************************************** @@ -519,12 +521,6 @@ __osm_ucast_mgr_process_neighbor( */ max_lid_ho = osm_switch_get_max_lid_ho( p_remote_sw ); - /* - Make sure the local lid matrix has enough room to hold - all the LID info coming from the remote LID matrix. - */ - osm_switch_set_min_lid_size( p_sw, max_lid_ho ); - hops = OSM_NO_PATH; for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ ) { @@ -1221,7 +1217,7 @@ osm_ucast_mgr_process( goto Exit; p_mgr->any_change = FALSE; - cl_qmap_apply_func(p_sw_guid_tbl, __osm_ucast_mgr_clean_switch, NULL); + cl_qmap_apply_func(p_sw_guid_tbl, __osm_ucast_mgr_setup_switch, p_mgr->p_subn); if (!p_routing_eng->build_lid_matrices || p_routing_eng->build_lid_matrices(p_routing_eng->context) != 0) diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index e8282f4..950bcb4 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -627,7 +627,7 @@ __osm_subn_set_up_down_min_hop_table( p_sw = p_next_sw; p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); /* Clear Min Hop Table */ - osm_lid_matrix_clear(&(p_sw->lmx)); + osm_switch_hops_clear(p_sw); } osm_log( p_log, OSM_LOG_VERBOSE, -- 1.5.0.1.26.gf5a92 From sashak at voltaire.com Sun Feb 25 14:19:43 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 26 Feb 2007 00:19:43 +0200 Subject: [openib-general] [PATCH] opensm: remove osm_matrix.* files In-Reply-To: <20070225214845.GF11957@sashak.voltaire.com> References: <20070225214845.GF11957@sashak.voltaire.com> Message-ID: <20070225221943.GG11957@sashak.voltaire.com> Following previously submitted min hops reimplementation this removes unused osm_matrix.* files. Signed-off-by: Sasha Khapyorsky --- osm/include/Makefile.am | 1 - osm/include/opensm/osm_matrix.h | 456 --------------------------------------- osm/opensm/Makefile.am | 2 +- osm/opensm/osm_matrix.c | 156 ------------- 4 files changed, 1 insertions(+), 614 deletions(-) delete mode 100644 osm/include/opensm/osm_matrix.h delete mode 100644 osm/opensm/osm_matrix.c diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index cf1b0e7..57b5296 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -17,7 +17,6 @@ EXTRA_DIST = \ $(srcdir)/opensm/osm_madw.h \ $(srcdir)/opensm/osm_subnet.h \ $(srcdir)/opensm/osm_sweep_fail_ctrl.h \ - $(srcdir)/opensm/osm_matrix.h \ $(srcdir)/opensm/osm_sa_lft_record.h \ $(srcdir)/opensm/osm_sa_mft_record.h \ $(srcdir)/opensm/osm_resp.h \ diff --git a/osm/include/opensm/osm_matrix.h b/osm/include/opensm/osm_matrix.h deleted file mode 100644 index 65db20a..0000000 --- a/osm/include/opensm/osm_matrix.h +++ /dev/null @@ -1,456 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Declaration of osm_lid_matrix_t. - * This object represents a two dimensional array of port numbers - * and LID values. - * This object is part of the OpenSM family of objects. - * - * Environment: - * Linux User Mode - * - * $Revision: 1.5 $ - */ - -#ifndef _OSM_MATRIX_H_ -#define _OSM_MATRIX_H_ - -#include -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****h* OpenSM/LID Matrix -* NAME -* LID Matrix -* -* DESCRIPTION -* The LID Matrix object encapsulates the information needed by the -* OpenSM to manage fabric routes. It is a two dimensional array -* index by LID value and Port Number. Each element contains the -* number of hops from that Port Number to the LID. -* Every Switch object contains a LID Matrix. -* -* The LID Matrix is not thread safe, thus callers must provide -* serialization. -* -* This object should be treated as opaque and should be -* manipulated only through the provided functions. -* -* AUTHOR -* Steve King, Intel -* -*********/ - -/****s* OpenSM: LID Matrix/osm_lid_matrix_t -* NAME -* osm_lid_matrix_t -* -* DESCRIPTION -* -* The LID Matrix object encapsulates the information needed by the -* OpenSM to manage fabric routes. It is a two dimensional array -* indexed by LID value and Port Number. Each element contains the -* number of hops from that Port Number to the LID. -* Every Switch object contains a LID Matrix. -* -* The LID Matrix is not thread safe, thus callers must provide -* serialization. -* -* The num_ports index into the matrix serves a special purpose, in that it -* contains the shortest hop path for that LID through any port. -* -* This object should be treated as opaque and should be -* manipulated only through the provided functions. -* -* SYNOPSIS -*/ -typedef struct _osm_lid_matrix_t -{ - cl_vector_t lid_vec; - uint8_t num_ports; -} osm_lid_matrix_t; -/* -* FIELDS -* lid_vec -* Vector (indexed by LID) of port arrays (indexed by port number) -* -* num_ports -* Number of ports at each entry in the LID vector. -* -* SEE ALSO -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_construct -* NAME -* osm_lid_matrix_construct -* -* DESCRIPTION -* This function constructs a LID Matrix object. -* -* SYNOPSIS -*/ -static inline void -osm_lid_matrix_construct( - IN osm_lid_matrix_t* const p_lmx ) -{ - p_lmx->num_ports = 0; - cl_vector_construct( &p_lmx->lid_vec ); -} -/* -* PARAMETERS -* p_lmx -* [in] Pointer to a LID Matrix object to construct. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Allows calling osm_lid_matrix_init, osm_lid_matrix_destroy -* -* Calling osm_lid_matrix_construct is a prerequisite to calling any other -* method except osm_lid_matrix_init. -* -* SEE ALSO -* LID Matrix object, osm_lid_matrix_init, osm_lid_matrix_destroy -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_destroy -* NAME -* osm_lid_matrix_destroy -* -* DESCRIPTION -* The osm_lid_matrix_destroy function destroys a node, releasing -* all resources. -* -* SYNOPSIS -*/ -void osm_lid_matrix_destroy( - IN osm_lid_matrix_t* const p_lmx ); -/* -* PARAMETERS -* p_lmx -* [in] Pointer to a LID Matrix object to destroy. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Performs any necessary cleanup of the specified LID Matrix object. -* Further operations should not be attempted on the destroyed object. -* This function should only be called after a call to osm_lid_matrix_construct or -* osm_lid_matrix_init. -* -* SEE ALSO -* LID Matrix object, osm_lid_matrix_construct, osm_lid_matrix_init -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_init -* NAME -* osm_lid_matrix_init -* -* DESCRIPTION -* Initializes a LID Matrix object for use. -* -* SYNOPSIS -*/ -ib_api_status_t -osm_lid_matrix_init( - IN osm_lid_matrix_t* const p_lmx, - IN const uint8_t num_ports ); -/* -* PARAMETERS -* p_lmx -* [in] Pointer to an osm_lid_matrix_t object to initialize. -* -* num_ports -* [in] Number of ports at each LID index. This value is fixed -* at initialization time. -* -* RETURN VALUES -* IB_SUCCESS on success -* -* NOTES -* -* SEE ALSO -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_get -* NAME -* osm_lid_matrix_get -* -* DESCRIPTION -* Returns the hop count at the specified LID/Port intersection. -* -* SYNOPSIS -*/ -static inline uint8_t -osm_lid_matrix_get( - IN const osm_lid_matrix_t* const p_lmx, - IN const uint16_t lid_ho, - IN const uint8_t port_num ) -{ - CL_ASSERT( port_num < p_lmx->num_ports ); - - if ( lid_ho >= cl_vector_get_size( &p_lmx->lid_vec ) ) - return OSM_NO_PATH; - - return( ((uint8_t *)cl_vector_get_ptr( - &p_lmx->lid_vec, lid_ho ))[port_num] ); -} -/* -* PARAMETERS -* p_lmx -* [in] Pointer to an osm_lid_matrix_t object. -* -* lid_ho -* [in] LID value (host order) for which to return the hop count -* -* port_num -* [in] Port number in the switch -* -* RETURN VALUES -* Returns the hop count at the specified LID/Port intersection. -* -* NOTES -* -* SEE ALSO -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_get_max_lid_ho -* NAME -* osm_lid_matrix_get_max_lid_ho -* -* DESCRIPTION -* Returns the maximum LID (host order) value contained -* in the matrix. -* -* SYNOPSIS -*/ -static inline uint16_t -osm_lid_matrix_get_max_lid_ho( - IN const osm_lid_matrix_t* const p_lmx ) -{ - return cl_vector_get_size( &p_lmx->lid_vec ) ? - (uint16_t)(cl_vector_get_size( &p_lmx->lid_vec ) - 1) : 0; -} -/* -* PARAMETERS -* p_lmx -* [in] Pointer to an osm_lid_matrix_t object. -* -* RETURN VALUES -* Returns the maximum LID (host order) value contained -* in the matrix. -* -* NOTES -* -* SEE ALSO -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_get_num_ports -* NAME -* osm_lid_matrix_get_num_ports -* -* DESCRIPTION -* Returns the number of ports in this lid matrix. -* -* SYNOPSIS -*/ -static inline uint8_t -osm_lid_matrix_get_num_ports( - IN const osm_lid_matrix_t* const p_lmx ) -{ - return( p_lmx->num_ports ); -} -/* -* PARAMETERS -* p_lmx -* [in] Pointer to an osm_lid_matrix_t object. -* -* RETURN VALUES -* Returns the number of ports in this lid matrix. -* -* NOTES -* -* SEE ALSO -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_get_least_hops -* NAME -* osm_lid_matrix_get_least_hops -* -* DESCRIPTION -* Returns the least number of hops for specified lid -* -* SYNOPSIS -*/ -static inline uint8_t -osm_lid_matrix_get_least_hops( - IN const osm_lid_matrix_t* const p_lmx, - IN const uint16_t lid_ho ) -{ - if( lid_ho > osm_lid_matrix_get_max_lid_ho( p_lmx ) ) - return( OSM_NO_PATH ); - - return( ((uint8_t *)cl_vector_get_ptr( - &p_lmx->lid_vec, lid_ho ))[p_lmx->num_ports] ); -} -/* -* PARAMETERS -* p_lmx -* [in] Pointer to an osm_lid_matrix_t object. -* -* lid_ho -* [in] LID (host order) for which to retrieve the shortest hop count. -* -* RETURN VALUES -* Returns the least number of hops for specified lid -* -* NOTES -* -* SEE ALSO -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_set -* NAME -* osm_lid_matrix_set -* -* DESCRIPTION -* Sets the hop count at the specified LID/Port intersection. -* -* SYNOPSIS -*/ -cl_status_t -osm_lid_matrix_set( - IN osm_lid_matrix_t* const p_lmx, - IN const uint16_t lid_ho, - IN const uint8_t port_num, - IN const uint8_t val ); -/* -* PARAMETERS -* p_lmx -* [in] Pointer to an osm_lid_matrix_t object. -* -* lid_ho -* [in] LID value (host order) to index into the vector. -* -* port_num -* [in] port number index into the vector entry. -* -* val -* [in] value (number of hops) to assign to this entry. -* -* RETURN VALUES -* Returns the hop count at the specified LID/Port intersection. -* -* NOTES -* -* SEE ALSO -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_set_min_lid_size -* NAME -* osm_lid_matrix_set_min_lid_size -* -* DESCRIPTION -* Sets the size of the matrix to at least accomodate the -* specified LID value (host ordered) -* -* SYNOPSIS -*/ -static inline cl_status_t -osm_lid_matrix_set_min_lid_size( - IN osm_lid_matrix_t* const p_lmx, - IN const uint16_t lid_ho ) -{ - return( cl_vector_set_min_size( &p_lmx->lid_vec, lid_ho + 1 ) ); -} -/* -* PARAMETERS -* p_lmx -* [in] Pointer to an osm_lid_matrix_t object. -* -* lid_ho -* [in] Minimum LID value (host order) to accomodate. -* -* RETURN VALUES -* Sets the size of the matrix to at least accomodate the -* specified LID value (host ordered) -* -* NOTES -* -* SEE ALSO -*********/ - -/****f* OpenSM: LID Matrix/osm_lid_matrix_clear -* NAME -* osm_lid_matrix_clear -* -* DESCRIPTION -* Clears a LID Matrix object in anticipation of a rebuild. -* -* SYNOPSIS -*/ -void -osm_lid_matrix_clear( - IN osm_lid_matrix_t* const p_lmx ); -/* -* PARAMETERS -* p_lmx -* [in] Pointer to an osm_lid_matrix_t object to clear. -* -* RETURN VALUES -* None. -* -* NOTES -* -* SEE ALSO -*********/ - -END_C_DECLS - -#endif /* _OSM_MATRIX_H_ */ diff --git a/osm/opensm/Makefile.am b/osm/opensm/Makefile.am index 15af336..01e1423 100644 --- a/osm/opensm/Makefile.am +++ b/osm/opensm/Makefile.am @@ -31,7 +31,7 @@ opensm_SOURCES = main.c osm_console.c osm_db_files.c \ osm_db_pack.c osm_drop_mgr.c osm_fwd_tbl.c \ osm_inform.c osm_lid_mgr.c osm_lin_fwd_rcv.c \ osm_lin_fwd_tbl.c osm_link_mgr.c \ - osm_matrix.c osm_mcast_fwd_rcv.c \ + osm_mcast_fwd_rcv.c \ osm_mcast_mgr.c osm_mcast_tbl.c osm_mcm_info.c \ osm_mcm_port.c osm_mtree.c osm_multicast.c osm_node.c \ osm_node_desc_rcv.c osm_node_info_rcv.c \ diff --git a/osm/opensm/osm_matrix.c b/osm/opensm/osm_matrix.c deleted file mode 100644 index 7202922..0000000 --- a/osm/opensm/osm_matrix.c +++ /dev/null @@ -1,156 +0,0 @@ -/* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * Abstract: - * Implementation of osm_lid_matrix_t. - * This file implements the LID Matrix object. - * - * Environment: - * Linux User Mode - * - * $Revision: 1.7 $ - */ - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include - -/********************************************************************** - **********************************************************************/ -void -osm_lid_matrix_destroy( - IN osm_lid_matrix_t* const p_lmx ) -{ - cl_vector_destroy( &p_lmx->lid_vec ); -} - -/********************************************************************** - Initializer function called by cl_vector -**********************************************************************/ -cl_status_t -__osm_lid_matrix_vec_init( - IN void* const p_elem, - IN void* context ) -{ - osm_lid_matrix_t* const p_lmx = (osm_lid_matrix_t*)context; - - memset( p_elem, OSM_NO_PATH, p_lmx->num_ports + 1); - return( CL_SUCCESS ); -} - -/********************************************************************** - Initializer function called by cl_vector -**********************************************************************/ -void -__osm_lid_matrix_vec_clear( - IN const size_t index, - IN void* const p_elem, - IN void* context ) -{ - osm_lid_matrix_t* const p_lmx = (osm_lid_matrix_t*)context; - - UNUSED_PARAM( index ); - memset( p_elem, OSM_NO_PATH, p_lmx->num_ports + 1); -} - -/********************************************************************** - **********************************************************************/ -void -osm_lid_matrix_clear( - IN osm_lid_matrix_t* const p_lmx ) -{ - cl_vector_apply_func( &p_lmx->lid_vec, - __osm_lid_matrix_vec_clear, p_lmx ); -} - -/********************************************************************** - **********************************************************************/ -ib_api_status_t -osm_lid_matrix_init( - IN osm_lid_matrix_t* const p_lmx, - IN const uint8_t num_ports ) -{ - cl_vector_t *p_vec; - cl_status_t status; - - CL_ASSERT( p_lmx ); - CL_ASSERT( num_ports ); - - p_lmx->num_ports = num_ports; - - p_vec = &p_lmx->lid_vec; - /* - Initialize the vector for the number of ports plus an - extra entry to hold the "least-hops" count for that LID. - */ - status = cl_vector_init( p_vec, - 0, /* min_size, */ - 1, /* grow_size */ - sizeof(uint8_t)*(num_ports + 1), /* element size */ - __osm_lid_matrix_vec_init, /* init function */ - NULL, /* destory func */ - p_lmx /* context */ - ); - - return( status ); -} - -/********************************************************************** - **********************************************************************/ -cl_status_t -osm_lid_matrix_set( - IN osm_lid_matrix_t* const p_lmx, - IN const uint16_t lid_ho, - IN const uint8_t port_num, - IN const uint8_t val ) -{ - uint8_t *p_port_array; - cl_status_t status; - - CL_ASSERT( port_num < p_lmx->num_ports ); - status = cl_vector_set_min_size( &p_lmx->lid_vec, lid_ho + 1 ); - if( status == CL_SUCCESS ) - { - p_port_array = (uint8_t *)cl_vector_get_ptr( &p_lmx->lid_vec, lid_ho ); - p_port_array[port_num] = val; - if( p_port_array[p_lmx->num_ports] > val ) - p_port_array[p_lmx->num_ports] = val; - } - return( status ); -} -- 1.5.0.1.26.gf5a92 From sashak at voltaire.com Sun Feb 25 15:47:05 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 26 Feb 2007 01:47:05 +0200 Subject: [openib-general] [PATCH] opensm: remove some unneeded osm_switch functions In-Reply-To: <20070225214845.GF11957@sashak.voltaire.com> References: <20070225214845.GF11957@sashak.voltaire.com> Message-ID: <20070225234705.GH11957@sashak.voltaire.com> Following introduced simplification this patch removes single field access functions from osm_switch. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_switch.h | 176 --------------------------------------- osm/opensm/osm_mcast_mgr.c | 27 ++---- osm/opensm/osm_mtree.c | 4 +- osm/opensm/osm_node_info_rcv.c | 2 +- osm/opensm/osm_state_mgr.c | 4 +- osm/opensm/osm_sw_info_rcv.c | 14 ++-- osm/opensm/osm_switch.c | 6 +- osm/opensm/osm_ucast_file.c | 11 +-- osm/opensm/osm_ucast_ftree.c | 41 +++++----- osm/opensm/osm_ucast_lash.c | 4 +- osm/opensm/osm_ucast_mgr.c | 34 ++++---- osm/opensm/osm_ucast_updn.c | 8 +- 12 files changed, 72 insertions(+), 259 deletions(-) diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index 19381f8..4e0d46d 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -623,93 +623,6 @@ osm_switch_get_max_block_id_in_use( * Switch object *********/ -/****f* OpenSM: Switch/osm_switch_get_node_ptr -* NAME -* osm_switch_get_node_ptr -* -* DESCRIPTION -* Returns a pointer to the Node object for this switch. -* -* SYNOPSIS -*/ -static inline osm_node_t* -osm_switch_get_node_ptr( - IN const osm_switch_t* const p_sw ) -{ - return( p_sw->p_node ); -} -/* -* PARAMETERS -* p_sw -* [in] Pointer to an osm_switch_t object. -* -* RETURN VALUES -* Returns a pointer to the Node object for this switch. -* -* NOTES -* -* SEE ALSO -* Switch object -*********/ - -/****f* OpenSM: Switch/osm_switch_get_max_lid_ho -* NAME -* osm_switch_get_max_lid_ho -* -* DESCRIPTION -* Returns the maximum LID (host order) value contained -* in the switch routing tables. -* -* SYNOPSIS -*/ -static inline uint16_t -osm_switch_get_max_lid_ho( - IN const osm_switch_t* const p_sw ) -{ - return p_sw->max_lid_ho; -} -/* -* PARAMETERS -* p_sw -* [in] Pointer to a switch object. -* -* RETURN VALUES -* Returns the maximum LID (host order) value contained -* in the switch routing tables. -* -* NOTES -* -* SEE ALSO -*********/ - -/****f* OpenSM: Switch/osm_switch_get_num_ports -* NAME -* osm_switch_get_num_ports -* -* DESCRIPTION -* Returns the number of ports in this switch. -* -* SYNOPSIS -*/ -static inline uint8_t -osm_switch_get_num_ports( - IN const osm_switch_t* const p_sw ) -{ - return p_sw->num_ports; -} -/* -* PARAMETERS -* p_sw -* [in] Pointer to an osm_switch_t object. -* -* RETURN VALUES -* Returns the number of ports in this switch. -* -* NOTES -* -* SEE ALSO -*********/ - /****f* OpenSM: Switch/osm_switch_get_fwd_tbl_block * NAME * osm_switch_get_fwd_tbl_block @@ -1330,95 +1243,6 @@ osm_switch_is_in_mcast_tree( * SEE ALSO *********/ -/****f* OpenSM: Node/osm_switch_discovery_count_get -* NAME -* osm_switch_discovery_count_get -* -* DESCRIPTION -* Returns a pointer to the physical port object at the -* specified local port number. -* -* SYNOPSIS -*/ -static inline uint32_t -osm_switch_discovery_count_get( - IN const osm_switch_t* const p_switch ) -{ - return( p_switch->discovery_count ); -} -/* -* PARAMETERS -* p_switch -* [in] Pointer to an osm_switch_t object. -* -* RETURN VALUES -* Returns the discovery count for this node. -* -* NOTES -* -* SEE ALSO -* Node object -*********/ - -/****f* OpenSM: Node/osm_switch_discovery_count_reset -* NAME -* osm_switch_discovery_count_reset -* -* DESCRIPTION -* Resets the discovery count for this node to zero. -* This operation should be performed at the start of a sweep. -* -* SYNOPSIS -*/ -static inline void -osm_switch_discovery_count_reset( - IN osm_switch_t* const p_switch ) -{ - p_switch->discovery_count = 0; -} -/* -* PARAMETERS -* p_switch -* [in] Pointer to an osm_switch_t object. -* -* RETURN VALUES -* None. -* -* NOTES -* -* SEE ALSO -* Node object -*********/ - -/****f* OpenSM: Node/osm_switch_discovery_count_inc -* NAME -* osm_switch_discovery_count_inc -* -* DESCRIPTION -* Increments the discovery count for this node. -* -* SYNOPSIS -*/ -static inline void -osm_switch_discovery_count_inc( - IN osm_switch_t* const p_switch ) -{ - p_switch->discovery_count++; -} -/* -* PARAMETERS -* p_switch -* [in] Pointer to an osm_switch_t object. -* -* RETURN VALUES -* None. -* -* NOTES -* -* SEE ALSO -* Node object -*********/ - END_C_DECLS #endif /* _OSM_SWITCH_H_ */ diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c index a5ad024..cf8ae7d 100644 --- a/osm/opensm/osm_mcast_mgr.c +++ b/osm/opensm/osm_mcast_mgr.c @@ -319,9 +319,7 @@ __osm_mcast_mgr_find_optimal_switch( if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) { - sw_guid_ho = cl_ntoh64( osm_node_get_node_guid( - osm_switch_get_node_ptr( p_sw ) ) ); - + sw_guid_ho = cl_ntoh64( osm_node_get_node_guid(p_sw->p_node) ); osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_mcast_mgr_find_optimal_switch: " "Switch 0x%016" PRIx64 ", hops = %f\n", @@ -339,9 +337,7 @@ __osm_mcast_mgr_find_optimal_switch( { if( p_best_sw ) { - sw_guid_ho = cl_ntoh64( osm_node_get_node_guid( - osm_switch_get_node_ptr( p_best_sw ) ) ); - + sw_guid_ho = cl_ntoh64( osm_node_get_node_guid(p_best_sw->p_node) ); osm_log( p_mgr->p_log, OSM_LOG_VERBOSE, "__osm_mcast_mgr_find_optimal_switch: " "Best switch is 0x%" PRIx64 ", hops = %f\n", @@ -459,7 +455,7 @@ __osm_mcast_mgr_set_tbl( CL_ASSERT( p_sw ); - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; CL_ASSERT( p_node ); @@ -571,9 +567,7 @@ __osm_mcast_mgr_subdivide( multicast and the multicast tree must branch at this switch. */ - uint64_t node_guid_ho = cl_ntoh64( osm_node_get_node_guid( - osm_switch_get_node_ptr( p_sw ) ) ); - + uint64_t node_guid_ho = cl_ntoh64( osm_node_get_node_guid(p_sw->p_node) ); osm_log( p_mgr->p_log, OSM_LOG_ERROR, "__osm_mcast_mgr_subdivide: ERR 0A03: " "Error routing MLID 0x%X through switch 0x%" PRIx64 "\n" @@ -587,9 +581,7 @@ __osm_mcast_mgr_subdivide( if( port_num > array_size ) { - uint64_t node_guid_ho = cl_ntoh64( osm_node_get_node_guid( - osm_switch_get_node_ptr( p_sw ) ) ); - + uint64_t node_guid_ho = cl_ntoh64( osm_node_get_node_guid(p_sw->p_node) ); osm_log( p_mgr->p_log, OSM_LOG_ERROR, "__osm_mcast_mgr_subdivide: ERR 0A04: " "Error routing MLID 0x%X through switch 0x%" PRIx64 "\n" @@ -669,7 +661,7 @@ __osm_mcast_mgr_branch( CL_ASSERT( p_list ); CL_ASSERT( p_max_depth ); - node_guid = osm_node_get_node_guid( osm_switch_get_node_ptr( p_sw ) ); + node_guid = osm_node_get_node_guid( p_sw->p_node ); node_guid_ho = cl_ntoh64( node_guid ); mlid_ho = cl_ntoh16( osm_mgrp_get_mlid( p_mgrp ) ); @@ -823,7 +815,7 @@ __osm_mcast_mgr_branch( needed to add the port to the table */ continue; - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; p_remote_node = osm_node_get_remote_node( p_node, i, NULL ); if( osm_node_get_type( p_remote_node ) == IB_NODE_TYPE_SWITCH ) @@ -1033,8 +1025,7 @@ osm_mcast_mgr_set_table( osm_log( p_mgr->p_log, OSM_LOG_VERBOSE, "osm_mcast_mgr_set_table: " "Configuring MLID 0x%X on switch 0x%" PRIx64 "\n", - mlid_ho, osm_node_get_node_guid( - osm_switch_get_node_ptr( p_sw ) ) ); + mlid_ho, osm_node_get_node_guid(p_sw->p_node) ); } /* @@ -1389,7 +1380,7 @@ mcast_mgr_dump_sw_routes( if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) goto Exit; - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; p_tbl = osm_switch_get_mcast_tbl_ptr( p_sw ); diff --git a/osm/opensm/osm_mtree.c b/osm/opensm/osm_mtree.c index a98df2f..14bfa36 100644 --- a/osm/opensm/osm_mtree.c +++ b/osm/opensm/osm_mtree.c @@ -68,7 +68,7 @@ osm_mtree_node_init( osm_mtree_node_construct( p_mtn ); p_mtn->p_sw = (osm_switch_t*)p_sw; - p_mtn->max_children = osm_switch_get_num_ports( p_sw ); + p_mtn->max_children = p_sw->num_ports; for( i = 0; i < p_mtn->max_children; i++ ) p_mtn->child_array[i] = NULL; @@ -83,7 +83,7 @@ osm_mtree_node_new( osm_mtree_node_t *p_mtn; p_mtn = malloc( sizeof(osm_mtree_node_t) + - sizeof(void*) * (osm_switch_get_num_ports( p_sw ) - 1) ); + sizeof(void*) * (p_sw->num_ports - 1) ); if( p_mtn != NULL ) osm_mtree_node_init( p_mtn, p_sw ); diff --git a/osm/opensm/osm_node_info_rcv.c b/osm/opensm/osm_node_info_rcv.c index 5cbd3b7..3053df5 100644 --- a/osm/opensm/osm_node_info_rcv.c +++ b/osm/opensm/osm_node_info_rcv.c @@ -657,7 +657,7 @@ __osm_ni_rcv_process_existing_switch( else { /* Make sure we have SwitchInfo on this node */ - if( !p_node->sw || osm_switch_discovery_count_get( p_node->sw ) == 0 ) + if( !p_node->sw || p_node->sw->discovery_count == 0 ) { /* we don't have the SwitchInfo - retry to get it */ osm_log( p_rcv->p_log, OSM_LOG_DEBUG, diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 2905857..61de8d2 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -566,7 +566,7 @@ __osm_state_mgr_reset_switch_count( cl_ntoh64( osm_node_get_node_guid( p_sw->p_node ) ) ); } - osm_switch_discovery_count_reset( p_sw ); + p_sw->discovery_count = 0; } /********************************************************************** @@ -585,7 +585,7 @@ __osm_state_mgr_get_sw_info( OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_get_sw_info ); - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; p_dr_path = osm_node_get_any_dr_path_ptr( p_node ); memset( &context, 0, sizeof( context ) ); diff --git a/osm/opensm/osm_sw_info_rcv.c b/osm/opensm/osm_sw_info_rcv.c index fe3fe9f..013a724 100644 --- a/osm/opensm/osm_sw_info_rcv.c +++ b/osm/opensm/osm_sw_info_rcv.c @@ -82,7 +82,7 @@ __osm_si_rcv_get_port_info( CL_ASSERT( p_sw ); - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; p_smp = osm_madw_get_smp_ptr( p_madw ); CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ); @@ -154,7 +154,7 @@ __osm_si_rcv_get_fwd_tbl( CL_ASSERT( p_sw ); - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ); @@ -223,7 +223,7 @@ __osm_si_rcv_get_mcast_fwd_tbl( CL_ASSERT( p_sw ); - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ); @@ -393,7 +393,7 @@ __osm_si_rcv_process_new( info we just received. */ osm_switch_set_switch_info( p_sw, p_si ); - osm_switch_discovery_count_inc( p_sw ); + p_sw->discovery_count++; /* Get the PortInfo attribute for every port. @@ -505,14 +505,14 @@ __osm_si_rcv_process_existing( This is a heavy sweep. Get information regardless of the state change bit. */ - osm_switch_discovery_count_inc( p_sw ); + p_sw->discovery_count++; osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, "__osm_si_rcv_process_existing: " "discovery_count is:%u\n", - osm_switch_discovery_count_get( p_sw ) ); + p_sw->discovery_count ); /* If this is the first discovery - then get the port_info */ - if ( osm_switch_discovery_count_get( p_sw ) == 1 ) + if ( p_sw->discovery_count == 1 ) __osm_si_rcv_get_port_info( p_rcv, p_sw, p_madw ); else { diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c index 7c57398..6db8add 100644 --- a/osm/opensm/osm_switch.c +++ b/osm/opensm/osm_switch.c @@ -195,7 +195,7 @@ osm_switch_get_fwd_tbl_block( CL_ASSERT( p_block ); p_tbl = osm_switch_get_fwd_tbl_ptr( p_sw ); - max_lid_ho = osm_switch_get_max_lid_ho( p_sw ); + max_lid_ho = p_sw->max_lid_ho; lids_per_block = osm_fwd_tbl_get_lids_per_block( &p_sw->fwd_tbl ); base_lid_ho = (uint16_t)(block_id * lids_per_block); @@ -278,7 +278,7 @@ osm_switch_recommend_path( CL_ASSERT( lid_ho > 0 ); - num_ports = osm_switch_get_num_ports( p_sw ); + num_ports = p_sw->num_ports; least_hops = osm_switch_get_least_hops( p_sw, lid_ho ); if ( least_hops == OSM_NO_PATH ) @@ -532,7 +532,7 @@ osm_switch_recommend_mcast_path( CL_ASSERT( lid_ho > 0 ); CL_ASSERT( mlid_ho >= IB_LID_MCAST_START_HO ); - num_ports = osm_switch_get_num_ports( p_sw ); + num_ports = p_sw->num_ports; /* If the user wants us to ignore existing multicast routes, diff --git a/osm/opensm/osm_ucast_file.c b/osm/opensm/osm_ucast_file.c index a623a26..4de4c02 100644 --- a/osm/opensm/osm_ucast_file.c +++ b/osm/opensm/osm_ucast_file.c @@ -93,8 +93,8 @@ static void add_path(osm_opensm_t * p_osm, osm_log(&p_osm->log, OSM_LOG_VERBOSE, "add_path: LID collision is detected on switch " "0x016%" PRIx64 ", will overwrite LID 0x%x entry\n", - cl_ntoh64(osm_node_get_node_guid - (osm_switch_get_node_ptr(p_sw))), new_lid); + cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)), + new_lid); } p_osm->sm.ucast_mgr.lft_buf[new_lid] = port_num; @@ -106,8 +106,7 @@ static void add_path(osm_opensm_t * p_osm, "add_path: route 0x%04x(was 0x%04x) %u 0x%016" PRIx64 " is added to switch 0x%016" PRIx64 "\n", new_lid, lid, port_num, cl_ntoh64(port_guid), - cl_ntoh64(osm_node_get_node_guid - (osm_switch_get_node_ptr(p_sw)))); + cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); } static void add_lid_hops(osm_opensm_t *p_osm, osm_switch_t *p_sw, @@ -118,8 +117,8 @@ static void add_lid_hops(osm_opensm_t *p_osm, osm_switch_t *p_sw, uint8_t i; new_lid = guid ? remap_lid(p_osm, lid, guid) : lid; - if (len > osm_switch_get_num_ports(p_sw)) - len = osm_switch_get_num_ports(p_sw); + if (len > p_sw->num_ports) + len = p_sw->num_ports; for (i = 0 ; i < len ; i++) osm_switch_set_hops(p_sw, lid, i, hops[i]); diff --git a/osm/opensm/osm_ucast_ftree.c b/osm/opensm/osm_ucast_ftree.c index 61db1d7..ac8302b 100644 --- a/osm/opensm/osm_ucast_ftree.c +++ b/osm/opensm/osm_ucast_ftree.c @@ -579,7 +579,7 @@ __osm_ftree_sw_create( uint8_t ports_num; /* make sure that the switch has ports */ - if (osm_switch_get_num_ports(p_osm_sw) == 1) + if (p_osm_sw->num_ports == 1) return NULL; p_sw = (ftree_sw_t *)malloc(sizeof(ftree_sw_t)); @@ -591,9 +591,9 @@ __osm_ftree_sw_create( p_sw->rank = 0xFF; __osm_ftree_tuple_init(p_sw->tuple); - p_sw->base_lid = osm_node_get_base_lid(osm_switch_get_node_ptr(p_sw->p_osm_sw),0); + p_sw->base_lid = osm_node_get_base_lid(p_sw->p_osm_sw->p_node,0); - ports_num = osm_node_get_num_physp(osm_switch_get_node_ptr(p_sw->p_osm_sw)); + ports_num = osm_node_get_num_physp(p_sw->p_osm_sw->p_node); p_sw->down_port_groups = (ftree_port_group_t **) malloc(ports_num * sizeof(ftree_port_group_t *)); p_sw->up_port_groups = @@ -657,7 +657,7 @@ __osm_ftree_sw_dump( "__osm_ftree_sw_dump: " "Switch index: %s, GUID: 0x%016" PRIx64 ", Ports: %u DOWN, %u UP\n", __osm_ftree_tuple_to_str(p_sw->tuple), - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), p_sw->down_port_groups_num, p_sw->up_port_groups_num); @@ -1214,7 +1214,7 @@ __osm_ftree_fabric_dump_general_info( osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, "__osm_ftree_fabric_dump_general_info: " " GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n", - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple)); } @@ -1228,8 +1228,7 @@ __osm_ftree_fabric_dump_general_info( "__osm_ftree_fabric_dump_general_info: " " GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n", cl_ntoh64(osm_node_get_node_guid( - osm_switch_get_node_ptr( - p_ftree->leaf_switches[i]->p_osm_sw))), + p_ftree->leaf_switches[i]->p_osm_sw->p_node)), cl_ntoh16(p_ftree->leaf_switches[i]->base_lid), __osm_ftree_tuple_to_str(p_ftree->leaf_switches[i]->tuple)); } @@ -1443,7 +1442,7 @@ __osm_ftree_fabric_make_indexing( p_sw->rank, __osm_ftree_tuple_to_str(p_sw->tuple), cl_ntoh16(p_sw->base_lid), - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw)))); + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node))); /* * Now run BFS and assign indexes to all switches @@ -1617,11 +1616,11 @@ __osm_ftree_fabric_validate_topology( "ERR AB09: Different number of upward port groups on switches:\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n", - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(reference_sw_arr[p_sw->rank]->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), reference_sw_arr[p_sw->rank]->up_port_groups_num, - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple), p_sw->up_port_groups_num); @@ -1638,11 +1637,11 @@ __osm_ftree_fabric_validate_topology( "ERR AB0A: Different number of downward port groups on switches:\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n", - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(reference_sw_arr[p_sw->rank]->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), reference_sw_arr[p_sw->rank]->down_port_groups_num, - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple), p_sw->down_port_groups_num); @@ -1663,11 +1662,11 @@ __osm_ftree_fabric_validate_topology( "ERR AB0B: Different number of ports in an upward port group on switches:\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n", - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(reference_sw_arr[p_sw->rank]->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), cl_ptr_vector_get_size(&p_ref_group->ports), - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple), cl_ptr_vector_get_size(&p_group->ports)); @@ -1691,11 +1690,11 @@ __osm_ftree_fabric_validate_topology( "ERR AB0C: Different number of ports in an downward port group on switches:\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n", - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(reference_sw_arr[p_sw->rank]->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), cl_ptr_vector_get_size(&p_ref_group->ports), - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple), cl_ptr_vector_get_size(&p_group->ports)); @@ -2439,7 +2438,7 @@ __osm_ftree_rank_from_switch( p_sw = p_sw_tbl_element->p_sw; __osm_ftree_sw_tbl_element_destroy(p_sw_tbl_element); - p_node = osm_switch_get_node_ptr(p_sw->p_osm_sw); + p_node = p_sw->p_osm_sw->p_node; /* note: skipping port 0 on switches */ for (i = 1; i < osm_node_get_num_physp(p_node); i++) @@ -2550,7 +2549,7 @@ __osm_ftree_rank_switches_from_hca( " - Switch guid: 0x%016" PRIx64 "\n" " - Switch LID : 0x%x\n", cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), cl_ntoh16(p_sw->base_lid)); __osm_ftree_rank_from_switch(p_ftree, p_sw); } @@ -2672,7 +2671,7 @@ __osm_ftree_fabric_construct_sw_ports( { ftree_hca_t * p_remote_hca; ftree_sw_t * p_remote_sw; - osm_node_t * p_node = osm_switch_get_node_ptr(p_sw->p_osm_sw); + osm_node_t * p_node = p_sw->p_osm_sw->p_node; osm_node_t * p_remote_node; ib_net16_t remote_base_lid; uint8_t remote_node_type; @@ -2740,10 +2739,10 @@ __osm_ftree_fabric_construct_sw_ports( " GUID 0x%016" PRIx64 ", LID 0x%x, rank %u\n", p_sw->rank, p_remote_sw->rank, - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), cl_ntoh16(p_sw->base_lid), p_sw->rank, - cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_remote_sw->p_osm_sw))), + cl_ntoh64(osm_node_get_node_guid(p_remote_sw->p_osm_sw->p_node)), cl_ntoh16(p_remote_sw->base_lid), p_remote_sw->rank); res = -1; diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c index f7ce5cd..2ce334a 100644 --- a/osm/opensm/osm_ucast_lash.c +++ b/osm/opensm/osm_ucast_lash.c @@ -1172,7 +1172,7 @@ static void populate_fwd_tbls(lash_t *p_lash) p_sw = p_next_sw; p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); - max_lid_ho = osm_switch_get_max_lid_ho(p_sw); + max_lid_ho = p_sw->max_lid_ho; current_guid = p_sw->p_node->node_info.port_guid; sw = p_sw->priv; @@ -1223,7 +1223,7 @@ static void print_fwd_table(IN const osm_switch_t *p_sw) uint16_t max_lid_ho, lid_ho; uint64_t switch_guid = osm_lash_get_switch_guid(p_sw); - max_lid_ho = osm_switch_get_max_lid_ho(p_sw); + max_lid_ho = p_sw->max_lid_ho; printf("FWDTBL: 0x%016" PRIx64 " max LID 0x%04X\n", cl_ntoh64(switch_guid), max_lid_ho); // starting at 1, not 0. Assuming no LID with an ID of 0 diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 93cafae..15dda55 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -190,8 +190,8 @@ __osm_ucast_mgr_dump_path_distribution( OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_path_distribution ); - p_node = osm_switch_get_node_ptr( p_sw ); - num_ports = osm_switch_get_num_ports( p_sw ); + p_node = p_sw->p_node; + num_ports = p_sw->num_ports; osm_log_printf( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_ucast_mgr_dump_path_distribution: " @@ -260,9 +260,9 @@ __osm_ucast_mgr_dump_ucast_routes( OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_ucast_routes ); - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; - max_lid_ho = osm_switch_get_max_lid_ho( p_sw ); + max_lid_ho = p_sw->max_lid_ho; fprintf( file, "__osm_ucast_mgr_dump_ucast_routes: " "Switch 0x%016" PRIx64 "\n" @@ -325,9 +325,9 @@ ucast_mgr_dump_lid_matrix(cl_map_item_t *p_map_item, void *cxt) osm_switch_t* p_sw = (osm_switch_t *)p_map_item; osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr; FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file; - osm_node_t *p_node = osm_switch_get_node_ptr(p_sw); - unsigned max_lid = osm_switch_get_max_lid_ho(p_sw); - unsigned max_port = osm_switch_get_num_ports(p_sw); + osm_node_t *p_node = p_sw->p_node; + unsigned max_lid = p_sw->max_lid_ho; + unsigned max_port = p_sw->num_ports; uint16_t lid; uint8_t port; @@ -356,9 +356,9 @@ ucast_mgr_dump_lfts(cl_map_item_t *p_map_item, void *cxt) osm_switch_t* p_sw = (osm_switch_t *)p_map_item; osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr; FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file; - osm_node_t *p_node = osm_switch_get_node_ptr(p_sw); - unsigned max_lid = osm_switch_get_max_lid_ho(p_sw); - unsigned max_port = osm_switch_get_num_ports(p_sw); + osm_node_t *p_node = p_sw->p_node; + unsigned max_lid = p_sw->max_lid_ho; + unsigned max_port = p_sw->num_ports; uint16_t lid; uint8_t port; @@ -496,8 +496,8 @@ __osm_ucast_mgr_process_neighbor( CL_ASSERT( port_num ); CL_ASSERT( remote_port_num ); - p_node = osm_switch_get_node_ptr( p_sw ); - p_remote_node = osm_switch_get_node_ptr( p_remote_sw ); + p_node = p_sw->p_node; + p_remote_node = p_remote_sw->p_node; CL_ASSERT( p_node ); CL_ASSERT( p_remote_node ); @@ -519,7 +519,7 @@ __osm_ucast_mgr_process_neighbor( /* Iterate through all the LIDs in the neighbor switch. */ - max_lid_ho = osm_switch_get_max_lid_ho( p_remote_sw ); + max_lid_ho = p_remote_sw->max_lid_ho; hops = OSM_NO_PATH; for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ ) @@ -773,7 +773,7 @@ __osm_ucast_mgr_process_port( */ CL_ASSERT( max_lid_ho < osm_switch_get_fwd_tbl_size( p_sw ) ); - node_guid = osm_node_get_node_guid(osm_switch_get_node_ptr( p_sw ) ); + node_guid = osm_node_get_node_guid( p_sw->p_node ); /* The lid matrix contains the number of hops to each @@ -887,7 +887,7 @@ osm_ucast_mgr_set_fwd_table( CL_ASSERT( p_sw ); - p_node = osm_switch_get_node_ptr( p_sw ); + p_node = p_sw->p_node; CL_ASSERT( p_node ); @@ -899,7 +899,7 @@ osm_ucast_mgr_set_fwd_table( Set the top of the unicast forwarding table. */ si = p_sw->switch_info; - lin_top = cl_hton16( osm_switch_get_max_lid_ho( p_sw ) ); + lin_top = cl_hton16( p_sw->max_lid_ho ); if (lin_top != si.lin_top) { set_swinfo_require = TRUE; @@ -927,7 +927,7 @@ osm_ucast_mgr_set_fwd_table( osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "osm_ucast_mgr_set_fwd_table: " "Setting switch FT top to LID 0x%X\n", - osm_switch_get_max_lid_ho( p_sw ) ); + p_sw->max_lid_ho ); } context.si_context.light_sweep = FALSE; diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index 950bcb4..05b7347 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -267,7 +267,7 @@ __updn_bfs_by_node( "Visiting port GUID 0x%" PRIx64 "\n", cl_ntoh64(current_guid) ); /* Go over all ports of the switch and find unvisited remote nodes */ - for ( pn = 0; pn < osm_switch_get_num_ports(u->sw); pn++ ) + for ( pn = 1; pn < u->sw->num_ports; pn++ ) { osm_node_t *p_remote_node; struct updn_node *rem_u; @@ -549,7 +549,7 @@ updn_subn_rank( u = (struct updn_node *)cl_qlist_remove_head(&list); /* Go over all remote nodes and rank them (if not already visited) */ p_sw = u->sw; - num_ports = osm_switch_get_num_ports(p_sw); + num_ports = p_sw->num_ports; osm_log( p_log, OSM_LOG_DEBUG, "updn_subn_rank: " "Handling switch GUID 0x%" PRIx64 "\n", @@ -743,7 +743,7 @@ expand_lid_matrices_for_lmc( { p_sw = (osm_switch_t *)p_next_sw; p_next_sw = cl_qmap_next(p_next_sw); - num_ports = osm_switch_get_num_ports(p_sw); + num_ports = p_sw->num_ports; for (port = 0; port < num_ports; port++) { hops = osm_switch_get_hop_count(p_sw, min_lid, port); for (lid = min_lid + 1 ; lid <= max_lid; lid++) @@ -973,7 +973,7 @@ __osm_updn_find_root_nodes_by_min_hop( /* Clear Min Hop Table && FWD Tbls - This should caused opensm to rebuild it's FWD tables, post setting Min Hop Tables */ - max_lid_ho = osm_switch_get_max_lid_ho(p_sw); + max_lid_ho = p_sw->max_lid_ho; /* Get base lid of switch by retrieving port 0 lid of node pointer */ self_lid_ho = cl_ntoh16( osm_node_get_base_lid( p_sw->p_node, 0 ) ); osm_log( &p_osm->log, OSM_LOG_DEBUG, -- 1.5.0.1.26.gf5a92 From kliteyn at dev.mellanox.co.il Sun Feb 25 22:20:56 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 26 Feb 2007 08:20:56 +0200 Subject: [openib-general] [PATCH] osm: Flushing log file after OSM_SYS_LOG message In-Reply-To: <20070225195837.GC11957@sashak.voltaire.com> References: <45E19BE2.2070704@dev.mellanox.co.il> <20070225195837.GC11957@sashak.voltaire.com> Message-ID: <45E27C48.8010700@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On 16:23 Sun 25 Feb , Yevgeny Kliteynik wrote: >> Hi Hal, >> >> OSM log should be flushed when OSM_SYS_LOG message is >> printed. We had this once, but somehow it has disappeared. >> >> This fix has to go both to trunk and to 1.2. >> >> Thanks, >> >> --Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik >> --- >> osm/opensm/osm_log.c | 3 ++- >> 1 files changed, 2 insertions(+), 1 deletions(-) >> >> diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c >> index d76031d..f95ed85 100644 >> --- a/osm/opensm/osm_log.c >> +++ b/osm/opensm/osm_log.c >> @@ -204,7 +204,8 @@ osm_log( >> #endif >> >> /* flush log */ >> - if (ret > 0 && (p_log->flush || (verbosity & OSM_LOG_ERROR)) && >> + if ( ret > 0 && >> + (p_log->flush || (verbosity & OSM_LOG_ERROR) || (verbosity & OSM_LOG_SYS)) && > > verbosity & (OSM_LOG_ERROR|OSM_LOG_SYS)? Sure - why not -- Yevgeny > Sasha > From sweitzen at cisco.com Sun Feb 25 23:34:04 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 25 Feb 2007 23:34:04 -0800 Subject: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1 Message-ID: Please fix these bugs for beta. I've compiled for RHEL4 and SLES10 on x86_64, i686, ia64, and ppc64. I compiled all MPIs with GNU, Intel, and PGI compilers. * 380 OFED 1.2 alpha1 gcc MVAPICH won't compile on RHEL4 IA64 * 381 OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64 with Intel compiler * 382 OFED 1.2 alpha1 mpitests won't compile with Intel compiler for Open MPI (RHEL4 IA64) * 383 OFED 1.2 alpha1 core/addr.c won't compile on SLES10 IA64 * 384 OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3 ppc64 * 386 OFED 1.2 alpha1 gcc MVAPICH2 won't compile on RHEL4 ppc64 (add -m64) * 387 OFED 1.2 alpha1 Open MPI won't compile on SLES10 ppc64 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Sun Feb 25 23:37:36 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 25 Feb 2007 23:37:36 -0800 Subject: [openib-general] bugs filed for OFED 1.2 alpha1 MPI compiler support Message-ID: Please fix these bugs for beta. I've compiled for RHEL4 and SLES10 on x86_64, i686, ia64, and ppc64. I compiled all MPIs with GNU, Intel, and PGI compilers, and tried compiling and running C, C++, Fortran 77, and Fortran 90 programs with each combo. * 370 OFED 1.2 alpha1 MVAPICH does not have Intel Fortran support * 372 MVAPICH2 GNU mpif90 uses PGI not GNU compiler * 373 MVAPICH2 Intel mpif90 does not include -rpath like mpif77 does * 374 MVAPICH2 PGI mpif90 link failure: undefined reference ..Dm_mpi * 375 Open MPI PGI C++ failure at runtime Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Feb 25 23:47:39 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Feb 2007 09:47:39 +0200 Subject: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1 In-Reply-To: References: Message-ID: <20070226074739.GA27677@mellanox.co.il> > Quoting Scott Weitzenkamp (sweitzen) : > Subject: bugs filed for problems compiling OFED 1.2 alpha1 > > Please fix these bugs for beta. > Scott, you have assigned all bugs to bugzilla at openib.org. To have the bugs resolved, please assign them to maintainers of appropriate module. For example, bonding module owner is Moni Shoua , so I think bug 384 should be assigned to him. -- MST From bugzilla-daemon at lists.openfabrics.org Sun Feb 25 23:49:20 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Sun, 25 Feb 2007 23:49:20 -0800 (PST) Subject: [openib-general] [Bug 384] OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3 ppc64 In-Reply-To: Message-ID: <20070226074920.7BFACE6080D@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=384 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |monis at voltaire.com ------- Comment #1 from sweitzen at cisco.com 2007-02-25 23:49 ------- Scott, you have assigned all bugs to bugzilla at openib.org. To have the bugs resolved, please assign them to maintainers of appropriate module. For example, bonding module owner is Moni Shoua , so I think bug 384 should be assigned to him. -- MST -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From sweitzen at cisco.com Sun Feb 25 23:50:42 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 25 Feb 2007 23:50:42 -0800 Subject: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1 In-Reply-To: <20070226074739.GA27677@mellanox.co.il> References: <20070226074739.GA27677@mellanox.co.il> Message-ID: > Scott, you have assigned all bugs to bugzilla at openib.org. > To have the bugs resolved, please assign them to maintainers of > appropriate module. Not sure what you mean by "all", only 384 was not assigned to a specific person. Scott From mst at mellanox.co.il Sun Feb 25 23:50:59 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Feb 2007 09:50:59 +0200 Subject: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1 In-Reply-To: References: Message-ID: <20070226075059.GB27677@mellanox.co.il> > Quoting Scott Weitzenkamp (sweitzen) : > Subject: bugs filed for problems compiling OFED 1.2 alpha1 > > Please fix these bugs for beta. > > I've compiled for RHEL4 and SLES10 on x86_64, i686, ia64, and ppc64. I > compiled all MPIs with GNU, Intel, and PGI compilers. > > • 380 OFED 1.2 alpha1 gcc MVAPICH won't compile on RHEL4 IA64 > • 381 OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64 with Intel > compiler > • 382 OFED 1.2 alpha1 mpitests won't compile with Intel compiler for Open > MPI (RHEL4 IA64) > • 383 OFED 1.2 alpha1 core/addr.c won't compile on SLES10 IA64 > • 384 OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3 ppc64 > • 386 OFED 1.2 alpha1 gcc MVAPICH2 won't compile on RHEL4 ppc64 (add > -m64) > • 387 OFED 1.2 alpha1 Open MPI won't compile on SLES10 ppc64 Some of these might be fixed in recent nightly builds. Specifically I know 383 was fixed yesterday. Please check this and let us know. -- MST From mst at mellanox.co.il Sun Feb 25 23:53:03 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Feb 2007 09:53:03 +0200 Subject: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1 In-Reply-To: References: <20070226074739.GA27677@mellanox.co.il> Message-ID: <20070226075303.GC27677@mellanox.co.il> > Quoting Scott Weitzenkamp (sweitzen) : > Subject: RE: bugs filed for problems compiling OFED 1.2 alpha1 > > > > Scott, you have assigned all bugs to bugzilla at openib.org. > > To have the bugs resolved, please assign them to maintainers of > > appropriate module. > > Not sure what you mean by "all", only 384 was not assigned to a specific > person. Correct. Sorry about that. -- MST From vlad at lists.openfabrics.org Mon Feb 26 02:26:28 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Mon, 26 Feb 2007 02:26:28 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070226-0200 daily build status Message-ID: <20070226102629.CDC67E6080A@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod --with-vnic-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: Build failed on i686 with linux-2.6.12 Build failed on i686 with linux-2.6.13 Build failed on i686 with linux-2.6.14 Build failed on powerpc with linux-2.6.19 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.19' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on powerpc with linux-2.6.18 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.18' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.19 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.19' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on powerpc with linux-2.6.15 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.15' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on powerpc with linux-2.6.13 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.13' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.18 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.18' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.19 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.19' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.12 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.12' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.18 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.18' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on powerpc with linux-2.6.17 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.17' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.13 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.13' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on powerpc with linux-2.6.14 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.14' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.13 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: ‘struct class_device’ has no member named ‘parent’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function ‘setup_path_class_files’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: ‘struct class_device’ has no member named ‘parent’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.13' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.12 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: ‘struct class_device’ has no member named ‘parent’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function ‘setup_path_class_files’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: ‘struct class_device’ has no member named ‘parent’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.12' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.12 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.12' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.15 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.15' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.14 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.14' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.15 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.15' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.17 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.17' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.16 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.16' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on powerpc with linux-2.6.16 Log: Build failed on x86_64 with linux-2.6.14 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: ‘struct class_device’ has no member named ‘parent’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function ‘setup_path_class_files’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: ‘struct class_device’ has no member named ‘parent’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.14' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.16' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on powerpc with linux-2.6.12 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.12' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.14 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.14' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.16 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.13 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.13' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.17 Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.17' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.16.21-0.8-smp Log: In file included from /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:33: include/linux/parser.h:34: error: expected declaration specifiers or ‘...’ before ‘u64’ include/linux/parser.h:35: error: expected declaration specifiers or ‘...’ before ‘s64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.16.21-0.8-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.5-7.244-smp Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: 'struct class_device' has no member named 'parent' /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function 'setup_path_class_files': /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: 'struct class_device' has no member named 'parent' make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: ‘struct class_device’ has no member named ‘parent’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function ‘setup_path_class_files’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: ‘struct class_device’ has no member named ‘parent’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From kliteyn at dev.mellanox.co.il Mon Feb 26 03:20:06 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 26 Feb 2007 13:20:06 +0200 Subject: [openib-general] [PATCH] osm: trivial data type change to remove compilation warning Message-ID: <45E2C266.5000503@dev.mellanox.co.il> Hi Hal Trivial data type change to remove compilation warning. Please apply to the trunk and to the 1.2 branch. Thanks. Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_ucast_updn.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index 8b86958..70ae10f 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -1005,8 +1005,8 @@ static void expand_lid_matrices_for_lmc( cl_map_item_t *p_next_port, *p_next_sw; osm_port_t *p_port; osm_switch_t *p_sw; - uint16_t lid, min_lid, max_lid, hops; - uint8_t port, num_ports; + uint16_t lid, min_lid, max_lid; + uint8_t port, num_ports, hops; p_next_port = cl_qmap_head( &p_subn->port_guid_tbl ); while (p_next_port != cl_qmap_end(&p_subn->port_guid_tbl)) { -- 1.4.4.1.GIT From halr at voltaire.com Mon Feb 26 05:45:11 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Feb 2007 08:45:11 -0500 Subject: [openib-general] [PATCH] osm: Flushing log file after OSM_SYS_LOG message In-Reply-To: <45E19BE2.2070704@dev.mellanox.co.il> References: <45E19BE2.2070704@dev.mellanox.co.il> Message-ID: <1172497508.4102.267757.camel@hal.voltaire.com> On Sun, 2007-02-25 at 09:23, Yevgeny Kliteynik wrote: > Hi Hal, > > OSM log should be flushed when OSM_SYS_LOG message is > printed. We had this once, but somehow it has disappeared. > > This fix has to go both to trunk and to 1.2. > > Thanks, > > --Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied (to both master and ofed_1_2). -- Hal From halr at voltaire.com Mon Feb 26 05:55:49 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Feb 2007 08:55:49 -0500 Subject: [openib-general] [PATCH] opensm: updn performance improvements In-Reply-To: <20070224201342.GB9147@sashak.voltaire.com> References: <20070224201342.GB9147@sashak.voltaire.com> Message-ID: <1172498135.4102.268407.camel@hal.voltaire.com> On Sat, 2007-02-24 at 15:13, Sasha Khapyorsky wrote: > There are various performance improvements for up/down routing engine: > - updn_node object which is referenced by switch's priv pointer > - ranking for switches only > - replace time consuming cl_list by cl_qlist > - reuse already collected up/down related information (in updn_node > structure) instead of rediscovering > - eliminate many inner loops > - mask time consuming logging > - elminate using two lists with BFS > - minor cleaups > > Now up/down looks 5-6 times faster. Nice work! > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to master only at least for right now; will get to ofed_1_2 in a bit). -- Hal From dy.manju at gmail.com Mon Feb 26 06:35:37 2007 From: dy.manju at gmail.com (manju y) Date: Mon, 26 Feb 2007 20:05:37 +0530 Subject: [openib-general] How to enable fast registration. Message-ID: Hi Can any one suggest me how to enable fast registration bit while creation of queue pair Thanks manju From tziporet at mellanox.co.il Mon Feb 26 07:10:59 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 26 Feb 2007 17:10:59 +0200 Subject: [openib-general] reminder: OFED 1.2 coordination meeting today (Monday Feb-26) at 9amPST Message-ID: <45E2F883.7000001@mellanox.co.il> Hi all, I wish to remind you that we have the OFED 1.2 coordination meeting today (Monday Feb-26) at 9am PST. Agenda: 1. Status update toward beta next week Tziporet Bridge info: Meeting ID: 2106670 Meeting Password: Global Access Numbers: http://cisco.com/en/US/about/doing_business/conferencing/index.html US/Canada: +1.866.432.9903 United Kingdom: +44.20.8824.0117 India: +91.80.4103.3979 Germany: +49.619.6773.9002 Japan: +81.3.5763.9394 China: +86.10.8515.5666 for world-wide access numbers see: http://openib.org/pipermail/openib-general/2007-January/031282.html _______________________________________________ From vlad at dev.mellanox.co.il Mon Feb 26 07:07:45 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 26 Feb 2007 17:07:45 +0200 Subject: [openib-general] HOWTO check ofa_kernel build from your git tree Message-ID: <1172502465.21382.44.camel@vladsk-laptop> On ssh.openfabrics.org: Run env git_url=/home/mst/scm/ofed_1_2_devel.git git_branch=ofed_1_2 \ CHECK_LOCAL=yes \ CHECK_KERNEL_ORG=yes \ CHECK_CROSS=yes /home/vlad/scripts/build_ofa_kernel.sh -- Vladimir Sokolovsky Mellanox Technologies Ltd. From halr at voltaire.com Mon Feb 26 07:23:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Feb 2007 10:23:58 -0500 Subject: [openib-general] ipoib & the partial pkey In-Reply-To: <45E1697E.6050007@voltaire.com> References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com> <45E1697E.6050007@voltaire.com> Message-ID: <1172503433.4102.273563.camel@hal.voltaire.com> On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote: > Sean Hefty wrote: > > I looked into this more... > > RFC 4391 states (middle of page 5): > > For a node to join a partition, one of its ports must be assigned the relevant > > P_Key by the SM [RFC4392]. > > > Jumping to RFC 4392 (top of page 4): > > Just to have us agree on the quote, it is from section 4 of rfc 4392 > (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt > > > at the time of creating an IB multicast group, multiple values such as the > > P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc. have to be > > specified. These values should be such that all potential members of the IB > > multicast group are able to communicate with one another when using them. > > OK, I suggest to remove this spec limitation, IMO you would need to get the IB spec changed first in order to do this. > as it does not allow the > use case of a server using a partition for which inter-client > communication is not allowed. > Actually since it does not let people use partial membership > partitioning with IPoIB as every ipoib device needs to join the > broadcast group, it is probably a spec bug and not a limitation done on > purpose. I'm pretty sure this was done on purpose (a conscious choice) as it is based on what the IBA spec requires. The flip side of this approach are the partial connectivity issues which Sean mentioned and this will be reported as SM failures (e.g. more support issues). > A simple real-life example is I/O target, the system admin wants IB > block and/or file storage traffic to use a partition, but he does not > want initiators to communicate among themselves on this partition. > > To achieve that the SM is configured to assign the partial pkey to the > initiator nodes and the full pkey to the target ports. > > The current implementation of IPoIB and core perfectly (and > transparently...) supports that. and is currently non compliant in its behavior. -- Hal > Or. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Mon Feb 26 07:37:38 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 26 Feb 2007 17:37:38 +0200 Subject: [openib-general] ipoib & the partial pkey In-Reply-To: <1172503433.4102.273563.camel@hal.voltaire.com> References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com> <45E1697E.6050007@voltaire.com> <1172503433.4102.273563.camel@hal.voltaire.com> Message-ID: <45E2FEC2.6010708@voltaire.com> Hal Rosenstock wrote: > On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote: >> Just to have us agree on the quote, it is from section 4 of rfc 4392 >> (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt >>> at the time of creating an IB multicast group, multiple values such as the >>> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc. have to be >>> specified. These values should be such that all potential members of the IB >>> multicast group are able to communicate with one another when using them. >> OK, I suggest to remove this spec limitation, > IMO you would need to get the IB spec changed first in order to do this. do you refers to this? > What about the description og P_Key in MCMemberRecord (table 210 on p. > 908 which is compliance) which states: > > "All members of the multicast group shall have full membership in the > partition indicated by the partition key." if yes, indeed, this also has to be changed. Or. From halr at voltaire.com Mon Feb 26 08:25:07 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Feb 2007 11:25:07 -0500 Subject: [openib-general] ipoib & the partial pkey In-Reply-To: <45E2FEC2.6010708@voltaire.com> References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com> <45E1697E.6050007@voltaire.com> <1172503433.4102.273563.camel@hal.voltaire.com> <45E2FEC2.6010708@voltaire.com> Message-ID: <1172507101.4102.277140.camel@hal.voltaire.com> On Mon, 2007-02-26 at 10:37, Or Gerlitz wrote: > Hal Rosenstock wrote: > > On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote: > > >> Just to have us agree on the quote, it is from section 4 of rfc 4392 > >> (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt > > >>> at the time of creating an IB multicast group, multiple values such as the > >>> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc. have to be > >>> specified. These values should be such that all potential members of the IB > >>> multicast group are able to communicate with one another when using them. > > >> OK, I suggest to remove this spec limitation, > > > IMO you would need to get the IB spec changed first in order to do this. > > do you refers to this? > > > What about the description og P_Key in MCMemberRecord (table 210 on p. > > 908 which is compliance) which states: > > > > "All members of the multicast group shall have full membership in the > > partition indicated by the partition key." > > if yes, indeed, this also has to be changed. Yes, for one. There may be others; I didn't look exhaustively at the spec for this. -- Hal > Or. > From rdreier at cisco.com Mon Feb 26 08:42:20 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Feb 2007 08:42:20 -0800 Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth In-Reply-To: <20070225122211.GD5331@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 25 Feb 2007 14:22:11 +0200") References: <20070225122211.GD5331@mellanox.co.il> Message-ID: nope, doesn't seem to make a difference. From sweitzen at cisco.com Mon Feb 26 08:49:59 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 26 Feb 2007 08:49:59 -0800 Subject: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1 In-Reply-To: <20070226075059.GB27677@mellanox.co.il> References: <20070226075059.GB27677@mellanox.co.il> Message-ID: > Some of these might be fixed in recent nightly builds. > Specifically I know 383 was fixed yesterday. Please check > this and let us know. Thanks, what is the URL for the nightly builds? Scott From vlad at dev.mellanox.co.il Mon Feb 26 08:59:22 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 26 Feb 2007 18:59:22 +0200 Subject: [openib-general] [ewg] RE: bugs filed for problems compiling OFED 1.2 alpha1 In-Reply-To: References: <20070226075059.GB27677@mellanox.co.il> Message-ID: <1172509162.21382.45.camel@vladsk-laptop> On Mon, 2007-02-26 at 08:49 -0800, Scott Weitzenkamp (sweitzen) wrote: > > Some of these might be fixed in recent nightly builds. > > Specifically I know 383 was fixed yesterday. Please check > > this and let us know. > > Thanks, what is the URL for the nightly builds? > > Scott > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg http://www.openfabrics.org/builds/ofa_1_2_kernel/ The latest: http://www.openfabrics.org/builds/ofa_1_2_kernel/ofa_1_2_kernel-20070226-0405.tgz -- Vladimir Sokolovsky Mellanox Technologies Ltd. From sweitzen at cisco.com Mon Feb 26 09:00:49 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 26 Feb 2007 09:00:49 -0800 Subject: [openib-general] [ewg] RE: bugs filed for problems compiling OFED 1.2 alpha1 In-Reply-To: <1172509162.21382.45.camel@vladsk-laptop> References: <20070226075059.GB27677@mellanox.co.il> <1172509162.21382.45.camel@vladsk-laptop> Message-ID: I want a full OFED build, please. This was agreed to in one of the OFED bi-weekly calls. Scott > -----Original Message----- > From: Vladimir Sokolovsky [mailto:vlad at dev.mellanox.co.il] > Sent: Monday, February 26, 2007 8:59 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Michael S. Tsirkin; Openfabrics-ewg at openib.org; OPENIB > Subject: Re: [ewg] RE: bugs filed for problems compiling OFED > 1.2 alpha1 > > On Mon, 2007-02-26 at 08:49 -0800, Scott Weitzenkamp (sweitzen) wrote: > > > Some of these might be fixed in recent nightly builds. > > > Specifically I know 383 was fixed yesterday. Please check > > > this and let us know. > > > > Thanks, what is the URL for the nightly builds? > > > > Scott > > > > _______________________________________________ > > ewg mailing list > > ewg at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > http://www.openfabrics.org/builds/ofa_1_2_kernel/ > > The latest: > http://www.openfabrics.org/builds/ofa_1_2_kernel/ofa_1_2_kerne > l-20070226-0405.tgz > > > -- > Vladimir Sokolovsky > Mellanox Technologies Ltd. > From jsquyres at cisco.com Mon Feb 26 09:05:30 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 26 Feb 2007 12:05:30 -0500 Subject: [openib-general] Fwd: Address List Change Now Scheduled for Wednesday, 2/28/2007 References: <3D84A59A1AD3584DA02AEAD240E8863F03BC471E@ES22SNLNT.srn.sandia.gov> Message-ID: FYI. In case you missed it the Nth time: THIS LIST IS CHANGING ON WEDNESDAY 2/28/2007 (2 days from now). Really. For sure this time. Trust me. Honest. Please update your addressbooks! Begin forwarded message: > From: "Lee, Michael Paichi" > Date: February 22, 2007 11:44:25 AM EST > To: "Jeff Squyres" , "Michael S. Tsirkin" > > Cc: "OpenFabrics General" > Subject: Address List Change Now Scheduled for Wednesday, 2/28/2007 > > The list will now be migrated on Wednesday, 2/28/2007. > > List address: general at lists.openfabrics.org > Updated change-date: Wednesday, 2/28/2007 > > Michael -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From monis at voltaire.com Mon Feb 26 09:07:30 2007 From: monis at voltaire.com (Moni Shoua) Date: Mon, 26 Feb 2007 19:07:30 +0200 Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB Message-ID: <45E313D2.70909@voltaire.com> Hi, This post follows a previous one, regarding required changes to IPoIB to enable it to work with bonding. Please find it here: http://openib.org/pipermail/openib-general/2007-February/032598.html This patch version adds fixes to the comments from Michael Tsirkin from the last post. IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Combing these two flows, there is a hole if some code at ipoib (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev is an ipoib device so for example netdev_priv(n->dev) would be of type struct ipoib_dev_priv. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one. In addition, if an IPoIB device is removed before bonding is unloaded it may cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB device no longer exist. This is why a neighbour cleanup is required during device cleanup. This cleanup scans the arp cache and the ndisc cache to find there neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is unloaded, the neighbour destructor must be set to NULL because the neighbour function is in ib_ipoib. For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is. During my tests I found that when running 1. modprobe -r ib_mthca (to delete IPoIB interfaces) 2. ping somewhere on the subnet of bond0 I get this stack dump (which ends with kernel death) [] skb_under_panic+0x5c/0x60 [] :ib_ipoib:ipoib_hard_header+0xa6/0xc0 [] arp_create+0x120/0x226 [] arp_send+0x25/0x3b [] arp_solicit+0x186/0x195 [] neigh_timer_handler+0x2b5/0x309 [] neigh_timer_handler+0x0/0x309 [] run_timer_softirq+0x130/0x19e [] __do_softirq+0x55/0xc3 [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] smp_apic_timer_interrupt+0x57/0x6a [] mwait_idle+0x0/0x45 [] apic_timer_interrupt+0x66/0x70 [] mwait_idle+0x42/0x45 [] cpu_idle+0x8b/0xae [] start_secondary+0x47f/0x48f The only way I found to avoid this (for now) is to check skb headroom in ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB operation and it seems to solve my problem. However, I would be happy to hear what others think of this last issue. I would really appreciate comments. thanks -MoniS ------------------------------------------------------------------------------ diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 07deee8..31bc6d8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_head list; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 705eb1d..0e3953e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -48,6 +48,8 @@ #include #include #include +#include +#include #define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff) @@ -70,6 +72,7 @@ module_param_named(debug_level, ipoib_de MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0"); #endif +static int ipoib_at_exit = 0; struct ipoib_path_iter { struct net_device *dev; struct ipoib_path path; @@ -490,7 +493,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -735,6 +738,9 @@ static int ipoib_hard_header(struct sk_b { struct ipoib_header *header; + if (skb_headroom(skb) < sizeof *header) { + return -1; + } header = (struct ipoib_header *) skb_push(skb, sizeof *header); header->proto = htons(type); @@ -746,8 +752,11 @@ static int ipoib_hard_header(struct sk_b * figure out where to send the packet later. */ if ((!skb->dst || !skb->dst->neighbour) && daddr) { - struct ipoib_pseudoheader *phdr = - (struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr); + struct ipoib_pseudoheader *phdr = NULL; + if (skb_headroom(skb) < sizeof *phdr) { + return -1; + } + phdr = (struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr); memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN); } @@ -769,32 +778,69 @@ static void ipoib_set_mcast_list(struct static void ipoib_neigh_destructor(struct neighbour *n) { struct ipoib_neigh *neigh; - struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct ipoib_dev_priv *priv; unsigned long flags; struct ipoib_ah *ah = NULL; - ipoib_dbg(priv, - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", - IPOIB_QPN(n->ha), - IPOIB_GID_RAW_ARG(n->ha + 4)); - - spin_lock_irqsave(&priv->lock, flags); neigh = *to_ipoib_neigh(n); if (neigh) { + priv = netdev_priv(neigh->dev); + ipoib_dbg(priv, + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", + IPOIB_QPN(n->ha), + IPOIB_GID_RAW_ARG(n->ha + 4)); + + spin_lock_irqsave(&priv->lock, flags); if (neigh->ah) ah = neigh->ah; list_del(&neigh->list); ipoib_neigh_free(n->dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); } - - spin_unlock_irqrestore(&priv->lock, flags); - if (ah) ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +static void ipoib_neigh_tbl_cleanup_master(struct neigh_table *tbl, + struct net_device* master, + struct net_device* slave) +{ + int i; + struct ipoib_neigh *neigh; + + write_lock_bh(&tbl->lock); + for (i = 0; i <= tbl->hash_mask; i++) { + struct neighbour *n, **np; + + np = &tbl->hash_buckets[i]; + while ((n = *np) != NULL) { + write_lock(&n->lock); + if (n->dev == master) { + neigh = *to_ipoib_neigh(n); + if (neigh && (neigh->dev == slave)){ + if (ipoib_at_exit) + n->parms->neigh_destructor = NULL; + ipoib_neigh_destructor(n); + } + } + write_unlock(&n->lock); + np = &n->next; + } + } + write_unlock_bh(&tbl->lock); +} + +static void ipoib_neigh_cleanup_by_master(struct net_device* master,struct net_device* slave){ + netif_stop_queue(slave); + if (master) { + ipoib_neigh_tbl_cleanup_master(&arp_tbl,master, slave); + ipoib_neigh_tbl_cleanup_master(&nd_tbl,master, slave); + } +} + +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -803,6 +849,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh->neighbour = neighbour; + neigh->dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); @@ -874,6 +921,7 @@ void ipoib_dev_cleanup(struct net_device /* Delete any child interfaces first */ list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) { + ipoib_neigh_cleanup_by_master(cpriv->dev->master, cpriv->dev); unregister_netdev(cpriv->dev); ipoib_dev_cleanup(cpriv->dev); free_netdev(cpriv->dev); @@ -1159,6 +1207,7 @@ static void ipoib_remove_one(struct ib_d ib_unregister_event_handler(&priv->event_handler); flush_scheduled_work(); + ipoib_neigh_cleanup_by_master(priv->dev->master, priv->dev); unregister_netdev(priv->dev); ipoib_dev_cleanup(priv->dev); free_netdev(priv->dev); @@ -1217,6 +1266,8 @@ err_fs: static void __exit ipoib_cleanup_module(void) { + ipoib_at_exit = 1; + ib_unregister_client(&ipoib_client); ib_sa_unregister_client(&ipoib_sa_client); ipoib_unregister_debugfs(); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index b04b72c..a41a949 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -774,7 +774,7 @@ out: if (skb->dst && skb->dst->neighbour && !*to_ipoib_neigh(skb->dst->neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (neigh) { kref_get(&mcast->ah->ref); diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 6a9f616..557be98 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -153,6 +153,7 @@ struct neigh_table nd_tbl = { .gc_thresh2 = 512, .gc_thresh3 = 1024, }; +EXPORT_SYMBOL(nd_tbl); /* ND options */ struct ndisc_options { From vlad at dev.mellanox.co.il Mon Feb 26 09:24:00 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 26 Feb 2007 19:24:00 +0200 Subject: [openib-general] [ewg] RE: bugs filed for problems compiling OFED 1.2 alpha1 In-Reply-To: References: <20070226075059.GB27677@mellanox.co.il> <1172509162.21382.45.camel@vladsk-laptop> Message-ID: <1172510640.21382.48.camel@vladsk-laptop> On Mon, 2007-02-26 at 09:00 -0800, Scott Weitzenkamp (sweitzen) wrote: > I want a full OFED build, please. This was agreed to in one of the OFED > bi-weekly calls. > > Scott > http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070226-1758.tgz Regards, Vladimir From mshefty at ichips.intel.com Mon Feb 26 09:46:50 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Feb 2007 09:46:50 -0800 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <1172394057.12388.3.camel@vladsk-laptop> References: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com> <1172394057.12388.3.camel@vladsk-laptop> Message-ID: <45E31D0A.20400@ichips.intel.com> Vladimir Sokolovsky wrote: > On Fri, 2007-02-23 at 12:15 -0800, Sean Hefty wrote: > > I would like these fixes in OFED 1.2 as well. What git tree / branch > do I > > generate a patch against? > > > > - Sean > > git://git.openfabrics.org/~vlad/ofed_1_2/.git > branch: ofed_1_2 Can you try pulling from: git://git.openfabrics.org/~shefty/ofed_1_2.git ofed_1_2 - Sean From mshefty at ichips.intel.com Mon Feb 26 10:15:41 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Feb 2007 10:15:41 -0800 Subject: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path() In-Reply-To: <45E19730.7010008@dev.mellanox.co.il> References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com> <1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il> <45E19730.7010008@dev.mellanox.co.il> Message-ID: <45E323CD.3080800@ichips.intel.com> > int ib_init_ah_from_path(struct ib_device *device, u8 port_num, > struct ib_sa_path_rec *rec, struct ib_ah_attr > *ah_attr) > { > int ret; > u16 gid_index; > > memset(ah_attr, 0, sizeof *ah_attr); > ah_attr->dlid = be16_to_cpu(rec->dlid); > ah_attr->sl = rec->sl; > ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f; I'm not sure about the '& 0x7f', but... > I have a feeling that this function doesn't handle the src_path_bits as > it should because > it doesn't care what is the LMC value of the slid (i think that if the > LMC is < 8) wrong bits > may be set in the src_path_bits. Wouldn't the function simply include the port's base LID in the source path bits? I would think that the LMC would mask out those bits in the address vector before ANDing the base LID back in to form the SLID. But even if the bits weren't masked out, ANDing the source path bits with the base LID should produce the same result. If I'm not seeing this correctly, can you describe the problem more? - Sean From or.gerlitz at gmail.com Mon Feb 26 11:05:48 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 26 Feb 2007 21:05:48 +0200 Subject: [openib-general] failure to create an FMR mapping 1K pages on memfree In-Reply-To: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com> References: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com> Message-ID: <15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com> oops - i fogot to CC openib-general. On 2/26/07, Or Gerlitz wrote: > Hi Roland, > > I have got a report on failure to create FMR mapping 1K pages (that is > 4MB) on memfree. > > I don't have the exact details (ie if Arbel/Sinai / what FW / etc) > nor which exact check fails in > mthca_fmr_alloc, but what's clear is that the latter function returns > -ENOMEM when attr.max_pages is 1024 and it works fine when > attr.max_pages is 256. > > Is this failure clear to you? if yes, does a HW or FW limit is being > hit or its a driver design issue? > > Or. > From bugzilla-daemon at lists.openfabrics.org Mon Feb 26 12:14:54 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 26 Feb 2007 12:14:54 -0800 (PST) Subject: [openib-general] [Bug 390] New: perftools don't work on alpha1 Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=390 Summary: perftools don't work on alpha1 Product: OpenFabrics Linux Version: 1.2alpha1 Platform: Other OS/Version: Other Status: NEW Severity: blocker Priority: P1 Component: Verbs AssignedTo: bugzilla at openib.org ReportedBy: swise at opengridcomputing.com CC: mst at mellanox.co.il There is no correct component so I assigned it to Verbs. But ib_rmda_bw --cma doesn't seem to work. It just exits immediately after displaying the params: [mpi at r1-iw ~]$ /usr/local/ofed/bin/ib_rdma_bw --cma 5915: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 | [mpi at r1-iw ~]$ /usr/local/ofed/bin/ib_rdma_bw --cma 5916: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 | [mpi at r1-iw ~]$ /usr/local/ofed/bin/ib_rdma_bw --cma --iters=10 5917: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=10 | duplex=0 | cma=1 | -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Mon Feb 26 12:15:18 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 26 Feb 2007 12:15:18 -0800 (PST) Subject: [openib-general] [Bug 390] perftools don't work on alpha1 In-Reply-To: Message-ID: <20070226201518.2E5C3E60803@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=390 ------- Comment #1 from swise at opengridcomputing.com 2007-02-26 12:15 ------- ib_rdma_lat works fine. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sean.hefty at intel.com Mon Feb 26 12:17:30 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 26 Feb 2007 12:17:30 -0800 Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey In-Reply-To: <1172507101.4102.277140.camel@hal.voltaire.com> Message-ID: <000201c759e3$24828410$55d8180a@amr.corp.intel.com> I think the following patch would make ipoib spec compliant. ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib. I'm not certain what this change would do to SRP, but the ib_cm and rdma_cm look okay, given that non-reversible paths aren't supported yet anyway. -- ib_find_cached_pkey masks off the upper-bit of the PKey when searching for a match. The upper bit indicates partial or full membership. Ignoring the upper bit can result in a full membership PKey matching with a partial membership PKey. For ipoib, this can result in joining a multicast group that disallows communication between all members. Signed-off-by: Sean Hefty --- drivers/infiniband/core/cache.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 558c9a0..6f366c3 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -179,7 +179,7 @@ int ib_find_cached_pkey(struct ib_device *device, *index = -1; for (i = 0; i < cache->table_len; ++i) - if ((cache->table[i] & 0x7fff) == (pkey & 0x7fff)) { + if (cache->table[i] == pkey) { *index = i; ret = 0; break; -- 1.4.4.3 From mst at mellanox.co.il Mon Feb 26 13:01:11 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Feb 2007 23:01:11 +0200 Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45E313D2.70909@voltaire.com> References: <45E313D2.70909@voltaire.com> Message-ID: <20070226210111.GC12919@mellanox.co.il> > > During my tests I found that when running > > 1. modprobe -r ib_mthca (to delete IPoIB interfaces) > 2. ping somewhere on the subnet of bond0 > > I get this stack dump (which ends with kernel death) > [] skb_under_panic+0x5c/0x60 > [] :ib_ipoib:ipoib_hard_header+0xa6/0xc0 > [] arp_create+0x120/0x226 > [] arp_send+0x25/0x3b > [] arp_solicit+0x186/0x195 > [] neigh_timer_handler+0x2b5/0x309 > [] neigh_timer_handler+0x0/0x309 > [] run_timer_softirq+0x130/0x19e > [] __do_softirq+0x55/0xc3 > [] call_softirq+0x1c/0x28 > [] do_softirq+0x2c/0x7d > [] smp_apic_timer_interrupt+0x57/0x6a > [] mwait_idle+0x0/0x45 > [] apic_timer_interrupt+0x66/0x70 > [] mwait_idle+0x42/0x45 > [] cpu_idle+0x8b/0xae > [] start_secondary+0x47f/0x48f > > The only way I found to avoid this (for now) is to check skb headroom in > ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB > operation and it seems to solve my problem. However, I would be happy to hear what > others think of this last issue. This seems to mean that hard_header_len is not copied from slave to master device. Right? Maybe that's what needs to be fixed. -- MST From rdreier at cisco.com Mon Feb 26 13:05:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Feb 2007 13:05:30 -0800 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get various post-rc1 cleanups and fixes: Adrian Bunk (2): IB/mthca: Make 2 functions static RDMA/cxgb3: cleanups Michael S. Tsirkin (1): IPoIB/cm: Improve small message bandwidth Roland Dreier (3): IPoIB: Remove unused local_rate tracking IB/uverbs: Return correct error for invalid PD in register MR IPoIB: Correct debugging output when path record lookup fails Sean Hefty (4): IB/core: Set hop limit in ib_init_ah_from_wc correctly RDMA/cma: Request reversible paths only IB/cm: Remove ca_guid from cm_device structure RDMA/cma: Remove unused node_guid from cma_device structure Steve Wise (1): RDMA/cxgb3: Stop the EP Timer on BAD CLOSE drivers/infiniband/core/cm.c | 10 ++--- drivers/infiniband/core/cma.c | 6 ++-- drivers/infiniband/core/uverbs_cmd.c | 4 ++- drivers/infiniband/core/verbs.c | 2 +- drivers/infiniband/hw/cxgb3/Makefile | 1 - drivers/infiniband/hw/cxgb3/cxio_hal.c | 31 +++++----------- drivers/infiniband/hw/cxgb3/cxio_hal.h | 5 --- drivers/infiniband/hw/cxgb3/cxio_resource.c | 14 +------ drivers/infiniband/hw/cxgb3/iwch_cm.c | 6 ++-- drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 +- drivers/infiniband/hw/cxgb3/iwch_provider.h | 1 - drivers/infiniband/hw/cxgb3/iwch_qp.c | 29 +++++++-------- drivers/infiniband/hw/mthca/mthca_mr.c | 10 +++-- drivers/infiniband/ulp/ipoib/ipoib.h | 1 - drivers/infiniband/ulp/ipoib/ipoib_cm.c | 46 ++++++++++++++---------- drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 8 ++--- 17 files changed, 76 insertions(+), 102 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d446998..842cd0b 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -88,7 +88,6 @@ struct cm_port { struct cm_device { struct list_head list; struct ib_device *device; - __be64 ca_guid; struct cm_port port[0]; }; @@ -739,8 +738,8 @@ retest: ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); spin_unlock_irqrestore(&cm_id_priv->lock, flags); ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT, - &cm_id_priv->av.port->cm_dev->ca_guid, - sizeof cm_id_priv->av.port->cm_dev->ca_guid, + &cm_id_priv->id.device->node_guid, + sizeof cm_id_priv->id.device->node_guid, NULL, 0); break; case IB_CM_REQ_RCVD: @@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, req_msg->local_comm_id = cm_id_priv->id.local_id; req_msg->service_id = param->service_id; - req_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid; + req_msg->local_ca_guid = cm_id_priv->id.device->node_guid; cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num)); cm_req_set_resp_res(req_msg, param->responder_resources); cm_req_set_init_depth(req_msg, param->initiator_depth); @@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg, cm_rep_set_flow_ctrl(rep_msg, param->flow_control); cm_rep_set_rnr_retry_count(rep_msg, param->rnr_retry_count); cm_rep_set_srq(rep_msg, param->srq); - rep_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid; + rep_msg->local_ca_guid = cm_id_priv->id.device->node_guid; if (param->private_data && param->private_data_len) memcpy(rep_msg->private_data, param->private_data, @@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device) return; cm_dev->device = device; - cm_dev->ca_guid = device->node_guid; set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); for (i = 1; i <= device->phys_port_cnt; i++) { diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f8d69b3..d441815 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -77,7 +77,6 @@ static int next_port; struct cma_device { struct list_head list; struct ib_device *device; - __be64 node_guid; struct completion comp; atomic_t refcount; struct list_head id_list; @@ -1492,11 +1491,13 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, ib_addr_get_dgid(addr, &path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); path_rec.numb_path = 1; + path_rec.reversible = 1; id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device, id_priv->id.port_num, &path_rec, IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | - IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH, + IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_REVERSIBLE, timeout_ms, GFP_KERNEL, cma_query_handler, work, &id_priv->query); @@ -2672,7 +2673,6 @@ static void cma_add_one(struct ib_device *device) return; cma_dev->device = device; - cma_dev->node_guid = device->node_guid; init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index df1efbc..4fd75af 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -622,8 +622,10 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file, obj->umem.virt_base = cmd.hca_va; pd = idr_read_pd(cmd.pd_handle, file->ucontext); - if (!pd) + if (!pd) { + ret = -EINVAL; goto err_release; + } mr = pd->device->reg_user_mr(pd, &obj->umem, cmd.access_flags, &udata); if (IS_ERR(mr)) { diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 8b5dd36..ccdf93d 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -167,7 +167,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, ah_attr->grh.sgid_index = (u8) gid_index; flow_class = be32_to_cpu(grh->version_tclass_flow); ah_attr->grh.flow_label = flow_class & 0xFFFFF; - ah_attr->grh.hop_limit = grh->hop_limit; + ah_attr->grh.hop_limit = 0xFF; ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF; } return 0; diff --git a/drivers/infiniband/hw/cxgb3/Makefile b/drivers/infiniband/hw/cxgb3/Makefile index 0e110f3..36b9898 100644 --- a/drivers/infiniband/hw/cxgb3/Makefile +++ b/drivers/infiniband/hw/cxgb3/Makefile @@ -8,5 +8,4 @@ iw_cxgb3-y := iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \ ifdef CONFIG_INFINIBAND_CXGB3_DEBUG EXTRA_CFLAGS += -DDEBUG -iw_cxgb3-y += cxio_dbg.o endif diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 114ac3b..d737c73 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -45,7 +45,7 @@ static LIST_HEAD(rdev_list); static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; -static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) +static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) { struct cxio_rdev *rdev; @@ -55,8 +55,7 @@ static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) return NULL; } -static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev - *tdev) +static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev) { struct cxio_rdev *rdev; @@ -118,7 +117,7 @@ int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq, return 0; } -static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) +static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) { struct rdma_cq_setup setup; setup.id = cqid; @@ -130,7 +129,7 @@ static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid) return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup)); } -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) +static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) { u64 sge_cmd; struct t3_modify_qp_wr *wqe; @@ -425,7 +424,7 @@ void cxio_flush_hw_cq(struct t3_cq *cq) } } -static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) +static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq) { if (CQE_OPCODE(*cqe) == T3_TERMINATE) return 0; @@ -760,17 +759,6 @@ ret: return err; } -/* IN : stag key, pdid, pbl_size - * Out: stag index, actaul pbl_size, and pbl_addr allocated. - */ -int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr) -{ - *stag = T3_STAG_UNSET; - return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR, - perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr)); -} - int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid, enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl, u32 *pbl_size, @@ -1029,7 +1017,7 @@ void __exit cxio_hal_exit(void) cxio_hal_destroy_rhdl_resource(); } -static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) +static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) { struct t3_swsq *sqp; __u32 ptr = wq->sq_rptr; @@ -1058,9 +1046,8 @@ static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq) break; } -static inline void create_read_req_cqe(struct t3_wq *wq, - struct t3_cqe *hw_cqe, - struct t3_cqe *read_cqe) +static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe, + struct t3_cqe *read_cqe) { read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr; read_cqe->len = wq->oldest_read->read_len; @@ -1073,7 +1060,7 @@ static inline void create_read_req_cqe(struct t3_wq *wq, /* * Return a ptr to the next read wr in the SWSQ or NULL. */ -static inline void advance_oldest_read(struct t3_wq *wq) +static void advance_oldest_read(struct t3_wq *wq) { u32 rptr = wq->oldest_read - wq->sq + 1; diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h index 8ab04a7..99543d6 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.h +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h @@ -143,7 +143,6 @@ int cxio_rdev_open(struct cxio_rdev *rdev); void cxio_rdev_close(struct cxio_rdev *rdev); int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq, enum t3_cq_opcode op, u32 credit); -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid); int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq); int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq); int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq); @@ -154,8 +153,6 @@ int cxio_create_qp(struct cxio_rdev *rdev, u32 kernel_domain, struct t3_wq *wq, int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq, struct cxio_ucontext *uctx); int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode); -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid, - enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr); int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl, u32 *pbl_size, @@ -171,8 +168,6 @@ int cxio_deallocate_window(struct cxio_rdev *rdev, u32 stag); int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr); void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb); void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb); -u32 cxio_hal_get_rhdl(void); -void cxio_hal_put_rhdl(u32 rhdl); u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp); void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid); int __init cxio_hal_init(void); diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.c b/drivers/infiniband/hw/cxgb3/cxio_resource.c index 65bf577..d3095ae 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_resource.c +++ b/drivers/infiniband/hw/cxgb3/cxio_resource.c @@ -179,7 +179,7 @@ tpt_err: /* * returns 0 if no resource available */ -static inline u32 cxio_hal_get_resource(struct kfifo *fifo) +static u32 cxio_hal_get_resource(struct kfifo *fifo) { u32 entry; if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32))) @@ -188,21 +188,11 @@ static inline u32 cxio_hal_get_resource(struct kfifo *fifo) return 0; /* fifo emptry */ } -static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) +static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry) { BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0); } -u32 cxio_hal_get_rhdl(void) -{ - return cxio_hal_get_resource(rhdl_fifo); -} - -void cxio_hal_put_rhdl(u32 rhdl) -{ - cxio_hal_put_resource(rhdl_fifo, rhdl); -} - u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp) { return cxio_hal_get_resource(rscp->tpt_fifo); diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index e5442e3..b21fde8 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -209,8 +209,7 @@ static enum iwch_ep_state state_read(struct iwch_ep_common *epc) return state; } -static inline void __state_set(struct iwch_ep_common *epc, - enum iwch_ep_state new) +static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new) { epc->state = new; } @@ -1459,7 +1458,7 @@ static int peer_close(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) /* * Returns whether an ABORT_REQ_RSS message is a negative advice. */ -static inline int is_neg_adv_abort(unsigned int status) +static int is_neg_adv_abort(unsigned int status) { return status == CPL_ERR_RTX_NEG_ADVICE || status == CPL_ERR_PERSIST_NEG_ADVICE; @@ -1635,6 +1634,7 @@ static int ec_status(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n", __FUNCTION__, ep->hwtid); + stop_ep_timer(ep); attrs.next_state = IWCH_QP_STATE_ERROR; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 2aef122..9947a14 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -948,7 +948,7 @@ void iwch_qp_rem_ref(struct ib_qp *qp) wake_up(&(to_iwch_qp(qp)->wait)); } -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) +static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) { PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn); return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn); diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 2af3e93..de0fe1b 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -178,7 +178,6 @@ static inline struct iwch_qp *to_iwch_qp(struct ib_qp *ibqp) void iwch_qp_add_ref(struct ib_qp *qp); void iwch_qp_rem_ref(struct ib_qp *qp); -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn); struct iwch_ucontext { struct ib_ucontext ibucontext; diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 4dda2f6..9ea00cc 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -36,8 +36,8 @@ #define NO_SUPPORT -1 -static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, - u8 * flit_cnt) +static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, + u8 * flit_cnt) { int i; u32 plen; @@ -96,8 +96,8 @@ static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr, return 0; } -static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, - u8 *flit_cnt) +static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, + u8 *flit_cnt) { int i; u32 plen; @@ -137,8 +137,8 @@ static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr, return 0; } -static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, - u8 *flit_cnt) +static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, + u8 *flit_cnt) { if (wr->num_sge > 1) return -EINVAL; @@ -158,9 +158,8 @@ static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, /* * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now. */ -static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp, - struct ib_sge *sg_list, u32 num_sgle, - u32 * pbl_addr, u8 * page_size) +static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, + u32 num_sgle, u32 * pbl_addr, u8 * page_size) { int i; struct iwch_mr *mhp; @@ -206,9 +205,8 @@ static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp, return 0; } -static inline int iwch_build_rdma_recv(struct iwch_dev *rhp, - union t3_wr *wqe, - struct ib_recv_wr *wr) +static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe, + struct ib_recv_wr *wr) { int i, err = 0; u32 pbl_addr[4]; @@ -473,8 +471,7 @@ int iwch_bind_mw(struct ib_qp *qp, return err; } -static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, - int tagged) +static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged) { switch (t3err) { case TPT_ERR_STAG: @@ -672,7 +669,7 @@ static void __flush_qp(struct iwch_qp *qhp, unsigned long *flag) spin_lock_irqsave(&qhp->lock, *flag); } -static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag) +static void flush_qp(struct iwch_qp *qhp, unsigned long *flag) { if (t3b_device(qhp->rhp)) cxio_set_wq_in_error(&qhp->wq); @@ -684,7 +681,7 @@ static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag) /* * Return non zero if at least one RECV was pre-posted. */ -static inline int rqes_posted(struct iwch_qp *qhp) +static int rqes_posted(struct iwch_qp *qhp) { return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV; } diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c index 6037dd3..8e4846b 100644 --- a/drivers/infiniband/hw/mthca/mthca_mr.c +++ b/drivers/infiniband/hw/mthca/mthca_mr.c @@ -310,8 +310,9 @@ int mthca_write_mtt_size(struct mthca_dev *dev) return mthca_is_memfree(dev) ? (PAGE_SIZE / sizeof (u64)) : 0x7ffffff; } -void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt, - int start_index, u64 *buffer_list, int list_len) +static void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, + struct mthca_mtt *mtt, int start_index, + u64 *buffer_list, int list_len) { u64 __iomem *mtts; int i; @@ -323,8 +324,9 @@ void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt, mtts + i); } -void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt, - int start_index, u64 *buffer_list, int list_len) +static void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, + struct mthca_mtt *mtt, int start_index, + u64 *buffer_list, int list_len) { __be64 *mtts; dma_addr_t dma_handle; diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 2594db2..fd55826 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -219,7 +219,6 @@ struct ipoib_dev_priv { union ib_gid local_gid; u16 local_lid; - u8 local_rate; unsigned int admin_mtu; unsigned int mcast_mtu; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 4d59682..3484e8b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -65,14 +65,14 @@ struct ipoib_cm_id { static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags, u64 mapping[IPOIB_CM_RX_SG]) { int i; ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); - for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i) + for (i = 0; i < frags; ++i) ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); } @@ -90,7 +90,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr); if (unlikely(ret)) { ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret); - ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[id].mapping); + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + priv->cm.srq_ring[id].mapping); dev_kfree_skb_any(priv->cm.srq_ring[id].skb); priv->cm.srq_ring[id].skb = NULL; } @@ -98,8 +99,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) return ret; } -static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, - u64 mapping[IPOIB_CM_RX_SG]) +static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int frags, + u64 mapping[IPOIB_CM_RX_SG]) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; @@ -107,7 +108,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12); if (unlikely(!skb)) - return -ENOMEM; + return NULL; /* * IPoIB adds a 4 byte header. So we need 12 more bytes to align the @@ -119,10 +120,10 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, DMA_FROM_DEVICE); if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) { dev_kfree_skb_any(skb); - return -EIO; + return NULL; } - for (i = 0; i < IPOIB_CM_RX_SG - 1; i++) { + for (i = 0; i < frags; i++) { struct page *page = alloc_page(GFP_ATOMIC); if (!page) @@ -136,7 +137,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, } priv->cm.srq_ring[id].skb = skb; - return 0; + return skb; partial_error: @@ -146,7 +147,7 @@ partial_error: ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); dev_kfree_skb_any(skb); - return -ENOMEM; + return NULL; } static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev, @@ -309,7 +310,7 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id, } /* Adjust length of skb with fragments to match received data */ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, - unsigned int length) + unsigned int length, struct sk_buff *toskb) { int i, num_frags; unsigned int size; @@ -326,7 +327,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, if (length == 0) { /* don't need this page */ - __free_page(frag->page); + skb_fill_page_desc(toskb, i, frag->page, 0, PAGE_SIZE); --skb_shinfo(skb)->nr_frags; } else { size = min(length, (unsigned) PAGE_SIZE); @@ -344,10 +345,11 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; - struct sk_buff *skb; + struct sk_buff *skb, *newskb; struct ipoib_cm_rx *p; unsigned long flags; u64 mapping[IPOIB_CM_RX_SG]; + int frags; ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n", wr_id, wc->opcode, wc->status); @@ -383,7 +385,11 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) } } - if (unlikely(ipoib_cm_alloc_rx_skb(dev, wr_id, mapping))) { + frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, + (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; + + newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping); + if (unlikely(!newskb)) { /* * If we can't allocate a new RX buffer, dump * this packet and reuse the old buffer. @@ -393,13 +399,13 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) goto repost; } - ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[wr_id].mapping); - memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, sizeof mapping); + ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping); + memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping); ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); - skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len); + skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb); skb->protocol = ((struct ipoib_header *) skb->data)->proto; skb->mac.raw = skb->data; @@ -1193,7 +1199,8 @@ int ipoib_cm_dev_init(struct net_device *dev) priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; for (i = 0; i < ipoib_recvq_size; ++i) { - if (ipoib_cm_alloc_rx_skb(dev, i, priv->cm.srq_ring[i].mapping)) { + if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, + priv->cm.srq_ring[i].mapping)) { ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); ipoib_cm_dev_cleanup(dev); return -ENOMEM; @@ -1228,7 +1235,8 @@ void ipoib_cm_dev_cleanup(struct net_device *dev) return; for (i = 0; i < ipoib_recvq_size; ++i) if (priv->cm.srq_ring[i].skb) { - ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[i].mapping); + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + priv->cm.srq_ring[i].mapping); dev_kfree_skb_any(priv->cm.srq_ring[i].skb); priv->cm.srq_ring[i].skb = NULL; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 18d27fd..f9dbc6f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -385,7 +385,7 @@ static void path_rec_completion(int status, struct sk_buff *skb; unsigned long flags; - if (pathrec) + if (!status) ipoib_dbg(priv, "PathRec LID 0x%04x for GID " IPOIB_GID_FMT "\n", be16_to_cpu(pathrec->dlid), IPOIB_GID_ARG(pathrec->dgid)); else diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index b303ce6..bb2e3d5 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -527,11 +527,9 @@ void ipoib_mcast_join_task(struct work_struct *work) { struct ib_port_attr attr; - if (!ib_query_port(priv->ca, priv->port, &attr)) { - priv->local_lid = attr.lid; - priv->local_rate = attr.active_speed * - ib_width_enum_to_int(attr.active_width); - } else + if (!ib_query_port(priv->ca, priv->port, &attr)) + priv->local_lid = attr.lid; + else ipoib_warn(priv, "ib_query_port failed\n"); } From Ashish.Batwara at lsi.com Mon Feb 26 13:04:41 2007 From: Ashish.Batwara at lsi.com (Batwara, Ashish) Date: Mon, 26 Feb 2007 14:04:41 -0700 Subject: [openib-general] opensm issue Message-ID: <01B9E81EECACE94DBBD0A556E768FB8A013A9361@NAMAIL2.ad.lsil.com> Hi, I am trying to bring up opensm, but it not letting me. When I look at the /var/log/messages, I see that it becomes UP for a moment and then again it goes down. Look for " SUBNET UP " in below logs. Can anyone know what the problem is? I am using OFED-1.1.1 with patches almost 1 month ago. Thanks Ashish Feb 26 14:38:37 p49 run_srp_daemon[7640]: failed srp_daemon: [HCA=mthca0] [port=2] [exit status=0] Feb 26 14:38:37 p49 run_srp_daemon[7642]: failed srp_daemon: [HCA=mthca0] [port=1] [exit status=0] Feb 26 14:38:46 p49 OpenSM[7433]: SM port is down Feb 26 14:38:53 p49 run_srp_daemon[7653]: starting srp_daemon: [HCA=mthca0] [port=2] Feb 26 14:38:53 p49 run_srp_daemon[7658]: starting srp_daemon: [HCA=mthca0] [port=1] Feb 26 14:38:56 p49 OpenSM[7433]: SM port is down Feb 26 14:38:56 p49 run_srp_daemon[7675]: failed srp_daemon: [HCA=mthca0] [port=2] [exit status=0] Feb 26 14:38:56 p49 run_srp_daemon[7680]: failed srp_daemon: [HCA=mthca0] [port=1] [exit status=0] Feb 26 14:39:06 p49 OpenSM[7433]: SM port is down Feb 26 14:39:26 p49 last message repeated 2 times Feb 26 14:39:26 p49 run_srp_daemon[7691]: starting srp_daemon: [HCA=mthca0] [port=1] Feb 26 14:39:26 p49 run_srp_daemon[7692]: starting srp_daemon: [HCA=mthca0] [port=2] Feb 26 14:39:29 p49 run_srp_daemon[7715]: failed srp_daemon: [HCA=mthca0] [port=1] [exit status=0] Feb 26 14:39:29 p49 run_srp_daemon[7716]: failed srp_daemon: [HCA=mthca0] [port=2] [exit status=0] Feb 26 14:39:36 p49 OpenSM[7433]: SM port is down Feb 26 14:39:56 p49 last message repeated 2 times Feb 26 14:39:59 p49 run_srp_daemon[7728]: starting srp_daemon: [HCA=mthca0] [port=1] Feb 26 14:39:59 p49 run_srp_daemon[7727]: starting srp_daemon: [HCA=mthca0] [port=2] Feb 26 14:40:02 p49 run_srp_daemon[7752]: failed srp_daemon: [HCA=mthca0] [port=1] [exit status=0] Feb 26 14:40:02 p49 run_srp_daemon[7751]: failed srp_daemon: [HCA=mthca0] [port=2] [exit status=0] Feb 26 14:40:06 p49 OpenSM[7433]: SM port is down Feb 26 14:40:26 p49 last message repeated 2 times Feb 26 14:40:32 p49 run_srp_daemon[7791]: starting srp_daemon: [HCA=mthca0] [port=2] Feb 26 14:40:32 p49 run_srp_daemon[7792]: starting srp_daemon: [HCA=mthca0] [port=1] Feb 26 14:40:35 p49 run_srp_daemon[7812]: failed srp_daemon: [HCA=mthca0] [port=1] [exit status=0] Feb 26 14:40:35 p49 run_srp_daemon[7817]: failed srp_daemon: [HCA=mthca0] [port=2] [exit status=0] Feb 26 14:40:36 p49 OpenSM[7433]: SM port is down Feb 26 14:40:46 p49 OpenSM[7433]: SM port is down Feb 26 14:40:56 p49 OpenSM[7433]: Entering MASTER state Feb 26 14:40:56 p49 OpenSM[7433]: SUBNET UP Feb 26 14:41:05 p49 run_srp_daemon[7823]: starting srp_daemon: [HCA=mthca0] [port=1] Feb 26 14:41:05 p49 run_srp_daemon[7832]: starting srp_daemon: [HCA=mthca0] [port=2] Feb 26 14:41:06 p49 OpenSM[7433]: SM port is down Feb 26 14:41:08 p49 run_srp_daemon[7847]: failed srp_daemon: [HCA=mthca0] [port=2] [exit status=0] Feb 26 14:41:14 p49 run_srp_daemon[7853]: failed srp_daemon: [HCA=mthca0] [port=1] [exit status=0] Feb 26 14:41:16 p49 OpenSM[7433]: SM port is down From halr at voltaire.com Mon Feb 26 13:25:28 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Feb 2007 16:25:28 -0500 Subject: [openib-general] opensm issue In-Reply-To: <01B9E81EECACE94DBBD0A556E768FB8A013A9361@NAMAIL2.ad.lsil.com> References: <01B9E81EECACE94DBBD0A556E768FB8A013A9361@NAMAIL2.ad.lsil.com> Message-ID: <1172525125.4102.295158.camel@hal.voltaire.com> Hi Ashish, On Mon, 2007-02-26 at 16:04, Batwara, Ashish wrote: > Hi, > I am trying to bring up opensm, but it not letting me. When I look at > the /var/log/messages, I see that it becomes UP for a moment and then > again it goes down. Look for " SUBNET UP " in below logs. Can anyone > know what the problem is? I am using OFED-1.1.1 with patches almost 1 > month ago. > > Thanks > Ashish > > > Feb 26 14:38:37 p49 run_srp_daemon[7640]: failed srp_daemon: > [HCA=mthca0] [port=2] [exit status=0] > Feb 26 14:38:37 p49 run_srp_daemon[7642]: failed srp_daemon: > [HCA=mthca0] [port=1] [exit status=0] > Feb 26 14:38:46 p49 OpenSM[7433]: SM port is down > Feb 26 14:38:53 p49 run_srp_daemon[7653]: starting srp_daemon: > [HCA=mthca0] [port=2] > Feb 26 14:38:53 p49 run_srp_daemon[7658]: starting srp_daemon: > [HCA=mthca0] [port=1] > Feb 26 14:38:56 p49 OpenSM[7433]: SM port is down > Feb 26 14:38:56 p49 run_srp_daemon[7675]: failed srp_daemon: > [HCA=mthca0] [port=2] [exit status=0] > Feb 26 14:38:56 p49 run_srp_daemon[7680]: failed srp_daemon: > [HCA=mthca0] [port=1] [exit status=0] > Feb 26 14:39:06 p49 OpenSM[7433]: SM port is down > Feb 26 14:39:26 p49 last message repeated 2 times > Feb 26 14:39:26 p49 run_srp_daemon[7691]: starting srp_daemon: > [HCA=mthca0] [port=1] > Feb 26 14:39:26 p49 run_srp_daemon[7692]: starting srp_daemon: > [HCA=mthca0] [port=2] > Feb 26 14:39:29 p49 run_srp_daemon[7715]: failed srp_daemon: > [HCA=mthca0] [port=1] [exit status=0] > Feb 26 14:39:29 p49 run_srp_daemon[7716]: failed srp_daemon: > [HCA=mthca0] [port=2] [exit status=0] > Feb 26 14:39:36 p49 OpenSM[7433]: SM port is down > Feb 26 14:39:56 p49 last message repeated 2 times > Feb 26 14:39:59 p49 run_srp_daemon[7728]: starting srp_daemon: > [HCA=mthca0] [port=1] > Feb 26 14:39:59 p49 run_srp_daemon[7727]: starting srp_daemon: > [HCA=mthca0] [port=2] > Feb 26 14:40:02 p49 run_srp_daemon[7752]: failed srp_daemon: > [HCA=mthca0] [port=1] [exit status=0] > Feb 26 14:40:02 p49 run_srp_daemon[7751]: failed srp_daemon: > [HCA=mthca0] [port=2] [exit status=0] > Feb 26 14:40:06 p49 OpenSM[7433]: SM port is down > Feb 26 14:40:26 p49 last message repeated 2 times > Feb 26 14:40:32 p49 run_srp_daemon[7791]: starting srp_daemon: > [HCA=mthca0] [port=2] > Feb 26 14:40:32 p49 run_srp_daemon[7792]: starting srp_daemon: > [HCA=mthca0] [port=1] > Feb 26 14:40:35 p49 run_srp_daemon[7812]: failed srp_daemon: > [HCA=mthca0] [port=1] [exit status=0] > Feb 26 14:40:35 p49 run_srp_daemon[7817]: failed srp_daemon: > [HCA=mthca0] [port=2] [exit status=0] > Feb 26 14:40:36 p49 OpenSM[7433]: SM port is down > Feb 26 14:40:46 p49 OpenSM[7433]: SM port is down > Feb 26 14:40:56 p49 OpenSM[7433]: Entering MASTER state > Feb 26 14:40:56 p49 OpenSM[7433]: SUBNET UP > Feb 26 14:41:05 p49 run_srp_daemon[7823]: starting srp_daemon: > [HCA=mthca0] [port=1] > Feb 26 14:41:05 p49 run_srp_daemon[7832]: starting srp_daemon: > [HCA=mthca0] [port=2] > Feb 26 14:41:06 p49 OpenSM[7433]: SM port is down > Feb 26 14:41:08 p49 run_srp_daemon[7847]: failed srp_daemon: > [HCA=mthca0] [port=2] [exit status=0] > Feb 26 14:41:14 p49 run_srp_daemon[7853]: failed srp_daemon: > [HCA=mthca0] [port=1] [exit status=0] > Feb 26 14:41:16 p49 OpenSM[7433]: SM port is down It appears your SM port to some switch (?) is losing physical connectivity. Try a different (known good) cable. -- Hal > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Mon Feb 26 14:00:52 2007 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 26 Feb 2007 14:00:52 -0800 Subject: [openib-general] ib0 interface up but can't ping In-Reply-To: <327816.12892.qm@web35105.mail.mud.yahoo.com> Message-ID: If your subnet is already has a SM running. Please look at the ifconfig output. If the interface ib0 is UP but not RUNNING, you can't ping since the carrier is not ON. Also look at /var/log/messages to see whether there is any errors. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Mon Feb 26 14:04:53 2007 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 26 Feb 2007 14:04:53 -0800 Subject: [openib-general] IPOIB NAPI In-Reply-To: Message-ID: Roland, Yes. It would be good to reduce number of interrupts by changing all upper layer protocols to use: poll CQ notify CQ, rotting packet notification poll again instead of notify CQ poll CQ If possible this can be in OFED-1.2? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Feb 26 14:09:48 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Feb 2007 14:09:48 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: (Shirley Ma's message of "Mon, 26 Feb 2007 14:08:48 -0800") References: Message-ID: > That would be great. We hit a similar problem in our cluster test -- data > corruption because of this race. On what platform? - R. From xma at us.ibm.com Mon Feb 26 14:08:48 2007 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 26 Feb 2007 14:08:48 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: Message-ID: > Hmm, OK. Then I will do my best to make sure we get a fix for this > into 2.6.22. That would be great. We hit a similar problem in our cluster test -- data corruption because of this race. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Mon Feb 26 14:20:56 2007 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 26 Feb 2007 14:20:56 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: Message-ID: Roland Dreier wrote on 02/26/2007 02:09:48 PM: > > That would be great. We hit a similar problem in our cluster test -- data > > corruption because of this race. > > On what platform? > > - R. On our cell blade + PCI-e Mellanox. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Feb 26 14:27:42 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Feb 2007 14:27:42 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: (Shirley Ma's message of "Mon, 26 Feb 2007 14:20:56 -0800") References: Message-ID: > On our cell blade + PCI-e Mellanox. I don't see anything in arch/powerpc that looks like dma_alloc_coherent() will do anything other than allocate some memory and map it with DMA_BIDIRECTIONAL. So how does this altix fix help in your situation? Am I misreading the Cell IOMMU code? - R. From rdreier at cisco.com Mon Feb 26 14:36:26 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Feb 2007 14:36:26 -0800 Subject: [openib-general] IPOIB NAPI In-Reply-To: (Shirley Ma's message of "Mon, 26 Feb 2007 14:04:53 -0800") References: Message-ID: > Yes. It would be good to reduce number of interrupts by changing all upper > layer protocols to use: > > poll CQ > notify CQ, rotting packet notification > poll again > > instead of > notify CQ > poll CQ > > If possible this can be in OFED-1.2? No way, it's way too late at this point to change the kernel-user ABI, let alone change all ULPs. - R. From halr at voltaire.com Mon Feb 26 14:47:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Feb 2007 17:47:58 -0500 Subject: [openib-general] [PATCH] opensm: faster min hops In-Reply-To: <20070225214845.GF11957@sashak.voltaire.com> References: <20070225214845.GF11957@sashak.voltaire.com> Message-ID: <1172530075.4102.299979.camel@hal.voltaire.com> On Sun, 2007-02-25 at 16:48, Sasha Khapyorsky wrote: > After gprof output analyzing, I noticed that current lmx (switch's lid > matrix) implementation is extremely slow. This simple hops matrix > reimplementation makes lid matrices build process two times faster. Excellent! > Signed-off-by: Sasha Khapyorsky Thanks! Applied (to master only right now). -- Hal From krause at cup.hp.com Mon Feb 26 15:34:39 2007 From: krause at cup.hp.com (Michael Krause) Date: Mon, 26 Feb 2007 15:34:39 -0800 Subject: [openib-general] IB routing discussion summary In-Reply-To: <000201c755f1$727618d0$8698070a@amr.corp.intel.com> References: <6.2.0.14.2.20070220103929.02953a20@esmail.cup.hp.com> <000201c755f1$727618d0$8698070a@amr.corp.intel.com> Message-ID: <6.2.0.14.2.20070226152634.026a85a8@esmail.cup.hp.com> At 11:49 AM 2/21/2007, Sean Hefty wrote: >I sent a message on this topic to the IBTA several days ago, but I am still >awaiting details (likely early next week). Unclear if that will occur. I just responded to some e-mail in the IBTA on the router subject as well. Given that discussion, I suspect it will be some time coming to fully answer the router dilemma. > >It should not be carried in the CM REQ. The SLID / DLID of the router > >ports should be derived through local subnet SA / SM query. When a CM REQ > >traverses one or more subnets there will be potentially many SLID / DLID > >involved in the communication. Each router should be populating its > >routing tables in order to build the new LRH attached to the GRH / CM REQ > >that it is forwarding to the next hop. > >I'm referring to configuration of the QP, not the operation of the routers. > >To establish a connection, the passive side QP needs to transition from >Init to >RTR. As part of that transition, the modify QP verb needs as input the >Destination LID of its local router. It sounds like you expect the >passive side >to perform an SA query to obtain its own local routing information, which >would >essentially invalidate the data carried in the primary and alternate path >fields >in the CM REQ. The source always queries to obtain a subnet-local router Port. A sink can simply reflect back the LRH with source / destination LID reversed assuming it had such information or it can query to find the optimal / preferred subnet-local router Port. > >From reading 12.7.11, 13.5.1, and 17.4, I do not believe that such a > requirement >was expected to be placed on the passive side of a connection. The initial >response I received agreed with this. > > >I'd need to go back but the architecture is predicated that the SM and SA > >are strictly local and for security purposes their communication should > >remain local. Higher level management entities built to communicate with > >SM and SA are responsible for cross subnet communications without exposing > >the SA or SM to direct interaction. P_Key and Q_Key management across > >subnets is an example of such communication across subnets that would not > >be exposed to the SA and SM. > >My initial thoughts are that this sounds like a good idea. It's not >eliminating >the need for interacting with a remote SA, so much as it abstracts it to >another >entity. > >My hope is that we can reach an agreement on the CM REQ. Depending on >that, it >still needs to determine if the existing SA attributes are sufficient to allow >forming inter-subnet connections, and if they are, can such attributes be >obtained. A lot of discussion will be required within the IBTA to nail anything down. As I noted above, I just provided answers to a number of questions posed as well as opened up perhaps a few more. I am not aware of a TTM to complete this work but clearly some amount of standardization is required and it will take a bit to define the scope so that the specification does not become so large that it will take significant amount of time to develop and more importantly, significant resources and time to validate that the routing protocol is solid. Routing protocols are not as simple as some may think - they vary as a function of the functional robustness and scalability provided. For now, I'll assume this discussion is on hold until the IBTA gets its act together. Mike From vartval at itweurope.com Mon Feb 26 17:15:16 2007 From: vartval at itweurope.com (ITWorks =?ISO-8859-1?Q?V=E5rtVal?=) Date: Tue, 27 Feb 2007 02:15:16 +0100 Subject: [openib-general] =?iso-8859-1?q?B=E4ttre_luft_=3D=3FISO-8859-1=3F?= =?iso-8859-1?q?Q=3Ff=3DF6r_b=3DE4ttre_h=3DE4lsa=3F=3D?= Message-ID: <01151779636287@quercus.itweurope.com> - This mail is in HTML. Some elements may be ommited in plain text. - Tel: +46 (0)8 625 46 40 ULTRA-TYST LUFTRENARE F�R KONTOR OCH HEM Lider du av allergiska besv�r p� v�ren? Besv�ras du av illaluktande �mnen p� din arbetsplats eller i hemmet? En luftrenare kan g�ra underverk p� din h�lsa! Vi rekommenderar pollenallergiska personer att st�lla ett luftfilter vid anslutning till arbetsplatsen och i sovrummet Lukt- och dammsensorer k�nner automatiskt av luftmilj�n Renar luften fr�n pollen, damm och partiklar Anti-bakteriellt filter minskar infektionsrisken i gemensamma utrymmen I princip ljudl�s - perfekt f�r sovrummet eller andra ljudk�nsliga milj�er Elegant design och l�g vikt (endast 7 kg) Mycket l�ttanv�nd med fj�rrkontroll Tv�ttbart filter 2 �rs fabriksgaranti >> Best�ll eller l�s mer genom att klicka h�r! Tips: Vill du k�pa Blu-ray eller �kta HDTV? E-posta vartval at itweurope.com s� hj�lper vi dig med leverans. Avbetalning och leasing OK! Om du inte vill ha fler erbjudanden fr�n ITWorks, skicka ett e-brev till removeme at itweurope.com med �rende "remove" If you do not want to recieve any more e-mails from ITWorks sales, please send a message to removeme at itweurope.com with subject "remove" -------------- next part -------------- An HTML attachment was scrubbed... URL: From nimrodg at mellanox.com Mon Feb 26 17:02:09 2007 From: nimrodg at mellanox.com (Nimrod Gindi) Date: Mon, 26 Feb 2007 17:02:09 -0800 Subject: [openib-general] OFED release testing Task force meeting minutes Message-ID: <1E3DCD1C63492545881FACB6063A57C1D4C8D8@mtiexch01.mti.com> Meeting took place on Wednesday - Feb. 21st, 2007 8:30AM (PST) Agenda: 1. Review combined report summary (as sent from Nimrod G.- Mellanox) and vote for approval 2. Next steps 3. Open discussion Attending companies: Qlogic, Mellanox, NetEffect, Voltaire, SystemFabricWorks Discussion Items and Action Items: 1. Reviewed the new report structure 2. Spread sheet was voted and agreed upon with 2 minor changes to make it rev 2 a. AI 1: Nimrod G. - move all items from sheet 3 to sheet 2 in the drop down format and remove sheet 3 b. AI 2: Nimrod G. - Add RHEL4 up4 and up3 to the supported section per latest decisions of ewg. 3. Next agreed steps: a. Start using the spread sheet towards later Alpha build of OFED 1.2 to assist with visibility into testing done by members b. Adding tests from member companies to shared OFED repository. i. AI 3: Amit K. - send out a pointer to tests which are already posted by Mellanox in OFED. c. Start considering ULP owners under the following understanding of responsibilities: i. ULP owner will be in charge of approving entering tests of the ULP to enter the list/repository ii. ULP owner to flag the task force in case in which the ULP under his responsibility is falling behind on testing in the community. Follow-up meeting will be scheduled for 7th-March 2007 8:30am PDT=11:30am EDT=6:30pm Israel. Nimrod Gindi Mellanox Technologies Ltd. mail : nimrodg at mellanox.com Cell : +1-408-750-4801 Office: +1-347-342-0011 Fax : +1-212-987-0275 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OFED testing report format rev2.xls Type: application/vnd.ms-excel Size: 48640 bytes Desc: OFED testing report format rev2.xls URL: From m3mber at eBay.com Mon Feb 26 18:25:38 2007 From: m3mber at eBay.com (eBay Member) Date: Tue, 27 Feb 2007 03:25:38 +0100 (CET) Subject: [openib-general] Message From eBay Member Message-ID: <20070227022539.0179D2F80EC@dd1224.kasserver.com> An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Feb 26 20:32:40 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Feb 2007 20:32:40 -0800 Subject: [openib-general] failure to create an FMR mapping 1K pages on memfree In-Reply-To: <15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com> (Or Gerlitz's message of "Mon, 26 Feb 2007 21:05:48 +0200") References: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com> <15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com> Message-ID: > I have got a report on failure to create FMR mapping 1K pages (that is > 4MB) on memfree. > > I don't have the exact details (ie if Arbel/Sinai / what FW / etc) > nor which exact check fails in > mthca_fmr_alloc, but what's clear is that the latter function returns > -ENOMEM when attr.max_pages is 1024 and it works fine when > attr.max_pages is 256. > > Is this failure clear to you? if yes, does a HW or FW limit is being > hit or its a driver design issue? Is it really returning -ENOMEM? It seems much more likely that you are hitting the code /* For Arbel, all MTTs must fit in the same page. */ if (mthca_is_memfree(dev) && mr->attr.max_pages * sizeof *mr->mem.arbel.mtts > PAGE_SIZE) return -EINVAL; I guess you could call this limit a driver design issue. - R. From mst at mellanox.co.il Mon Feb 26 22:02:45 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 08:02:45 +0200 Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45E313D2.70909@voltaire.com> References: <45E313D2.70909@voltaire.com> Message-ID: <20070227060245.GI12919@mellanox.co.il> > When using the bonding driver, neighbours are created by the net stack on behalf > of the bonding (master) device. On the tx flow the bonding code gets an skb such > that skb->dev points to the master device, it changes this skb to point on the > slave device and calls the slave hard_start_xmit function. > > > Combing these two flows, there is a hole if some code at ipoib > (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev > is an ipoib device so for example netdev_priv(n->dev) would be of type struct > ipoib_dev_priv. > > To fix it, this patch adds a dev field to struct ipoib_neigh which is used > instead of the struct neighbour dev one. It seems that in this design, if multiple ipoib interfaces are present, we might get an skb such that skb->dev will be different from the new dev field in struct ipoib_neigh. It seems that the result will be that the packet will be sent on a wrong interface. Right? > In addition, if an IPoIB device is removed before bonding is unloaded it may > cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB > device no longer exist. This is why a neighbour cleanup is required during device > cleanup. This cleanup scans the arp cache and the ndisc cache to find there > neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is > unloaded, the neighbour destructor must be set to NULL because the neighbour function is in > ib_ipoib. > For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is. I wonder about this: is it really true that any allocated neighbour is always in either arp_tbl or nd_tbl? For example, could some code have called neigh_hold and retained a neighbour that is not in either one of these tables? > During my tests I found that when running > > 1. modprobe -r ib_mthca (to delete IPoIB interfaces) > 2. ping somewhere on the subnet of bond0 > > I get this stack dump (which ends with kernel death) > [] skb_under_panic+0x5c/0x60 > [] :ib_ipoib:ipoib_hard_header+0xa6/0xc0 > [] arp_create+0x120/0x226 > [] arp_send+0x25/0x3b > [] arp_solicit+0x186/0x195 > [] neigh_timer_handler+0x2b5/0x309 > [] neigh_timer_handler+0x0/0x309 > [] run_timer_softirq+0x130/0x19e > [] __do_softirq+0x55/0xc3 > [] call_softirq+0x1c/0x28 > [] do_softirq+0x2c/0x7d > [] smp_apic_timer_interrupt+0x57/0x6a > [] mwait_idle+0x0/0x45 > [] apic_timer_interrupt+0x66/0x70 > [] mwait_idle+0x42/0x45 > [] cpu_idle+0x8b/0xae > [] start_secondary+0x47f/0x48f > > The only way I found to avoid this (for now) is to check skb headroom in > ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB > operation and it seems to solve my problem. However, I would be happy to hear what > others think of this last issue. As I said, this seems to indicate a problem in the bonding code. But what will happen after you error out in ipoib_hard_header? Is the packet dropped? What might break as a result? > I would really appreciate comments. > > thanks > > -MoniS ------------------------------------------------------------------------------ diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 07deee8..31bc6d8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_head list; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 705eb1d..0e3953e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -48,6 +48,8 @@ #include #include #include +#include +#include #define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff) @@ -70,6 +72,7 @@ module_param_named(debug_level, ipoib_de MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0"); #endif +static int ipoib_at_exit = 0; struct ipoib_path_iter { struct net_device *dev; struct ipoib_path path; @@ -490,7 +493,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -735,6 +738,9 @@ static int ipoib_hard_header(struct sk_b { struct ipoib_header *header; + if (skb_headroom(skb) < sizeof *header) { + return -1; + } header = (struct ipoib_header *) skb_push(skb, sizeof *header); header->proto = htons(type); @@ -746,8 +752,11 @@ static int ipoib_hard_header(struct sk_b * figure out where to send the packet later. */ if ((!skb->dst || !skb->dst->neighbour) && daddr) { - struct ipoib_pseudoheader *phdr = - (struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr); + struct ipoib_pseudoheader *phdr = NULL; + if (skb_headroom(skb) < sizeof *phdr) { + return -1; + } + phdr = (struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr); memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN); } @@ -769,32 +778,69 @@ static void ipoib_set_mcast_list(struct static void ipoib_neigh_destructor(struct neighbour *n) { struct ipoib_neigh *neigh; - struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct ipoib_dev_priv *priv; unsigned long flags; struct ipoib_ah *ah = NULL; - ipoib_dbg(priv, - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", - IPOIB_QPN(n->ha), - IPOIB_GID_RAW_ARG(n->ha + 4)); - - spin_lock_irqsave(&priv->lock, flags); neigh = *to_ipoib_neigh(n); if (neigh) { + priv = netdev_priv(neigh->dev); + ipoib_dbg(priv, + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", + IPOIB_QPN(n->ha), + IPOIB_GID_RAW_ARG(n->ha + 4)); + + spin_lock_irqsave(&priv->lock, flags); if (neigh->ah) ah = neigh->ah; list_del(&neigh->list); ipoib_neigh_free(n->dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); } - - spin_unlock_irqrestore(&priv->lock, flags); - if (ah) ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +static void ipoib_neigh_tbl_cleanup_master(struct neigh_table *tbl, + struct net_device* master, + struct net_device* slave) +{ + int i; + struct ipoib_neigh *neigh; + + write_lock_bh(&tbl->lock); + for (i = 0; i <= tbl->hash_mask; i++) { + struct neighbour *n, **np; + + np = &tbl->hash_buckets[i]; + while ((n = *np) != NULL) { + write_lock(&n->lock); + if (n->dev == master) { + neigh = *to_ipoib_neigh(n); + if (neigh && (neigh->dev == slave)){ + if (ipoib_at_exit) + n->parms->neigh_destructor = NULL; + ipoib_neigh_destructor(n); + } + } + write_unlock(&n->lock); + np = &n->next; + } + } + write_unlock_bh(&tbl->lock); +} + +static void ipoib_neigh_cleanup_by_master(struct net_device* master,struct net_device* slave){ + netif_stop_queue(slave); + if (master) { + ipoib_neigh_tbl_cleanup_master(&arp_tbl,master, slave); + ipoib_neigh_tbl_cleanup_master(&nd_tbl,master, slave); + } +} + +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -803,6 +849,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh->neighbour = neighbour; + neigh->dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); @@ -874,6 +921,7 @@ void ipoib_dev_cleanup(struct net_device /* Delete any child interfaces first */ list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) { + ipoib_neigh_cleanup_by_master(cpriv->dev->master, cpriv->dev); unregister_netdev(cpriv->dev); ipoib_dev_cleanup(cpriv->dev); free_netdev(cpriv->dev); @@ -1159,6 +1207,7 @@ static void ipoib_remove_one(struct ib_d ib_unregister_event_handler(&priv->event_handler); flush_scheduled_work(); + ipoib_neigh_cleanup_by_master(priv->dev->master, priv->dev); unregister_netdev(priv->dev); ipoib_dev_cleanup(priv->dev); free_netdev(priv->dev); @@ -1217,6 +1266,8 @@ err_fs: static void __exit ipoib_cleanup_module(void) { + ipoib_at_exit = 1; + ib_unregister_client(&ipoib_client); ib_sa_unregister_client(&ipoib_sa_client); ipoib_unregister_debugfs(); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index b04b72c..a41a949 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -774,7 +774,7 @@ out: if (skb->dst && skb->dst->neighbour && !*to_ipoib_neigh(skb->dst->neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (neigh) { kref_get(&mcast->ah->ref); diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 6a9f616..557be98 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -153,6 +153,7 @@ struct neigh_table nd_tbl = { .gc_thresh2 = 512, .gc_thresh3 = 1024, }; +EXPORT_SYMBOL(nd_tbl); /* ND options */ struct ndisc_options { -- MST From diego.guella at sircomtech.com Mon Feb 26 23:10:50 2007 From: diego.guella at sircomtech.com (Diego Guella) Date: Tue, 27 Feb 2007 08:10:50 +0100 Subject: [openib-general] Fwd: Address List Change Now Scheduled for Wednesday, 2/28/2007 References: <3D84A59A1AD3584DA02AEAD240E8863F03BC471E@ES22SNLNT.srn.sandia.gov> Message-ID: <009301c75a3e$7165aef0$05c8a8c0@DIEGO> Should I do something to get subscribed to the new mailing list or I will be automatically subscribed? The only change is that I have to write messages to general at lists.openfabrics.org, correct? ----- Original Message ----- From: "Jeff Squyres" To: "OpenFabrics General" Sent: Monday, February 26, 2007 6:05 PM Subject: [openib-general] Fwd: Address List Change Now Scheduled for Wednesday, 2/28/2007 > FYI. In case you missed it the Nth time: THIS LIST IS CHANGING ON > WEDNESDAY 2/28/2007 (2 days from now). Really. For sure this time. > Trust me. Honest. > > Please update your addressbooks! > > > > Begin forwarded message: > >> From: "Lee, Michael Paichi" >> Date: February 22, 2007 11:44:25 AM EST >> To: "Jeff Squyres" , "Michael S. Tsirkin" >> >> Cc: "OpenFabrics General" >> Subject: Address List Change Now Scheduled for Wednesday, 2/28/2007 >> >> The list will now be migrated on Wednesday, 2/28/2007. >> >> List address: general at lists.openfabrics.org >> Updated change-date: Wednesday, 2/28/2007 >> >> Michael > > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From philippe_bernadat at hp.com Tue Feb 27 00:33:11 2007 From: philippe_bernadat at hp.com (Bernadat, Philippe) Date: Tue, 27 Feb 2007 09:33:11 +0100 Subject: [openib-general] failure to create an FMR mapping 1K pages on memfree In-Reply-To: References: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com><15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com> Message-ID: <3F3894AC7A13B04E83CEBC95CFD3047E05B06D13@idaexc03.emea.cpqcorp.net> Roland is right, I checked were mthca_fmr_alloc() was failing. Mtts is one page of pointers, so max is 512. Does work with 512. I checked, mthca_alloc_fmr returns EINVAL, then ib_create_fmr_pool returns ENOMEM. So this isn't a hardware limitation since the Voltaire Stack managed to handle 1024 pages on the same board. Is there a way to fix OFED ? Philippe > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Tuesday, February 27, 2007 5:33 AM > To: Or Gerlitz > Cc: Bernadat, Philippe; openib > Subject: Re: failure to create an FMR mapping 1K pages on memfree > > > I have got a report on failure to create FMR mapping 1K > pages (that is > > 4MB) on memfree. > > > > I don't have the exact details (ie if Arbel/Sinai / what FW / etc) > > nor which exact check fails in > > mthca_fmr_alloc, but what's clear is that the latter > function returns > > -ENOMEM when attr.max_pages is 1024 and it works fine when > > attr.max_pages is 256. > > > > Is this failure clear to you? if yes, does a HW or FW > limit is being > > hit or its a driver design issue? > > Is it really returning -ENOMEM? It seems much more likely that you > are hitting the code > > /* For Arbel, all MTTs must fit in the same page. */ > if (mthca_is_memfree(dev) && > mr->attr.max_pages * sizeof *mr->mem.arbel.mtts > PAGE_SIZE) > return -EINVAL; > > I guess you could call this limit a driver design issue. > > - R. > From vlad at lists.openfabrics.org Tue Feb 27 02:29:38 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Tue, 27 Feb 2007 02:29:38 -0800 (PST) Subject: [openib-general] ofa_1_2_kernel 20070227-0200 daily build status Message-ID: <20070227102938.74A69E60803@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Failed: Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From jsquyres at cisco.com Tue Feb 27 03:08:31 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 27 Feb 2007 06:08:31 -0500 Subject: [openib-general] Fwd: Address List Change Now Scheduled for Wednesday, 2/28/2007 In-Reply-To: <009301c75a3e$7165aef0$05c8a8c0@DIEGO> References: <3D84A59A1AD3584DA02AEAD240E8863F03BC471E@ES22SNLNT.srn.sandia.gov> <009301c75a3e$7165aef0$05c8a8c0@DIEGO> Message-ID: On Feb 27, 2007, at 2:10 AM, Diego Guella wrote: > Should I do something to get subscribed to the new mailing list or > I will be automatically subscribed? There is nothing that you need to do; the list is simply being migrated from one server to another and changing names in the process. > The only change is that I have to write messages to > general at lists.openfabrics.org, correct? Correct. There will be aliases in place to redirect messages from the old name to the new name, too. So the warning is more about updating e-mail client filters, etc. > > > > ----- Original Message ----- From: "Jeff Squyres" > To: "OpenFabrics General" > Sent: Monday, February 26, 2007 6:05 PM > Subject: [openib-general] Fwd: Address List Change Now Scheduled > for Wednesday, 2/28/2007 > > >> FYI. In case you missed it the Nth time: THIS LIST IS CHANGING ON >> WEDNESDAY 2/28/2007 (2 days from now). Really. For sure this time. >> Trust me. Honest. >> >> Please update your addressbooks! >> >> >> >> Begin forwarded message: >> >>> From: "Lee, Michael Paichi" >>> Date: February 22, 2007 11:44:25 AM EST >>> To: "Jeff Squyres" , "Michael S. Tsirkin" >>> >>> Cc: "OpenFabrics General" >>> Subject: Address List Change Now Scheduled for Wednesday, 2/28/2007 >>> >>> The list will now be migrated on Wednesday, 2/28/2007. >>> >>> List address: general at lists.openfabrics.org >>> Updated change-date: Wednesday, 2/28/2007 >>> >>> Michael >> >> >> -- >> Jeff Squyres >> Server Virtualization Business Unit >> Cisco Systems >> >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ >> openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From cppbala at yahoo.com Tue Feb 27 03:30:48 2007 From: cppbala at yahoo.com (Bala) Date: Tue, 27 Feb 2007 03:30:48 -0800 (PST) Subject: [openib-general] ib0 shows MAC address as 00-00-00.... is it normal?? Message-ID: <87194.51250.qm@web35102.mail.mud.yahoo.com> Hi All, We have build and installed OFED-1.1 on RHEL-4 machine, using ipoib we set the IPs for the interface and able to ping each other, but my ifconfig shows ib0 MAC address as shown below "00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00" -------------- ib0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:271465 errors:0 dropped:0 overruns:0 frame:0 TX packets:1444336 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:15664386 (14.9 MiB) TX bytes:2718736764 (2.5 GiB) ------------------- pls let me know is it normal, is there any way to get the real hw/mac address. regards, Bala. ____________________________________________________________________________________ Be a PS3 game guru. Get your game face on with the latest PS3 news and previews at Yahoo! Games. http://videogames.yahoo.com/platform?platform=120121 From cppbala at yahoo.com Tue Feb 27 03:35:03 2007 From: cppbala at yahoo.com (Bala) Date: Tue, 27 Feb 2007 03:35:03 -0800 (PST) Subject: [openib-general] mpi over IB Message-ID: <656981.21379.qm@web35115.mail.mud.yahoo.com> Hi All, We have build and installed OFED-1.1 on RHEL-4 machines, while compiling selected mpi support, pls through some light on how to use mpi over IB interface, using what modules etc. or do we need to install separate mpi software to use. thanks in advance, -bala- ____________________________________________________________________________________ 8:00? 8:25? 8:40? Find a flick in no time with the Yahoo! Search movie showtime shortcut. http://tools.search.yahoo.com/shortcuts/#news From jsquyres at cisco.com Tue Feb 27 03:43:36 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 27 Feb 2007 06:43:36 -0500 Subject: [openib-general] mpi over IB In-Reply-To: <656981.21379.qm@web35115.mail.mud.yahoo.com> References: <656981.21379.qm@web35115.mail.mud.yahoo.com> Message-ID: During the installation process, the OFED installer should have asked you if you wanted to install Open MPI and/or MVAPICH. Both of these MPI implementations are capable of communicating natively over the IB interface. Running MPI applications with Open MPI should natively choose the IB interface at run time if your IB network is up and running properly (e.g., try running ibv_devinfo to ensure that ports are listed in the PORT_ACTIVE state, etc.). I assume that the same is true with MVAPICH as well. On Feb 27, 2007, at 6:35 AM, Bala wrote: > Hi All, > We have build and installed OFED-1.1 > on RHEL-4 machines, while compiling selected > mpi support, pls through some light on how > to use mpi over IB interface, using what > modules etc. or do we need to install separate > mpi software to use. > > thanks in advance, > -bala- > > > > ______________________________________________________________________ > ______________ > 8:00? 8:25? 8:40? Find a flick in no time > with the Yahoo! Search movie showtime shortcut. > http://tools.search.yahoo.com/shortcuts/#news > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From halr at voltaire.com Tue Feb 27 03:38:08 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Feb 2007 06:38:08 -0500 Subject: [openib-general] ib0 shows MAC address as 00-00-00.... is it normal?? In-Reply-To: <87194.51250.qm@web35102.mail.mud.yahoo.com> References: <87194.51250.qm@web35102.mail.mud.yahoo.com> Message-ID: <1172576284.4102.346987.camel@hal.voltaire.com> On Tue, 2007-02-27 at 06:30, Bala wrote: > Hi All, > We have build and installed OFED-1.1 on > RHEL-4 machine, using ipoib we set the IPs > for the interface and able to ping each other, > but my ifconfig shows ib0 MAC address as > shown below > "00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00" > > -------------- > ib0 Link encap:UNSPEC HWaddr > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 > inet addr:192.168.0.1 Bcast:192.168.0.255 > Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:2044 > Metric:1 > RX packets:271465 errors:0 dropped:0 > overruns:0 frame:0 > TX packets:1444336 errors:0 dropped:0 > overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:15664386 (14.9 MiB) TX > bytes:2718736764 (2.5 GiB) > ------------------- > > pls let me know is it normal, Depends on the (truncated) guid for the HCA port. > is there any way > to get the real hw/mac address. ip addr show ib0 -- Hal > regards, > Bala. > > > > ____________________________________________________________________________________ > Be a PS3 game guru. > Get your game face on with the latest PS3 news and previews at Yahoo! Games. > http://videogames.yahoo.com/platform?platform=120121 > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From monis at voltaire.com Tue Feb 27 03:54:59 2007 From: monis at voltaire.com (Moni Shoua) Date: Tue, 27 Feb 2007 13:54:59 +0200 Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070227060245.GI12919@mellanox.co.il> References: <45E313D2.70909@voltaire.com> <20070227060245.GI12919@mellanox.co.il> Message-ID: <45E41C13.8090300@voltaire.com> Thanks for the comments >> To fix it, this patch adds a dev field to struct ipoib_neigh which is used >> instead of the struct neighbour dev one. > > It seems that in this design, if multiple ipoib interfaces are present, we might > get an skb such that skb->dev will be different from the new dev field in struct > ipoib_neigh. > > It seems that the result will be that the packet will be sent on a wrong interface. > Right? > I don't see how. The field dev in ipoib_neigh doesn't take part in interface selection. As I see it, skb travels this path: 1. Passed to bond_dev->hard_start_xmit 2. bond_dev->hard_start_xmit chooses the current active interface, changes skb->dev and enqueues it back for xmittig. >> In addition, if an IPoIB device is removed before bonding is unloaded it may >> cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB >> device no longer exist. This is why a neighbour cleanup is required during device >> cleanup. This cleanup scans the arp cache and the ndisc cache to find there >> neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is >> unloaded, the neighbour destructor must be set to NULL because the neighbour function is in >> ib_ipoib. >> For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is. > > I wonder about this: is it really true that any allocated neighbour is always in > either arp_tbl or nd_tbl? For example, could some code have called neigh_hold > and retained a neighbour that is not in either one of these tables? > I got the assumption about neighbours living in one of these 2 tables from observation and code reading. I preferred that that on keeping track of all ipoib_neighs and putting them in a list. However, I could do that instead of neigh_table scanning. Do you think it's better? For the example... I didn't understand it. Could you please explain? >> During my tests I found that when running >> >> 1. modprobe -r ib_mthca (to delete IPoIB interfaces) >> 2. ping somewhere on the subnet of bond0 >> >> I get this stack dump (which ends with kernel death) >> [] skb_under_panic+0x5c/0x60 >> [] :ib_ipoib:ipoib_hard_header+0xa6/0xc0 >> [] arp_create+0x120/0x226 >> [] arp_send+0x25/0x3b >> [] arp_solicit+0x186/0x195 >> [] neigh_timer_handler+0x2b5/0x309 >> [] neigh_timer_handler+0x0/0x309 >> [] run_timer_softirq+0x130/0x19e >> [] __do_softirq+0x55/0xc3 >> [] call_softirq+0x1c/0x28 >> [] do_softirq+0x2c/0x7d >> [] smp_apic_timer_interrupt+0x57/0x6a >> [] mwait_idle+0x0/0x45 >> [] apic_timer_interrupt+0x66/0x70 >> [] mwait_idle+0x42/0x45 >> [] cpu_idle+0x8b/0xae >> [] start_secondary+0x47f/0x48f >> >> The only way I found to avoid this (for now) is to check skb headroom in >> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB >> operation and it seems to solve my problem. However, I would be happy to hear what >> others think of this last issue. > > As I said, this seems to indicate a problem in the bonding code. > But what will happen after you error out in ipoib_hard_header? > Is the packet dropped? What might break as a result? > I will check the hard_header_len issue in the bonding code more carefully. From first look it seems that bonding does borrow the hard_header_len. Also, my checks show that it is safe to return with error from hard_header(). For example, in neigh_connected_output: err = dev->hard_header(skb, dev, ntohs(skb->protocol), neigh->ha, NULL, skb->len); read_unlock_bh(&neigh->lock); if (err >= 0) err = neigh->ops->queue_xmit(skb); else { err = -EINVAL; kfree_skb(skb); >> I would really appreciate comments. >> >> thanks >> >> -MoniS > From monil at voltaire.com Tue Feb 27 05:02:50 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 15:02:50 +0200 Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter. Message-ID: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> Hello, I did a short code review of the ipoib code concentrating on partitioning support and I mentioned that the asynchronous events handler in the ipoib code does not take the port number reported in the event record into consideration. The effect of that is that all of the ib# devices related to that specific HCA are flushed when it seems to me that only the relevant port one should be. Is that done on purpose, or am I missing something ? Thanks, Moni p.s. I'm working on a patch that should solve another issue caused by PKEY reordering & ipoib behavior and the above issue further complicates things for me. From mst at mellanox.co.il Tue Feb 27 05:51:46 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 15:51:46 +0200 Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> Message-ID: <20070227135131.GA4437@mellanox.co.il> > Quoting Moni Levy : > Subject: [RFC] IB/ipoib: Asynchronous events delivered without port parameter. > > Hello, > I did a short code review of the ipoib code concentrating on > partitioning support and I mentioned that the asynchronous events > handler in the ipoib code does not take the port number reported in > the event record into consideration. The effect of that is that all of > the ib# devices related to that specific HCA are flushed when it seems > to me that only the relevant port one should be. Is that done on > purpose, or am I missing something ? > > Thanks, > Moni > > p.s. I'm working on a patch that should solve another issue caused by > PKEY reordering & ipoib behavior and the above issue further > complicates things for me. If true, why is this a problem? -- MST From swise at opengridcomputing.com Tue Feb 27 06:23:30 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 08:23:30 -0600 Subject: [openib-general] HOWTO check ofa_kernel build from your git tree In-Reply-To: <1172502465.21382.44.camel@vladsk-laptop> References: <1172502465.21382.44.camel@vladsk-laptop> Message-ID: <1172586210.11870.16.camel@stevo-desktop> Where are all the kernel src trees on ssh. openfabrics.org? I would like to build against specific trees that are failing with cxgb3... Also: what RH distro ships: linux-2.6.9-22.ELsmp and linux-2.6.9-34.ELsmp Thanks, Steve. On Mon, 2007-02-26 at 17:07 +0200, Vladimir Sokolovsky wrote: > On ssh.openfabrics.org: > Run > env git_url=/home/mst/scm/ofed_1_2_devel.git git_branch=ofed_1_2 \ > CHECK_LOCAL=yes \ > CHECK_KERNEL_ORG=yes \ > CHECK_CROSS=yes /home/vlad/scripts/build_ofa_kernel.sh > From monil at voltaire.com Tue Feb 27 06:29:56 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 16:29:56 +0200 Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering Message-ID: <45E44064.4020407@voltaire.com> This issue was found during partitioning & SM fail over testing. The fix was tested over the weekend with pkey reshuffling, removal and addition every few seconds concurrent with OFED restart. The patch applies on Roland's git tree. Changes from v1: * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike * fixed a bug in device extraction from the work struct * removed some warnings in case they are caused due to missing PKEY as this seems like a valid flow now. SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy --- ipoib.h | 4 +++- ipoib_ib.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- ipoib_main.c | 5 +++-- ipoib_multicast.c | 11 ++++++----- ipoib_verbs.c | 8 +++++++- 5 files changed, 60 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 2594db2..d08ecca 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -205,6 +205,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct flush_restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); int ipoib_ib_dev_up(struct net_device *dev); int ipoib_ib_dev_down(struct net_device *dev, int flush); -int ipoib_ib_dev_stop(struct net_device *dev); +int ipoib_ib_dev_stop(struct net_device *dev, int flush); int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_dev_cleanup(struct net_device *dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f2aa923..b0287c1 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device ret = ipoib_init_qp(dev); if (ret) { - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); + if (ret != -ENOENT) + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); return -1; } ret = ipoib_ib_post_receives(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } ret = ipoib_cm_dev_open(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } @@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi return pending; } -int ipoib_ib_dev_stop(struct net_device *dev) +int ipoib_ib_dev_stop(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; @@ -581,7 +582,8 @@ timeout: /* Wait for all AHs to be reaped */ set_bit(IPOIB_STOP_REAPER, &priv->flags); cancel_delayed_work(&priv->ah_reap_task); - flush_workqueue(ipoib_workqueue); + if (flush) + flush_workqueue(ipoib_workqueue); begin = jiffies; @@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp) { - struct ipoib_dev_priv *cpriv, *priv = - container_of(work, struct ipoib_dev_priv, flush_task); + struct ipoib_dev_priv *cpriv; struct net_device *dev = priv->dev; - if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) { + /* + * ipoib_ib_dev_stop() below may not find the PKey and leave the + * IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp + * flag on is Ok. + */ + if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && !restart_qp) { ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n"); return; } @@ -641,6 +647,13 @@ void ipoib_ib_dev_flush(struct work_stru ipoib_dbg(priv, "flushing\n"); ipoib_ib_dev_down(dev, 0); + + if (restart_qp) { + ipoib_dbg(priv, "restarting the device QP\n"); + if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) + ipoib_ib_dev_stop(dev, 0); + ipoib_ib_dev_open(dev); + } /* * The device could have been brought down between the start and when @@ -655,11 +668,29 @@ void ipoib_ib_dev_flush(struct work_stru /* Flush any child interfaces too */ list_for_each_entry(cpriv, &priv->child_intfs, list) - ipoib_ib_dev_flush(&cpriv->flush_task); + __ipoib_ib_dev_flush(cpriv, restart_qp); mutex_unlock(&priv->vlan_mutex); } +void ipoib_ib_dev_flush(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, flush_task); + /* We only restart the QP in case of PKEY change event */ + ipoib_dbg(priv, "Flushing %s\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 0); +} + +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, flush_restart_qp_task); + /* We only restart the QP in case of PKEY change event */ + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 1); +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 18d27fd..2eab846 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -107,7 +107,7 @@ int ipoib_open(struct net_device *dev) return -EINVAL; if (ipoib_ib_dev_up(dev)) { - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -EINVAL; } @@ -152,7 +152,7 @@ static int ipoib_stop(struct net_device flush_workqueue(ipoib_workqueue); ipoib_ib_dev_down(dev, 1); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { struct ipoib_dev_priv *cpriv; @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); + INIT_WORK(&priv->flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index b303ce6..27d6fd4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid); if (ret < 0) { - ipoib_warn(priv, "couldn't attach QP to multicast group " - IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); + if (ret != -ENXIO) /* No PKEY found */ + ipoib_warn(priv, "couldn't attach QP to multicast group " + IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(mcast->mcmember.mgid)); clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); return ret; @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s status = ipoib_mcast_join_finish(mcast, &multicast->rec); if (status) { - if (mcast->logcount++ < 20) + if (mcast->logcount++ < 20 && status != -ENXIO) ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); - } else { + } else if (status != -ENXIO) { ipoib_warn(priv, "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 3cb551b..d0384ea 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) { clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); ret = -ENXIO; + ipoib_dbg(priv, "PKEY %X not found\n", priv->pkey); goto out; } + ipoib_dbg(priv, "PKEY %X found at index %d\n", priv->pkey, pkey_index); set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); /* set correct QKey for QP */ @@ -105,9 +107,11 @@ int ipoib_init_qp(struct net_device *dev */ ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index); if (ret) { + ipoib_dbg(priv, "PKEY %X not found.\n", priv->pkey); clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); return ret; } + ipoib_dbg(priv, "PKEY %X found at index %d.\n", priv->pkey, pkey_index); set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); qp_attr.qp_state = IB_QPS_INIT; @@ -260,12 +264,14 @@ void ipoib_event(struct ib_event_handler container_of(handler, struct ipoib_dev_priv, event_handler); if (record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || record->event == IB_EVENT_PORT_ACTIVE || record->event == IB_EVENT_LID_CHANGE || record->event == IB_EVENT_SM_CHANGE || record->event == IB_EVENT_CLIENT_REREGISTER) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); + } else if (record->event == IB_EVENT_PKEY_CHANGE) { + ipoib_dbg(priv, "PKEY change event\n"); + queue_work(ipoib_workqueue, &priv->flush_restart_qp_task); } } From mst at mellanox.co.il Tue Feb 27 06:51:14 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 16:51:14 +0200 Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45E41C13.8090300@voltaire.com> References: <45E313D2.70909@voltaire.com> <20070227060245.GI12919@mellanox.co.il> <45E41C13.8090300@voltaire.com> Message-ID: <20070227145114.GC4437@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > > > Thanks for the comments > > >> To fix it, this patch adds a dev field to struct ipoib_neigh which is used > >> instead of the struct neighbour dev one. > > > > It seems that in this design, if multiple ipoib interfaces are present, we might > > get an skb such that skb->dev will be different from the new dev field in struct > > ipoib_neigh. > > > > It seems that the result will be that the packet will be sent on a wrong interface. > > Right? > > > I don't see how. The field dev in ipoib_neigh doesn't take part in interface selection. > As I see it, skb travels this path: > 1. Passed to bond_dev->hard_start_xmit > 2. bond_dev->hard_start_xmit chooses the current active interface, changes skb->dev and enqueues it back for xmittig. ipoib_neigh ah field includes struct ib_ah *. This selects important parameters which depend on both packet source and destination interfaces. I think the right thing might be to compare ipoib_neigh dev and skb->dev, and destroy ipoib_neigh if these do not match. > >> In addition, if an IPoIB device is removed before bonding is unloaded it may > >> cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB > >> device no longer exist. This is why a neighbour cleanup is required during device > >> cleanup. This cleanup scans the arp cache and the ndisc cache to find there > >> neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is > >> unloaded, the neighbour destructor must be set to NULL because the neighbour function is in > >> ib_ipoib. > >> For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is. > > > > I wonder about this: is it really true that any allocated neighbour is always in > > either arp_tbl or nd_tbl? For example, could some code have called neigh_hold > > and retained a neighbour that is not in either one of these tables? > > > I got the assumption about neighbours living in one of these 2 tables from > observation and code reading. I preferred that that on keeping track of all > ipoib_neighs and putting them in a list. However, I could do that instead of > neigh_table scanning. Do you think it's better? If some neighbours are not on any tables, it seems using our own lists (e.g. lists we have in ipoib_path) is the only option, no? > For the example... I didn't > understand it. Could you please explain? grep for neigh_hold. neighbour is only destroyed when ref count goes to 0. If some code does neigh_hold, it seems neighbour could be removed from table but destructor not yet called. > >> During my tests I found that when running > >> > >> 1. modprobe -r ib_mthca (to delete IPoIB interfaces) > >> 2. ping somewhere on the subnet of bond0 > >> > >> I get this stack dump (which ends with kernel death) > >> [] skb_under_panic+0x5c/0x60 > >> [] :ib_ipoib:ipoib_hard_header+0xa6/0xc0 > >> [] arp_create+0x120/0x226 > >> [] arp_send+0x25/0x3b > >> [] arp_solicit+0x186/0x195 > >> [] neigh_timer_handler+0x2b5/0x309 > >> [] neigh_timer_handler+0x0/0x309 > >> [] run_timer_softirq+0x130/0x19e > >> [] __do_softirq+0x55/0xc3 > >> [] call_softirq+0x1c/0x28 > >> [] do_softirq+0x2c/0x7d > >> [] smp_apic_timer_interrupt+0x57/0x6a > >> [] mwait_idle+0x0/0x45 > >> [] apic_timer_interrupt+0x66/0x70 > >> [] mwait_idle+0x42/0x45 > >> [] cpu_idle+0x8b/0xae > >> [] start_secondary+0x47f/0x48f > >> > >> The only way I found to avoid this (for now) is to check skb headroom in > >> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB > >> operation and it seems to solve my problem. However, I would be happy to hear what > >> others think of this last issue. > > > > As I said, this seems to indicate a problem in the bonding code. > > But what will happen after you error out in ipoib_hard_header? > > Is the packet dropped? What might break as a result? > > > I will check the hard_header_len issue in the bonding code more carefully. > From first look it seems that bonding does borrow the hard_header_len. So where does a shorter message come from? > Also, > my checks show that it is safe to return with error from > hard_header(). For example, in neigh_connected_output: > > err = dev->hard_header(skb, dev, ntohs(skb->protocol), > neigh->ha, NULL, skb->len); > read_unlock_bh(&neigh->lock); > if (err >= 0) > err = neigh->ops->queue_xmit(skb); > else { > err = -EINVAL; > kfree_skb(skb); > > >> I would really appreciate comments. > >> > >> thanks > >> > >> -MoniS > > -- MST From mst at mellanox.co.il Tue Feb 27 07:12:12 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 17:12:12 +0200 Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <45E44064.4020407@voltaire.com> References: <45E44064.4020407@voltaire.com> Message-ID: <20070227151212.GD4437@mellanox.co.il> I just gave this a cursory glance. A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey change? > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index f2aa923..b0287c1 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device > > ret = ipoib_init_qp(dev); > if (ret) { > - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > + if (ret != -ENOENT) > + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > return -1; > } What's the reason for this? > @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic > INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); > INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); > INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); > + INIT_WORK(&priv->flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp); > INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); > INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); > } Shorter name? > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > index b303ce6..27d6fd4 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > &mcast->mcmember.mgid); > if (ret < 0) { > - ipoib_warn(priv, "couldn't attach QP to multicast group " > - IPOIB_GID_FMT "\n", > - IPOIB_GID_ARG(mcast->mcmember.mgid)); > + if (ret != -ENXIO) /* No PKEY found */ > + ipoib_warn(priv, "couldn't attach QP to multicast group " > + IPOIB_GID_FMT "\n", > + IPOIB_GID_ARG(mcast->mcmember.mgid)); > > clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); > return ret; > @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s > status = ipoib_mcast_join_finish(mcast, &multicast->rec); > > if (status) { > - if (mcast->logcount++ < 20) > + if (mcast->logcount++ < 20 && status != -ENXIO) > ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " > IPOIB_GID_FMT ", status %d\n", > IPOIB_GID_ARG(mcast->mcmember.mgid), status); > @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int > ", status %d\n", > IPOIB_GID_ARG(mcast->mcmember.mgid), > status); > - } else { > + } else if (status != -ENXIO) { > ipoib_warn(priv, "multicast join failed for " > IPOIB_GID_FMT ", status %d\n", > IPOIB_GID_ARG(mcast->mcmember.mgid), > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > index 3cb551b..d0384ea 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device > if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) { > clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > ret = -ENXIO; > + ipoib_dbg(priv, "PKEY %X not found\n", priv->pkey); > goto out; > } > + ipoib_dbg(priv, "PKEY %X found at index %d\n", priv->pkey, pkey_index); > set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > > /* set correct QKey for QP */ Make it PKey or pkey: no text in uppercase in log messages please. > @@ -105,9 +107,11 @@ int ipoib_init_qp(struct net_device *dev > */ > ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index); > if (ret) { > + ipoib_dbg(priv, "PKEY %X not found.\n", priv->pkey); > clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > return ret; > } > + ipoib_dbg(priv, "PKEY %X found at index %d.\n", priv->pkey, pkey_index); > set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > > qp_attr.qp_state = IB_QPS_INIT; going a bit overboard on the number of debug messages here. > @@ -260,12 +264,14 @@ void ipoib_event(struct ib_event_handler > container_of(handler, struct ipoib_dev_priv, event_handler); > > if (record->event == IB_EVENT_PORT_ERR || > - record->event == IB_EVENT_PKEY_CHANGE || > record->event == IB_EVENT_PORT_ACTIVE || > record->event == IB_EVENT_LID_CHANGE || > record->event == IB_EVENT_SM_CHANGE || > record->event == IB_EVENT_CLIENT_REREGISTER) { > ipoib_dbg(priv, "Port state change event\n"); > queue_work(ipoib_workqueue, &priv->flush_task); > + } else if (record->event == IB_EVENT_PKEY_CHANGE) { > + ipoib_dbg(priv, "PKEY change event\n"); > + queue_work(ipoib_workqueue, &priv->flush_restart_qp_task); > } > } -- MST From rdreier at cisco.com Tue Feb 27 07:30:44 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Feb 2007 07:30:44 -0800 Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <20070227151212.GD4437@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 27 Feb 2007 17:12:12 +0200") References: <45E44064.4020407@voltaire.com> <20070227151212.GD4437@mellanox.co.il> Message-ID: > I just gave this a cursory glance. I haven't really read it except to think "why is this so complicated"? > A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey > change? Changing the P_Key index is not allowed for RTS->RTS. You would have to modify the QP RTS->SQD, wait for the SQ to drain, then modify the P_Key index with SQD->SQD, and finally go SQD->RTS. - R. From mst at mellanox.co.il Tue Feb 27 07:36:10 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 17:36:10 +0200 Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: References: <20070227151212.GD4437@mellanox.co.il> Message-ID: <20070227153610.GI4437@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering > > > I just gave this a cursory glance. > > I haven't really read it except to think "why is this so complicated"? > > > A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey > > change? > > Changing the P_Key index is not allowed for RTS->RTS. You would have > to modify the QP RTS->SQD, wait for the SQ to drain, then modify the > P_Key index with SQD->SQD, and finally go SQD->RTS. True, I misread the spec. -- MST From rdreier at cisco.com Tue Feb 27 07:38:08 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Feb 2007 07:38:08 -0800 Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> (Moni Levy's message of "Tue, 27 Feb 2007 15:02:50 +0200") References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> Message-ID: > I did a short code review of the ipoib code concentrating on > partitioning support and I mentioned that the asynchronous events > handler in the ipoib code does not take the port number reported in > the event record into consideration. The effect of that is that all of > the ib# devices related to that specific HCA are flushed when it seems > to me that only the relevant port one should be. Is that done on > purpose, or am I missing something ? I don't think there's any particular reason the code is that way except for the oversight never being corrected. But it looks trivial to fix, like the patch below. Does that look right to you? > p.s. I'm working on a patch that should solve another issue caused by > PKEY reordering & ipoib behavior and the above issue further > complicates things for me. Why not fix the issue first then? commit a27cbe878203076247c1b5287f5ab59ed143b560 Author: Roland Dreier Date: Tue Feb 27 07:37:49 2007 -0800 IPoIB: Only handle async events for one port An asynchronous event carries the port number that the event occurred on, so there's no reason for an IPoIB interface to process an event associated with a different local HCA port. Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 3cb551b..7f3ec20 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, struct ipoib_dev_priv *priv = container_of(handler, struct ipoib_dev_priv, event_handler); - if (record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || - record->event == IB_EVENT_PORT_ACTIVE || - record->event == IB_EVENT_LID_CHANGE || - record->event == IB_EVENT_SM_CHANGE || - record->event == IB_EVENT_CLIENT_REREGISTER) { + if ((record->event == IB_EVENT_PORT_ERR || + record->event == IB_EVENT_PKEY_CHANGE || + record->event == IB_EVENT_PORT_ACTIVE || + record->event == IB_EVENT_LID_CHANGE || + record->event == IB_EVENT_SM_CHANGE || + record->event == IB_EVENT_CLIENT_REREGISTER) && + record->element.port_num == priv->port) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); } From monil at voltaire.com Tue Feb 27 07:44:29 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 17:44:29 +0200 Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: References: <45E44064.4020407@voltaire.com> <20070227151212.GD4437@mellanox.co.il> Message-ID: <6a122cc00702270744u43f55c37r15311ef8ba80f4f9@mail.gmail.com> On 2/27/07, Roland Dreier wrote: > > I just gave this a cursory glance. > > I haven't really read it except to think "why is this so complicated"? Do you refer to that complication of the patch of the issue ? > > > A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey > > change? > > Changing the P_Key index is not allowed for RTS->RTS. You would have > to modify the QP RTS->SQD, wait for the SQ to drain, then modify the > P_Key index with SQD->SQD, and finally go SQD->RTS. Do you think that using that way to solve it will be a significant simplification ? We'll still have to reuse that handling for missed completion that is currently implemented in ipoib_ib_dev_stop and still have additional work element. -- Moni > > - R. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Tue Feb 27 07:44:19 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 17:44:19 +0200 Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> Message-ID: <20070227154419.GJ4437@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [RFC] IB/ipoib: Asynchronous events delivered without port parameter. > > > I did a short code review of the ipoib code concentrating on > > partitioning support and I mentioned that the asynchronous events > > handler in the ipoib code does not take the port number reported in > > the event record into consideration. The effect of that is that all of > > the ib# devices related to that specific HCA are flushed when it seems > > to me that only the relevant port one should be. Is that done on > > purpose, or am I missing something ? > > I don't think there's any particular reason the code is that way > except for the oversight never being corrected. But it looks trivial > to fix, like the patch below. Does that look right to you? > > > p.s. I'm working on a patch that should solve another issue caused by > > PKEY reordering & ipoib behavior and the above issue further > > complicates things for me. > > Why not fix the issue first then? > > commit a27cbe878203076247c1b5287f5ab59ed143b560 > Author: Roland Dreier > Date: Tue Feb 27 07:37:49 2007 -0800 > > IPoIB: Only handle async events for one port > > An asynchronous event carries the port number that the event occurred > on, so there's no reason for an IPoIB interface to process an event > associated with a different local HCA port. > > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > index 3cb551b..7f3ec20 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, > struct ipoib_dev_priv *priv = > container_of(handler, struct ipoib_dev_priv, event_handler); > > - if (record->event == IB_EVENT_PORT_ERR || > - record->event == IB_EVENT_PKEY_CHANGE || > - record->event == IB_EVENT_PORT_ACTIVE || > - record->event == IB_EVENT_LID_CHANGE || > - record->event == IB_EVENT_SM_CHANGE || > - record->event == IB_EVENT_CLIENT_REREGISTER) { > + if ((record->event == IB_EVENT_PORT_ERR || > + record->event == IB_EVENT_PKEY_CHANGE || > + record->event == IB_EVENT_PORT_ACTIVE || > + record->event == IB_EVENT_LID_CHANGE || > + record->event == IB_EVENT_SM_CHANGE || > + record->event == IB_EVENT_CLIENT_REREGISTER) && > + record->element.port_num == priv->port) { > ipoib_dbg(priv, "Port state change event\n"); > queue_work(ipoib_workqueue, &priv->flush_task); > } Looks good. -- MST From monil at voltaire.com Tue Feb 27 07:47:59 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 17:47:59 +0200 Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> Message-ID: <6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com> On 2/27/07, Roland Dreier wrote: > > I did a short code review of the ipoib code concentrating on > > partitioning support and I mentioned that the asynchronous events > > handler in the ipoib code does not take the port number reported in > > the event record into consideration. The effect of that is that all of > > the ib# devices related to that specific HCA are flushed when it seems > > to me that only the relevant port one should be. Is that done on > > purpose, or am I missing something ? > > I don't think there's any particular reason the code is that way > except for the oversight never being corrected. But it looks trivial > to fix, like the patch below. Does that look right to you? > > > p.s. I'm working on a patch that should solve another issue caused by > > PKEY reordering & ipoib behavior and the above issue further > > complicates things for me. > > Why not fix the issue first then? > > commit a27cbe878203076247c1b5287f5ab59ed143b560 > Author: Roland Dreier > Date: Tue Feb 27 07:37:49 2007 -0800 > > IPoIB: Only handle async events for one port > > An asynchronous event carries the port number that the event occurred > on, so there's no reason for an IPoIB interface to process an event > associated with a different local HCA port. > > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > index 3cb551b..7f3ec20 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, > struct ipoib_dev_priv *priv = > container_of(handler, struct ipoib_dev_priv, event_handler); > > - if (record->event == IB_EVENT_PORT_ERR || > - record->event == IB_EVENT_PKEY_CHANGE || > - record->event == IB_EVENT_PORT_ACTIVE || > - record->event == IB_EVENT_LID_CHANGE || > - record->event == IB_EVENT_SM_CHANGE || > - record->event == IB_EVENT_CLIENT_REREGISTER) { > + if ((record->event == IB_EVENT_PORT_ERR || > + record->event == IB_EVENT_PKEY_CHANGE || > + record->event == IB_EVENT_PORT_ACTIVE || > + record->event == IB_EVENT_LID_CHANGE || > + record->event == IB_EVENT_SM_CHANGE || > + record->event == IB_EVENT_CLIENT_REREGISTER) && > + record->element.port_num == priv->port) { > ipoib_dbg(priv, "Port state change event\n"); > queue_work(ipoib_workqueue, &priv->flush_task); > } > That's exactly what I intended to post. --Moni From rdreier at cisco.com Tue Feb 27 07:47:10 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Feb 2007 07:47:10 -0800 Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <6a122cc00702270744u43f55c37r15311ef8ba80f4f9@mail.gmail.com> (Moni Levy's message of "Tue, 27 Feb 2007 17:44:29 +0200") References: <45E44064.4020407@voltaire.com> <20070227151212.GD4437@mellanox.co.il> <6a122cc00702270744u43f55c37r15311ef8ba80f4f9@mail.gmail.com> Message-ID: > > I haven't really read it except to think "why is this so complicated"? > > Do you refer to that complication of the patch of the issue ? the patch. > > Changing the P_Key index is not allowed for RTS->RTS. You would have > > to modify the QP RTS->SQD, wait for the SQ to drain, then modify the > > P_Key index with SQD->SQD, and finally go SQD->RTS. > > Do you think that using that way to solve it will be a significant > simplification ? We'll still have to reuse that handling for missed > completion that is currently implemented in ipoib_ib_dev_stop and > still have additional work element. no, I don't think SQD is really useful in practice. From vlad at dev.mellanox.co.il Tue Feb 27 07:48:03 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 27 Feb 2007 17:48:03 +0200 Subject: [openib-general] HOWTO check ofa_kernel build from your git tree In-Reply-To: <1172586210.11870.16.camel@stevo-desktop> References: <1172502465.21382.44.camel@vladsk-laptop> <1172586210.11870.16.camel@stevo-desktop> Message-ID: <1172591283.21382.84.camel@vladsk-laptop> On Tue, 2007-02-27 at 08:23 -0600, Steve Wise wrote: > Where are all the kernel src trees on ssh. openfabrics.org? > > I would like to build against specific trees that are failing with > cxgb3... > /home/vlad/kernel.org// > Also: > > what RH distro ships: > > linux-2.6.9-22.ELsmp > RHEL4.0U2 > > and > > linux-2.6.9-34.ELsmp > RHEL4.0U3 > > Thanks, > > Steve. > > > > On Mon, 2007-02-26 at 17:07 +0200, Vladimir Sokolovsky wrote: > > On ssh.openfabrics.org: > > Run > > env git_url=/home/mst/scm/ofed_1_2_devel.git git_branch=ofed_1_2 \ > > CHECK_LOCAL=yes \ > > CHECK_KERNEL_ORG=yes \ > > CHECK_CROSS=yes /home/vlad/scripts/build_ofa_kernel.sh > > From monil at voltaire.com Tue Feb 27 07:52:09 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 17:52:09 +0200 Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: References: <45E44064.4020407@voltaire.com> <20070227151212.GD4437@mellanox.co.il> <6a122cc00702270744u43f55c37r15311ef8ba80f4f9@mail.gmail.com> Message-ID: <6a122cc00702270752i391a9e90ubf70569993f1f6d1@mail.gmail.com> On 2/27/07, Roland Dreier wrote: > > > I haven't really read it except to think "why is this so complicated"? > > > > Do you refer to that complication of the patch of the issue ? > > the patch. Please advise and I'll change it. > > > > Changing the P_Key index is not allowed for RTS->RTS. You would have > > > to modify the QP RTS->SQD, wait for the SQ to drain, then modify the > > > P_Key index with SQD->SQD, and finally go SQD->RTS. > > > > Do you think that using that way to solve it will be a significant > > simplification ? We'll still have to reuse that handling for missed > > completion that is currently implemented in ipoib_ib_dev_stop and > > still have additional work element. > > no, I don't think SQD is really useful in practice. > From swise at opengridcomputing.com Tue Feb 27 07:59:53 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 09:59:53 -0600 Subject: [openib-general] [PATCH 0/6] ofed_1_2: cxgb3 bug fixes Message-ID: <20070227155953.21615.96154.stgit@dell3.ogc.int> Hey Vlad, These fixes need to be pulled into ofed_1_2 for the Chelsio Ethernet driver. You can pull them directly from my ofa git tree: git://staging.openfabrics.org/~swise/ofed_1_2 cxgb3_fixes Thanks, Steve. From swise at opengridcomputing.com Tue Feb 27 07:59:55 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 09:59:55 -0600 Subject: [openib-general] [PATCH 1/6] sysfs attributes are now managed per port, no longer per adapter. In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int> References: <20070227155953.21615.96154.stgit@dell3.ogc.int> Message-ID: <20070227155955.21615.20784.stgit@dell3.ogc.int> sysfs attributes are now managed per port, no longer per adapter. Signed-off-by: Divy Le Ray --- drivers/net/cxgb3/cxgb3_main.c | 21 ++++++++++++--------- 1 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index dfa035a..638b0ab 100755 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -435,26 +435,24 @@ static int setup_sge_qsets(struct adapte } static ssize_t attr_show(struct class_device *cd, char *buf, - ssize_t(*format) (struct adapter *, char *)) + ssize_t(*format) (struct net_device *, char *)) { ssize_t len; - struct adapter *adap = to_net_dev(cd)->priv; /* Synchronize with ioctls that may shut down the device */ rtnl_lock(); - len = (*format) (adap, buf); + len = (*format) (to_net_dev(cd), buf); rtnl_unlock(); return len; } static ssize_t attr_store(struct class_device *cd, const char *buf, size_t len, - ssize_t(*set) (struct adapter *, unsigned int), + ssize_t(*set) (struct net_device *, unsigned int), unsigned int min_val, unsigned int max_val) { char *endp; ssize_t ret; unsigned int val; - struct adapter *adap = to_net_dev(cd)->priv; if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -464,7 +462,7 @@ static ssize_t attr_store(struct class_d return -EINVAL; rtnl_lock(); - ret = (*set) (adap, val); + ret = (*set) (to_net_dev(cd), val); if (!ret) ret = len; rtnl_unlock(); @@ -472,8 +470,9 @@ static ssize_t attr_store(struct class_d } #define CXGB3_SHOW(name, val_expr) \ -static ssize_t format_##name(struct adapter *adap, char *buf) \ +static ssize_t format_##name(struct net_device *dev, char *buf) \ { \ + struct adapter *adap = dev->priv; \ return sprintf(buf, "%u\n", val_expr); \ } \ static ssize_t show_##name(struct class_device *cd, char *buf) \ @@ -481,8 +480,10 @@ static ssize_t show_##name(struct class_ return attr_show(cd, buf, format_##name); \ } -static ssize_t set_nfilters(struct adapter *adap, unsigned int val) +static ssize_t set_nfilters(struct net_device *dev, unsigned int val) { + struct adapter *adap = dev->priv; + if (adap->flags & FULL_INIT_DONE) return -EBUSY; if (val && adap->params.rev == 0) @@ -499,8 +500,10 @@ static ssize_t store_nfilters(struct cla return attr_store(cd, buf, len, set_nfilters, 0, ~0); } -static ssize_t set_nservers(struct adapter *adap, unsigned int val) +static ssize_t set_nservers(struct net_device *dev, unsigned int val) { + struct adapter *adap = dev->priv; + if (adap->flags & FULL_INIT_DONE) return -EBUSY; if (val > t3_mc5_size(&adap->mc5) - adap->params.mc5.nfilters) From swise at opengridcomputing.com Tue Feb 27 07:59:57 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 09:59:57 -0600 Subject: [openib-general] [PATCH 2/6] Clean up some private ioctls. In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int> References: <20070227155953.21615.96154.stgit@dell3.ogc.int> Message-ID: <20070227155957.21615.98689.stgit@dell3.ogc.int> Clean up some private ioctls. Signed-off-by: Divy Le Ray --- drivers/net/cxgb3/cxgb3_ioctl.h | 33 +++++++++------------------ drivers/net/cxgb3/cxgb3_main.c | 48 +++------------------------------------ 2 files changed, 15 insertions(+), 66 deletions(-) diff --git a/drivers/net/cxgb3/cxgb3_ioctl.h b/drivers/net/cxgb3/cxgb3_ioctl.h old mode 100755 new mode 100644 index a942818..0a82fcd --- a/drivers/net/cxgb3/cxgb3_ioctl.h +++ b/drivers/net/cxgb3/cxgb3_ioctl.h @@ -36,28 +36,17 @@ #define __CHIOCTL_H__ * Ioctl commands specific to this driver. */ enum { - CHELSIO_SETREG = 1024, - CHELSIO_GETREG, - CHELSIO_SETTPI, - CHELSIO_GETTPI, - CHELSIO_GETMTUTAB, - CHELSIO_SETMTUTAB, - CHELSIO_GETMTU, - CHELSIO_SET_PM, - CHELSIO_GET_PM, - CHELSIO_GET_TCAM, - CHELSIO_SET_TCAM, - CHELSIO_GET_TCB, - CHELSIO_GET_MEM, - CHELSIO_LOAD_FW, - CHELSIO_GET_PROTO, - CHELSIO_SET_PROTO, - CHELSIO_SET_TRACE_FILTER, - CHELSIO_SET_QSET_PARAMS, - CHELSIO_GET_QSET_PARAMS, - CHELSIO_SET_QSET_NUM, - CHELSIO_GET_QSET_NUM, - CHELSIO_SET_PKTSCHED, + CHELSIO_GETMTUTAB = 1029, + CHELSIO_SETMTUTAB = 1030, + CHELSIO_SET_PM = 1032, + CHELSIO_GET_PM = 1033, + CHELSIO_GET_MEM = 1038, + CHELSIO_LOAD_FW = 1041, + CHELSIO_SET_TRACE_FILTER = 1044, + CHELSIO_SET_QSET_PARAMS = 1045, + CHELSIO_GET_QSET_PARAMS = 1046, + CHELSIO_SET_QSET_NUM = 1047, + CHELSIO_GET_QSET_NUM = 1048, }; struct ch_reg { diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c old mode 100755 new mode 100644 index 638b0ab..0e84c4e --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -1547,32 +1547,6 @@ static int cxgb_extension_ioctl(struct n return -EFAULT; switch (cmd) { - case CHELSIO_SETREG:{ - struct ch_reg edata; - - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - if (copy_from_user(&edata, useraddr, sizeof(edata))) - return -EFAULT; - if ((edata.addr & 3) != 0 - || edata.addr >= adapter->mmio_len) - return -EINVAL; - writel(edata.val, adapter->regs + edata.addr); - break; - } - case CHELSIO_GETREG:{ - struct ch_reg edata; - - if (copy_from_user(&edata, useraddr, sizeof(edata))) - return -EFAULT; - if ((edata.addr & 3) != 0 - || edata.addr >= adapter->mmio_len) - return -EINVAL; - edata.val = readl(adapter->regs + edata.addr); - if (copy_to_user(useraddr, &edata, sizeof(edata))) - return -EFAULT; - break; - } case CHELSIO_SET_QSET_PARAMS:{ int i; struct qset_params *q; @@ -1836,10 +1810,10 @@ static int cxgb_extension_ioctl(struct n return -EINVAL; /* - * Version scheme: - * bits 0..9: chip version - * bits 10..15: chip revision - */ + * Version scheme: + * bits 0..9: chip version + * bits 10..15: chip revision + */ t.version = 3 | (adapter->params.rev << 10); if (copy_to_user(useraddr, &t, sizeof(t))) return -EFAULT; @@ -1888,20 +1862,6 @@ static int cxgb_extension_ioctl(struct n t.trace_rx); break; } - case CHELSIO_SET_PKTSCHED:{ - struct ch_pktsched_params p; - - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - if (!adapter->open_device_map) - return -EAGAIN; /* uP and SGE must be running */ - if (copy_from_user(&p, useraddr, sizeof(p))) - return -EFAULT; - send_pktsched_cmd(adapter, p.sched, p.idx, p.min, p.max, - p.binding); - break; - - } default: return -EOPNOTSUPP; } From swise at opengridcomputing.com Tue Feb 27 07:59:59 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 09:59:59 -0600 Subject: [openib-general] [PATCH 3/6] Update FW version to 3.2 In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int> References: <20070227155953.21615.96154.stgit@dell3.ogc.int> Message-ID: <20070227155959.21615.25648.stgit@dell3.ogc.int> Update FW version to 3.2 Signed-off-by: Steve Wise --- drivers/net/cxgb3/t3_hw.c | 6 ++++-- drivers/net/cxgb3/version.h | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c old mode 100755 new mode 100644 index 365a7f5..eaa7a2e --- a/drivers/net/cxgb3/t3_hw.c +++ b/drivers/net/cxgb3/t3_hw.c @@ -884,11 +884,13 @@ int t3_check_fw_version(struct adapter * major = G_FW_VERSION_MAJOR(vers); minor = G_FW_VERSION_MINOR(vers); - if (type == FW_VERSION_T3 && major == 3 && minor == 1) + if (type == FW_VERSION_T3 && major == FW_VERSION_MAJOR && + minor == FW_VERSION_MINOR) return 0; CH_ERR(adapter, "found wrong FW version(%u.%u), " - "driver needs version 3.1\n", major, minor); + "driver needs version %u.%u\n", major, minor, + FW_VERSION_MAJOR, FW_VERSION_MINOR); return -EINVAL; } diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h old mode 100755 new mode 100644 index 2b67dd5..782a6cf --- a/drivers/net/cxgb3/version.h +++ b/drivers/net/cxgb3/version.h @@ -36,4 +36,6 @@ #define DRV_DESC "Chelsio T3 Network Dri #define DRV_NAME "cxgb3" /* Driver version */ #define DRV_VERSION "1.0" +#define FW_VERSION_MAJOR 3 +#define FW_VERSION_MINOR 2 #endif /* __CHELSIO_VERSION_H */ From swise at opengridcomputing.com Tue Feb 27 08:00:01 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 10:00:01 -0600 Subject: [openib-general] [PATCH 4/6] Offload packets may be DMAed long after their SGE Tx descriptors are done In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int> References: <20070227155953.21615.96154.stgit@dell3.ogc.int> Message-ID: <20070227160001.21615.66513.stgit@dell3.ogc.int> Offload packets may be DMAed long after their SGE Tx descriptors are done so they must remain mapped until they are freed rather than until their descriptors are freed. Unmap such packets through an skb destructor. Signed-off-by: Divy Le Ray --- drivers/net/cxgb3/sge.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 61 insertions(+), 2 deletions(-) diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c old mode 100755 new mode 100644 index 3f2cf8a..822a598 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -105,6 +105,15 @@ struct unmap_info { /* packet unmapping }; /* + * Holds unmapping information for Tx packets that need deferred unmapping. + * This structure lives at skb->head and must be allocated by callers. + */ +struct deferred_unmap_info { + struct pci_dev *pdev; + dma_addr_t addr[MAX_SKB_FRAGS + 1]; +}; + +/* * Maps a number of flits to the number of Tx descriptors that can hold them. * The formula is * @@ -252,10 +261,13 @@ static void free_tx_desc(struct adapter struct pci_dev *pdev = adapter->pdev; unsigned int cidx = q->cidx; + const int need_unmap = need_skb_unmap() && + q->cntxt_id >= FW_TUNNEL_SGEEC_START; + d = &q->sdesc[cidx]; while (n--) { if (d->skb) { /* an SGL is present */ - if (need_skb_unmap()) + if (need_unmap) unmap_skb(d->skb, q, cidx, pdev); if (d->skb->priority == cidx) kfree_skb(d->skb); @@ -1227,6 +1239,50 @@ int t3_mgmt_tx(struct adapter *adap, str } /** + * deferred_unmap_destructor - unmap a packet when it is freed + * @skb: the packet + * + * This is the packet destructor used for Tx packets that need to remain + * mapped until they are freed rather than until their Tx descriptors are + * freed. + */ +static void deferred_unmap_destructor(struct sk_buff *skb) +{ + int i; + const dma_addr_t *p; + const struct skb_shared_info *si; + const struct deferred_unmap_info *dui; + const struct unmap_info *ui = (struct unmap_info *)skb->cb; + + dui = (struct deferred_unmap_info *)skb->head; + p = dui->addr; + + if (ui->len) + pci_unmap_single(dui->pdev, *p++, ui->len, PCI_DMA_TODEVICE); + + si = skb_shinfo(skb); + for (i = 0; i < si->nr_frags; i++) + pci_unmap_page(dui->pdev, *p++, si->frags[i].size, + PCI_DMA_TODEVICE); +} + +static void setup_deferred_unmapping(struct sk_buff *skb, struct pci_dev *pdev, + const struct sg_ent *sgl, int sgl_flits) +{ + dma_addr_t *p; + struct deferred_unmap_info *dui; + + dui = (struct deferred_unmap_info *)skb->head; + dui->pdev = pdev; + for (p = dui->addr; sgl_flits >= 3; sgl++, sgl_flits -= 3) { + *p++ = be64_to_cpu(sgl->addr[0]); + *p++ = be64_to_cpu(sgl->addr[1]); + } + if (sgl_flits) + *p = be64_to_cpu(sgl->addr[0]); +} + +/** * write_ofld_wr - write an offload work request * @adap: the adapter * @skb: the packet to send @@ -1262,8 +1318,11 @@ static void write_ofld_wr(struct adapter sgp = ndesc == 1 ? (struct sg_ent *)&d->flit[flits] : sgl; sgl_flits = make_sgl(skb, sgp, skb->h.raw, skb->tail - skb->h.raw, adap->pdev); - if (need_skb_unmap()) + if (need_skb_unmap()) { + setup_deferred_unmapping(skb, adap->pdev, sgp, sgl_flits); + skb->destructor = deferred_unmap_destructor; ((struct unmap_info *)skb->cb)->len = skb->tail - skb->h.raw; + } write_wr_hdr_sgl(ndesc, skb, d, pidx, q, sgl, flits, sgl_flits, gen, from->wr_hi, from->wr_lo); From swise at opengridcomputing.com Tue Feb 27 08:00:04 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 10:00:04 -0600 Subject: [openib-general] [PATCH 5/6] Improve the traffic recovery after the HW ran out of response queue entries. In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int> References: <20070227155953.21615.96154.stgit@dell3.ogc.int> Message-ID: <20070227160003.21615.34378.stgit@dell3.ogc.int> Improve the traffic recovery after the HW ran out of response queue entries. Signed-off-by: Divy Le Ray --- drivers/net/cxgb3/adapter.h | 2 ++ drivers/net/cxgb3/sge.c | 15 ++++++++++++++- 2 files changed, 16 insertions(+), 1 deletions(-) diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h old mode 100755 new mode 100644 index 5c97a64..01b99b9 --- a/drivers/net/cxgb3/adapter.h +++ b/drivers/net/cxgb3/adapter.h @@ -121,6 +121,8 @@ struct sge_rspq { /* state for an SGE r unsigned long empty; /* # of times queue ran out of credits */ unsigned long nomem; /* # of responses deferred due to no mem */ unsigned long unhandled_irqs; /* # of spurious intrs */ + unsigned long starved; + unsigned long restarted; }; struct tx_desc; diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c index 822a598..4ff0ab6 100644 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -2376,13 +2376,26 @@ static void sge_timer_cb(unsigned long d spin_unlock(&qs->txq[TXQ_OFLD].lock); } lock = (adap->flags & USING_MSIX) ? &qs->rspq.lock : - &adap->sge.qs[0].rspq.lock; + &adap->sge.qs[0].rspq.lock; if (spin_trylock_irq(lock)) { if (!napi_is_scheduled(qs->netdev)) { + u32 status = t3_read_reg(adap, A_SG_RSPQ_FL_STATUS); + if (qs->fl[0].credits < qs->fl[0].size) __refill_fl(adap, &qs->fl[0]); if (qs->fl[1].credits < qs->fl[1].size) __refill_fl(adap, &qs->fl[1]); + + if (status & (1 << qs->rspq.cntxt_id)) { + qs->rspq.starved++; + if (qs->rspq.credits) { + refill_rspq(adap, &qs->rspq, 1); + qs->rspq.credits--; + qs->rspq.restarted++; + t3_write_reg(adap, A_SG_RSPQ_FL_STATUS, + 1 << qs->rspq.cntxt_id); + } + } } spin_unlock_irq(lock); } From swise at opengridcomputing.com Tue Feb 27 08:00:06 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 10:00:06 -0600 Subject: [openib-general] [PATCH 6/6] Populate Rx free list with pages. In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int> References: <20070227155953.21615.96154.stgit@dell3.ogc.int> Message-ID: <20070227160006.21615.53181.stgit@dell3.ogc.int> Populate Rx free list with pages. Signed-off-by: Divy Le Ray --- drivers/net/cxgb3/adapter.h | 9 + drivers/net/cxgb3/sge.c | 318 +++++++++++++++++++++++++++++++------------ 2 files changed, 235 insertions(+), 92 deletions(-) diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h index 01b99b9..80c3d8f 100644 --- a/drivers/net/cxgb3/adapter.h +++ b/drivers/net/cxgb3/adapter.h @@ -74,6 +74,11 @@ enum { /* adapter flags */ struct rx_desc; struct rx_sw_desc; +struct sge_fl_page { + struct skb_frag_struct frag; + unsigned char *va; +}; + struct sge_fl { /* SGE per free-buffer list state */ unsigned int buf_size; /* size of each Rx buffer */ unsigned int credits; /* # of available Rx buffers */ @@ -81,11 +86,13 @@ struct sge_fl { /* SGE per free-buffer unsigned int cidx; /* consumer index */ unsigned int pidx; /* producer index */ unsigned int gen; /* free list generation */ + unsigned int cntxt_id; /* SGE context id for the free list */ + struct sge_fl_page page; struct rx_desc *desc; /* address of HW Rx descriptor ring */ struct rx_sw_desc *sdesc; /* address of SW Rx descriptor ring */ dma_addr_t phys_addr; /* physical address of HW ring start */ - unsigned int cntxt_id; /* SGE context id for the free list */ unsigned long empty; /* # of times queue ran out of buffers */ + unsigned long alloc_failed; /* # of times buffer allocation failed */ }; /* diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c index 4ff0ab6..c237834 100644 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -45,9 +45,25 @@ #include "firmware_exports.h" #define USE_GTS 0 #define SGE_RX_SM_BUF_SIZE 1536 + +/* + * If USE_RX_PAGE is defined, the small freelist populated with (partial) + * pages instead of skbs. Pages are carved up into RX_PAGE_SIZE chunks (must + * be a multiple of the host page size). + */ +#define USE_RX_PAGE +#define RX_PAGE_SIZE 2048 + +/* + * skb freelist packets are copied into a new skb (and the freelist one is + * reused) if their len is <= + */ #define SGE_RX_COPY_THRES 256 -# define SGE_RX_DROP_THRES 16 +/* + * Minimum number of freelist entries before we start dropping TUNNEL frames. + */ +#define SGE_RX_DROP_THRES 16 /* * Period of the Tx buffer reclaim timer. This timer does not need to run @@ -85,7 +101,10 @@ struct tx_sw_desc { /* SW state per Tx }; struct rx_sw_desc { /* SW state per Rx descriptor */ - struct sk_buff *skb; + union { + struct sk_buff *skb; + struct sge_fl_page page; + } t; DECLARE_PCI_UNMAP_ADDR(dma_addr); }; @@ -332,16 +351,27 @@ static void free_rx_bufs(struct pci_dev pci_unmap_single(pdev, pci_unmap_addr(d, dma_addr), q->buf_size, PCI_DMA_FROMDEVICE); - kfree_skb(d->skb); - d->skb = NULL; + + if (q->buf_size != RX_PAGE_SIZE) { + kfree_skb(d->t.skb); + d->t.skb = NULL; + } else { + if (d->t.page.frag.page) + put_page(d->t.page.frag.page); + d->t.page.frag.page = NULL; + } if (++cidx == q->size) cidx = 0; } + + if (q->page.frag.page) + put_page(q->page.frag.page); + q->page.frag.page = NULL; } /** * add_one_rx_buf - add a packet buffer to a free-buffer list - * @skb: the buffer to add + * @va: va of the buffer to add * @len: the buffer length * @d: the HW Rx descriptor to write * @sd: the SW Rx descriptor to write @@ -351,14 +381,13 @@ static void free_rx_bufs(struct pci_dev * Add a buffer of the given length to the supplied HW and SW Rx * descriptors. */ -static inline void add_one_rx_buf(struct sk_buff *skb, unsigned int len, +static inline void add_one_rx_buf(unsigned char *va, unsigned int len, struct rx_desc *d, struct rx_sw_desc *sd, unsigned int gen, struct pci_dev *pdev) { dma_addr_t mapping; - sd->skb = skb; - mapping = pci_map_single(pdev, skb->data, len, PCI_DMA_FROMDEVICE); + mapping = pci_map_single(pdev, va, len, PCI_DMA_FROMDEVICE); pci_unmap_addr_set(sd, dma_addr, mapping); d->addr_lo = cpu_to_be32(mapping); @@ -383,14 +412,47 @@ static void refill_fl(struct adapter *ad { struct rx_sw_desc *sd = &q->sdesc[q->pidx]; struct rx_desc *d = &q->desc[q->pidx]; + struct sge_fl_page *p = &q->page; while (n--) { - struct sk_buff *skb = alloc_skb(q->buf_size, gfp); + unsigned char *va; - if (!skb) - break; + if (unlikely(q->buf_size != RX_PAGE_SIZE)) { + struct sk_buff *skb = alloc_skb(q->buf_size, gfp); + + if (!skb) { + q->alloc_failed++; + break; + } + va = skb->data; + sd->t.skb = skb; + } else { + if (!p->frag.page) { + p->frag.page = alloc_pages(gfp, 0); + if (unlikely(!p->frag.page)) { + q->alloc_failed++; + break; + } else { + p->frag.size = RX_PAGE_SIZE; + p->frag.page_offset = 0; + p->va = page_address(p->frag.page); + } + } + + memcpy(&sd->t, p, sizeof(*p)); + va = p->va; + + p->frag.page_offset += RX_PAGE_SIZE; + BUG_ON(p->frag.page_offset > PAGE_SIZE); + p->va += RX_PAGE_SIZE; + if (p->frag.page_offset == PAGE_SIZE) + p->frag.page = NULL; + else + get_page(p->frag.page); + } + + add_one_rx_buf(va, q->buf_size, d, sd, q->gen, adap->pdev); - add_one_rx_buf(skb, q->buf_size, d, sd, q->gen, adap->pdev); d++; sd++; if (++q->pidx == q->size) { @@ -425,7 +487,7 @@ static void recycle_rx_buf(struct adapte struct rx_desc *from = &q->desc[idx]; struct rx_desc *to = &q->desc[q->pidx]; - q->sdesc[q->pidx] = q->sdesc[idx]; + memcpy(&q->sdesc[q->pidx], &q->sdesc[idx], sizeof(struct rx_sw_desc)); to->addr_lo = from->addr_lo; /* already big endian */ to->addr_hi = from->addr_hi; /* likewise */ wmb(); @@ -458,7 +520,7 @@ static void recycle_rx_buf(struct adapte * of the SW ring. */ static void *alloc_ring(struct pci_dev *pdev, size_t nelem, size_t elem_size, - size_t sw_size, dma_addr_t *phys, void *metadata) + size_t sw_size, dma_addr_t * phys, void *metadata) { size_t len = nelem * elem_size; void *s = NULL; @@ -588,61 +650,6 @@ static inline unsigned int flits_to_desc } /** - * get_packet - return the next ingress packet buffer from a free list - * @adap: the adapter that received the packet - * @fl: the SGE free list holding the packet - * @len: the packet length including any SGE padding - * @drop_thres: # of remaining buffers before we start dropping packets - * - * Get the next packet from a free list and complete setup of the - * sk_buff. If the packet is small we make a copy and recycle the - * original buffer, otherwise we use the original buffer itself. If a - * positive drop threshold is supplied packets are dropped and their - * buffers recycled if (a) the number of remaining buffers is under the - * threshold and the packet is too big to copy, or (b) the packet should - * be copied but there is no memory for the copy. - */ -static struct sk_buff *get_packet(struct adapter *adap, struct sge_fl *fl, - unsigned int len, unsigned int drop_thres) -{ - struct sk_buff *skb = NULL; - struct rx_sw_desc *sd = &fl->sdesc[fl->cidx]; - - prefetch(sd->skb->data); - - if (len <= SGE_RX_COPY_THRES) { - skb = alloc_skb(len, GFP_ATOMIC); - if (likely(skb != NULL)) { - __skb_put(skb, len); - pci_dma_sync_single_for_cpu(adap->pdev, - pci_unmap_addr(sd, - dma_addr), - len, PCI_DMA_FROMDEVICE); - memcpy(skb->data, sd->skb->data, len); - pci_dma_sync_single_for_device(adap->pdev, - pci_unmap_addr(sd, - dma_addr), - len, PCI_DMA_FROMDEVICE); - } else if (!drop_thres) - goto use_orig_buf; - recycle: - recycle_rx_buf(adap, fl, fl->cidx); - return skb; - } - - if (unlikely(fl->credits < drop_thres)) - goto recycle; - - use_orig_buf: - pci_unmap_single(adap->pdev, pci_unmap_addr(sd, dma_addr), - fl->buf_size, PCI_DMA_FROMDEVICE); - skb = sd->skb; - skb_put(skb, len); - __refill_fl(adap, fl); - return skb; -} - -/** * get_imm_packet - return the next ingress packet buffer from a response * @resp: the response descriptor containing the packet data * @@ -1676,7 +1683,6 @@ static void rx_eth(struct adapter *adap, struct cpl_rx_pkt *p = (struct cpl_rx_pkt *)(skb->data + pad); struct port_info *pi; - rq->eth_pkts++; skb_pull(skb, sizeof(*p) + pad); skb->dev = adap->port[p->iff]; skb->dev->last_rx = jiffies; @@ -1704,6 +1710,85 @@ static void rx_eth(struct adapter *adap, netif_rx(skb); } +#define SKB_DATA_SIZE 128 + +static void skb_data_init(struct sk_buff *skb, struct sge_fl_page *p, + unsigned int len) +{ + skb->len = len; + if (len <= SKB_DATA_SIZE) { + memcpy(skb->data, p->va, len); + skb->tail += len; + put_page(p->frag.page); + } else { + memcpy(skb->data, p->va, SKB_DATA_SIZE); + skb_shinfo(skb)->frags[0].page = p->frag.page; + skb_shinfo(skb)->frags[0].page_offset = + p->frag.page_offset + SKB_DATA_SIZE; + skb_shinfo(skb)->frags[0].size = len - SKB_DATA_SIZE; + skb_shinfo(skb)->nr_frags = 1; + skb->data_len = len - SKB_DATA_SIZE; + skb->tail += SKB_DATA_SIZE; + skb->truesize += skb->data_len; + } +} + +/** +* get_packet - return the next ingress packet buffer from a free list +* @adap: the adapter that received the packet +* @fl: the SGE free list holding the packet +* @len: the packet length including any SGE padding +* @drop_thres: # of remaining buffers before we start dropping packets +* +* Get the next packet from a free list and complete setup of the +* sk_buff. If the packet is small we make a copy and recycle the +* original buffer, otherwise we use the original buffer itself. If a +* positive drop threshold is supplied packets are dropped and their +* buffers recycled if (a) the number of remaining buffers is under the +* threshold and the packet is too big to copy, or (b) the packet should +* be copied but there is no memory for the copy. +*/ +static struct sk_buff *get_packet(struct adapter *adap, struct sge_fl *fl, + unsigned int len, unsigned int drop_thres) +{ + struct sk_buff *skb = NULL; + struct rx_sw_desc *sd = &fl->sdesc[fl->cidx]; + + prefetch(sd->t.skb->data); + + if (len <= SGE_RX_COPY_THRES) { + skb = alloc_skb(len, GFP_ATOMIC); + if (likely(skb != NULL)) { + struct rx_desc *d = &fl->desc[fl->cidx]; + dma_addr_t mapping = + (dma_addr_t)((u64) be32_to_cpu(d->addr_hi) << 32 | + be32_to_cpu(d->addr_lo)); + + __skb_put(skb, len); + pci_dma_sync_single_for_cpu(adap->pdev, mapping, len, + PCI_DMA_FROMDEVICE); + memcpy(skb->data, sd->t.skb->data, len); + pci_dma_sync_single_for_device(adap->pdev, mapping, len, + PCI_DMA_FROMDEVICE); + } else if (!drop_thres) + goto use_orig_buf; +recycle: + recycle_rx_buf(adap, fl, fl->cidx); + return skb; + } + + if (unlikely(fl->credits < drop_thres)) + goto recycle; + +use_orig_buf: + pci_unmap_single(adap->pdev, pci_unmap_addr(sd, dma_addr), + fl->buf_size, PCI_DMA_FROMDEVICE); + skb = sd->t.skb; + skb_put(skb, len); + __refill_fl(adap, fl); + return skb; +} + /** * handle_rsp_cntrl_info - handles control information in a response * @qs: the queue set corresponding to the response @@ -1826,7 +1911,7 @@ static int process_responses(struct adap q->next_holdoff = q->holdoff_tmr; while (likely(budget_left && is_new_response(r, q))) { - int eth, ethpad = 0; + int eth, ethpad = 2; struct sk_buff *skb = NULL; u32 len, flags = ntohl(r->flags); u32 rss_hi = *(const u32 *)r, rss_lo = r->rss_hdr.rss_hash_val; @@ -1853,18 +1938,56 @@ static int process_responses(struct adap break; } q->imm_data++; + ethpad = 0; } else if ((len = ntohl(r->len_cq)) != 0) { - struct sge_fl *fl; + struct sge_fl *fl = + (len & F_RSPD_FLQ) ? &qs->fl[1] : &qs->fl[0]; + + if (fl->buf_size == RX_PAGE_SIZE) { + struct rx_sw_desc *sd = &fl->sdesc[fl->cidx]; + struct sge_fl_page *p = &sd->t.page; + + prefetch(p->va); + prefetch(p->va + L1_CACHE_BYTES); + + __refill_fl(adap, fl); + + pci_unmap_single(adap->pdev, + pci_unmap_addr(sd, dma_addr), + fl->buf_size, + PCI_DMA_FROMDEVICE); + + if (eth) { + if (unlikely(fl->credits < + SGE_RX_DROP_THRES)) + goto eth_recycle; + + skb = alloc_skb(SKB_DATA_SIZE, + GFP_ATOMIC); + if (unlikely(!skb)) { +eth_recycle: + q->rx_drops++; + recycle_rx_buf(adap, fl, + fl->cidx); + goto eth_done; + } + } else { + skb = alloc_skb(SKB_DATA_SIZE, + GFP_ATOMIC); + if (unlikely(!skb)) + goto no_mem; + } + + skb_data_init(skb, p, G_RSPD_LEN(len)); +eth_done: + fl->credits--; + q->eth_pkts++; + } else { + fl->credits--; + skb = get_packet(adap, fl, G_RSPD_LEN(len), + eth ? SGE_RX_DROP_THRES : 0); + } - fl = (len & F_RSPD_FLQ) ? &qs->fl[1] : &qs->fl[0]; - fl->credits--; - skb = get_packet(adap, fl, G_RSPD_LEN(len), - eth ? SGE_RX_DROP_THRES : 0); - if (!skb) - q->rx_drops++; - else if (r->rss_hdr.opcode == CPL_TRACE_PKT) - __skb_pull(skb, 2); - ethpad = 2; if (++fl->cidx == fl->size) fl->cidx = 0; } else @@ -1888,18 +2011,23 @@ static int process_responses(struct adap q->credits = 0; } - if (likely(skb != NULL)) { + if (skb) { + /* Preserve the RSS info in csum & priority */ + skb->csum = rss_hi; + skb->priority = rss_lo; + if (eth) rx_eth(adap, q, skb, ethpad); else { - /* Preserve the RSS info in csum & priority */ - skb->csum = rss_hi; - skb->priority = rss_lo; - ngathered = rx_offload(&adap->tdev, q, skb, - offload_skbs, ngathered); + if (unlikely(r->rss_hdr.opcode == + CPL_TRACE_PKT)) + __skb_pull(skb, ethpad); + + ngathered = rx_offload(&adap->tdev, q, + skb, offload_skbs, + ngathered); } } - --budget_left; } @@ -2376,7 +2504,7 @@ static void sge_timer_cb(unsigned long d spin_unlock(&qs->txq[TXQ_OFLD].lock); } lock = (adap->flags & USING_MSIX) ? &qs->rspq.lock : - &adap->sge.qs[0].rspq.lock; + &adap->sge.qs[0].rspq.lock; if (spin_trylock_irq(lock)) { if (!napi_is_scheduled(qs->netdev)) { u32 status = t3_read_reg(adap, A_SG_RSPQ_FL_STATUS); @@ -2392,7 +2520,7 @@ static void sge_timer_cb(unsigned long d refill_rspq(adap, &qs->rspq, 1); qs->rspq.credits--; qs->rspq.restarted++; - t3_write_reg(adap, A_SG_RSPQ_FL_STATUS, + t3_write_reg(adap, A_SG_RSPQ_FL_STATUS, 1 << qs->rspq.cntxt_id); } } @@ -2504,13 +2632,21 @@ int t3_sge_alloc_qset(struct adapter *ad flits_to_desc(sgl_len(MAX_SKB_FRAGS + 1) + 3); if (ntxq == 1) { +#ifdef USE_RX_PAGE + q->fl[0].buf_size = RX_PAGE_SIZE; +#else q->fl[0].buf_size = SGE_RX_SM_BUF_SIZE + 2 + sizeof(struct cpl_rx_pkt); +#endif q->fl[1].buf_size = MAX_FRAME_SIZE + 2 + sizeof(struct cpl_rx_pkt); } else { +#ifdef USE_RX_PAGE + q->fl[0].buf_size = RX_PAGE_SIZE; +#else q->fl[0].buf_size = SGE_RX_SM_BUF_SIZE + sizeof(struct cpl_rx_data); +#endif q->fl[1].buf_size = (16 * 1024) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); } @@ -2704,7 +2840,7 @@ void __devinit t3_sge_prep(struct adapte q->polling = adap->params.rev > 0; q->coalesce_usecs = 5; q->rspq_size = 1024; - q->fl_size = 4096; + q->fl_size = 1024; q->jumbo_size = 512; q->txq_size[TXQ_ETH] = 1024; q->txq_size[TXQ_OFLD] = 1024; From vlad at mellanox.co.il Tue Feb 27 08:36:46 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 27 Feb 2007 18:36:46 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <45E31D0A.20400@ichips.intel.com> References: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com> <1172394057.12388.3.camel@vladsk-laptop> <45E31D0A.20400@ichips.intel.com> Message-ID: <1172594206.21382.90.camel@vladsk-laptop> On Mon, 2007-02-26 at 09:46 -0800, Sean Hefty wrote: > Vladimir Sokolovsky wrote: > > On Fri, 2007-02-23 at 12:15 -0800, Sean Hefty wrote: > > > I would like these fixes in OFED 1.2 as well. What git tree / branch > > do I > > > generate a patch against? > > > > > > - Sean > > > > git://git.openfabrics.org/~vlad/ofed_1_2/.git > > branch: ofed_1_2 > > Can you try pulling from: > > git://git.openfabrics.org/~shefty/ofed_1_2.git ofed_1_2 > > - Sean Sean, Please send patches that will be added to kernel_patches/fixes. Please update your git tree from git://git.openfabrics.org/~vlad/ofed_1_2/.git ofed_1_2 -- Vladimir Sokolovsky Mellanox Technologies Ltd. From monil at voltaire.com Tue Feb 27 08:43:21 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 18:43:21 +0200 Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: <6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com> References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> <6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com> Message-ID: <6a122cc00702270843h34e407bek9f8757e9fb309fc6@mail.gmail.com> On 2/27/07, Moni Levy wrote: > On 2/27/07, Roland Dreier wrote: > > > I did a short code review of the ipoib code concentrating on > > > partitioning support and I mentioned that the asynchronous events > > > handler in the ipoib code does not take the port number reported in > > > the event record into consideration. The effect of that is that all of > > > the ib# devices related to that specific HCA are flushed when it seems > > > to me that only the relevant port one should be. Is that done on > > > purpose, or am I missing something ? > > > > I don't think there's any particular reason the code is that way > > except for the oversight never being corrected. But it looks trivial > > to fix, like the patch below. Does that look right to you? > > > > > p.s. I'm working on a patch that should solve another issue caused by > > > PKEY reordering & ipoib behavior and the above issue further > > > complicates things for me. > > > > Why not fix the issue first then? > > > > commit a27cbe878203076247c1b5287f5ab59ed143b560 > > Author: Roland Dreier > > Date: Tue Feb 27 07:37:49 2007 -0800 > > > > IPoIB: Only handle async events for one port > > > > An asynchronous event carries the port number that the event occurred > > on, so there's no reason for an IPoIB interface to process an event > > associated with a different local HCA port. > > > > Signed-off-by: Roland Dreier > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > > index 3cb551b..7f3ec20 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > > @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, > > struct ipoib_dev_priv *priv = > > container_of(handler, struct ipoib_dev_priv, event_handler); > > > > - if (record->event == IB_EVENT_PORT_ERR || > > - record->event == IB_EVENT_PKEY_CHANGE || > > - record->event == IB_EVENT_PORT_ACTIVE || > > - record->event == IB_EVENT_LID_CHANGE || > > - record->event == IB_EVENT_SM_CHANGE || > > - record->event == IB_EVENT_CLIENT_REREGISTER) { > > + if ((record->event == IB_EVENT_PORT_ERR || > > + record->event == IB_EVENT_PKEY_CHANGE || > > + record->event == IB_EVENT_PORT_ACTIVE || > > + record->event == IB_EVENT_LID_CHANGE || > > + record->event == IB_EVENT_SM_CHANGE || > > + record->event == IB_EVENT_CLIENT_REREGISTER) && > > + record->element.port_num == priv->port) { > > ipoib_dbg(priv, "Port state change event\n"); > > queue_work(ipoib_workqueue, &priv->flush_task); > > } > > > > That's exactly what I intended to post. On a second thought based on the fact that on a two port HCA we'll have a 50% miss on the events being delivered, I would move the new condition to be evaluated first. I apologize if this is too much of micro optimization. What do you think ? --Moni > > --Moni > From sean.hefty at intel.com Tue Feb 27 08:45:41 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 27 Feb 2007 08:45:41 -0800 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <1172594206.21382.90.camel@vladsk-laptop> Message-ID: <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> >Please send patches that will be added to kernel_patches/fixes. > >Please update your git tree from >git://git.openfabrics.org/~vlad/ofed_1_2/.git ofed_1_2 You want me to create a patch that adds a file that contains the actual patches? Why not apply the patches directly? From rdreier at cisco.com Tue Feb 27 08:51:19 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Feb 2007 08:51:19 -0800 Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: <6a122cc00702270843h34e407bek9f8757e9fb309fc6@mail.gmail.com> (Moni Levy's message of "Tue, 27 Feb 2007 18:43:21 +0200") References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> <6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com> <6a122cc00702270843h34e407bek9f8757e9fb309fc6@mail.gmail.com> Message-ID: > On a second thought based on the fact that on a two port HCA we'll > have a 50% miss on the events being delivered, I would move the new > condition to be evaluated first. I apologize if this is too much of > micro optimization. What do you think ? That wouldn't really be correct since element.port_num isn't valid unless we already know it's a port-related event. And it's not worth worrying about this since it's not remotely a hot path. - R. From vlad at mellanox.co.il Tue Feb 27 08:53:51 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 27 Feb 2007 18:53:51 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> References: <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> Message-ID: <1172595231.21382.96.camel@vladsk-laptop> On Tue, 2007-02-27 at 08:45 -0800, Sean Hefty wrote: > >Please send patches that will be added to kernel_patches/fixes. > > > >Please update your git tree from > >git://git.openfabrics.org/~vlad/ofed_1_2/.git ofed_1_2 > > You want me to create a patch that adds a file that contains the actual patches? Yes, actual patches should be created under kernel_patches/fixes. Please update your git tree because the following patch fails: From 2e7e33936de5f92656c0565ce88f97e796367dae Mon Sep 17 00:00:00 2001 From: Sean Hefty Date: Fri, 23 Feb 2007 12:35:43 -0800 Subject: [PATCH] rdma_cm: request reversible paths only The rdma_cm requires that path records be reversible. Set the reversible bit when issuing an path record query. Signed-off-by: Sean Hefty diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9e0ab04..171cce9 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1396,11 +1396,13 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, ib_addr_get_dgid(addr, &path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); path_rec.numb_path = 1; + path_rec.reversible = 1; id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device, id_priv->id.port_num, &path_rec, IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | - IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH, + IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_REVERSIBLE, timeout_ms, GFP_KERNEL, cma_query_handler, work, &id_priv->query); > > Why not apply the patches directly? > To be consistent with 2.6.20 kernel. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From mst at mellanox.co.il Tue Feb 27 08:55:32 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 18:55:32 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> Message-ID: <20070227165532.GB10245@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [PATCH] for OFED 1.2 > > >Please send patches that will be added to kernel_patches/fixes. > > > >Please update your git tree from > >git://git.openfabrics.org/~vlad/ofed_1_2/.git ofed_1_2 > > You want me to create a patch that adds a file that contains the actual patches? > > Why not apply the patches directly? That's the ofed structure, this was discussed multiple times already. The point is to keep all changes to upstream components separate, to make updating to upstream kernel trivial in the future. Worked quite well for OFED 1.1 -> 1.2 transition. -- MST From monil at voltaire.com Tue Feb 27 08:57:31 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 18:57:31 +0200 Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey In-Reply-To: <000201c759e3$24828410$55d8180a@amr.corp.intel.com> References: <1172507101.4102.277140.camel@hal.voltaire.com> <000201c759e3$24828410$55d8180a@amr.corp.intel.com> Message-ID: <6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com> Sean, On 2/26/07, Sean Hefty wrote: > I think the following patch would make ipoib spec compliant. > ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib. > I'm not certain what this change would do to SRP, but the ib_cm and > rdma_cm look okay, given that non-reversible paths aren't supported > yet anyway. Sorry for jumping into that thread, but although this patch will make things more spec compliant, it will break functionality we depend one. I suggest that we first find an alternate way to enable usage of partial partition membership before disabling that functionality at all. --Moni > -- > > ib_find_cached_pkey masks off the upper-bit of the PKey when searching > for a match. The upper bit indicates partial or full membership. Ignoring > the upper bit can result in a full membership PKey matching with a partial > membership PKey. For ipoib, this can result in joining a multicast group > that disallows communication between all members. > > Signed-off-by: Sean Hefty > --- > drivers/infiniband/core/cache.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > index 558c9a0..6f366c3 100644 > --- a/drivers/infiniband/core/cache.c > +++ b/drivers/infiniband/core/cache.c > @@ -179,7 +179,7 @@ int ib_find_cached_pkey(struct ib_device *device, > *index = -1; > > for (i = 0; i < cache->table_len; ++i) > - if ((cache->table[i] & 0x7fff) == (pkey & 0x7fff)) { > + if (cache->table[i] == pkey) { > *index = i; > ret = 0; > break; > -- > 1.4.4.3 > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From monil at voltaire.com Tue Feb 27 09:00:01 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 19:00:01 +0200 Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> <6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com> <6a122cc00702270843h34e407bek9f8757e9fb309fc6@mail.gmail.com> Message-ID: <6a122cc00702270900q43b6e3fo7008aeaf64236d38@mail.gmail.com> On 2/27/07, Roland Dreier wrote: > > On a second thought based on the fact that on a two port HCA we'll > > have a 50% miss on the events being delivered, I would move the new > > condition to be evaluated first. I apologize if this is too much of > > micro optimization. What do you think ? > > That wouldn't really be correct since element.port_num isn't valid > unless we already know it's a port-related event. You're perfectly right, sorry. > > And it's not worth worrying about this since it's not remotely a hot path. Ok. --Moni > > - R. > From sean.hefty at intel.com Tue Feb 27 09:00:41 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 27 Feb 2007 09:00:41 -0800 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <1172595231.21382.96.camel@vladsk-laptop> Message-ID: <000101c75a90$d04f31f0$c6d8180a@amr.corp.intel.com> >Yes, actual patches should be created under kernel_patches/fixes. > >Please update your git tree because the following patch fails: Can you explain how the patch fails? I don't see how putting the patch into a file helps. >> Why not apply the patches directly? >> >To be consistent with 2.6.20 kernel. You can check out stock 2.6.20 using a tag. Why maintain the ofed code in git if you don't use it to track patches? - Sean From mshefty at ichips.intel.com Tue Feb 27 09:06:53 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 27 Feb 2007 09:06:53 -0800 Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey In-Reply-To: <6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com> References: <1172507101.4102.277140.camel@hal.voltaire.com> <000201c759e3$24828410$55d8180a@amr.corp.intel.com> <6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com> Message-ID: <45E4652D.2070704@ichips.intel.com> > Sorry for jumping into that thread, but although this patch will make > things more spec compliant, it will break functionality we depend one. > I suggest that we first find an alternate way to enable usage of > partial partition membership before disabling that functionality at > all. Can you clarify the functionality you depend on? Are you reliant on ipoib being able to join a multicast group from partial partition membership? If so, do all SA's and switches support this? - Sean From mst at mellanox.co.il Tue Feb 27 09:18:14 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 19:18:14 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <000101c75a90$d04f31f0$c6d8180a@amr.corp.intel.com> References: <1172595231.21382.96.camel@vladsk-laptop> <000101c75a90$d04f31f0$c6d8180a@amr.corp.intel.com> Message-ID: <20070227171814.GD10245@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [PATCH] for OFED 1.2 > > >Yes, actual patches should be created under kernel_patches/fixes. > > > >Please update your git tree because the following patch fails: > > Can you explain how the patch fails? I don't see how putting the patch into a > file helps. Try applying it? > >> Why not apply the patches directly? > >> > >To be consistent with 2.6.20 kernel. > > You can check out stock 2.6.20 using a tag. Why maintain the ofed code in git > if you don't use it to track patches? Basically so that conflicts in future merges from upstream are easy to resolve. If you like, let's reopen this for 1.3. We are after freeze in OFED 1.2. -- MST From swise at opengridcomputing.com Tue Feb 27 09:20:31 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 11:20:31 -0600 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <20070227165532.GB10245@mellanox.co.il> References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> <20070227165532.GB10245@mellanox.co.il> Message-ID: <1172596831.11870.51.camel@stevo-desktop> On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote: > > Quoting Sean Hefty : > > Subject: Re: [PATCH] for OFED 1.2 > > > > >Please send patches that will be added to kernel_patches/fixes. > > > > > >Please update your git tree from > > >git://git.openfabrics.org/~vlad/ofed_1_2/.git ofed_1_2 > > > > You want me to create a patch that adds a file that contains the actual patches? > > > > Why not apply the patches directly? > > That's the ofed structure, this was discussed multiple times already. > The point is to keep all changes to upstream components separate, > to make updating to upstream kernel trivial in the future. > > Worked quite well for OFED 1.1 -> 1.2 transition. > Having these patches as files is painful for every developer because they cannot create a patch against ofed_1_2/drivers/infiniband/* nor the kernel.org upstream tree. They need to apply all the current patches and then create a patch on top of that. Or hope the patch applies fuzzily. I think with stacked git or just git and rebasing at key times, you could keep an ofed_1_2 tree that folks can easily apply patches to... Its too late to change this for 1.2, but you might want to reconsider the design for 1.3. my 2 cents... From monil at voltaire.com Tue Feb 27 09:25:27 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 27 Feb 2007 19:25:27 +0200 Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey In-Reply-To: <45E4652D.2070704@ichips.intel.com> References: <1172507101.4102.277140.camel@hal.voltaire.com> <000201c759e3$24828410$55d8180a@amr.corp.intel.com> <6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com> <45E4652D.2070704@ichips.intel.com> Message-ID: <6a122cc00702270925j47a79e8ey82708c4ef8038480@mail.gmail.com> On 2/27/07, Sean Hefty wrote: > > Sorry for jumping into that thread, but although this patch will make > > things more spec compliant, it will break functionality we depend one. > > I suggest that we first find an alternate way to enable usage of > > partial partition membership before disabling that functionality at > > all. > > Can you clarify the functionality you depend on? Are you reliant on ipoib being > able to join a multicast group from partial partition membership? Exactly. > If so, do all SA's and switches support this? I can't commit on all the SA's and switches. -- Moni > > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From halr at voltaire.com Tue Feb 27 09:19:45 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Feb 2007 12:19:45 -0500 Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey In-Reply-To: <45E4652D.2070704@ichips.intel.com> References: <1172507101.4102.277140.camel@hal.voltaire.com> <000201c759e3$24828410$55d8180a@amr.corp.intel.com> <6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com> <45E4652D.2070704@ichips.intel.com> Message-ID: <1172596773.4102.367435.camel@hal.voltaire.com> On Tue, 2007-02-27 at 12:06, Sean Hefty wrote: > > Sorry for jumping into that thread, but although this patch will make > > things more spec compliant, it will break functionality we depend one. > > I suggest that we first find an alternate way to enable usage of > > partial partition membership before disabling that functionality at > > all. > > Can you clarify the functionality you depend on? Are you reliant on ipoib being > able to join a multicast group from partial partition membership? If so, do all > SA's and switches support this? I'm not sure who can speak for all SAs nor necessarily would the vendor SAs indicate this. From a quick code inspection of OpenSM, it appears to not enforce the compliance properly. Switches do whatever they are told to do by the SM. -- Hal > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Tue Feb 27 09:31:22 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 19:31:22 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <1172596831.11870.51.camel@stevo-desktop> References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> <20070227165532.GB10245@mellanox.co.il> <1172596831.11870.51.camel@stevo-desktop> Message-ID: <20070227173122.GE10245@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [openib-general] [PATCH] for OFED 1.2 > > On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote: > > > Quoting Sean Hefty : > > > Subject: Re: [PATCH] for OFED 1.2 > > > > > > >Please send patches that will be added to kernel_patches/fixes. > > > > > > > >Please update your git tree from > > > >git://git.openfabrics.org/~vlad/ofed_1_2/.git ofed_1_2 > > > > > > You want me to create a patch that adds a file that contains the actual patches? > > > > > > Why not apply the patches directly? > > > > That's the ofed structure, this was discussed multiple times already. > > The point is to keep all changes to upstream components separate, > > to make updating to upstream kernel trivial in the future. > > > > Worked quite well for OFED 1.1 -> 1.2 transition. > > > > Having these patches as files is painful for every developer because > they cannot create a patch against ofed_1_2/drivers/infiniband/* nor the > kernel.org upstream tree. Did you try using quilt which makes managing patch stacks quite easy? If you have quilt installed, OFED scripts actually use it to apply patches, so things are easy. > They need to apply all the current patches > and then create a patch on top of that. Or hope the patch applies > fuzzily. One point I can't stress enough: whatever way you create a patch, developers are expected to build and test it in OFED environment before posting. > I think with stacked git or just git and rebasing at key times, you > could keep an ofed_1_2 tree that folks can easily apply patches to... > > Its too late to change this for 1.2, but you might want to reconsider > the design for 1.3. Well, I experimented with git rebase and it is unfortunately still fragile at this point. I agree using stacked git might be a good idea, I just did not have the chance to experiment with it enough. I had an impression that publishing stg managed branch creates problems for whoever attempts to track it, but I might be wrong. -- MST From sean.hefty at intel.com Tue Feb 27 09:30:02 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 27 Feb 2007 09:30:02 -0800 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <1172596831.11870.51.camel@stevo-desktop> Message-ID: <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> >I think with stacked git or just git and rebasing at key times, you >could keep an ofed_1_2 tree that folks can easily apply patches to... > >Its too late to change this for 1.2, but you might want to reconsider >the design for 1.3. Can't we just create a new branch (ofed_1_2_patched) with these patches already applied and in the correct order? Maybe I'm just not understanding the work flow here... - Sean From jsquyres at cisco.com Tue Feb 27 09:39:18 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 27 Feb 2007 12:39:18 -0500 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <20070227173122.GE10245@mellanox.co.il> References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> <20070227165532.GB10245@mellanox.co.il> <1172596831.11870.51.camel@stevo-desktop> <20070227173122.GE10245@mellanox.co.il> Message-ID: <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com> It would be great if all of this knowledge is posted to the wiki to avoid repeating this conversation in the future (or one of countless variations of this conversation). For example, I admit to not paying close attention to many of the threads on this list, but this was the first time I'd head of "quilt". Specifically: if there are tools and methods that are helpful for OFA/ OFED development, they should be detailed on the wiki. The wiki is where all permanent knowledge should be posted. This is just my $0.000001... On Feb 27, 2007, at 12:31 PM, Michael S. Tsirkin wrote: >> Quoting Steve Wise : >> Subject: Re: [openib-general] [PATCH] for OFED 1.2 >> >> On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote: >>>> Quoting Sean Hefty : >>>> Subject: Re: [PATCH] for OFED 1.2 >>>> >>>>> Please send patches that will be added to kernel_patches/fixes. >>>>> >>>>> Please update your git tree from >>>>> git://git.openfabrics.org/~vlad/ofed_1_2/.git ofed_1_2 >>>> >>>> You want me to create a patch that adds a file that contains the >>>> actual patches? >>>> >>>> Why not apply the patches directly? >>> >>> That's the ofed structure, this was discussed multiple times >>> already. >>> The point is to keep all changes to upstream components separate, >>> to make updating to upstream kernel trivial in the future. >>> >>> Worked quite well for OFED 1.1 -> 1.2 transition. >>> >> >> Having these patches as files is painful for every developer because >> they cannot create a patch against ofed_1_2/drivers/infiniband/* >> nor the >> kernel.org upstream tree. > > Did you try using quilt which makes managing patch stacks quite easy? > If you have quilt installed, OFED scripts actually use it > to apply patches, so things are easy. > >> They need to apply all the current patches >> and then create a patch on top of that. Or hope the patch applies >> fuzzily. > > One point I can't stress enough: whatever way you create a patch, > developers are expected to build and test it in OFED environment > before posting. > >> I think with stacked git or just git and rebasing at key times, you >> could keep an ofed_1_2 tree that folks can easily apply patches to... >> >> Its too late to change this for 1.2, but you might want to reconsider >> the design for 1.3. > > Well, I experimented with git rebase and it is unfortunately still > fragile at this point. > > I agree using stacked git might be a good idea, I just did not > have the chance to experiment with it enough. I had an impression > that publishing stg managed branch creates problems for whoever > attempts to track it, but I might be wrong. > > > -- > MST > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Tue Feb 27 09:44:26 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 19:44:26 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> References: <1172596831.11870.51.camel@stevo-desktop> <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> Message-ID: <20070227174426.GF10245@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [PATCH] for OFED 1.2 > > >I think with stacked git or just git and rebasing at key times, you > >could keep an ofed_1_2 tree that folks can easily apply patches to... > > > >Its too late to change this for 1.2, but you might want to reconsider > >the design for 1.3. > > Can't we just create a new branch (ofed_1_2_patched) with these patches already > applied and in the correct order? Then what we do when we want to update to new upstream? Throw this branch away? As it is, I just pull then build and remove patches that conflict. By the way, there are backport patches, etc - it is still incorrect to say that you would be able to generate a patch out of git and know it's a good one without test-build. > Maybe I'm just not understanding the work flow here... Sean, please install quilt and try using it for working with the system. Adding new patch is usually done in this way quilt new quilt add edit quilt refresh cp patches/ kernel_patches/fixes/ git add kernel_patches/fixes/ git commit kernel_patches/fixes/ -- MST From mst at mellanox.co.il Tue Feb 27 09:45:53 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 19:45:53 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com> References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> <20070227165532.GB10245@mellanox.co.il> <1172596831.11870.51.camel@stevo-desktop> <20070227173122.GE10245@mellanox.co.il> <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com> Message-ID: <20070227174553.GG10245@mellanox.co.il> Lot's of stuff *is* in wiki already - did you look at pages Vlad created? Things can always be improved, you can add stuff too. Quoting Jeff Squyres : Subject: Re: [PATCH] for OFED 1.2 It would be great if all of this knowledge is posted to the wiki to avoid repeating this conversation in the future (or one of countless variations of this conversation). For example, I admit to not paying close attention to many of the threads on this list, but this was the first time I'd head of "quilt". Specifically: if there are tools and methods that are helpful for OFA/ OFED development, they should be detailed on the wiki. The wiki is where all permanent knowledge should be posted. This is just my $0.000001... On Feb 27, 2007, at 12:31 PM, Michael S. Tsirkin wrote: >> Quoting Steve Wise : >> Subject: Re: [openib-general] [PATCH] for OFED 1.2 >> >> On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote: >>>> Quoting Sean Hefty : >>>> Subject: Re: [PATCH] for OFED 1.2 >>>> >>>>> Please send patches that will be added to kernel_patches/fixes. >>>>> >>>>> Please update your git tree from >>>>> git://git.openfabrics.org/~vlad/ofed_1_2/.git ofed_1_2 >>>> >>>> You want me to create a patch that adds a file that contains the >>>> actual patches? >>>> >>>> Why not apply the patches directly? >>> >>> That's the ofed structure, this was discussed multiple times >>> already. >>> The point is to keep all changes to upstream components separate, >>> to make updating to upstream kernel trivial in the future. >>> >>> Worked quite well for OFED 1.1 -> 1.2 transition. >>> >> >> Having these patches as files is painful for every developer because >> they cannot create a patch against ofed_1_2/drivers/infiniband/* >> nor the >> kernel.org upstream tree. > > Did you try using quilt which makes managing patch stacks quite easy? > If you have quilt installed, OFED scripts actually use it > to apply patches, so things are easy. > >> They need to apply all the current patches >> and then create a patch on top of that. Or hope the patch applies >> fuzzily. > > One point I can't stress enough: whatever way you create a patch, > developers are expected to build and test it in OFED environment > before posting. > >> I think with stacked git or just git and rebasing at key times, you >> could keep an ofed_1_2 tree that folks can easily apply patches to... >> >> Its too late to change this for 1.2, but you might want to reconsider >> the design for 1.3. > > Well, I experimented with git rebase and it is unfortunately still > fragile at this point. > > I agree using stacked git might be a good idea, I just did not > have the chance to experiment with it enough. I had an impression > that publishing stg managed branch creates problems for whoever > attempts to track it, but I might be wrong. > > > -- > MST > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From mst at mellanox.co.il Tue Feb 27 09:47:18 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 19:47:18 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com> References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> <20070227165532.GB10245@mellanox.co.il> <1172596831.11870.51.camel@stevo-desktop> <20070227173122.GE10245@mellanox.co.il> <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com> Message-ID: <20070227174718.GH10245@mellanox.co.il> > This is just my $0.000001... Thanks for the suggestions, but what does $0.000001 buy one in US today? -- MST From swise at opengridcomputing.com Tue Feb 27 09:55:52 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 11:55:52 -0600 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <20070227174426.GF10245@mellanox.co.il> References: <1172596831.11870.51.camel@stevo-desktop> <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> <20070227174426.GF10245@mellanox.co.il> Message-ID: <1172598952.11870.74.camel@stevo-desktop> On Tue, 2007-02-27 at 19:44 +0200, Michael S. Tsirkin wrote: > > Quoting Sean Hefty : > > Subject: Re: [PATCH] for OFED 1.2 > > > > >I think with stacked git or just git and rebasing at key times, you > > >could keep an ofed_1_2 tree that folks can easily apply patches to... > > > > > >Its too late to change this for 1.2, but you might want to reconsider > > >the design for 1.3. > > > > Can't we just create a new branch (ofed_1_2_patched) with these patches already > > applied and in the correct order? > > Then what we do when we want to update to new upstream? Throw this branch away? > As it is, I just pull then build and remove patches that conflict. > > By the way, there are backport patches, etc - it is still incorrect > to say that you would be able to generate a patch out of git > and know it's a good one without test-build. > > > Maybe I'm just not understanding the work flow here... > > Sean, please install quilt and try using it for working with the system. > Adding new patch is usually done in this way > quilt new > quilt add > edit > quilt refresh > > cp patches/ kernel_patches/fixes/ > git add kernel_patches/fixes/ > git commit kernel_patches/fixes/ NOTE: The key to the above process is the assumption that the developer maintains _all_ of the existing patches from kernel_patches/ on top of the ofed_1_2 tree using quilt or stg. Otherwise quilt/stg isn't buying you anything. And this doesn't take into account backports. Regardless, you need to build, install and test any ofed patch on an ofed system, so you're gonna have extra work: 1) create ofed-specific patch build/test it on ofed post it to openib-general/ewg 2) create kernel.org patch build/test it on kernel.org post it to openib-gernel/lklm/netdev My .27 cents... From jsquyres at cisco.com Tue Feb 27 10:11:18 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 27 Feb 2007 13:11:18 -0500 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <20070227174553.GG10245@mellanox.co.il> References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> <20070227165532.GB10245@mellanox.co.il> <1172596831.11870.51.camel@stevo-desktop> <20070227173122.GE10245@mellanox.co.il> <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com> <20070227174553.GG10245@mellanox.co.il> Message-ID: On Feb 27, 2007, at 12:45 PM, Michael S. Tsirkin wrote: > Lot's of stuff *is* in wiki already - did you look at pages Vlad > created? A search for "quilt" on the wiki turns up nothing (I checked before I posted :-) ). And yes, I have [thoroughly] read the pages Vlad created. But the very fact that this conversation is occurring is because either the information is not on the wiki or what is on the wiki is not clear. Otherwise, I suspect that you simply would have pointed Steve to the wiki and said "Please read the fine manual at http://....". Don't get me wrong; what has already been posted is great. I'm just saying: keep it coming! The wiki should be a living document that changes as our procedures and collective wisdom changes. It saves us *all* time over the long run. A one-time dump of information is not nearly as useful as an ever-updated document. > Things can always be improved, you can add stuff too. https://wiki.openfabrics.org/tiki-lastchanges.php?days=31 shows that only Tziporet and myself have changed the OFED portion of the wiki over the past month. So -- *you* can add stuff to the wiki, too. :-) > This is just my $0.000001... It buys very little, if anything. In fact, a whole $0.02 also buys very little, if anything. So take my comments for what they're worth. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Tue Feb 27 10:14:08 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 20:14:08 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <1172598952.11870.74.camel@stevo-desktop> References: <1172596831.11870.51.camel@stevo-desktop> <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> <20070227174426.GF10245@mellanox.co.il> <1172598952.11870.74.camel@stevo-desktop> Message-ID: <20070227181353.GI10245@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [PATCH] for OFED 1.2 > > On Tue, 2007-02-27 at 19:44 +0200, Michael S. Tsirkin wrote: > > > Quoting Sean Hefty : > > > Subject: Re: [PATCH] for OFED 1.2 > > > > > > >I think with stacked git or just git and rebasing at key times, you > > > >could keep an ofed_1_2 tree that folks can easily apply patches to... > > > > > > > >Its too late to change this for 1.2, but you might want to reconsider > > > >the design for 1.3. > > > > > > Can't we just create a new branch (ofed_1_2_patched) with these patches already > > > applied and in the correct order? > > > > Then what we do when we want to update to new upstream? Throw this branch away? > > As it is, I just pull then build and remove patches that conflict. > > > > By the way, there are backport patches, etc - it is still incorrect > > to say that you would be able to generate a patch out of git > > and know it's a good one without test-build. > > > > > Maybe I'm just not understanding the work flow here... > > > > Sean, please install quilt and try using it for working with the system. > > Adding new patch is usually done in this way > > quilt new > > quilt add > > edit > > quilt refresh > > > > cp patches/ kernel_patches/fixes/ > > git add kernel_patches/fixes/ > > git commit kernel_patches/fixes/ > > NOTE: The key to the above process is the assumption that the developer > maintains _all_ of the existing patches from kernel_patches/ on top of > the ofed_1_2 tree using quilt or stg. Otherwise quilt/stg isn't buying > you anything. OFED will do this automatically. > And this doesn't take into account backports. The process works with backport patches too: you just have to do this > quilt pop -a > > > > quilt new > > > quilt add > > > edit > > > quilt refresh > > quilt push -a -- MST From mst at mellanox.co.il Tue Feb 27 10:15:28 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 20:15:28 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> <20070227165532.GB10245@mellanox.co.il> <1172596831.11870.51.camel@stevo-desktop> <20070227173122.GE10245@mellanox.co.il> <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com> <20070227174553.GG10245@mellanox.co.il> Message-ID: <20070227181528.GJ10245@mellanox.co.il> > > This is just my $0.000001... > > It buys very little, if anything. In fact, a whole $0.02 also buys > very little, if anything. So take my comments for what they're worth. Oh, good, I thought deflation is getting out of hand ... -- MST From mst at mellanox.co.il Tue Feb 27 10:16:53 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 20:16:53 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: References: <1172594206.21382.90.camel@vladsk-laptop> <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com> <20070227165532.GB10245@mellanox.co.il> <1172596831.11870.51.camel@stevo-desktop> <20070227173122.GE10245@mellanox.co.il> <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com> <20070227174553.GG10245@mellanox.co.il> Message-ID: <20070227181653.GK10245@mellanox.co.il> > > Lot's of stuff *is* in wiki already - did you look at pages Vlad > > created? > > A search for "quilt" on the wiki turns up nothing (I checked before I > posted :-) ). > > And yes, I have [thoroughly] read the pages Vlad created. But the > very fact that this conversation is occurring is because either the > information is not on the wiki or what is on the wiki is not clear. > Otherwise, I suspect that you simply would have pointed Steve to the > wiki and said "Please read the fine manual at http://....". You are right in that, I don't disclaim it. Thanks for the suggestion, I'll try to find the time to add this to wiki. -- MST From swise at opengridcomputing.com Tue Feb 27 10:43:18 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 12:43:18 -0600 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <20070227181353.GI10245@mellanox.co.il> References: <1172596831.11870.51.camel@stevo-desktop> <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> <20070227174426.GF10245@mellanox.co.il> <1172598952.11870.74.camel@stevo-desktop> <20070227181353.GI10245@mellanox.co.il> Message-ID: <1172601798.11870.103.camel@stevo-desktop> > > > > > > Sean, please install quilt and try using it for working with the system. > > > Adding new patch is usually done in this way > > > quilt new > > > quilt add > > > edit > > > quilt refresh > > > > > > cp patches/ kernel_patches/fixes/ > > > git add kernel_patches/fixes/ > > > git commit kernel_patches/fixes/ > > > > NOTE: The key to the above process is the assumption that the developer > > maintains _all_ of the existing patches from kernel_patches/ on top of > > the ofed_1_2 tree using quilt or stg. Otherwise quilt/stg isn't buying > > you anything. > > OFED will do this automatically. > uh, can you explain this? Given I have a freshly cloned ofed_1_2 git tree, and I want to change cma.c (a good one cuz there are patches). What do I do? There's no quilt stack at all at this point. Right? > > And this doesn't take into account backports. > > The process works with backport patches too: you just have to do this > > > quilt pop -a > > > > > > quilt new > > > > quilt add > > > > edit > > > > quilt refresh > > > > quilt push -a But you cannot keep a stack for more than one backport pushed, right? So you still need to be slapping the stacks of patches around for each backport. Or maybe I'm confused? From sean.hefty at intel.com Tue Feb 27 10:49:08 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 27 Feb 2007 10:49:08 -0800 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <1172601798.11870.103.camel@stevo-desktop> Message-ID: <000301c75a9f$f6d28480$c6d8180a@amr.corp.intel.com> >But you cannot keep a stack for more than one backport pushed, right? >So you still need to be slapping the stacks of patches around for each >backport. Why not have separate branches for each kernels too? From mst at mellanox.co.il Tue Feb 27 10:51:07 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 20:51:07 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <1172601798.11870.103.camel@stevo-desktop> References: <1172596831.11870.51.camel@stevo-desktop> <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> <20070227174426.GF10245@mellanox.co.il> <1172598952.11870.74.camel@stevo-desktop> <20070227181353.GI10245@mellanox.co.il> <1172601798.11870.103.camel@stevo-desktop> Message-ID: <20070227185107.GL10245@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: [PATCH] for OFED 1.2 > > > > > > > > > Sean, please install quilt and try using it for working with the system. > > > > Adding new patch is usually done in this way > > > > quilt new > > > > quilt add > > > > edit > > > > quilt refresh > > > > > > > > cp patches/ kernel_patches/fixes/ > > > > git add kernel_patches/fixes/ > > > > git commit kernel_patches/fixes/ > > > > > > NOTE: The key to the above process is the assumption that the developer > > > maintains _all_ of the existing patches from kernel_patches/ on top of > > > the ofed_1_2 tree using quilt or stg. Otherwise quilt/stg isn't buying > > > you anything. > > > > OFED will do this automatically. > > > > uh, can you explain this? Given I have a freshly cloned ofed_1_2 git > tree, and I want to change cma.c (a good one cuz there are patches). > What do I do? There's no quilt stack at all at this point. Right? Try running the configure script. After this, quilt applied will show what patches are applied. > > > And this doesn't take into account backports. > > > > The process works with backport patches too: you just have to do this > > > > > quilt pop -a > > > > > > > > quilt new > > > > > quilt add > > > > > edit > > > > > quilt refresh > > > > > > quilt push -a > > > But you cannot keep a stack for more than one backport pushed, right? > So you still need to be slapping the stacks of patches around for each > backport. > > Or maybe I'm confused? Yes. Fortunately it's not too hard: you can do quilt pop -a and re-run configure for another kernel. Of course for testing the patch, it is easier to commit the change in your tree and then to use openfabrics cross-build functionality that will clone this tree and build for multiple arches/kernels. -- MST From mst at mellanox.co.il Tue Feb 27 10:53:26 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 20:53:26 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <000301c75a9f$f6d28480$c6d8180a@amr.corp.intel.com> References: <1172601798.11870.103.camel@stevo-desktop> <000301c75a9f$f6d28480$c6d8180a@amr.corp.intel.com> Message-ID: <20070227185326.GM10245@mellanox.co.il> > Quoting Sean Hefty : > Subject: RE: [PATCH] for OFED 1.2 > > >But you cannot keep a stack for more than one backport pushed, right? > >So you still need to be slapping the stacks of patches around for each > >backport. > > Why not have separate branches for each kernels too? I think it'll be much more work to maintain all these branches. And again, there will be conflicts, and it's too easy to get confused when resolving a conflict. With patches we have scripts to automate this. -- MST From troy at scl.ameslab.gov Tue Feb 27 11:03:16 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Tue, 27 Feb 2007 13:03:16 -0600 Subject: [openib-general] remove www.openfabrics.org SVN links.. Message-ID: <20070227190316.GA12092@minbar-g5.scl.ameslab.gov> Can someone please update the main www.openfabrics.org web page to remove all references to subversion, and link to a wiki page on how to get the latest source? Thanks. From mshefty at ichips.intel.com Tue Feb 27 11:10:17 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 27 Feb 2007 11:10:17 -0800 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <20070227185326.GM10245@mellanox.co.il> References: <1172601798.11870.103.camel@stevo-desktop> <000301c75a9f$f6d28480$c6d8180a@amr.corp.intel.com> <20070227185326.GM10245@mellanox.co.il> Message-ID: <45E48219.7030904@ichips.intel.com> > I think it'll be much more work to maintain all these branches. > And again, there will be conflicts, and it's too easy to get confused when > resolving a conflict. Storing patches in a directory seems confusing to me. They must be applied in a specific order for everything to work, and that knowledge is not captured. Conflicts need to be resolved anyway. If someone wants to use scripts to make their life easier, that's fine, but they shouldn't be a necessity to checking out code and creating patches using git. For OFED they are. From sashak at voltaire.com Tue Feb 27 12:11:39 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 27 Feb 2007 22:11:39 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <20070227174426.GF10245@mellanox.co.il> References: <1172596831.11870.51.camel@stevo-desktop> <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> <20070227174426.GF10245@mellanox.co.il> Message-ID: <20070227201139.GB13938@sashak.voltaire.com> On 19:44 Tue 27 Feb , Michael S. Tsirkin wrote: > > Quoting Sean Hefty : > > Subject: Re: [PATCH] for OFED 1.2 > > > > >I think with stacked git or just git and rebasing at key times, you > > >could keep an ofed_1_2 tree that folks can easily apply patches to... > > > > > >Its too late to change this for 1.2, but you might want to reconsider > > >the design for 1.3. > > > > Can't we just create a new branch (ofed_1_2_patched) with these patches already > > applied and in the correct order? > > Then what we do when we want to update to new upstream? Throw this branch away? > As it is, I just pull then build and remove patches that conflict. You can save this branch as - (or better) and to rebase to the new upstream. > By the way, there are backport patches, etc - it is still incorrect > to say that you would be able to generate a patch out of git > and know it's a good one without test-build. In similar way you can track backport patch sets as branches. > > Maybe I'm just not understanding the work flow here... > > Sean, please install quilt and try using it for working with the system. > Adding new patch is usually done in this way > quilt new > quilt add > edit > quilt refresh > > cp patches/ kernel_patches/fixes/ > git add kernel_patches/fixes/ > git commit kernel_patches/fixes/ This looks strange for me to track patches against patches... Sasha From rowland at cse.ohio-state.edu Tue Feb 27 11:57:03 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Tue, 27 Feb 2007 14:57:03 -0500 Subject: [openib-general] ofed_1_2_scripts for bug 372 Message-ID: <45E48D0F.8070403@cse.ohio-state.edu> Hi Vladimir. I've attached a small patch to the ofed_1_2_scripts build.sh file for the mvapich2() function. This fixes bug 372 where the F90 compiler was not being set properly for the GNU compiler case and other possible compilers in the path were being found. This patch is against the latest ofed_1_2_scripts git. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bug-372.patch URL: From mst at mellanox.co.il Tue Feb 27 12:23:31 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 22:23:31 +0200 Subject: [openib-general] [PATCH] for OFED 1.2 In-Reply-To: <20070227201139.GB13938@sashak.voltaire.com> References: <1172596831.11870.51.camel@stevo-desktop> <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com> <20070227174426.GF10245@mellanox.co.il> <20070227201139.GB13938@sashak.voltaire.com> Message-ID: <20070227202331.GP10245@mellanox.co.il> > Quoting Sasha Khapyorsky : > Subject: Re: [PATCH] for OFED 1.2 > > On 19:44 Tue 27 Feb , Michael S. Tsirkin wrote: > > > Quoting Sean Hefty : > > > Subject: Re: [PATCH] for OFED 1.2 > > > > > > >I think with stacked git or just git and rebasing at key times, you > > > >could keep an ofed_1_2 tree that folks can easily apply patches to... > > > > > > > >Its too late to change this for 1.2, but you might want to reconsider > > > >the design for 1.3. > > > > > > Can't we just create a new branch (ofed_1_2_patched) with these patches already > > > applied and in the correct order? > > > > Then what we do when we want to update to new upstream? Throw this branch away? > > As it is, I just pull then build and remove patches that conflict. > > You can save this branch as - (or better) > and to rebase to the new upstream. rebase does not seem to be too robust when run on such a large repository as the linux kernel. Maybe stacked git will work. > > By the way, there are backport patches, etc - it is still incorrect > > to say that you would be able to generate a patch out of git > > and know it's a good one without test-build. > > In similar way you can track backport patch sets as branches. At the moment it seems like a lot of work. Again, maybe stg makes it easy, I know it's hard with plain git. And I think lots of people (including me) will be confused if we have a ton of branches. > > > Maybe I'm just not understanding the work flow here... > > > > Sean, please install quilt and try using it for working with the system. > > Adding new patch is usually done in this way > > quilt new > > quilt add > > edit > > quilt refresh > > > > cp patches/ kernel_patches/fixes/ > > git add kernel_patches/fixes/ > > git commit kernel_patches/fixes/ > > This looks strange for me to track patches against patches... One gets used to it :) Seriously, we have these patches, and we want to version them together with source they are intended to apply to. -- MST From or.gerlitz at gmail.com Tue Feb 27 12:26:32 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 27 Feb 2007 22:26:32 +0200 Subject: [openib-general] failure to create an FMR mapping 1K pages on memfree In-Reply-To: References: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com> <15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com> Message-ID: <15ddcffd0702271226m6c54fa66x3328129f7a7e608@mail.gmail.com> On 2/27/07, Roland Dreier wrote: > Is it really returning -ENOMEM? It seems much more likely that you > are hitting the code > > /* For Arbel, all MTTs must fit in the same page. */ > if (mthca_is_memfree(dev) && > mr->attr.max_pages * sizeof *mr->mem.arbel.mtts > PAGE_SIZE) > return -EINVAL; > > I guess you could call this limit a driver design issue. Indeed, sorry for the in accorate description, mthca_fmr_alloc returns -EINVAL and the fmr pool code returns -ENOMEM. Thanks for the clarification. Or. From mst at mellanox.co.il Tue Feb 27 13:29:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Feb 2007 23:29:24 +0200 Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.5.0.2 Message-ID: <20070227212924.GB24555@mellanox.co.il> FYI. ----- Forwarded message from Junio C Hamano ----- Subject: [ANNOUNCE] GIT 1.5.0.2 Date: Tue, 27 Feb 2007 10:58:22 +0200 In-Reply-To: <7vwt2ec32p.fsf at assigned-by-dhcp.cox.net> (Junio C. Hamano'smessage of "Sun, 18 Feb 2007 18:07:42 -0800") References: <7vwt2ec32p.fsf at assigned-by-dhcp.cox.net> From: Junio C Hamano The latest maintenance release GIT 1.5.0.2 is available at the usual places: http://www.kernel.org/pub/software/scm/git/ git-1.5.0.2.tar.{gz,bz2} (tarball) git-htmldocs-1.5.0.2.tar.{gz,bz2} (preformatted docs) git-manpages-1.5.0.2.tar.{gz,bz2} (preformatted docs) RPMS/$arch/git-*-1.5.0.2-1.$arch.rpm (RPM) GIT v1.5.0.2 Release Notes ========================== Fixes since v1.5.0.1 -------------------- * Bugfixes - Automated merge conflict handling when changes to symbolic links conflicted were completely broken. The merge-resolve strategy created a regular file with conflict markers in it in place of the symbolic link. The default strategy, merge-recursive was even more broken. It removed the path that was pointed at by the symbolic link. Both of these problems have been fixed. - 'git diff maint master next' did not correctly give combined diff across three trees. - 'git fast-import' portability fix for Solaris. - 'git show-ref --verify' without arguments did not error out but segfaulted. - 'git diff :tracked-file `pwd`/an-untracked-file' gave an extra slashes after a/ and b/. - 'git format-patch' produced too long filenames if the commit message had too long line at the beginning. - Running 'make all' and then without changing anything running 'make install' still rebuilt some files. This was inconvenient when building as yourself and then installing as root (especially problematic when the source directory is on NFS and root is mapped to nobody). - 'git-rerere' failed to deal with two unconflicted paths that sorted next to each other. - 'git-rerere' attempted to open(2) a symlink and failed if there was a conflict. Since a conflicting change to a symlink would not benefit from rerere anyway, the command now ignores conflicting changes to symlinks. - 'git-repack' did not like to pass more than 64 arguments internally to underlying 'rev-list' logic, which made it impossible to repack after accumulating many (small) packs in the repository. - 'git-diff' to review the combined diff during a conflicted merge were not reading the working tree version correctly when changes to a symbolic link conflicted. It should have read the data using readlink(2) but read from the regular file the symbolic link pointed at. - 'git-remote' did not like period in a remote's name. * Documentation updates - added and clarified core.bare, core.legacyheaders configurations. - updated "git-clone --depth" documentation. * Assorted git-gui fixes. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ----- End forwarded message ----- -- MST From rdreier at cisco.com Tue Feb 27 13:40:36 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Feb 2007 13:40:36 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: (Roland Dreier's message of "Mon, 26 Feb 2007 14:27:42 -0800") References: Message-ID: > > On our cell blade + PCI-e Mellanox. > > I don't see anything in arch/powerpc that looks like > dma_alloc_coherent() will do anything other than allocate some memory > and map it with DMA_BIDIRECTIONAL. So how does this altix fix help in > your situation? Am I misreading the Cell IOMMU code? Shirley, can you clarify why doing dma_alloc_coherent() in the kernel helps on your Cell blade? It really seems that dma_alloc_coherent() just allocates some memory and then does dma_map(DMA_BIDIRECTIONAL), which would be exactly the same as allocating the CQ buffer in userspace and using ib_umem_get() to map it into the kernel. I'm looking at a possibly cleaner solution to the Altix issue, so I would like to make sure it fixes whatever the bug on Cell is as well. So any details you can provide about the problem you see on Cell would help a lot. Thanks... From hozer at hozed.org Tue Feb 27 13:47:43 2007 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 27 Feb 2007 15:47:43 -0600 Subject: [openib-general] Port error rate detection In-Reply-To: <45DA0E50.7010002@ornl.gov> References: <45DA0E50.7010002@ornl.gov> Message-ID: <20070227214739.GZ21482@narn.hozed.org> On Mon, Feb 19, 2007 at 03:53:36PM -0500, Steven Carter wrote: > I have a Nagios module that alerts on connectivity, port errors, > speed/width problems. I would like to give it the ability to change the > severity of the alert depending on whether errors are just present or if > they are increasing faster than a specified rate. The intent is to > equip the module to keep the state of the last query and possibly > history, but I wanted to make sure that I was not re-inventing the wheel > first. Is there an attribute or utility that I am overlooking that will > help me do this? One other thing you might want to take a look at is the Fountain/Goanna node monitoring setup... It's not really anything like the proposed performance manager, but it might get you want you need. (And we'd like some feedback on what it should do differently ;) http://www.scl.ameslab.gov/Projects/Monitor/ From xma at us.ibm.com Tue Feb 27 14:14:51 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Feb 2007 14:14:51 -0800 Subject: [openib-general] [RFC/BUG] DMA vs. CQ race In-Reply-To: Message-ID: Roland Dreier wrote on 02/27/2007 01:40:36 PM: > Shirley, can you clarify why doing dma_alloc_coherent() in the kernel > helps on your Cell blade? It really seems that dma_alloc_coherent() > just allocates some memory and then does dma_map(DMA_BIDIRECTIONAL), > which would be exactly the same as allocating the CQ buffer in > userspace and using ib_umem_get() to map it into the kernel. > > I'm looking at a possibly cleaner solution to the Altix issue, so I > would like to make sure it fixes whatever the bug on Cell is as well. > So any details you can provide about the problem you see on Cell would > help a lot. > > Thanks... Thanks, Roland. The failure on Cell is different with Altix issue after I reviewed the whole thread. So this fix might not help Cell. The problem I have might be related to multiple DMAs mapping to the same CQ. It might be somewhere else lost the sync. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Feb 27 14:28:54 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Feb 2007 15:28:54 -0700 Subject: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish Message-ID: Hello Roland, Sorry to bother you again. Could you please review below patch to see it's possible to be in upper stream soon? IPoIB can't ping each other if broadcast join successfully but encounting any other IB multicast join failure (like IB multicast group join failure for default IPv6 link local solicited address) when bringing the interface up. It does impact IPoIB usability in large node cluster when MCG LIDs are limited. Thanks Shirley Ma ----- Forwarded by Shirley Ma/Beaverton/IBM on 02/27/07 06:23 AM ----- Shirley Ma/Beaverton/IBM@ IBMUS To Sent by: "Roland Dreier" openib-general-bo cc unces at openib.org openib-general at openib.org Subject [openib-general] [PATCH] enable 02/05/07 06:50 AM IPoIB only if broadcast join finish Hi, Roland, Please review this patch. According to IPoIB RFC4391 section 5, once IPoIB broacast group has been joined, the interface should be ready for data transfer. In current IPoIB implementation, the interface is UP and RUNNING when all default multicast join successful. We hit a problem while the broadcast join finishe and sucessful but the all hosts multicast join failure. Here is the patch, if possible please give your input asap, we have an urgent customer issue need to be resolved: diff -urpN ipoib/ipoib_multicast.c ipoib-multicast/ipoib_multicast.c --- ipoib/ipoib_multicast.c 2006-11-29 13:57:37.000000000 -0800 +++ ipoib-multicast/ipoib_multicast.c 2007-02-04 22:34:16.000000000 -0800 @@ -402,6 +402,11 @@ static void ipoib_mcast_join_complete(in queue_work(ipoib_workqueue, &priv->mcast_task); mutex_unlock(&mcast_mutex); complete(&mcast->done); + /* + * broadcast join finished, enable carrier + */ + if (mcast == priv->broadcast) + netif_carrier_on(dev); return; } @@ -599,7 +604,6 @@ void ipoib_mcast_join_task(void *dev_ptr ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); clear_bit(IPOIB_MCAST_RUN, &priv->flags); - netif_carrier_on(dev); } int ipoib_mcast_start_thread(struct net_device *dev) (See attached file: ipoib-multicast.patch) Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638(See attached file: ipoib-multicast.patch) _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic08451.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ipoib-multicast.patch Type: application/octet-stream Size: 777 bytes Desc: not available URL: From rdreier at cisco.com Tue Feb 27 14:35:34 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Feb 2007 14:35:34 -0800 Subject: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: (Shirley Ma's message of "Tue, 27 Feb 2007 15:28:54 -0700") References: Message-ID: I don't think this applies any more since Sean's multicast stuff was merged. I didn't realize you wanted to get this merged upstream -- anyway, can you please regenerate the patch against the latest kernel? Thanks From xma at us.ibm.com Tue Feb 27 14:38:55 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Feb 2007 14:38:55 -0800 Subject: [openib-general] IPOIB NAPI In-Reply-To: Message-ID: Roland Dreier wrote on 02/26/2007 02:36:26 PM: > No way, it's way too late at this point to change the kernel-user ABI, > let alone change all ULPs. > > - R. Hello Roland, So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can generate the patch for all ULPs to use this for review. Do you need me to do that? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Feb 27 14:41:44 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Feb 2007 14:41:44 -0800 Subject: [openib-general] IPOIB NAPI In-Reply-To: (Shirley Ma's message of "Tue, 27 Feb 2007 14:38:55 -0800") References: Message-ID: > So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can > generate the patch for all ULPs to use this for review. Do you need me to > do that? No, it's not in OFED 1.2 or the upstream kernel. And no one has implemented it for userspace (and I'm somewhat reluctant to break the ABI at this point without some performance numbers to motivate making this API change). Have the NAPI performance problems with ehca been resolved? We could probably merge IPoIB NAPI for 2.6.22 then, which would pull in the kernel changes at least. - R. From swise at opengridcomputing.com Tue Feb 27 14:43:51 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 16:43:51 -0600 Subject: [openib-general] cannot instal ofed-1.2 kernel rpm on 2.6.20.1 Message-ID: <1172616231.11870.142.camel@stevo-desktop> I built the ofed 1.2 rpms from the OFED-1.2-20070227-0602 build and the kernel rpm fails to install on a 2.6.20.1 kernel: vic13:/usr/local/src/OFED-1.2-20070227-0602/RPMS/sles-release-10-15.2 # rpm -U kernel-ib-1.2-2.6.20.1.x86_64.rpm error: Failed dependencies: ksym(schedule) = 1000e51 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(__up_wakeup) = 1042cbb5 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(pci_request_region) = 10cc2981 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(skb_dequeue) = 10fc721b is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(mod_timer) = 14777d07 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(remap_pfn_range) = 155834a8 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(unregister_netevent_notifier) = 1598dc9d is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(bad_dma_address) = 1675606f is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(dev_get_by_name) = 16ab1a6b is needed by kernel-ib-1.2-2.6.20.1.x86_64 ... Anybody seen this? From xma at us.ibm.com Tue Feb 27 14:46:25 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Feb 2007 14:46:25 -0800 Subject: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Roland Dreier wrote on 02/27/2007 02:35:34 PM: > I don't think this applies any more since Sean's multicast stuff was > merged. I didn't realize you wanted to get this merged upstream -- > anyway, can you please regenerate the patch against the latest kernel? > > Thanks Sure. I will generate a new patch. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Feb 27 14:48:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Feb 2007 17:48:15 -0500 Subject: [openib-general] [PATCH] osm: trivial data type change to remove compilation warning In-Reply-To: <45E2C266.5000503@dev.mellanox.co.il> References: <45E2C266.5000503@dev.mellanox.co.il> Message-ID: <1172616493.31770.10684.camel@hal.voltaire.com> On Mon, 2007-02-26 at 06:20, Yevgeny Kliteynik wrote: > Hi Hal > > Trivial data type change to remove compilation warning. > Please apply to the trunk and to the 1.2 branch. > > Thanks. > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied (to both master and ofed_1_2). -- Hal From xma at us.ibm.com Tue Feb 27 14:54:27 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Feb 2007 14:54:27 -0800 Subject: [openib-general] IPOIB NAPI In-Reply-To: Message-ID: oland Dreier wrote on 02/27/2007 02:41:44 PM: > > So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can > > generate the patch for all ULPs to use this for review. Do you need me to > > do that? > > No, it's not in OFED 1.2 or the upstream kernel. And no one has > implemented it for userspace (and I'm somewhat reluctant to break the > ABI at this point without some performance numbers to motivate making > this API change). > > Have the NAPI performance problems with ehca been resolved? We could > probably merge IPoIB NAPI for 2.6.22 then, which would pull in the > kernel changes at least. > > - R. We have addressed the NAPI performance issues with ehca driver. I believe the patches have been upper stream. However the test results show that it's better to delay poll again to next NAPI interval, something like this: poll-cq notify-cq, if missed_event && netif_rx_reschedule() return 1 vs. poll-cq, notify-cq, if missed_event && netif_rx_reschedule() poll again return 0 It seems ehca delivering packet much faster than other HCAs. So poll again would stay in the loop for many many times. So the above changes doesn't impact other HCAs, I would recommand it. I saw same implementations on other ethernet drivers. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Tue Feb 27 15:05:37 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 27 Feb 2007 17:05:37 -0600 Subject: [openib-general] cannot instal ofed-1.2 kernel rpm on 2.6.20.1 In-Reply-To: <1172616231.11870.142.camel@stevo-desktop> References: <1172616231.11870.142.camel@stevo-desktop> Message-ID: <1172617537.11870.143.camel@stevo-desktop> I opened bug 399 to track this. I also opened bug 398 because I got an error installing opensm with this same OFED-1.2 build. Steve. On Tue, 2007-02-27 at 16:43 -0600, Steve Wise wrote: > I built the ofed 1.2 rpms from the OFED-1.2-20070227-0602 build and the > kernel rpm fails to install on a 2.6.20.1 kernel: > > vic13:/usr/local/src/OFED-1.2-20070227-0602/RPMS/sles-release-10-15.2 # rpm -U kernel-ib-1.2-2.6.20.1.x86_64.rpm > error: Failed dependencies: > ksym(schedule) = 1000e51 is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(__up_wakeup) = 1042cbb5 is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(pci_request_region) = 10cc2981 is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(skb_dequeue) = 10fc721b is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(mod_timer) = 14777d07 is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(remap_pfn_range) = 155834a8 is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(unregister_netevent_notifier) = 1598dc9d is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(bad_dma_address) = 1675606f is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(dev_get_by_name) = 16ab1a6b is needed by kernel-ib-1.2-2.6.20.1.x86_64 > > ... > > > > Anybody seen this? > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Tue Feb 27 15:59:23 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Feb 2007 16:59:23 -0700 Subject: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Hello Roland, Here is the new patch against 2.6.20-rc1 kernel. Please review it. diff -urpN ipoib/ipoib_multicast.c ipoib-link/ipoib_multicast.c --- ipoib/ipoib_multicast.c 2007-02-27 07:21:50.000000000 -0800 +++ ipoib-link/ipoib_multicast.c 2007-02-27 07:52:10.000000000 -0800 @@ -407,6 +407,11 @@ static int ipoib_mcast_join_complete(int queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0); mutex_unlock(&mcast_mutex); + /* + * broadcast join finished, enable carrier + */ + if (unlikely(mcast == priv->broadcast)) + netif_carrier_on(dev); return 0; } @@ -596,7 +601,6 @@ void ipoib_mcast_join_task(struct work_s ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); clear_bit(IPOIB_MCAST_RUN, &priv->flags); - netif_carrier_on(dev); } int ipoib_mcast_start_thread(struct net_device *dev) (See attached file: ipoib-link.patch) Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ipoib-link.patch Type: application/octet-stream Size: 772 bytes Desc: not available URL: From bugzilla-daemon at lists.openfabrics.org Tue Feb 27 21:00:29 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 27 Feb 2007 21:00:29 -0800 (PST) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop In-Reply-To: Message-ID: <20070228050029.2EF4CE602D9@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=263 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED ------- Comment #14 from sweitzen at cisco.com 2007-02-27 21:00 ------- With OFED 1.2 alpha1, I was able to failover/failback an IB port every 10 seconds for 8 hours on RHEL4 x86_64 LionMini SDR and DDR. Will keep testing on other platforms. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Tue Feb 27 21:05:09 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 07:05:09 +0200 Subject: [openib-general] IPOIB NAPI In-Reply-To: References: Message-ID: <20070228050509.GB26317@mellanox.co.il> > Quoting Shirley Ma : > Subject: Re: [openib-general] IPOIB NAPI > > Roland Dreier wrote on 02/27/2007 02:41:44 PM: > > > > So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can > > > generate the patch for all ULPs to use this for review. Do you need me to > > > do that? > > > > No, it's not in OFED 1.2 or the upstream kernel. And no one has > > implemented it for userspace (and I'm somewhat reluctant to break the > > ABI at this point without some performance numbers to motivate making > > this API change). > > > > Have the NAPI performance problems with ehca been resolved? We could > > probably merge IPoIB NAPI for 2.6.22 then, which would pull in the > > kernel changes at least. > > > > - R. > We have addressed the NAPI performance issues with ehca driver. I believe the patches have been upper stream. However the test results show that it's better to delay poll again to next NAPI interval, something like this: > > poll-cq > notify-cq, if missed_event && netif_rx_reschedule() > return 1 > > vs. > poll-cq, > notify-cq, if missed_event && netif_rx_reschedule() > poll again > return 0 > > It seems ehca delivering packet much faster than other HCAs. So poll again would stay in the loop for many many times. So the above changes doesn't impact other HCAs, I would recommand it. I saw same implementations on other ethernet drivers. I'm confused. Which one is faster? -- MST From bugzilla-daemon at lists.openfabrics.org Tue Feb 27 21:15:07 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 27 Feb 2007 21:15:07 -0800 (PST) Subject: [openib-general] [Bug 400] New: OFED 1.2 alpha1 IPoIB HA failover gets QP warnings Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=400 Summary: OFED 1.2 alpha1 IPoIB HA failover gets QP warnings Product: OpenFabrics Linux Version: 1.2alpha1 Platform: X86-64 OS/Version: RHEL 4 Status: NEW Severity: normal Priority: P3 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: sweitzen at cisco.com OFED 1.2 alpha1 on RHEL4 U4 x86_64, LionMini DDR HCA. I have IPoIB HA configured, running traffic via netperf, and bringing up/down a different host IB port every 10 seconds. This is working for several hours, but I see warnings in dmesg, more on server side. Client dmesg: ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib_mthca 0000:04:00.0: QP 000404 not found in MGM ib0: ib_detach_mcast failed (result = -22) ib0: ipoib_mcast_detach failed (result = -22) [root at svbu-qa-dl145-1 log]# Server dmesg: ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib_mthca 0000:04:00.0: QP 000405 not found in MGM ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet [root at svbu-qa-dl145-2 log]# -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Tue Feb 27 21:18:07 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 27 Feb 2007 21:18:07 -0800 (PST) Subject: [openib-general] [Bug 400] OFED 1.2 alpha1 IPoIB HA failover gets QP warnings In-Reply-To: Message-ID: <20070228051807.9DD61E603C6@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=400 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |rolandd at cisco.com ------- Comment #1 from sweitzen at cisco.com 2007-02-27 21:18 ------- Roland, can you take a look at this, please? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From xma at us.ibm.com Tue Feb 27 22:06:35 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Feb 2007 22:06:35 -0800 Subject: [OFA General] Re: [openib-general] IPOIB NAPI In-Reply-To: <20070228050509.GB26317@mellanox.co.il> Message-ID: >I'm confused. Which one is faster? Sorry for the confusion, Michael. The one with return 1 has better throughput. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at lists.openfabrics.org Tue Feb 27 22:18:53 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 27 Feb 2007 22:18:53 -0800 (PST) Subject: [OFA General] [Bug 371] IPoIB HA not working properly with OFED1.2-alpha In-Reply-To: Message-ID: <20070228061853.784E3E60812@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=371 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sweitzen at cisco.com -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Tue Feb 27 23:08:55 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 27 Feb 2007 23:08:55 -0800 (PST) Subject: [OFA General] [Bug 371] IPoIB HA not working properly with OFED1.2-alpha In-Reply-To: Message-ID: <20070228070856.0A44EE60810@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=371 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |vlad at mellanox.co.il ------- Comment #2 from mst at mellanox.co.il 2007-02-27 23:08 ------- Assigned to Vlad. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From mplee at sandia.gov Tue Feb 27 23:17:34 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Wed, 28 Feb 2007 00:17:34 -0700 Subject: [OFA General] List Address Change Completed Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> This list has been migrated to the new server, lists.openfabrics.org. Please update any address book or filter settings to reflect the new mailing list address. Future messages and replies should be sent to this address: general at lists.openfabrics.org The new web address for this list is: http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general If you have any questions, please contact me at mplee at sandia.gov Regards, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From mplee at sandia.gov Tue Feb 27 23:17:34 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Wed, 28 Feb 2007 00:17:34 -0700 Subject: [OFA General] List Address Change Completed Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> This list has been migrated to the new server, lists.openfabrics.org. Please update any address book or filter settings to reflect the new mailing list address. Future messages and replies should be sent to this address: general at lists.openfabrics.org The new web address for this list is: http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general If you have any questions, please contact me at mplee at sandia.gov Regards, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue Feb 27 23:17:06 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 09:17:06 +0200 Subject: [OFA General] Re: IPOIB NAPI In-Reply-To: References: Message-ID: <20070228071706.GA22246@mellanox.co.il> > Quoting Shirley Ma : > Subject: Re: IPOIB NAPI > > oland Dreier wrote on 02/27/2007 02:41:44 PM: > > > > So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I > can > > > generate the patch for all ULPs to use this for review. Do you need me to > > > do that? > > > > No, it's not in OFED 1.2 or the upstream kernel. And no one has > > implemented it for userspace (and I'm somewhat reluctant to break the > > ABI at this point without some performance numbers to motivate making > > this API change). > > > > Have the NAPI performance problems with ehca been resolved? We could > > probably merge IPoIB NAPI for 2.6.22 then, which would pull in the > > kernel changes at least. > > > > - R. > We have addressed the NAPI performance issues with ehca driver. I believe the > patches have been upper stream. However the test results show that it's better > to delay poll again to next NAPI interval, something like this: > > poll-cq > notify-cq, if missed_event && netif_rx_reschedule() > return 1 > > vs. > poll-cq, > notify-cq, if missed_event && netif_rx_reschedule() > poll again > return 0 > > It seems ehca delivering packet much faster than other HCAs. So poll again > would stay in the loop for many many times. So the above changes doesn't impact > other HCAs, I would recommand it. I saw same implementations on other ethernet > drivers. I have not benchmarked this, but actually the "return 1" version makes sense to me too: since a new completion was observed after notify-cq, we likely currently have HCA writing new completions into the CQ at a high rate, so it makes sense to delay polling by a few cycles, and reduce the number of interrupts in this way. Right? -- MST From mst at mellanox.co.il Tue Feb 27 23:23:41 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 09:23:41 +0200 Subject: [OFA General] List Address Change Completed In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> Message-ID: <20070228072341.GB22246@mellanox.co.il> > Quoting Lee, Michael Paichi : > Subject: [OFA General] List Address Change Completed > > This list has been migrated to the new server, lists.openfabrics.org. Please update any address book or filter settings to reflect the new mailing list address. Future messages and replies should be sent to this address: > > general at lists.openfabrics.org > > The new web address for this list is: > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > If you have any questions, please contact me at mplee at sandia.gov Can the subject prefix be made all lower-case, with dash, please? OFA General -> ofa-general? Upper case words look like shouting to me, and e.g. exchange rules are limited in coping with spaces. -- MST From mst at mellanox.co.il Tue Feb 27 23:23:41 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 09:23:41 +0200 Subject: [OFA General] List Address Change Completed In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> Message-ID: <20070228072341.GB22246@mellanox.co.il> > Quoting Lee, Michael Paichi : > Subject: [OFA General] List Address Change Completed > > This list has been migrated to the new server, lists.openfabrics.org. Please update any address book or filter settings to reflect the new mailing list address. Future messages and replies should be sent to this address: > > general at lists.openfabrics.org > > The new web address for this list is: > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > If you have any questions, please contact me at mplee at sandia.gov Can the subject prefix be made all lower-case, with dash, please? OFA General -> ofa-general? Upper case words look like shouting to me, and e.g. exchange rules are limited in coping with spaces. -- MST From vlad at mellanox.co.il Tue Feb 27 23:28:25 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 28 Feb 2007 09:28:25 +0200 Subject: [OFA General] Re: [PATCH 0/6] ofed_1_2: cxgb3 bug fixes In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int> References: <20070227155953.21615.96154.stgit@dell3.ogc.int> Message-ID: <1172647705.21382.101.camel@vladsk-laptop> On Tue, 2007-02-27 at 09:59 -0600, Steve Wise wrote: > Hey Vlad, > > These fixes need to be pulled into ofed_1_2 for the Chelsio Ethernet > driver. > > You can pull them directly from my ofa git tree: > > git://staging.openfabrics.org/~swise/ofed_1_2 cxgb3_fixes > > Thanks, > > Steve. Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From mplee at sandia.gov Tue Feb 27 23:32:10 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Wed, 28 Feb 2007 00:32:10 -0700 Subject: [ofa-general] RE: [OFA General] List Address Change Completed References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> <20070228072341.GB22246@mellanox.co.il> Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949B@ES22SNLNT.srn.sandia.gov> Done -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Tue 2/27/2007 11:23 PM To: Lee, Michael Paichi Cc: general at lists.openfabrics.org; openib-general at openib.org Subject: Re: [OFA General] List Address Change Completed > Quoting Lee, Michael Paichi : > Subject: [OFA General] List Address Change Completed > > This list has been migrated to the new server, lists.openfabrics.org. Please update any address book or filter settings to reflect the new mailing list address. Future messages and replies should be sent to this address: > > general at lists.openfabrics.org > > The new web address for this list is: > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > If you have any questions, please contact me at mplee at sandia.gov Can the subject prefix be made all lower-case, with dash, please? OFA General -> ofa-general? Upper case words look like shouting to me, and e.g. exchange rules are limited in coping with spaces. -- MST -------------- next part -------------- An HTML attachment was scrubbed... URL: From mplee at sandia.gov Tue Feb 27 23:32:10 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Wed, 28 Feb 2007 00:32:10 -0700 Subject: [ofa-general] RE: [OFA General] List Address Change Completed References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> <20070228072341.GB22246@mellanox.co.il> Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949B@ES22SNLNT.srn.sandia.gov> Done -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Tue 2/27/2007 11:23 PM To: Lee, Michael Paichi Cc: general at lists.openfabrics.org; openib-general at openib.org Subject: Re: [OFA General] List Address Change Completed > Quoting Lee, Michael Paichi : > Subject: [OFA General] List Address Change Completed > > This list has been migrated to the new server, lists.openfabrics.org. Please update any address book or filter settings to reflect the new mailing list address. Future messages and replies should be sent to this address: > > general at lists.openfabrics.org > > The new web address for this list is: > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > If you have any questions, please contact me at mplee at sandia.gov Can the subject prefix be made all lower-case, with dash, please? OFA General -> ofa-general? Upper case words look like shouting to me, and e.g. exchange rules are limited in coping with spaces. -- MST -------------- next part -------------- An HTML attachment was scrubbed... URL: From kliteyn at dev.mellanox.co.il Wed Feb 28 01:07:31 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 28 Feb 2007 11:07:31 +0200 Subject: [ofa-general] [PATCH] osm: Trivial changes for compilation on windows Message-ID: <45E54653.6010300@dev.mellanox.co.il> Hi Hal. This patch has trivial data types changes and redefining a macro. BTW, Sasha, do we still need this macro (NOISE_L in osm_ucast_updn.c)? Signed-off-by: Yevgeny Kliteynik --- osm/include/opensm/osm_switch.h | 4 ++-- osm/opensm/osm_ucast_mgr.c | 2 +- osm/opensm/osm_ucast_updn.c | 6 +++++- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index 1b3c35d..36c531c 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -105,8 +105,8 @@ typedef struct _osm_switch osm_node_t *p_node; ib_switch_info_t switch_info; uint16_t max_lid_ho; - unsigned num_ports; - unsigned num_hops; + uint8_t num_ports; + uint16_t num_hops; uint8_t **hops; osm_port_profile_t *p_prof; osm_fwd_tbl_t fwd_tbl; diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 473baa6..2ab1c3b 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -411,7 +411,7 @@ __osm_ucast_mgr_setup_switch( IN cl_map_item_t* const p_map_item, IN void* cxt ) { - uint16_t lids = cl_ptr_vector_get_size(&((osm_subn_t *)cxt)->port_lid_tbl); + uint16_t lids = (uint16_t)cl_ptr_vector_get_size(&((osm_subn_t *)cxt)->port_lid_tbl); osm_switch_prepare_path_rebuild((osm_switch_t *)p_map_item, lids ? lids - 1 : 0); diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index 93f54f4..679346e 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -97,7 +97,11 @@ struct updn_node { unsigned visited; }; +#ifndef WIN32 #define NOISE_L(log, fmt, arg...) +#else +#define NOISE_L +#endif /* ///////////////////////////////// */ /* Statics */ @@ -294,7 +298,7 @@ __updn_bfs_by_node( "move from 0x%016" PRIx64 " rank: %u " "to 0x%016" PRIx64" rank: %u\n", cl_ntoh64(current_guid), u->rank, - cl_ntoh64(remote_guid), rem->rank ); + cl_ntoh64(remote_guid), rem_u->rank ); /* Check if this is a legal step : the only illegal step is going from DOWN to UP */ if ((current_dir == DOWN) && (next_dir == UP)) -- 1.4.4.1.GIT From tziporet at mellanox.co.il Wed Feb 28 02:01:32 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 28 Feb 2007 12:01:32 +0200 Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! Message-ID: <45E552FC.4040305@mellanox.co.il> Hi Roland, When running stress tests over IPoIB CM a kernel bug occurred (with kernel 2.6.20): Feb 27 17:47:52 sw169 kernel: BUG: soft lockup detected on CPU#0! Feb 27 17:47:52 sw169 kernel: Feb 27 17:47:52 sw169 kernel: Call Trace: Feb 27 17:47:52 sw169 kernel: [] softlockup_tick+0xd2/0xe4 Feb 27 17:47:52 sw169 kernel: [] update_process_times+0x42/0x68 Feb 27 17:47:52 sw169 kernel: [] smp_local_timer_interrupt+0x31/0x52 Feb 27 17:47:52 sw169 kernel: [] smp_apic_timer_interrupt+0x4f/0x66 Feb 27 17:47:52 sw169 kernel: [] apic_timer_interrupt+0x66/0x70 Feb 27 17:47:52 sw169 kernel: [] _spin_lock_irqsave+0x15/0x24 Feb 27 17:47:52 sw169 kernel: [] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 Feb 27 17:47:52 sw169 kernel: [] neigh_destroy+0xc2/0x10e Feb 27 17:47:52 sw169 kernel: [] dst_destroy+0x5f/0xd6 Feb 27 17:47:52 sw169 kernel: [] dst_run_gc+0x6c/0x12a Feb 27 17:47:52 sw169 kernel: [] dst_run_gc+0x0/0x12a Feb 27 17:47:52 sw169 kernel: [] run_timer_softirq+0x14f/0x1a0 Feb 27 17:47:52 sw169 kernel: [] __do_softirq+0x50/0xbb Feb 27 17:47:52 sw169 kernel: [] call_softirq+0x1c/0x28 Feb 27 17:47:52 sw169 kernel: [] do_softirq+0x2e/0x97 Feb 27 17:47:52 sw169 kernel: [] smp_apic_timer_interrupt+0x54/0x66 Feb 27 17:47:52 sw169 kernel: [] mwait_idle+0x0/0x42 Feb 27 17:47:52 sw169 kernel: [] apic_timer_interrupt+0x66/0x70 Feb 27 17:47:52 sw169 kernel: [] mwait_idle+0x3f/0x42 Feb 27 17:47:52 sw169 kernel: [] cpu_idle+0x8b/0xae Feb 27 17:47:52 sw169 kernel: [] start_kernel+0x212/0x214 Feb 27 17:47:52 sw169 kernel: [] _sinittext+0x175/0x179 To reproduce: Need 2 machines back2back (A and B), and opensm installed on machine B. On A machine run: ping B (its ib0 address) On machine B: Copy scripts from http://www.openfabrics.org/~tziporet/ipoib_scripts/ to a local directory and edit them to include the correct ib0 IP address of machine A. Run: runscripts.sh Tziporet From rf at q-leap.de Wed Feb 28 02:20:16 2007 From: rf at q-leap.de (Roland Fehrenbacher) Date: Wed, 28 Feb 2007 11:20:16 +0100 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 Message-ID: <17893.22368.748298.755523@gargle.gargle.HOWL> Hi, I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, and saw some unpleasant performance drops when using OFED 1.1 (kernel 2.6.20.1 with included IB drivers). The main drop is in throughput as measured by the OSU MPI bandwidth benchmark. However, the latency for large packet sizes is also worse (see results below). I tried with and without "options ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a siginficant performance difference of approx. 10%). The IB card is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an Opteron with nForce4 2200 Professional chipset. Does anybody have an explanation or even better a solution to this issue? Thanks, Roland -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: osu_bench.result URL: From mst at mellanox.co.il Wed Feb 28 02:31:31 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 12:31:31 +0200 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <17893.22368.748298.755523@gargle.gargle.HOWL> References: <17893.22368.748298.755523@gargle.gargle.HOWL> Message-ID: <20070228103131.GC28054@mellanox.co.il> > Quoting Roland Fehrenbacher : > Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 > > Content-Description: message body text > Hi, > > I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, and saw > some unpleasant performance drops when using OFED 1.1 (kernel 2.6.20.1 > with included IB drivers). The main drop is in throughput as measured > by the OSU MPI bandwidth benchmark. However, the latency for large > packet sizes is also worse (see results below). I tried with and > without "options ib_mthca msi_x=1" (using IBGD, disabling msi_x makes > a siginficant performance difference of approx. 10%). The IB card is a > Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an Opteron > with nForce4 2200 Professional chipset. > > Does anybody have an explanation or even better a solution to this > issue? Could be a BIOS bug. Try setting tune_pci=1. If this helps, contact your BIOS vendor: here's an explanation about what this parameter does: http://www.mail-archive.com/openib-general at openib.org/msg25305.html -- MST From rf at q-leap.de Wed Feb 28 03:00:02 2007 From: rf at q-leap.de (Roland Fehrenbacher) Date: Wed, 28 Feb 2007 12:00:02 +0100 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <20070228103131.GC28054@mellanox.co.il> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <20070228103131.GC28054@mellanox.co.il> Message-ID: <17893.24754.773054.426451@gargle.gargle.HOWL> >>>>> "MST" == Michael S Tsirkin writes: >> Quoting Roland Fehrenbacher : Subject: >> [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 >> >> Hi, >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, >> and saw some unpleasant performance drops when using OFED 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main drop is in >> throughput as measured by the OSU MPI bandwidth >> benchmark. However, the latency for large packet sizes is also >> worse (see results below). I tried with and without "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a >> siginficant performance difference of approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an >> Opteron with nForce4 2200 Professional chipset. >> >> Does anybody have an explanation or even better a solution to >> this issue? MST> Could be a BIOS bug. Try setting tune_pci=1. If this helps, MST> contact your BIOS vendor: here's an explanation about what MST> this parameter does: MST> http://www.mail-archive.com/openib-general at openib.org/msg25305.html I tried this with no effect. Just to make sure the settings are in effect, is there a way I can check this after booting? Roland From vlad at lists.openfabrics.org Wed Feb 28 02:59:18 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Wed, 28 Feb 2007 02:59:18 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070228-0200 daily build status Message-ID: <20070228105918.D9287E603C6@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: Build failed on powerpc with linux-2.6.19 Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’ /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.19' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.19 Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.19' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ppc64 with linux-2.6.19 Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’ /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.19' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.5-7.244-smp Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit': /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:468: error: implicit declaration of function 'proto_unregister' /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init': /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:517: error: implicit declaration of function 'proto_register' make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.16.21-0.8-smp Log: In file included from /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:33: include/linux/parser.h:34: error: expected declaration specifiers or ‘...’ before ‘u64’ include/linux/parser.h:35: error: expected declaration specifiers or ‘...’ before ‘s64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.16.21-0.8-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on ia64 with linux-2.6.16.21-0.8-default Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’ make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From mst at mellanox.co.il Wed Feb 28 03:50:47 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 13:50:47 +0200 Subject: [ofa-general] Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <17893.24754.773054.426451@gargle.gargle.HOWL> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <20070228103131.GC28054@mellanox.co.il> <17893.24754.773054.426451@gargle.gargle.HOWL> Message-ID: <20070228115047.GE28054@mellanox.co.il> > Quoting Roland Fehrenbacher : > Subject: Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2 > > >>>>> "MST" == Michael S Tsirkin writes: > > >> Quoting Roland Fehrenbacher : Subject: > >> [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 > >> > >> Hi, > >> > >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, > >> and saw some unpleasant performance drops when using OFED 1.1 > >> (kernel 2.6.20.1 with included IB drivers). The main drop is in > >> throughput as measured by the OSU MPI bandwidth > >> benchmark. However, the latency for large packet sizes is also > >> worse (see results below). I tried with and without "options > >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a > >> siginficant performance difference of approx. 10%). The IB card > >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an > >> Opteron with nForce4 2200 Professional chipset. > >> > >> Does anybody have an explanation or even better a solution to > >> this issue? > > MST> Could be a BIOS bug. Try setting tune_pci=1. If this helps, > MST> contact your BIOS vendor: here's an explanation about what > MST> this parameter does: > > MST> http://www.mail-archive.com/openib-general at openib.org/msg25305.html > > I tried this with no effect. Just to make sure the settings are in > effect, is there a way I can check this after booting? cat /sys/modules/ib_mthca/parameters/tune_pci -- MST From mst at mellanox.co.il Wed Feb 28 03:58:46 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 13:58:46 +0200 Subject: [ofa-general] [PATCH] vnic: include linux/vmalloc.h explicitly Message-ID: <20070228115846.GF28054@mellanox.co.il> Some VNIC files use vmalloc. These should include linux/vmalloc.h Signed-off-by: Michael S. Tsirkin --- This has been applied to OFED git - it fixes build on 2.6.19, and I think it's a good idea generally. diff --git a/drivers/infiniband/ulp/vnic/vnic_control.c b/drivers/infiniband/ulp/vnic/vnic_control.c index 2c55540..a199380 100644 --- a/drivers/infiniband/ulp/vnic/vnic_control.c +++ b/drivers/infiniband/ulp/vnic/vnic_control.c @@ -32,6 +32,7 @@ #include #include +#include #include "vnic_util.h" #include "vnic_main.h" diff --git a/drivers/infiniband/ulp/vnic/vnic_data.c b/drivers/infiniband/ulp/vnic/vnic_data.c index c1d056a..33fa914 100644 --- a/drivers/infiniband/ulp/vnic/vnic_data.c +++ b/drivers/infiniband/ulp/vnic/vnic_data.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "vnic_util.h" #include "vnic_viport.h" -- MST From rf at q-leap.de Wed Feb 28 04:23:29 2007 From: rf at q-leap.de (Roland Fehrenbacher) Date: Wed, 28 Feb 2007 13:23:29 +0100 Subject: [ofa-general] Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <20070228115047.GE28054@mellanox.co.il> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <20070228103131.GC28054@mellanox.co.il> <17893.24754.773054.426451@gargle.gargle.HOWL> <20070228115047.GE28054@mellanox.co.il> Message-ID: <17893.29761.695854.496211@gargle.gargle.HOWL> >>>>> "MST" == Michael S Tsirkin writes: >> Quoting Roland Fehrenbacher : Subject: Re: >> Performance penalty of OFED 1.1 versus IBGD 1.8.2 >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, >> and saw some unpleasant performance drops when using OFED 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main drop is in >> throughput as measured by the OSU MPI bandwidth >> benchmark. However, the latency for large packet sizes is also >> worse (see results below). I tried with and without "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a >> siginficant performance difference of approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an >> Opteron with nForce4 2200 Professional chipset. >> >> Does anybody have an explanation or even better a solution >> to this issue? MST> Could be a BIOS bug. Try setting tune_pci=1. If this helps, MST> contact your BIOS vendor: here's an explanation about what MST> this parameter does: MST> http://www.mail-archive.com/openib-general at openib.org/msg25305.html >> I tried this with no effect. Just to make sure the settings are >> in effect, is there a way I can check this after booting? MST> cat /sys/modules/ib_mthca/parameters/tune_pci Ok, the settings are active, but have zero effect. Anything else I could check? Roland From mst at mellanox.co.il Wed Feb 28 04:25:35 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 14:25:35 +0200 Subject: [ofa-general] Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <17893.29761.695854.496211@gargle.gargle.HOWL> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <20070228103131.GC28054@mellanox.co.il> <17893.24754.773054.426451@gargle.gargle.HOWL> <20070228115047.GE28054@mellanox.co.il> <17893.29761.695854.496211@gargle.gargle.HOWL> Message-ID: <20070228122535.GA3576@mellanox.co.il> > Quoting Roland Fehrenbacher : > Subject: Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2 > > >>>>> "MST" == Michael S Tsirkin writes: > > >> Quoting Roland Fehrenbacher : Subject: Re: > >> Performance penalty of OFED 1.1 versus IBGD 1.8.2 > > >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, > >> and saw some unpleasant performance drops when using OFED 1.1 > >> (kernel 2.6.20.1 with included IB drivers). The main drop is in > >> throughput as measured by the OSU MPI bandwidth > >> benchmark. However, the latency for large packet sizes is also > >> worse (see results below). I tried with and without "options > >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a > >> siginficant performance difference of approx. 10%). The IB card > >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an > >> Opteron with nForce4 2200 Professional chipset. > >> > >> Does anybody have an explanation or even better a solution > >> to this issue? > > MST> Could be a BIOS bug. Try setting tune_pci=1. If this helps, > MST> contact your BIOS vendor: here's an explanation about what > MST> this parameter does: > > MST> http://www.mail-archive.com/openib-general at openib.org/msg25305.html > > >> I tried this with no effect. Just to make sure the settings are > >> in effect, is there a way I can check this after booting? > > MST> cat /sys/modules/ib_mthca/parameters/tune_pci > > Ok, the settings are active, but have zero effect. Anything else I > could check? No idea. Could be an MPI issue? -- MST From halr at voltaire.com Wed Feb 28 04:25:57 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Feb 2007 07:25:57 -0500 Subject: [ofa-general] Re: [PATCH] opensm: remove osm_matrix.* files In-Reply-To: <20070225221943.GG11957@sashak.voltaire.com> References: <20070225214845.GF11957@sashak.voltaire.com> <20070225221943.GG11957@sashak.voltaire.com> Message-ID: <1172665541.31770.60792.camel@hal.voltaire.com> On Sun, 2007-02-25 at 17:19, Sasha Khapyorsky wrote: > Following previously submitted min hops reimplementation this removes > unused osm_matrix.* files. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From hnguyen at linux.vnet.ibm.com Wed Feb 28 04:50:03 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Wed, 28 Feb 2007 13:50:03 +0100 Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! Message-ID: <200702281350.03788.hnguyen@linux.vnet.ibm.com> Hi, I also have seen this when high traffic happens bidirectionally between two nodes and 4 links (ppc64, ehca on 2.6.20) through ipoib. Here is a snippet of backtraces: BUG: soft lockup detected on CPU#23! Call Trace: [C00000000F5DB470] [C00000000000FC8C] .show_stack+0x5c/0x1cc (unreliable) [C00000000F5DB520] [C00000000008731C] .softlockup_tick+0x114/0x14c [C00000000F5DB5E0] [C000000000063210] .run_local_timers+0x1c/0x30 [C00000000F5DB660] [C000000000024244] .timer_interrupt+0xec/0x504 [C00000000F5DB750] [C000000000003570] decrementer_common+0xf0/0x100 --- Exception: 901 at .tcp_v4_rcv+0x964/0xd04 LR = .tcp_v4_rcv+0x938/0xd04 [C00000000F5DBB30] [C00000000035A328] .ip_local_deliver+0x1ac/0x400 [C00000000F5DBBC0] [C000000000359B04] .ip_rcv+0x378/0x690 [C00000000F5DBC70] [C00000000032D5EC] .netif_receive_skb+0x550/0x574 [C00000000F5DBD20] [C00000000032D718] .process_backlog+0x108/0x250 [C00000000F5DBE00] [C00000000032B434] .net_rx_action+0x198/0x2f4 [C00000000F5DBED0] [C00000000005CB58] .__do_softirq+0xd8/0x1a0 [C00000000F5DBF90] [C00000000002761C] .call_do_softirq+0x14/0x24 [C0000003B4E23BA0] [C00000000000CE68] .do_softirq+0xb4/0xc0 [C0000003B4E23C30] [C00000000032DC78] .netif_rx_ni+0x58/0x78 [C0000003B4E23CB0] [D00000000013F638] .ipoib_ib_completion+0x2a4/0x6dc [ib_ipoib] [C0000003B4E23DB0] [D00000000069EB94] .comp_task+0x340/0x424 [ib_ehca] [C0000003B4E23ED0] [C00000000007338C] .kthread+0x170/0x1c0 [C0000003B4E23F90] [C0000000000277D8] .kernel_thread+0x4c/0x68 Above trace occurred on all 32 cpus multiple times. Reason is that the kernel timer tick did not get the cpu after 10 secs (see kernel/softlockup.c), since ipoib_ib_completion() seemed to be polling cq in high rate. The following patch would help: diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f2aa923..97ea26f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -301,6 +301,7 @@ void ipoib_ib_completion(struct ib_cq *c n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc); for (i = 0; i < n; ++i) ipoib_ib_handle_wc(dev, priv->ibwc + i); + cond_resched(); } while (n == IPOIB_NUM_WC); } However I still saw that BUG trace occurred on 3-4 cpus after several hrs. I should also mention that the systems are still functional. Regards Nam From wombat2 at us.ibm.com Wed Feb 28 04:52:10 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Wed, 28 Feb 2007 07:52:10 -0500 Subject: [ofa-general] Re: [OFA General] List Address Change Completed In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> Message-ID: Michael, It looks like the migration of the mailing list deleted all subscriber settings for using digest mode. Before the migration I used to get postings in digest form, now I get individual postings. Can you restore the subscriber settings for those who used to have digest mode to getting digest again? Regards. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner general-bounces at lists.openfabrics.org wrote on 02/28/2007 02:17:34 AM: > This list has been migrated to the new server, lists.openfabrics. > org. Please update any address book or filter settings to reflect > the new mailing list address. Future messages and replies should be > sent to this address: > > general at lists.openfabrics.org > > The new web address for this list is: > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > If you have any questions, please contact me at mplee at sandia.gov > > Regards, > Michael _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From wombat2 at us.ibm.com Wed Feb 28 04:52:10 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Wed, 28 Feb 2007 07:52:10 -0500 Subject: [ofa-general] Re: [OFA General] List Address Change Completed In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> Message-ID: Michael, It looks like the migration of the mailing list deleted all subscriber settings for using digest mode. Before the migration I used to get postings in digest form, now I get individual postings. Can you restore the subscriber settings for those who used to have digest mode to getting digest again? Regards. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner general-bounces at lists.openfabrics.org wrote on 02/28/2007 02:17:34 AM: > This list has been migrated to the new server, lists.openfabrics. > org. Please update any address book or filter settings to reflect > the new mailing list address. Future messages and replies should be > sent to this address: > > general at lists.openfabrics.org > > The new web address for this list is: > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > If you have any questions, please contact me at mplee at sandia.gov > > Regards, > Michael _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at lists.openfabrics.org Wed Feb 28 05:00:00 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 28 Feb 2007 05:00:00 -0800 (PST) Subject: [ofa-general] [Bug 390] perftools don't work on alpha1 In-Reply-To: Message-ID: <20070228130000.8F2A1E60837@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=390 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from mst at mellanox.co.il 2007-02-28 04:59 ------- As far as I know, most perftools do not support CMA. Here is what I get: # ib_write_lat --cma ib_write_lat: unrecognized option `--cma' Usage: ib_write_lat start a server and wait for connection ib_write_lat connect to server at Options: -p, --port= listen on/connect to port (default 18515) -c, --connection= connection type RC/UC (default RC) -m, --mtu= mtu size (default 1024) -d, --ib-dev= use IB device (default first device found) -i, --ib-port= use port of IB device (default 1) -s, --size= size of message to exchange (default 1) -a, --all Run sizes from 2 till 2^23 -t, --tx-depth= size of tx queue (default 50) -n, --iters= number of exchanges (at least 2, default 1000) -C, --report-cycles report times in cpu cycle units (default microseconds) -H, --report-histogram print out all results (default print summary only) -U, --report-unsorted (implies -H) print out unsorted results (default sorted) -V, --version display version number -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Wed Feb 28 05:11:38 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 15:11:38 +0200 Subject: [ofa-general] ofed 1.2: backport changes Message-ID: <20070228131138.GA4715@mellanox.co.il> Hi! To fix bug 247, I have moved backport implementation for struct class sysfs functions from individual backport patches to kernel_addons. I then removed this code from individual core and ipath backport patches for RHEL4 and SLES9 kernels, to avoid conflict. With 8e99564fab97570e82212000cfd78ada7bcf45fe, core and ipath passes build. However, since I do not own ipath hardware, please do DOA testing on RHEL4 and SLES9 kernels. Thanks, -- MST From pasha at dev.mellanox.co.il Wed Feb 28 05:25:49 2007 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Wed, 28 Feb 2007 15:25:49 +0200 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <17893.22368.748298.755523@gargle.gargle.HOWL> References: <17893.22368.748298.755523@gargle.gargle.HOWL> Message-ID: <45E582DD.8010206@dev.mellanox.co.il> Hi Roland, > I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, and saw > some unpleasant performance drops when using OFED 1.1 (kernel 2.6.20.1 > with included IB drivers). The main drop is in throughput as measured > by the OSU MPI bandwidth benchmark. However, the latency for large > packet sizes is also worse (see results below). I tried with and > without "options ib_mthca msi_x=1" (using IBGD, disabling msi_x makes > a siginficant performance difference of approx. 10%). The IB card is a > Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an Opteron > with nForce4 2200 Professional chipset. > > Does anybody have an explanation or even better a solution to this > issue? Please try to add follow mvapich parameter : VIADEV_DEFAULT_MTU=MTU2048 Regards, Pasha. > > Thanks, > > Roland > > > > ------------------------------------------------------------------------ > > IBGD > -------- > > # OSU MPI Bandwidth Test (Version 2.1) > # Size Bandwidth (MB/s) > 1 0.830306 > 2 1.642710 > 4 3.307494 > 8 6.546477 > 16 13.161954 > 32 26.395154 > 64 52.913060 > 128 101.890547 > 256 172.227478 > 512 383.296292 > 1024 611.172247 > 2048 830.147571 > 4096 1068.057366 > 8192 1221.262520 > 16384 1271.771983 > 32768 1369.702828 > 65536 1426.124683 > 131072 1453.781151 > 262144 1457.297992 > 524288 1464.625860 > 1048576 1468.953875 > 2097152 1470.614903 > 4194304 1471.607758 > > # OSU MPI Latency Test (Version 2.1) > # Size Latency (us) > 0 3.03 > 1 3.03 > 2 3.04 > 4 3.03 > 8 3.03 > 16 3.04 > 32 3.11 > 64 3.23 > 128 3.49 > 256 3.83 > 512 4.88 > 1024 6.31 > 2048 8.60 > 4096 11.02 > 8192 15.78 > 16384 28.85 > 32768 39.82 > 65536 60.30 > 131072 106.65 > 262144 196.47 > 524288 374.62 > 1048576 730.79 > 2097152 1442.32 > 4194304 2864.80 > > OFED 1.1 > --------- > > # OSU MPI Bandwidth Test (Version 2.2) > # Size Bandwidth (MB/s) > 1 0.698614 > 2 1.463192 > 4 2.941852 > 8 5.859464 > 16 11.697510 > 32 23.339031 > 64 46.403081 > 128 92.013928 > 256 182.918388 > 512 315.076923 > 1024 500.083937 > 2048 765.294564 > 4096 1003.652513 > 8192 1147.640312 > 16384 1115.803139 > 32768 1221.120298 > 65536 1282.328447 > 131072 1315.715608 > 262144 1331.456393 > 524288 1340.691793 > 1048576 1345.650404 > 2097152 1349.279211 > 4194304 1350.489883 > > # OSU MPI Latency Test (Version 2.2) > # Size Latency (us) > 0 2.99 > 1 3.03 > 2 3.06 > 4 3.03 > 8 3.03 > 16 3.04 > 32 3.12 > 64 3.27 > 128 3.96 > 256 4.29 > 512 4.99 > 1024 6.53 > 2048 9.08 > 4096 11.92 > 8192 17.39 > 16384 31.05 > 32768 43.47 > 65536 67.17 > 131072 115.30 > 262144 212.33 > 524288 405.20 > 1048576 790.45 > 2097152 1558.88 > 4194304 3095.17 > > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Wed Feb 28 05:24:40 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 15:24:40 +0200 Subject: [ofa-general] Re: ofed 1.2: backport changes In-Reply-To: <20070228131138.GA4715@mellanox.co.il> References: <20070228131138.GA4715@mellanox.co.il> Message-ID: <20070228132440.GC4715@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: ofed 1.2: backport changes > > Hi! > To fix bug 247, Should have been: bug 347. > I have moved backport implementation for > struct class sysfs functions from individual backport patches > to kernel_addons. > > I then removed this code from individual core and ipath backport patches > for RHEL4 and SLES9 kernels, to avoid conflict. > With 8e99564fab97570e82212000cfd78ada7bcf45fe, core and ipath passes build. > > However, since I do not own ipath hardware, please do DOA testing > on RHEL4 and SLES9 kernels. -- MST From halr at voltaire.com Wed Feb 28 05:31:42 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Feb 2007 08:31:42 -0500 Subject: [ofa-general] Re: [PATCH] osm: Trivial changes for compilation on windows In-Reply-To: <45E54653.6010300@dev.mellanox.co.il> References: <45E54653.6010300@dev.mellanox.co.il> Message-ID: <1172669491.31770.64611.camel@hal.voltaire.com> On Wed, 2007-02-28 at 04:07, Yevgeny Kliteynik wrote: > Hi Hal. > > This patch has trivial data types changes and redefining a macro. > > > BTW, Sasha, do we still need this macro (NOISE_L in osm_ucast_updn.c)? > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied (to both master and ofed_1_2). -- Hal From rf at q-leap.de Wed Feb 28 05:44:41 2007 From: rf at q-leap.de (Roland Fehrenbacher) Date: Wed, 28 Feb 2007 14:44:41 +0100 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <45E582DD.8010206@dev.mellanox.co.il> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <45E582DD.8010206@dev.mellanox.co.il> Message-ID: <17893.34633.644064.978253@gargle.gargle.HOWL> >>>>> "Pavel" == Pavel Shamis <(Pasha)" > writes: Pavel> Hi Roland, >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, >> and saw some unpleasant performance drops when using OFED 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main drop is in >> throughput as measured by the OSU MPI bandwidth >> benchmark. However, the latency for large packet sizes is also >> worse (see results below). I tried with and without "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a >> siginficant performance difference of approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an >> Opteron with nForce4 2200 Professional chipset. >> >> Does anybody have an explanation or even better a solution to >> this issue? Pavel> Please try to add follow mvapich parameter : Pavel> VIADEV_DEFAULT_MTU=MTU2048 Thanks for the suggestion. Unfortunately, it didn't improve the simple bandwidth results. Bi-directional bandwidth increased by 3% though. Any more ideas? Roland > ------------------------------------------------------------------------ > > IBGD > -------- > > # OSU MPI Bandwidth Test (Version 2.1) > # Size Bandwidth (MB/s) > 1 0.830306 > 2 1.642710 > 4 3.307494 > 8 6.546477 > 16 13.161954 > 32 26.395154 > 64 52.913060 > 128 101.890547 > 256 172.227478 > 512 383.296292 > 1024 611.172247 > 2048 830.147571 > 4096 1068.057366 > 8192 1221.262520 > 16384 1271.771983 > 32768 1369.702828 > 65536 1426.124683 > 131072 1453.781151 > 262144 1457.297992 > 524288 1464.625860 > 1048576 1468.953875 > 2097152 1470.614903 > 4194304 1471.607758 > > # OSU MPI Latency Test (Version 2.1) > # Size Latency (us) > 0 3.03 > 1 3.03 > 2 3.04 > 4 3.03 > 8 3.03 > 16 3.04 > 32 3.11 > 64 3.23 > 128 3.49 > 256 3.83 > 512 4.88 > 1024 6.31 > 2048 8.60 > 4096 11.02 > 8192 15.78 > 16384 28.85 > 32768 39.82 > 65536 60.30 > 131072 106.65 > 262144 196.47 > 524288 374.62 > 1048576 730.79 > 2097152 1442.32 > 4194304 2864.80 > > OFED 1.1 > --------- > > # OSU MPI Bandwidth Test (Version 2.2) > # Size Bandwidth (MB/s) > 1 0.698614 > 2 1.463192 > 4 2.941852 > 8 5.859464 > 16 11.697510 > 32 23.339031 > 64 46.403081 > 128 92.013928 > 256 182.918388 > 512 315.076923 > 1024 500.083937 > 2048 765.294564 > 4096 1003.652513 > 8192 1147.640312 > 16384 1115.803139 > 32768 1221.120298 > 65536 1282.328447 > 131072 1315.715608 > 262144 1331.456393 > 524288 1340.691793 > 1048576 1345.650404 > 2097152 1349.279211 > 4194304 1350.489883 > > # OSU MPI Latency Test (Version 2.2) > # Size Latency (us) > 0 2.99 > 1 3.03 > 2 3.06 > 4 3.03 > 8 3.03 > 16 3.04 > 32 3.12 > 64 3.27 > 128 3.96 > 256 4.29 > 512 4.99 > 1024 6.53 > 2048 9.08 > 4096 11.92 > 8192 17.39 > 16384 31.05 > 32768 43.47 > 65536 67.17 > 131072 115.30 > 262144 212.33 > 524288 405.20 > 1048576 790.45 > 2097152 1558.88 > 4194304 3095.17 > > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Wed Feb 28 05:42:00 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Feb 2007 08:42:00 -0500 Subject: [ofa-general] [PATCH][MINOR] OpenSM/osm_sa.c: Add osm_log error message when osm_sa_mad_ctrl_bind fails Message-ID: <1172670114.31770.65268.camel@hal.voltaire.com> OpenSM/osm_sa.c: Add osm_log error message when osm_sa_mad_ctrl_bind fails Signed-off-by: Hal Rosenstock diff --git a/osm/opensm/osm_sa.c b/osm/opensm/osm_sa.c index 42a38aa..d74d875 100644 --- a/osm/opensm/osm_sa.c +++ b/osm/opensm/osm_sa.c @@ -505,6 +505,16 @@ osm_sa_bind( status = osm_sa_mad_ctrl_bind( &p_sa->mad_ctrl, port_guid ); + if( status != IB_SUCCESS ) + { + osm_log( p_sa->p_log, OSM_LOG_ERROR, + "osm_sa_bind: ERR 4C03: " + "SA MAD Controller bind failed (%s)\n", + ib_get_err_str( status ) ); + goto Exit; + } + + Exit: OSM_LOG_EXIT( p_sa->p_log ); return( status ); } From vlad at lists.openfabrics.org Wed Feb 28 05:57:25 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Wed, 28 Feb 2007 05:57:25 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070228-0525 daily build status Message-ID: <20070228135725.31E24E60842@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-42.ELsmp Failed: Build failed on x86_64 with linux-2.6.5-7.244-smp Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit': /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:468: error: implicit declaration of function 'proto_unregister' /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init': /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:517: error: implicit declaration of function 'proto_register' make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: 'ADVERTISE_PAUSE_CAP' undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: 'ADVERTISE_PAUSE_ASYM' undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'add_adapter': /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: 'adapter_list_lock' undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'remove_adapter': /home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: 'adapter_list_lock' undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From pasha at dev.mellanox.co.il Wed Feb 28 06:03:36 2007 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Wed, 28 Feb 2007 16:03:36 +0200 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <17893.34633.644064.978253@gargle.gargle.HOWL> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <45E582DD.8010206@dev.mellanox.co.il> <17893.34633.644064.978253@gargle.gargle.HOWL> Message-ID: <45E58BB8.4020902@dev.mellanox.co.il> > Pavel> Hi Roland, > >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, > >> and saw some unpleasant performance drops when using OFED 1.1 > >> (kernel 2.6.20.1 with included IB drivers). The main drop is in > >> throughput as measured by the OSU MPI bandwidth > >> benchmark. However, the latency for large packet sizes is also > >> worse (see results below). I tried with and without "options > >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a > >> siginficant performance difference of approx. 10%). The IB card > >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an > >> Opteron with nForce4 2200 Professional chipset. > >> > >> Does anybody have an explanation or even better a solution to > >> this issue? > > Pavel> Please try to add follow mvapich parameter : > Pavel> VIADEV_DEFAULT_MTU=MTU2048 > > Thanks for the suggestion. Unfortunately, it didn't improve the simple > bandwidth results. Bi-directional bandwidth increased by 3% > though. Any more ideas? 3% is good start :-) Please also try to add this one: VIADEV_MAX_RDMA_SIZE=4194304 -Pasha > > Roland > >> ------------------------------------------------------------------------ >> >> IBGD >> -------- >> >> # OSU MPI Bandwidth Test (Version 2.1) >> # Size Bandwidth (MB/s) >> 1 0.830306 >> 2 1.642710 >> 4 3.307494 >> 8 6.546477 >> 16 13.161954 >> 32 26.395154 >> 64 52.913060 >> 128 101.890547 >> 256 172.227478 >> 512 383.296292 >> 1024 611.172247 >> 2048 830.147571 >> 4096 1068.057366 >> 8192 1221.262520 >> 16384 1271.771983 >> 32768 1369.702828 >> 65536 1426.124683 >> 131072 1453.781151 >> 262144 1457.297992 >> 524288 1464.625860 >> 1048576 1468.953875 >> 2097152 1470.614903 >> 4194304 1471.607758 >> >> # OSU MPI Latency Test (Version 2.1) >> # Size Latency (us) >> 0 3.03 >> 1 3.03 >> 2 3.04 >> 4 3.03 >> 8 3.03 >> 16 3.04 >> 32 3.11 >> 64 3.23 >> 128 3.49 >> 256 3.83 >> 512 4.88 >> 1024 6.31 >> 2048 8.60 >> 4096 11.02 >> 8192 15.78 >> 16384 28.85 >> 32768 39.82 >> 65536 60.30 >> 131072 106.65 >> 262144 196.47 >> 524288 374.62 >> 1048576 730.79 >> 2097152 1442.32 >> 4194304 2864.80 >> >> OFED 1.1 >> --------- >> >> # OSU MPI Bandwidth Test (Version 2.2) >> # Size Bandwidth (MB/s) >> 1 0.698614 >> 2 1.463192 >> 4 2.941852 >> 8 5.859464 >> 16 11.697510 >> 32 23.339031 >> 64 46.403081 >> 128 92.013928 >> 256 182.918388 >> 512 315.076923 >> 1024 500.083937 >> 2048 765.294564 >> 4096 1003.652513 >> 8192 1147.640312 >> 16384 1115.803139 >> 32768 1221.120298 >> 65536 1282.328447 >> 131072 1315.715608 >> 262144 1331.456393 >> 524288 1340.691793 >> 1048576 1345.650404 >> 2097152 1349.279211 >> 4194304 1350.489883 >> >> # OSU MPI Latency Test (Version 2.2) >> # Size Latency (us) >> 0 2.99 >> 1 3.03 >> 2 3.06 >> 4 3.03 >> 8 3.03 >> 16 3.04 >> 32 3.12 >> 64 3.27 >> 128 3.96 >> 256 4.29 >> 512 4.99 >> 1024 6.53 >> 2048 9.08 >> 4096 11.92 >> 8192 17.39 >> 16384 31.05 >> 32768 43.47 >> 65536 67.17 >> 131072 115.30 >> 262144 212.33 >> 524288 405.20 >> 1048576 790.45 >> 2097152 1558.88 >> 4194304 3095.17 >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tziporet at mellanox.co.il Wed Feb 28 06:10:02 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 28 Feb 2007 16:10:02 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary Message-ID: <45E58D3A.8060906@mellanox.co.il> The meeting summary is also available on the Wiki: https://wiki.openfabrics.org/tiki-index.php?page=Teleconf+02-26-2007 This is the OFED 1.2 Feb-26 meeting summary on alpha status: Abbreviated minutes / summary: * We will not build any alpha2 package. Anyone can use the full packages that Vlad provides. * The cut date for Beta changes is end of this week (Saturday Mar-3) * Next milestone is the Beta release - on March-7 * Each maintainer should fix the bugs assigned to him in bugzilla * Documents will stay in the same way as in OFED 1.1 (one directory with all docs) * Improved RPM usage by the install will not be part of OFED 1.2 Action Items: 1. Daily build of full OFED package - Vlad 2. Fix bugs sent by Scott for the beta - all 3. Send list of bugs that must be fixed for the beta - Tziporet 4. Schedule a developers session on OFA developers at Sonoma - Tziporet 5. Fix ipath driver compilation issues - Bryan 6. Support MPI selection by MVAPICH2 - Shaun Detailed Minutes: * RPM and install: The RPM are build today in a non-standard way. o We are not going to do any change for OFED 1.2 since it will delay the release significantly. o The RPM usage will be enhanced for the next (1.3) release and we will decide on the correct way in Sonoma. * MPI selection: o Implemented by Open MPI and MVAPICH o Need a support from MVAPICH2 o Jeff will publish usage to the full list after MVAPICH package will support it. o Scott said Cisco will test it * *Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From pasha at dev.mellanox.co.il Wed Feb 28 06:12:40 2007 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Wed, 28 Feb 2007 16:12:40 +0200 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <45E58BB8.4020902@dev.mellanox.co.il> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <45E582DD.8010206@dev.mellanox.co.il> <17893.34633.644064.978253@gargle.gargle.HOWL> <45E58BB8.4020902@dev.mellanox.co.il> Message-ID: <45E58DD8.90306@dev.mellanox.co.il> Also please run : mpirun_rsh -v I want to check which version of mvapich you have. Pavel Shamis (Pasha) wrote: >> Pavel> Hi Roland, >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, >> >> and saw some unpleasant performance drops when using OFED 1.1 >> >> (kernel 2.6.20.1 with included IB drivers). The main drop is in >> >> throughput as measured by the OSU MPI bandwidth >> >> benchmark. However, the latency for large packet sizes is also >> >> worse (see results below). I tried with and without "options >> >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a >> >> siginficant performance difference of approx. 10%). The IB card >> >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an >> >> Opteron with nForce4 2200 Professional chipset. >> >> >> Does anybody have an explanation or even better a >> solution to >> >> this issue? >> >> Pavel> Please try to add follow mvapich parameter : >> Pavel> VIADEV_DEFAULT_MTU=MTU2048 >> >> Thanks for the suggestion. Unfortunately, it didn't improve the simple >> bandwidth results. Bi-directional bandwidth increased by 3% >> though. Any more ideas? > 3% is good start :-) > Please also try to add this one: > VIADEV_MAX_RDMA_SIZE=4194304 > > -Pasha > >> >> Roland >> >>> ------------------------------------------------------------------------ >>> >>> IBGD >>> -------- >>> >>> # OSU MPI Bandwidth Test (Version 2.1) >>> # Size Bandwidth (MB/s) >>> 1 0.830306 >>> 2 1.642710 >>> 4 3.307494 >>> 8 6.546477 >>> 16 13.161954 >>> 32 26.395154 >>> 64 52.913060 >>> 128 101.890547 >>> 256 172.227478 >>> 512 383.296292 >>> 1024 611.172247 >>> 2048 830.147571 >>> 4096 1068.057366 >>> 8192 1221.262520 >>> 16384 1271.771983 >>> 32768 1369.702828 >>> 65536 1426.124683 >>> 131072 1453.781151 >>> 262144 1457.297992 >>> 524288 1464.625860 >>> 1048576 1468.953875 >>> 2097152 1470.614903 >>> 4194304 1471.607758 >>> >>> # OSU MPI Latency Test (Version 2.1) >>> # Size Latency (us) >>> 0 3.03 >>> 1 3.03 >>> 2 3.04 >>> 4 3.03 >>> 8 3.03 >>> 16 3.04 >>> 32 3.11 >>> 64 3.23 >>> 128 3.49 >>> 256 3.83 >>> 512 4.88 >>> 1024 6.31 >>> 2048 8.60 >>> 4096 11.02 >>> 8192 15.78 >>> 16384 28.85 >>> 32768 39.82 >>> 65536 60.30 >>> 131072 106.65 >>> 262144 196.47 >>> 524288 374.62 >>> 1048576 730.79 >>> 2097152 1442.32 >>> 4194304 2864.80 >>> >>> OFED 1.1 >>> --------- >>> >>> # OSU MPI Bandwidth Test (Version 2.2) >>> # Size Bandwidth (MB/s) >>> 1 0.698614 >>> 2 1.463192 >>> 4 2.941852 >>> 8 5.859464 >>> 16 11.697510 >>> 32 23.339031 >>> 64 46.403081 >>> 128 92.013928 >>> 256 182.918388 >>> 512 315.076923 >>> 1024 500.083937 >>> 2048 765.294564 >>> 4096 1003.652513 >>> 8192 1147.640312 >>> 16384 1115.803139 >>> 32768 1221.120298 >>> 65536 1282.328447 >>> 131072 1315.715608 >>> 262144 1331.456393 >>> 524288 1340.691793 >>> 1048576 1345.650404 >>> 2097152 1349.279211 >>> 4194304 1350.489883 >>> >>> # OSU MPI Latency Test (Version 2.2) >>> # Size Latency (us) >>> 0 2.99 >>> 1 3.03 >>> 2 3.06 >>> 4 3.03 >>> 8 3.03 >>> 16 3.04 >>> 32 3.12 >>> 64 3.27 >>> 128 3.96 >>> 256 4.29 >>> 512 4.99 >>> 1024 6.53 >>> 2048 9.08 >>> 4096 11.92 >>> 8192 17.39 >>> 16384 31.05 >>> 32768 43.47 >>> 65536 67.17 >>> 131072 115.30 >>> 262144 212.33 >>> 524288 405.20 >>> 1048576 790.45 >>> 2097152 1558.88 >>> 4194304 3095.17 >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From bugzilla-daemon at lists.openfabrics.org Wed Feb 28 06:22:31 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 28 Feb 2007 06:22:31 -0800 (PST) Subject: [ofa-general] [Bug 390] perftools don't work on alpha1 In-Reply-To: Message-ID: <20070228142231.EEAF2E60823@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=390 swise at opengridcomputing.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|WONTFIX | ------- Comment #3 from swise at opengridcomputing.com 2007-02-28 06:22 ------- ib_rdma_bw, not ib_write_bw. ib_rdma_bw and ib_rdma_lat both support the --cma flag. ib_rdma_lat works, ib_rdma_bw doesn't. You don't want this fixed? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dlezcano at fr.ibm.com Wed Feb 28 06:35:41 2007 From: dlezcano at fr.ibm.com (Daniel Lezcano) Date: Wed, 28 Feb 2007 15:35:41 +0100 Subject: [ofa-general] Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces In-Reply-To: <11697516372179-git-send-email-ebiederm@xmission.com> References: <11697516372179-git-send-email-ebiederm@xmission.com> Message-ID: <45E5933D.4070304@fr.ibm.com> Eric W. Biederman wrote: > From: Eric W. Biederman - unquoted > > This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate > a network device is local to a single network namespace and > should never be moved. Useful for pseudo devices that we > need an instance in each network namespace (like the loopback > device) and for any device we find that cannot handle multiple > network namespaces so we may trap them in the initial network > namespace. > > This patch introduces the function dev_change_net_namespace > a function used to move a network device from one network > namespace to another. To the network device nothing > special appears to happen, to the components of the network > stack it appears as if the network device was unregistered > in the network namespace it is in, and a new device > was registered in the network namespace the device > was moved to. > > This patch sets up a namespace device destructor that > upon the exit of a network namespace moves all of the > movable network devices to the initial network namespace > so they are not lost. > If you: * create etun0/etun1 * create a namespace * move etun1 to this namespace * rename the etun1 to eth0 * kill the namespace the former network device etun1 will be lost if you have in your parent namespace an interface eth0 because it will conflict. Perhaps, the first name should be restored before moving the device back to the initial network namespace ? -- Daniel ps : nice patchset From dlezcano at fr.ibm.com Wed Feb 28 06:42:08 2007 From: dlezcano at fr.ibm.com (Daniel Lezcano) Date: Wed, 28 Feb 2007 15:42:08 +0100 Subject: [ofa-general] Re: [PATCH RFC 22/31] net: Add network namespace clone support. In-Reply-To: <11697516373288-git-send-email-ebiederm@xmission.com> References: <11697516373288-git-send-email-ebiederm@xmission.com> Message-ID: <45E594C0.6090009@fr.ibm.com> Eric W. Biederman wrote: > From: Eric W. Biederman - unquoted > > This patch allows you to create a new network namespace > using sys_clone(...). > > Signed-off-by: Eric W. Biederman > --- > include/linux/sched.h | 1 + > kernel/nsproxy.c | 11 +++++++++++ > net/core/net_namespace.c | 38 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 50 insertions(+), 0 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 4463735..9e0f91a 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -26,6 +26,7 @@ > #define CLONE_STOPPED 0x02000000 /* Start in stopped state */ > #define CLONE_NEWUTS 0x04000000 /* New utsname group? */ > #define CLONE_NEWIPC 0x08000000 /* New ipcs */ > +#define CLONE_NEWNET 0x20000000 /* New network namespace */ > > /* > * Scheduling policies > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c > index 4f3c95a..7861c4c 100644 > --- a/kernel/nsproxy.c > +++ b/kernel/nsproxy.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > > struct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy); > EXPORT_SYMBOL_GPL(init_nsproxy); > @@ -70,6 +71,7 @@ struct nsproxy *dup_namespaces(struct nsproxy *orig) > get_ipc_ns(ns->ipc_ns); > if (ns->pid_ns) > get_pid_ns(ns->pid_ns); > + get_net(ns->net_ns); > } > > return ns; > @@ -117,10 +119,18 @@ int copy_namespaces(int flags, struct task_struct *tsk) > if (err) > goto out_pid; > > + err = copy_net(flags, tsk); > + if (err) > + goto out_net; > + > out: > put_nsproxy(old_ns); > return err; > > +out_net: > + if (new_ns->pid_ns) > + put_pid_ns(new_ns->pid_ns); > + > out_pid: > if (new_ns->ipc_ns) > put_ipc_ns(new_ns->ipc_ns); > @@ -146,5 +156,6 @@ void free_nsproxy(struct nsproxy *ns) > put_ipc_ns(ns->ipc_ns); > if (ns->pid_ns) > put_pid_ns(ns->pid_ns); > + put_net(ns->net_ns); > kfree(ns); > } > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c > index 93e3879..cc56105 100644 > --- a/net/core/net_namespace.c > +++ b/net/core/net_namespace.c > @@ -175,6 +175,44 @@ out_undo: > goto out; > } > > +int copy_net(int flags, struct task_struct *tsk) > +{ > + net_t old_net = tsk->nsproxy->net_ns; > + net_t new_net; > + int err; > + > + get_net(old_net); > + > + if (!(flags & CLONE_NEWNET)) > + return 0; > + > + err = -EPERM; > + if (!capable(CAP_SYS_ADMIN)) > + goto out; > + > + err = -ENOMEM; > + new_net = net_alloc(); > + if (null_net(new_net)) > + goto out; > + > + mutex_lock(&net_mutex); > + err = setup_net(new_net); > + if (err) > + goto out_unlock; > Should we "net_free" in case of error ? > + > + net_lock(); > + net_list_append(new_net); > + net_unlock(); > + > + tsk->nsproxy->net_ns = new_net; > + > +out_unlock: > + mutex_unlock(&net_mutex); > +out: > + put_net(old_net); > + return err; > +} > + > void pernet_modcopy(void *pnetdst, const void *src, unsigned long size) > { > net_t net; > From ebiederm at xmission.com Wed Feb 28 07:05:13 2007 From: ebiederm at xmission.com (ebiederm at xmission.com) Date: Wed, 28 Feb 2007 08:05:13 -0700 Subject: [ofa-general] Re: [PATCH RFC 22/31] net: Add network namespace clone support. In-Reply-To: <45E594C0.6090009@fr.ibm.com> (Daniel Lezcano's message of "Wed, 28 Feb 2007 15:42:08 +0100") References: <11697516373288-git-send-email-ebiederm@xmission.com> <45E594C0.6090009@fr.ibm.com> Message-ID: Daniel Lezcano writes: >> + >> + mutex_lock(&net_mutex); >> + err = setup_net(new_net); >> + if (err) >> + goto out_unlock; >> > Should we "net_free" in case of error ? Oops. Yes we should. Thanks. >> + net_lock(); >> + net_list_append(new_net); >> + net_unlock(); >> + >> + tsk->nsproxy->net_ns = new_net; >> + >> +out_unlock: >> + mutex_unlock(&net_mutex); net_free(new_net); >> +out: >> + put_net(old_net); >> + return err; >> +} >> + >> Eric From ebiederm at xmission.com Wed Feb 28 07:12:16 2007 From: ebiederm at xmission.com (ebiederm at xmission.com) Date: Wed, 28 Feb 2007 08:12:16 -0700 Subject: [ofa-general] Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces In-Reply-To: <45E5933D.4070304@fr.ibm.com> (Daniel Lezcano's message of "Wed, 28 Feb 2007 15:35:41 +0100") References: <11697516372179-git-send-email-ebiederm@xmission.com> <45E5933D.4070304@fr.ibm.com> Message-ID: Daniel Lezcano writes: > Eric W. Biederman wrote: >> From: Eric W. Biederman - unquoted >> >> This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate >> a network device is local to a single network namespace and >> should never be moved. Useful for pseudo devices that we >> need an instance in each network namespace (like the loopback >> device) and for any device we find that cannot handle multiple >> network namespaces so we may trap them in the initial network >> namespace. >> >> This patch introduces the function dev_change_net_namespace >> a function used to move a network device from one network >> namespace to another. To the network device nothing >> special appears to happen, to the components of the network >> stack it appears as if the network device was unregistered >> in the network namespace it is in, and a new device >> was registered in the network namespace the device >> was moved to. >> >> This patch sets up a namespace device destructor that >> upon the exit of a network namespace moves all of the >> movable network devices to the initial network namespace >> so they are not lost. >> > If you: > * create etun0/etun1 > * create a namespace > * move etun1 to this namespace > * rename the etun1 to eth0 > * kill the namespace > > the former network device etun1 will be lost if you have in your parent > namespace an interface eth0 because it will conflict. > Perhaps, the first name should be restored before moving the device back to the > initial network namespace ? Restoration of a previous name is no guarantee of anything. Someone may have renamed the some other interface etun1 in the original network namespace. However if you look closely at the code. You will discover that if it can't keep the same name it will rename the device as it switches namespaces. In particular it will become devN where N is replaced by some unused number. That is what the pat parameter to dev_change_net_namespace is about. I'm not exactly thrilled about the generic name but the code should work, and I don't know if there is a name that makes better sense. > -- Daniel > > ps : nice patchset Thanks. Eric From Roland.Fehrenbacher at transtec.de Wed Feb 28 07:21:00 2007 From: Roland.Fehrenbacher at transtec.de (Roland Fehrenbacher) Date: Wed, 28 Feb 2007 16:21:00 +0100 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <45E58BB8.4020902@dev.mellanox.co.il> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <45E582DD.8010206@dev.mellanox.co.il> <17893.34633.644064.978253@gargle.gargle.HOWL> <45E58BB8.4020902@dev.mellanox.co.il> Message-ID: <17893.40412.365196.423575@gargle.gargle.HOWL> >>>>> "Pavel" == Pavel Shamis <(Pasha)" > writes: Pavel> Hi Roland, >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, >> >> and saw some unpleasant performance drops when using OFED >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main >> drop is in >> throughput as measured by the OSU MPI bandwidth >> >> benchmark. However, the latency for large packet sizes is >> also >> worse (see results below). I tried with and without >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x >> makes a >> siginficant performance difference of >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200 >> Professional chipset. >> >> >> >> Does anybody have an explanation or even better a solution >> to >> this issue? >> Pavel> Please try to add follow mvapich parameter : Pavel> VIADEV_DEFAULT_MTU=MTU2048 >> Thanks for the suggestion. Unfortunately, it didn't improve the >> simple bandwidth results. Bi-directional bandwidth increased by >> 3% though. Any more ideas? Pavel> 3% is good start :-) Please also try to add this one: Pavel> VIADEV_MAX_RDMA_SIZE=4194304 This brought another 2% in bi-directional bandwidth, but still nothing in uni-directional bandwidth. mvapich version is 0.9.8 Roland From mplee at sandia.gov Wed Feb 28 07:25:38 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Wed, 28 Feb 2007 08:25:38 -0700 Subject: [ofa-general] RE: [OFA General] List Address Change Completed References: Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949D@ES22SNLNT.srn.sandia.gov> Done. If I missed anyone, please send me an email. Michael -----Original Message----- From: Bernard King-Smith [mailto:wombat2 at us.ibm.com] Sent: Wed 2/28/2007 4:52 AM To: Lee, Michael Paichi Cc: general at lists.openfabrics.org; openib-general at openib.org Subject: Re: [OFA General] List Address Change Completed Michael, It looks like the migration of the mailing list deleted all subscriber settings for using digest mode. Before the migration I used to get postings in digest form, now I get individual postings. Can you restore the subscriber settings for those who used to have digest mode to getting digest again? Regards. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner general-bounces at lists.openfabrics.org wrote on 02/28/2007 02:17:34 AM: > This list has been migrated to the new server, lists.openfabrics. > org. Please update any address book or filter settings to reflect > the new mailing list address. Future messages and replies should be > sent to this address: > > general at lists.openfabrics.org > > The new web address for this list is: > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > If you have any questions, please contact me at mplee at sandia.gov > > Regards, > Michael _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From mplee at sandia.gov Wed Feb 28 07:25:38 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Wed, 28 Feb 2007 08:25:38 -0700 Subject: [ofa-general] RE: [OFA General] List Address Change Completed References: Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949D@ES22SNLNT.srn.sandia.gov> Done. If I missed anyone, please send me an email. Michael -----Original Message----- From: Bernard King-Smith [mailto:wombat2 at us.ibm.com] Sent: Wed 2/28/2007 4:52 AM To: Lee, Michael Paichi Cc: general at lists.openfabrics.org; openib-general at openib.org Subject: Re: [OFA General] List Address Change Completed Michael, It looks like the migration of the mailing list deleted all subscriber settings for using digest mode. Before the migration I used to get postings in digest form, now I get individual postings. Can you restore the subscriber settings for those who used to have digest mode to getting digest again? Regards. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner general-bounces at lists.openfabrics.org wrote on 02/28/2007 02:17:34 AM: > This list has been migrated to the new server, lists.openfabrics. > org. Please update any address book or filter settings to reflect > the new mailing list address. Future messages and replies should be > sent to this address: > > general at lists.openfabrics.org > > The new web address for this list is: > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > If you have any questions, please contact me at mplee at sandia.gov > > Regards, > Michael _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From monis at voltaire.com Wed Feb 28 07:38:13 2007 From: monis at voltaire.com (Moni Shoua) Date: Wed, 28 Feb 2007 17:38:13 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070227145114.GC4437@mellanox.co.il> References: <45E313D2.70909@voltaire.com> <20070227060245.GI12919@mellanox.co.il> <45E41C13.8090300@voltaire.com> <20070227145114.GC4437@mellanox.co.il> Message-ID: <45E5A1E5.4000201@voltaire.com> Hi, I took some comments from this discussion and I'll refer to them when I write a new version for this patch. I'll post it soon. thanks -MoniS Michael S. Tsirkin wrote: >> I got the assumption about neighbours living in one of these 2 tables from >> observation and code reading. I preferred that that on keeping track of all >> ipoib_neighs and putting them in a list. However, I could do that instead of >> neigh_table scanning. Do you think it's better? > > If some neighbours are not on any tables, it seems using our own lists > (e.g. lists we have in ipoib_path) is the only option, no? OK, I see what you mean. I'll use my own list to keep track about ipoib_neighs. >>>> The only way I found to avoid this (for now) is to check skb headroom in >>>> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB >>>> operation and it seems to solve my problem. However, I would be happy to hear what >>>> others think of this last issue. >>> As I said, this seems to indicate a problem in the bonding code. >>> But what will happen after you error out in ipoib_hard_header? >>> Is the packet dropped? What might break as a result? Michael, your tip about hard_header_len helped. i found what was wrong in the bonding code. Now the skb_under_panic() issue is gone. I will remove the part of checking for headroom from the patch. Thanks > From pasha at dev.mellanox.co.il Wed Feb 28 07:44:27 2007 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Wed, 28 Feb 2007 17:44:27 +0200 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <17893.40412.365196.423575@gargle.gargle.HOWL> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <45E582DD.8010206@dev.mellanox.co.il> <17893.34633.644064.978253@gargle.gargle.HOWL> <45E58BB8.4020902@dev.mellanox.co.il> <17893.40412.365196.423575@gargle.gargle.HOWL> Message-ID: <45E5A35B.8000200@dev.mellanox.co.il> Roland Fehrenbacher wrote: >>>>>> "Pavel" == Pavel Shamis <(Pasha)" > writes: > > Pavel> Hi Roland, > >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, > >> >> and saw some unpleasant performance drops when using OFED > >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main > >> drop is in >> throughput as measured by the OSU MPI bandwidth > >> >> benchmark. However, the latency for large packet sizes is > >> also >> worse (see results below). I tried with and without > >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x > >> makes a >> siginficant performance difference of > >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR > >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200 > >> Professional chipset. > >> >> > >> >> Does anybody have an explanation or even better a solution > >> to >> this issue? > >> > > Pavel> Please try to add follow mvapich parameter : > Pavel> VIADEV_DEFAULT_MTU=MTU2048 > >> Thanks for the suggestion. Unfortunately, it didn't improve the > >> simple bandwidth results. Bi-directional bandwidth increased by > >> 3% though. Any more ideas? > > Pavel> 3% is good start :-) Please also try to add this one: > Pavel> VIADEV_MAX_RDMA_SIZE=4194304 > > This brought another 2% in bi-directional bandwidth, but still nothing > in uni-directional bandwidth. > > mvapich version is 0.9.8 0.9.8 was not distributed (and tested) with OFED 1.1 :-( Please try to use package distributed with OFED 1.1 version. Pasha. From vlad at mellanox.co.il Wed Feb 28 08:18:46 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 28 Feb 2007 18:18:46 +0200 Subject: [ofa-general] [PATCH] Add dapltest headers to Makefile.am Message-ID: <1172679526.21382.114.camel@vladsk-laptop> Hi Arlin, The followin patch fix dapltest compilation after 'make dist': Add dapltest headers to EXTRA_DIST Signed-off-by: Vladimir Sokolovsky diff --git a/Makefile.am b/Makefile.am index e2bf4dc..98bcf70 100644 --- a/Makefile.am +++ b/Makefile.am @@ -231,7 +231,35 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ doc/dat.conf \ dapl/udapl/libdaplcma.map \ libdat.spec.in \ - $(man_MANS) + $(man_MANS) \ + test/dapltest/include/dapl_bpool.h \ + test/dapltest/include/dapl_client_info.h \ + test/dapltest/include/dapl_common.h \ + test/dapltest/include/dapl_execute.h \ + test/dapltest/include/dapl_fft_cmd.h \ + test/dapltest/include/dapl_fft_util.h \ + test/dapltest/include/dapl_getopt.h \ + test/dapltest/include/dapl_global.h \ + test/dapltest/include/dapl_limit_cmd.h \ + test/dapltest/include/dapl_mdep.h \ + test/dapltest/include/dapl_memlist.h \ + test/dapltest/include/dapl_params.h \ + test/dapltest/include/dapl_performance_cmd.h \ + test/dapltest/include/dapl_performance_stats.h \ + test/dapltest/include/dapl_performance_test.h \ + test/dapltest/include/dapl_proto.h \ + test/dapltest/include/dapl_quit_cmd.h \ + test/dapltest/include/dapl_server_cmd.h \ + test/dapltest/include/dapl_server_info.h \ + test/dapltest/include/dapl_tdep.h \ + test/dapltest/include/dapl_tdep_print.h \ + test/dapltest/include/dapl_test_data.h \ + test/dapltest/include/dapl_transaction_cmd.h \ + test/dapltest/include/dapl_transaction_stats.h \ + test/dapltest/include/dapl_transaction_test.h \ + test/dapltest/include/dapl_version.h \ + test/dapltest/mdep/linux/dapl_mdep_user.h + dist-hook: libdat.spec cp libdat.spec $(distdir) From xma at us.ibm.com Wed Feb 28 08:21:15 2007 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 28 Feb 2007 08:21:15 -0800 Subject: [ofa-general] Re: IPOIB NAPI In-Reply-To: <20070228071706.GA22246@mellanox.co.il> Message-ID: Michael, >I have not benchmarked this, but actually the "return 1" version makes sense to >me too: since a new completion was observed after notify-cq, we likely currently >have HCA writing new completions into the CQ at a high rate, so it makes sense >to delay polling by a few cycles, and reduce the number of interrupts in this >way. >Right? Agree. Another question, have you benchmark IPoIB NAPI vs. missed event only mode: just change ipoib completion from notify-cq, poll-cq to poll-cq, notify-cq if any missed event, poll again? I am going to try this to see the performance difference. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From surs at cse.ohio-state.edu Wed Feb 28 08:40:40 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Wed, 28 Feb 2007 11:40:40 -0500 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <45E5A35B.8000200@dev.mellanox.co.il> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <45E582DD.8010206@dev.mellanox.co.il> <17893.34633.644064.978253@gargle.gargle.HOWL> <45E58BB8.4020902@dev.mellanox.co.il> <17893.40412.365196.423575@gargle.gargle.HOWL> <45E5A35B.8000200@dev.mellanox.co.il> Message-ID: <20070228164038.GA28118@cse.ohio-state.edu> Hi Roland, * On Feb,2 Pavel Shamis (Pasha) wrote : > Roland Fehrenbacher wrote: > >>>>>>"Pavel" == Pavel Shamis <(Pasha)" > writes: > > > > Pavel> Hi Roland, > > >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, > > >> >> and saw some unpleasant performance drops when using OFED > > >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main > > >> drop is in >> throughput as measured by the OSU MPI bandwidth > > >> >> benchmark. However, the latency for large packet sizes is > > >> also >> worse (see results below). I tried with and without > > >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x > > >> makes a >> siginficant performance difference of > > >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR > > >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200 > > >> Professional chipset. > > >> >> > > >> >> Does anybody have an explanation or even better a solution > > >> to >> this issue? > > >> > > > > Pavel> Please try to add follow mvapich parameter : > > Pavel> VIADEV_DEFAULT_MTU=MTU2048 > > >> Thanks for the suggestion. Unfortunately, it didn't improve the > > >> simple bandwidth results. Bi-directional bandwidth increased by > > >> 3% though. Any more ideas? > > > > Pavel> 3% is good start :-) Please also try to add this one: > > Pavel> VIADEV_MAX_RDMA_SIZE=4194304 > > > >This brought another 2% in bi-directional bandwidth, but still nothing > >in uni-directional bandwidth. > > > >mvapich version is 0.9.8 > 0.9.8 was not distributed (and tested) with OFED 1.1 :-( > Please try to use package distributed with OFED 1.1 version. MVAPICH-0.9.8 was tested by the MVAPICH team on OFED 1.1. It is being used at several production clusters with OFED 1.1. I ran the bandwidth test on our Opteron nodes, AMD Processor 254 (2.8 GHz), with Mellanox dual-port DDR cards. I can see a peak bandwidth of 1402 MillionBytes/sec as reported by OSU Bandwidth test. On the same machines, I ran ib_rdma_bw (in the perftest module of OFED-1.1), which reports lower Gen2 level performance numbers. The peak bw reported by ib_rdma_bw is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402 MillionBytes/sec). So, the lower level numbers match up to what is reported by MPI. I'm wondering how your lower-level ib_rdma_bw numbers look like? Are they matching up with what OSU BW test reports? If they are, then it is likely some other issue than MPI. We also have a MVAPICH-0.9.9 beta version out. You could give that a try too, if you want. We will be making the full release soon. Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs From rdreier at cisco.com Wed Feb 28 08:42:41 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 28 Feb 2007 08:42:41 -0800 Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! In-Reply-To: <200702281350.03788.hnguyen@linux.vnet.ibm.com> (Hoang-Nam Nguyen's message of "Wed, 28 Feb 2007 13:50:03 +0100") References: <200702281350.03788.hnguyen@linux.vnet.ibm.com> Message-ID: I guess the solution is to merge IPoIB NAPI to avoid overloading the system with interrupts. I'll fix up a few last things with my NAPI patch and we can try to get it in shape to merge for 2.6.22. > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index f2aa923..97ea26f 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -301,6 +301,7 @@ void ipoib_ib_completion(struct ib_cq *c > n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc); > for (i = 0; i < n; ++i) > ipoib_ib_handle_wc(dev, priv->ibwc + i); > + cond_resched(); obviously this is wrong because ipoib_ib_completion() is not necessarily called in process context (in fact the ehca scaling hack is probably the only driver that does call it when it's safe to reschedule). > } while (n == IPOIB_NUM_WC); > } > > However I still saw that BUG trace occurred on 3-4 cpus after several hrs. Right, because this patch is not really doing anything to reduce the interrupt load. From surs at cse.ohio-state.edu Wed Feb 28 08:46:46 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Wed, 28 Feb 2007 11:46:46 -0500 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <20070228164038.GA28118@cse.ohio-state.edu> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <45E582DD.8010206@dev.mellanox.co.il> <17893.34633.644064.978253@gargle.gargle.HOWL> <45E58BB8.4020902@dev.mellanox.co.il> <17893.40412.365196.423575@gargle.gargle.HOWL> <45E5A35B.8000200@dev.mellanox.co.il> <20070228164038.GA28118@cse.ohio-state.edu> Message-ID: <20070228164645.GA22595@cse.ohio-state.edu> Hi, * On Feb,3 Sayantan Sur wrote : > Hi Roland, > > * On Feb,2 Pavel Shamis (Pasha) wrote : > > Roland Fehrenbacher wrote: > > >>>>>>"Pavel" == Pavel Shamis <(Pasha)" > writes: > > > > > > Pavel> Hi Roland, > > > >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, > > > >> >> and saw some unpleasant performance drops when using OFED > > > >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main > > > >> drop is in >> throughput as measured by the OSU MPI bandwidth > > > >> >> benchmark. However, the latency for large packet sizes is > > > >> also >> worse (see results below). I tried with and without > > > >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x > > > >> makes a >> siginficant performance difference of > > > >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR > > > >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200 > > > >> Professional chipset. > > > >> >> > > > >> >> Does anybody have an explanation or even better a solution > > > >> to >> this issue? > > > >> > > > > > > Pavel> Please try to add follow mvapich parameter : > > > Pavel> VIADEV_DEFAULT_MTU=MTU2048 > > > >> Thanks for the suggestion. Unfortunately, it didn't improve the > > > >> simple bandwidth results. Bi-directional bandwidth increased by > > > >> 3% though. Any more ideas? > > > > > > Pavel> 3% is good start :-) Please also try to add this one: > > > Pavel> VIADEV_MAX_RDMA_SIZE=4194304 > > > > > >This brought another 2% in bi-directional bandwidth, but still nothing > > >in uni-directional bandwidth. > > > > > >mvapich version is 0.9.8 > > 0.9.8 was not distributed (and tested) with OFED 1.1 :-( > > Please try to use package distributed with OFED 1.1 version. > > MVAPICH-0.9.8 was tested by the MVAPICH team on OFED 1.1. It is being > used at several production clusters with OFED 1.1. > > I ran the bandwidth test on our Opteron nodes, AMD Processor 254 (2.8 > GHz), with Mellanox dual-port DDR cards. I can see a peak bandwidth of > 1402 MillionBytes/sec as reported by OSU Bandwidth test. On the same > machines, I ran ib_rdma_bw (in the perftest module of OFED-1.1), which > reports lower Gen2 level performance numbers. The peak bw reported by > ib_rdma_bw is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402 > MillionBytes/sec). So, the lower level numbers match up to what is > reported by MPI. The above was done with OFED-1.1. Using IBGD-1.8.2 on the same machines and saw 1402 MillionBytes/sec peak bandwidth. This is the same as reported by OFED-1.1. > I'm wondering how your lower-level ib_rdma_bw numbers look like? Are > they matching up with what OSU BW test reports? If they are, then it is > likely some other issue than MPI. > > We also have a MVAPICH-0.9.9 beta version out. You could give that a try > too, if you want. We will be making the full release soon. In addition, you can check the following URL w.r.t. performance numbers. http://nowlab.cse.ohio-state.edu/projects/mpi-iba/performance/mvapich/opteron/MVAPICH-0.9.8-opteron-gen2-DDR.html Thanks, Sayantan. > > Thanks, > Sayantan. > > -- > http://www.cse.ohio-state.edu/~surs > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- http://www.cse.ohio-state.edu/~surs From hnguyen at linux.vnet.ibm.com Wed Feb 28 09:01:02 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Wed, 28 Feb 2007 18:01:02 +0100 Subject: [ofa-general] [PATCH 2.6.21-rc2] ehca: fix mismatched sync between completion handler and destroy cq Message-ID: <200702281801.02747.hnguyen@linux.vnet.ibm.com> This patch fixes two issues reported by Roland and Christoph H.: - Mismatched sync/locking between completion handler and destroy cq We introduced a counter nr_events per cq to track number of irq events seen. This counter is incremented when an event queue entry is seen and decremented after completion handler has been called regardless if scaling code is active or not. Note that nr_callbacks tracks number of events assigned to a cpu and both counters can potentially diverge. The sync between running completion handler and destroy cq is done by using the global spin lock ehca_cq_idr_lock. - Replace yield by wait_event on the counter above to become zero Signed-off-by: Hoang-Nam Nguyen --- ehca_classes.h | 6 ++++- ehca_cq.c | 16 +++++++++++++-- ehca_irq.c | 59 +++++++++++++++++++++++++++++++++++++-------------------- ehca_main.c | 4 +-- 4 files changed, 60 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 40404c9..85fe741 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -52,6 +52,8 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include + #include #include @@ -153,7 +155,9 @@ struct ehca_cq { spinlock_t cb_lock; struct hlist_head qp_hashtab[QP_HASHTAB_LEN]; struct list_head entry; - u32 nr_callbacks; + u32 nr_callbacks; /* #events assigned to cpu by scaling code */ + u32 nr_events; /* #events seen */ + wait_queue_head_t wait_completion; spinlock_t task_lock; u32 ownpid; /* mmap counter for resources mapped into user space */ diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index 6ebfa27..e2cdc1a 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -146,6 +146,7 @@ struct ib_cq *ehca_create_cq(struct ib_d spin_lock_init(&my_cq->spinlock); spin_lock_init(&my_cq->cb_lock); spin_lock_init(&my_cq->task_lock); + init_waitqueue_head(&my_cq->wait_completion); my_cq->ownpid = current->tgid; cq = &my_cq->ib_cq; @@ -302,6 +303,16 @@ create_cq_exit1: return cq; } +static int get_cq_nr_events(struct ehca_cq *my_cq) +{ + int ret; + unsigned long flags; + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + ret = my_cq->nr_events; + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + return ret; +} + int ehca_destroy_cq(struct ib_cq *cq) { u64 h_ret; @@ -329,10 +340,11 @@ int ehca_destroy_cq(struct ib_cq *cq) } spin_lock_irqsave(&ehca_cq_idr_lock, flags); - while (my_cq->nr_callbacks) { + while (my_cq->nr_events) { spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - yield(); + wait_event(my_cq->wait_completion, !get_cq_nr_events(my_cq)); spin_lock_irqsave(&ehca_cq_idr_lock, flags); + /* recheck nr_events to assure no cqe has just arrived */ } idr_remove(&ehca_cq_idr, my_cq->token); diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 3ec53c6..7d8b795 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -404,10 +403,11 @@ static inline void process_eqe(struct eh u32 token; unsigned long flags; struct ehca_cq *cq; + eqe_value = eqe->entry; ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value); if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { - ehca_dbg(&shca->ib_device, "... completion event"); + ehca_dbg(&shca->ib_device, "Got completion event"); token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); spin_lock_irqsave(&ehca_cq_idr_lock, flags); cq = idr_find(&ehca_cq_idr, token); @@ -419,16 +419,20 @@ static inline void process_eqe(struct eh return; } reset_eq_pending(cq); - if (ehca_scaling_code) { + cq->nr_events++; + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + if (ehca_scaling_code) queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - } else { - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + else { comp_event_callback(cq); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); } } else { - ehca_dbg(&shca->ib_device, - "Got non completion event"); + ehca_dbg(&shca->ib_device, "Got non completion event"); parse_identifier(shca, eqe_value); } } @@ -478,6 +482,7 @@ void ehca_process_eq(struct ehca_shca *s "token=%x", token); continue; } + eqe_cache[eqe_cnt].cq->nr_events++; spin_unlock(&ehca_cq_idr_lock); } else eqe_cache[eqe_cnt].cq = NULL; @@ -504,12 +509,18 @@ void ehca_process_eq(struct ehca_shca *s /* call completion handler for cached eqes */ for (i = 0; i < eqe_cnt; i++) if (eq->eqe_cache[i].cq) { - if (ehca_scaling_code) { - spin_lock(&ehca_cq_idr_lock); + if (ehca_scaling_code) queue_comp_task(eq->eqe_cache[i].cq); - spin_unlock(&ehca_cq_idr_lock); - } else - comp_event_callback(eq->eqe_cache[i].cq); + else { + struct ehca_cq *cq = eq->eqe_cache[i].cq; + comp_event_callback(cq); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, + flags); + } } else { ehca_dbg(&shca->ib_device, "Got non completion event"); parse_identifier(shca, eq->eqe_cache[i].eqe->entry); @@ -523,7 +534,6 @@ void ehca_process_eq(struct ehca_shca *s if (!eqe) break; process_eqe(shca, eqe); - eqe_cnt++; } while (1); unlock_irq_spinlock: @@ -567,8 +577,7 @@ static void __queue_comp_task(struct ehc list_add_tail(&__cq->entry, &cct->cq_list); cct->cq_jobs++; wake_up(&cct->wait_queue); - } - else + } else __cq->nr_callbacks++; spin_unlock(&__cq->task_lock); @@ -577,18 +586,21 @@ static void __queue_comp_task(struct ehc static void queue_comp_task(struct ehca_cq *__cq) { - int cpu; int cpu_id; struct ehca_cpu_comp_task *cct; + int cq_jobs; + unsigned long flags; - cpu = get_cpu(); cpu_id = find_next_online_cpu(pool); BUG_ON(!cpu_online(cpu_id)); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); BUG_ON(!cct); - if (cct->cq_jobs > 0) { + spin_lock_irqsave(&cct->task_lock, flags); + cq_jobs = cct->cq_jobs; + spin_unlock_irqrestore(&cct->task_lock, flags); + if (cq_jobs > 0) { cpu_id = find_next_online_cpu(pool); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); BUG_ON(!cct); @@ -608,11 +620,17 @@ static void run_comp_task(struct ehca_cp cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); spin_unlock_irqrestore(&cct->task_lock, flags); comp_event_callback(cq); - spin_lock_irqsave(&cct->task_lock, flags); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + + spin_lock_irqsave(&cct->task_lock, flags); spin_lock(&cq->task_lock); cq->nr_callbacks--; - if (cq->nr_callbacks == 0) { + if (!cq->nr_callbacks) { list_del_init(cct->cq_list.next); cct->cq_jobs--; } diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index c183512..a5e564a 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -52,7 +52,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0021"); +MODULE_VERSION("SVNEHCA_0022"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -810,7 +809,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0021)\n"); + "(Rel.: SVNEHCA_0022)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); From rf at q-leap.de Wed Feb 28 09:14:37 2007 From: rf at q-leap.de (Roland Fehrenbacher) Date: Wed, 28 Feb 2007 18:14:37 +0100 Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2 In-Reply-To: <20070228164038.GA28118@cse.ohio-state.edu> References: <17893.22368.748298.755523@gargle.gargle.HOWL> <45E582DD.8010206@dev.mellanox.co.il> <17893.34633.644064.978253@gargle.gargle.HOWL> <45E58BB8.4020902@dev.mellanox.co.il> <17893.40412.365196.423575@gargle.gargle.HOWL> <45E5A35B.8000200@dev.mellanox.co.il> <20070228164038.GA28118@cse.ohio-state.edu> Message-ID: <17893.47229.704258.287392@gargle.gargle.HOWL> >>>>> "Sayantan" == Sayantan Sur writes: Roland> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED Roland> 1.1, and saw some unpleasant performance drops when using Roland> OFED 1.1 (kernel 2.6.20.1 with included IB drivers). The Roland> main drop is in throughput as measured by the OSU MPI Roland> bandwidth benchmark. However, the latency for large packet Roland> sizes is also worse (see results below). I tried with and Roland> without "options ib_mthca msi_x=1" (using IBGD, disabling Roland> msi_x makes a siginficant performance difference of Roland> approx. 10%). The IB card is a Mellanox MHGS18-XT Roland> (PCIe/DDR Firmware 1.2.0) running on an Opteron with Roland> nForce4 2200 Professional chipset. Roland> Roland> Does anybody have an explanation or even better a solution Roland> to this issue? Pavel> Please try to add follow mvapich parameter : Pavel> VIADEV_DEFAULT_MTU=MTU2048 Roland> Thanks for the suggestion. Unfortunately, it didn't Roland> improve the simple bandwidth results. Bi-directional Roland> bandwidth increased by 3% though. Any more ideas? Pavel> 3% is good start :-) Please also try to add this one: Pavel> VIADEV_MAX_RDMA_SIZE=4194304 Roland> This brought another 2% in bi-directional bandwidth, but Roland> still nothing in uni-directional bandwidth. Roland> mvapich version is 0.9.8 Pavel> 0.9.8 was not distributed (and tested) with OFED 1.1 :-( Pavel> Please try to use package distributed with OFED 1.1 Pavel> version. Sayantan> MVAPICH-0.9.8 was tested by the MVAPICH team on OFED Sayantan> 1.1. It is being used at several production clusters Sayantan> with OFED 1.1. Sayantan> I ran the bandwidth test on our Opteron nodes, AMD Sayantan> Processor 254 (2.8 GHz), with Mellanox dual-port DDR Sayantan> cards. I can see a peak bandwidth of 1402 Sayantan> MillionBytes/sec as reported by OSU Bandwidth test. On Sayantan> the same machines, I ran ib_rdma_bw (in the perftest Sayantan> module of OFED-1.1), which reports lower Gen2 level Sayantan> performance numbers. The peak bw reported by ib_rdma_bw Sayantan> is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402 Sayantan> MillionBytes/sec). So, the lower level numbers match up Sayantan> to what is reported by MPI. Sayantan> I'm wondering how your lower-level ib_rdma_bw numbers Sayantan> look like? I get: 3802: Bandwidth peak (#0 to #989): 1288.55 MB/sec 3802: Bandwidth average: 1288.54 MB/sec 3802: Service Demand peak (#0 to #989): 1818 cycles/KB 3802: Service Demand Avg : 1818 cycles/KB so, 1288.55MB/sec*1.048 = 1350 MillionBytes/sec, also matches up exactly with the MPI results (see results below) Sayantan> Are they matching up with what OSU BW test reports? If Sayantan> they are, then it is likely some other issue than MPI. Looks like it's not MPI then. What else could be wrong? Why is IBGD so much better in my case? Sayantan> We also have a MVAPICH-0.9.9 beta version out. You could Sayantan> give that a try too, if you want. We will be making the Sayantan> full release soon. Probably won't help in this case. Roland > ------------------------------------------------------------------------ > > IBGD > -------- > > # OSU MPI Bandwidth Test (Version 2.1) > # Size Bandwidth (MB/s) > 1 0.830306 > 2 1.642710 > 4 3.307494 > 8 6.546477 > 16 13.161954 > 32 26.395154 > 64 52.913060 > 128 101.890547 > 256 172.227478 > 512 383.296292 > 1024 611.172247 > 2048 830.147571 > 4096 1068.057366 > 8192 1221.262520 > 16384 1271.771983 > 32768 1369.702828 > 65536 1426.124683 > 131072 1453.781151 > 262144 1457.297992 > 524288 1464.625860 > 1048576 1468.953875 > 2097152 1470.614903 > 4194304 1471.607758 > > # OSU MPI Latency Test (Version 2.1) > # Size Latency (us) > 0 3.03 > 1 3.03 > 2 3.04 > 4 3.03 > 8 3.03 > 16 3.04 > 32 3.11 > 64 3.23 > 128 3.49 > 256 3.83 > 512 4.88 > 1024 6.31 > 2048 8.60 > 4096 11.02 > 8192 15.78 > 16384 28.85 > 32768 39.82 > 65536 60.30 > 131072 106.65 > 262144 196.47 > 524288 374.62 > 1048576 730.79 > 2097152 1442.32 > 4194304 2864.80 > > OFED 1.1 > --------- > > # OSU MPI Bandwidth Test (Version 2.2) > # Size Bandwidth (MB/s) > 1 0.698614 > 2 1.463192 > 4 2.941852 > 8 5.859464 > 16 11.697510 > 32 23.339031 > 64 46.403081 > 128 92.013928 > 256 182.918388 > 512 315.076923 > 1024 500.083937 > 2048 765.294564 > 4096 1003.652513 > 8192 1147.640312 > 16384 1115.803139 > 32768 1221.120298 > 65536 1282.328447 > 131072 1315.715608 > 262144 1331.456393 > 524288 1340.691793 > 1048576 1345.650404 > 2097152 1349.279211 > 4194304 1350.489883 > > # OSU MPI Latency Test (Version 2.2) > # Size Latency (us) > 0 2.99 > 1 3.03 > 2 3.06 > 4 3.03 > 8 3.03 > 16 3.04 > 32 3.12 > 64 3.27 > 128 3.96 > 256 4.29 > 512 4.99 > 1024 6.53 > 2048 9.08 > 4096 11.92 > 8192 17.39 > 16384 31.05 > 32768 43.47 > 65536 67.17 > 131072 115.30 > 262144 212.33 > 524288 405.20 > 1048576 790.45 > 2097152 1558.88 > 4194304 3095.17 > > > ------------------------------------------------------------------------ From dledford at redhat.com Wed Feb 28 09:56:58 2007 From: dledford at redhat.com (Doug Ledford) Date: Wed, 28 Feb 2007 12:56:58 -0500 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <45E58D3A.8060906@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> Message-ID: <1172685419.4777.145.camel@fc6.xsintricity.com> On Wed, 2007-02-28 at 16:10 +0200, Tziporet Koren wrote: > * Improved RPM usage by the install will not be part of OFED > 1.2 Since I first brought this up, you have added new libraries, iWARP support, etc. These constitute new RPMs. And, because you guys have been doing things contrary to standards like the file hierarchy standard in the original RPMs, it's been carried forward to these new RPMs. This is a snowball, and the longer you put off fixing it, the harder it gets to change. And not just in your RPMs either. The longer you put off coming up with a reasonable standard for MPI library and executable file locations, the longer customers will hand roll their own site specific setups, and the harder it will be to get them to switch over to the standard once you *do* implement it. You may end up dooming Jeff to maintaining those custom file location hacks in the OpenMPI spec forever. Not to mention that interoperability is about more than one machine talking to another machine. It's also about a customer's application building properly on different versions of the stack, without the customer needing to change all the include file locations and link parameters. It's also about a customer being able to rest assured that if they tried to install two conflicting copies of libibverbs, it would in fact cause RPM to throw conflict errors (which it doesn't now because your libibverbs is in /usr/local, where I'm not allowed to put ours, so since the files are in different locations, rpm will happily let the user install both your libibverbs and my libibverbs without a conflict, and a customer could waste large amounts of time trying to track down a bug in one library only to find out their application is linking against the other). > * The RPM usage will be enhanced for the next (1.3) > release and we will decide on the correct way in > Sonoma. There's not really much to decide. Either the stack is Linux File Hierarchy Standard compliant or it isn't. The only leeway for decisions allowed by the standard is on things like where in /etc to put the config files (since you guys are striving to be a generic RDMA stack, not just an IB stack, I would suggest that all RDMA related config files go into /etc/rdma, and for those applications that can reasonably be run absent RDMA technology, like OpenMPI, I would separate their config files off into either /etc or /etc/openmpi, ditto for the include directories, /usr/include/rdma for the generic non-IB specific stuff, and possibly /usr/include/rdma/infiniband for IB specific stuff, or you could put the IB stuff under /usr/include/infiniband, either way). The biggest variation from the spec that needs to be dealt with is the need for multiple MPI installations, which is problematic if you just use generic locations as it stands today, but with a few modifications to the MPI stack it could be worked around. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at mellanox.co.il Wed Feb 28 10:39:54 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 20:39:54 +0200 Subject: [ofa-general] Re: IPOIB NAPI In-Reply-To: References: <20070228071706.GA22246@mellanox.co.il> Message-ID: <20070228183922.GB10826@mellanox.co.il> > Quoting Shirley Ma : > Subject: Re: IPOIB NAPI > > Michael, > > >I have not benchmarked this, but actually the "return 1" version makes sense > to > >me too: since a new completion was observed after notify-cq, we likely > currently > >have HCA writing new completions into the CQ at a high rate, so it makes sense > >to delay polling by a few cycles, and reduce the number of interrupts in this > >way. > > >Right? > > Agree. Another question, have you benchmark IPoIB NAPI vs. missed event only > mode: just change ipoib completion from notify-cq, poll-cq to poll-cq, > notify-cq if any missed event, poll again? I am going to try this to see the > performance difference. At some point, I think I compared req notif + poll against poll + req notif + poll if missed (both without NAPI), and did not see any speed difference. NAPI was also benchmarked and it was a win as compared to non-NAPI, especially with multiple sockets tests. -- MST From or.gerlitz at gmail.com Wed Feb 28 10:43:51 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 28 Feb 2007 20:43:51 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey In-Reply-To: <000201c759e3$24828410$55d8180a@amr.corp.intel.com> References: <1172507101.4102.277140.camel@hal.voltaire.com> <000201c759e3$24828410$55d8180a@amr.corp.intel.com> Message-ID: <15ddcffd0702281043h52ca49e7t110bc75e3ad2a832@mail.gmail.com> On 2/26/07, Sean Hefty wrote: > I think the following patch would make ipoib spec compliant. > ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib. > I'm not certain what this change would do to SRP, but the ib_cm and > rdma_cm look okay, given that non-reversible paths aren't supported > yet anyway. Sean, As Moni stated, we need this functionality and among other scenarions, the use case I have mentioned over this discussion was of an I/O target being a full member of a partition where the initiators connected to it being partial members - since they need not and should not talk among themselves. The connection may be implemented over TCP/UDP on top of IPoIB (eg iscsi / nfs / some cluster file system) or over the RDMA CM and the VERBS (iSER / rNFS / native implementation of cluster file systems) or over the IB CM and the VERBS (srp). For all the above cases expect for SRP IPoIB is used as the ARP provider and it means that the nodes with the partial membership must join the "IPv4 broadcast" IB multicast group. This is working fine with the openib IPoIB and core implementation running against the Voltaire SA/SM and as Hal commented (Hal - can you verify it? see (*) below ) also against the open SM/SA. My guess this is also working fine with TopSpin/Cisco SM/SA. (*) simply configure the SM to allocate 0xffff (index 0) and 0x8001 (index 1) to node A, then 0x7fff (index 0) and 0x0001 (index 1) to node B. Now, configure ib0 of both nodes to subnet X, create an 0x8001 ib0 child on both and configure ib0.8001 to subnet Y, make sure you have pings on both subnets - thanks! My suggestion is that we act to have the spec changed to match this real need and not that this code (my guess which is present there from day one, I guess Roland can tell) would be removed to match the spec. Or. From sashak at voltaire.com Wed Feb 28 12:04:26 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 28 Feb 2007 22:04:26 +0200 Subject: [ofa-general] Re: [PATCH] osm: Trivial changes for compilation on windows In-Reply-To: <45E54653.6010300@dev.mellanox.co.il> References: <45E54653.6010300@dev.mellanox.co.il> Message-ID: <20070228200426.GC30973@sashak.voltaire.com> On 11:07 Wed 28 Feb , Yevgeny Kliteynik wrote: > Hi Hal. > > This patch has trivial data types changes and redefining a macro. > > > BTW, Sasha, do we still need this macro (NOISE_L in osm_ucast_updn.c)? For me it is perfectly fine to remove this completely. Sasha From halr at voltaire.com Wed Feb 28 12:36:24 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Feb 2007 15:36:24 -0500 Subject: [ofa-general] Re: [PATCH] opensm: remove some unneeded osm_switch functions In-Reply-To: <20070225234705.GH11957@sashak.voltaire.com> References: <20070225214845.GF11957@sashak.voltaire.com> <20070225234705.GH11957@sashak.voltaire.com> Message-ID: <1172694979.31770.89937.camel@hal.voltaire.com> On Sun, 2007-02-25 at 18:47, Sasha Khapyorsky wrote: > Following introduced simplification this patch removes single field > access functions from osm_switch. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From mst at mellanox.co.il Wed Feb 28 13:02:35 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Feb 2007 23:02:35 +0200 Subject: [ofa-general] [PATCH] IB/mthca: recv poll cq optimization Message-ID: <20070228210235.GC8564@mellanox.co.il> All good recv work requests generate HW completions in FIFO order, so we can use rq->tail rather than hardware data. In this way, we save a branch on data path for recv completions (branch is still there for send completions). Signed-off-by: Michael S. Tsirkin --- Roland, what do you think? This increases the overall code size but I think the extra code is on the error CQE handling path. BTW, since most kernel QPs seem not to use selective signaling, it might be worth it to optimize send completions in a similiar way in case selective singaling is disabled on QP. diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index efd79ef..78f8069 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -542,38 +542,37 @@ static inline int mthca_poll_one(struct mthca_dev *dev, >> wq->wqe_shift); entry->wr_id = (*cur_qp)->wrid[wqe_index + (*cur_qp)->rq.max]; + if (wq->last_comp < wqe_index) + wq->tail += wqe_index - wq->last_comp; + else + wq->tail += wqe_index + wq->max - wq->last_comp; + + wq->last_comp = wqe_index; } else if ((*cur_qp)->ibqp.srq) { struct mthca_srq *srq = to_msrq((*cur_qp)->ibqp.srq); u32 wqe = be32_to_cpu(cqe->wqe); - wq = NULL; wqe_index = wqe >> srq->wqe_shift; entry->wr_id = srq->wrid[wqe_index]; mthca_free_srq_wqe(srq, wqe); } else { - s32 wqe; wq = &(*cur_qp)->rq; - wqe = be32_to_cpu(cqe->wqe); - wqe_index = wqe >> wq->wqe_shift; - /* - * WQE addr == base - 1 might be reported in receive completion - * with error instead of (rq size - 1) by Sinai FW 1.0.800 and - * Arbel FW 5.1.400. This bug should be fixed in later FW revs. - */ - if (unlikely(wqe_index < 0)) - wqe_index = wq->max - 1; - entry->wr_id = (*cur_qp)->wrid[wqe_index]; + wq->last_comp = wq->tail++ & (wq->max - 1); + entry->wr_id = (*cur_qp)->wrid[wq->last_comp]; } - if (wq) { - if (wq->last_comp < wqe_index) - wq->tail += wqe_index - wq->last_comp; - else - wq->tail += wqe_index + wq->max - wq->last_comp; - - wq->last_comp = wqe_index; - } + if (unlikely(is_error)) { + if (!is_send && !(*cur_qp)->ibqp.srq) { + s32 wqe = be32_to_cpu(cqe->wqe); + wqe_index = wqe >> wq->wqe_shift; + /* + * WQE addr == base - 1 might be reported in receive completion + * with error instead of (rq size - 1) by Sinai FW 1.0.800 and + * Arbel FW 5.1.400. This bug should be fixed in later FW revs. + */ + if (unlikely(wqe_index < 0)) + wqe_index = wq->max - 1; + } - if (is_error) { handle_error_cqe(dev, cq, *cur_qp, wqe_index, is_send, (struct mthca_err_cqe *) cqe, entry, &free_cqe); -- MST From sashak at voltaire.com Wed Feb 28 13:21:03 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 28 Feb 2007 23:21:03 +0200 Subject: [ofa-general] [PATCH] opensm: switch pre-routing preparation status check Message-ID: <20070228212103.GE30973@sashak.voltaire.com> osm_switch_prepare_path_rebuild() will return status value now, it is needed in order to track switch pre-routing preparation properly. Also tiny p_sw->hops rework for potentially lockless p_sw->hops accessing. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_switch.h | 4 +- osm/opensm/osm_switch.c | 32 +++++++++++++++++------------ osm/opensm/osm_ucast_mgr.c | 42 ++++++++++++++++++++++++-------------- 3 files changed, 47 insertions(+), 31 deletions(-) diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index c3ef865..4270904 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -1153,7 +1153,7 @@ osm_switch_path_count_get( * * SYNOPSIS */ -void +int osm_switch_prepare_path_rebuild( IN osm_switch_t* p_sw, IN uint16_t max_lids ); @@ -1166,7 +1166,7 @@ osm_switch_prepare_path_rebuild( * [in] Max number of lids in the subnet. * * RETURN VALUE -* None. +* Returns zero on success, or negative value if an error occurred. * * NOTES * diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c index f258dbc..3a98a63 100644 --- a/osm/opensm/osm_switch.c +++ b/osm/opensm/osm_switch.c @@ -489,37 +489,43 @@ osm_switch_clear_hops( /********************************************************************** **********************************************************************/ -void +int osm_switch_prepare_path_rebuild( IN osm_switch_t* p_sw, IN uint16_t max_lids ) { + uint8_t **hops; unsigned i; for ( i = 0; i < p_sw->num_ports; i++ ) osm_port_prof_construct( &p_sw->p_prof[i] ); if (!p_sw->hops) { - p_sw->hops = malloc((max_lids + 1)*sizeof(p_sw->hops[0])); - if (!p_sw->hops) - return; - memset(p_sw->hops, 0, (max_lids + 1)*sizeof(p_sw->hops[0])); + hops = malloc((max_lids + 1)*sizeof(hops[0])); + if (!hops) + return -1; + memset(hops, 0, (max_lids + 1)*sizeof(hops[0])); + p_sw->hops = hops; p_sw->num_hops = max_lids + 1; } else if (max_lids + 1 > p_sw->num_hops) { - uint8_t **old_hops = p_sw->hops; - - p_sw->hops = malloc((max_lids + 1)*sizeof(p_sw->hops[0])); - if (!p_sw->hops) - return; - memcpy(p_sw->hops, old_hops, p_sw->num_hops*sizeof(p_sw->hops[0])); - memset(p_sw->hops + p_sw->num_hops, 0, - (max_lids + 1 - p_sw->num_hops)*sizeof(p_sw->hops[0])); + uint8_t **old_hops; + + hops = malloc((max_lids + 1)*sizeof(hops[0])); + if (!hops) + return -1; + memcpy(hops, p_sw->hops, p_sw->num_hops*sizeof(hops[0])); + memset(hops + p_sw->num_hops, 0, + (max_lids + 1 - p_sw->num_hops)*sizeof(hops[0])); + old_hops = p_sw->hops; + p_sw->hops = hops; p_sw->num_hops = max_lids + 1; free(old_hops); } p_sw->max_lid_ho = max_lids; + + return 0; } /********************************************************************** diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index f02bae9..5b4ce45 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -404,20 +404,6 @@ static void __osm_ucast_mgr_dump_tables(osm_ucast_mgr_t *p_mgr) } /********************************************************************** - Starting a rebuild, so notify the switch so it can clear tables, etc... -**********************************************************************/ -static void -__osm_ucast_mgr_setup_switch( - IN cl_map_item_t* const p_map_item, - IN void* cxt ) -{ - uint16_t lids = (uint16_t)cl_ptr_vector_get_size(&((osm_subn_t *)cxt)->port_lid_tbl); - - osm_switch_prepare_path_rebuild((osm_switch_t *)p_map_item, - lids ? lids - 1 : 0); -} - -/********************************************************************** Add each switch's own LID(s) to its LID matrix. **********************************************************************/ static void @@ -1195,6 +1181,30 @@ osm_ucast_mgr_build_lid_matrices( /********************************************************************** **********************************************************************/ +static int +ucast_mgr_setup_all_switches(osm_subn_t *p_subn) +{ + osm_switch_t *p_sw; + uint16_t lids; + + lids = (uint16_t)cl_ptr_vector_get_size(&p_subn->port_lid_tbl); + lids = lids ? lids - 1 : 0; + + for (p_sw = (osm_switch_t*)cl_qmap_head(&p_subn->sw_guid_tbl); + p_sw != (osm_switch_t*)cl_qmap_end(&p_subn->sw_guid_tbl); + p_sw = (osm_switch_t*)cl_qmap_next(&p_sw->map_item)) + if (osm_switch_prepare_path_rebuild(p_sw, lids)) { + osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, + "ucast_mgr_setup_all_switches: cannot setup switch 0x%016" PRIx64 + "\n", cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); + return -1; + } + + return 0; +} + +/********************************************************************** + **********************************************************************/ osm_signal_t osm_ucast_mgr_process( IN osm_ucast_mgr_t* const p_mgr ) @@ -1214,11 +1224,11 @@ osm_ucast_mgr_process( /* If there are no switches in the subnet, we are done. */ - if (cl_qmap_count( p_sw_guid_tbl ) == 0) + if (cl_qmap_count( p_sw_guid_tbl ) == 0 || + ucast_mgr_setup_all_switches(p_mgr->p_subn) < 0) goto Exit; p_mgr->any_change = FALSE; - cl_qmap_apply_func(p_sw_guid_tbl, __osm_ucast_mgr_setup_switch, p_mgr->p_subn); if (!p_routing_eng->build_lid_matrices || p_routing_eng->build_lid_matrices(p_routing_eng->context) != 0) -- 1.5.0.1.40.gb40d From sashak at voltaire.com Wed Feb 28 13:52:50 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 28 Feb 2007 23:52:50 +0200 Subject: [ofa-general] [PATCH TRIVIAL] opensm: remove NOISE_L macros from osm_ucast_updn.c In-Reply-To: <20070228200426.GC30973@sashak.voltaire.com> References: <45E54653.6010300@dev.mellanox.co.il> <20070228200426.GC30973@sashak.voltaire.com> Message-ID: <20070228215250.GG30973@sashak.voltaire.com> This removes NOISE_L macros completely. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_updn.c | 38 -------------------------------------- 1 files changed, 0 insertions(+), 38 deletions(-) diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index b8dd61c..72d943b 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -97,12 +97,6 @@ struct updn_node { unsigned visited; }; -#ifndef WIN32 -#define NOISE_L(log, fmt, arg...) -#else -#define NOISE_L -#endif - /* ///////////////////////////////// */ /* Statics */ /* ///////////////////////////////// */ @@ -258,19 +252,10 @@ __updn_bfs_by_node( { ib_net64_t remote_guid, current_guid; - NOISE_L( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "Starting a new iteration with %zu elements in current list\n", - cl_qlist_count(&list) ); - u = (struct updn_node *)cl_qlist_remove_head(&list); u->visited = 0; /* cleanup */ current_dir = u->dir; current_guid = osm_node_get_node_guid(u->sw->p_node); - NOISE_L( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "Visiting port GUID 0x%" PRIx64 "\n", - cl_ntoh64(current_guid) ); /* Go over all ports of the switch and find unvisited remote nodes */ for ( pn = 1; pn < u->sw->num_ports; pn++ ) { @@ -293,12 +278,6 @@ __updn_bfs_by_node( current_guid, remote_guid, u->is_root, rem_u->is_root); - NOISE_L( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "move from 0x%016" PRIx64 " rank: %u " - "to 0x%016" PRIx64" rank: %u\n", - cl_ntoh64(current_guid), u->rank, - cl_ntoh64(remote_guid), rem_u->rank ); /* Check if this is a legal step : the only illegal step is going from DOWN to UP */ if ((current_dir == DOWN) && (next_dir == UP)) @@ -317,15 +296,6 @@ __updn_bfs_by_node( remote_min_hop = osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem); if (current_min_hop + 1 < remote_min_hop) { - NOISE_L( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node (less): " - "Setting Min Hop Table of switch: 0x%" PRIx64 - "\n\t\tCurrent hop count is: %d, next hop count: %d" - "\n\tlid to set: 0x%x" - "\n\tport number: 0x%X" - "\n\thops number: %d\n", - cl_ntoh64(remote_guid), remote_min_hop,current_min_hop + 1, - root_lid, pn_rem, current_min_hop + 1 ); set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1); if (set_hop_return_value) { @@ -340,11 +310,6 @@ __updn_bfs_by_node( /* Insert updn_switch item into the list */ rem_u->dir = next_dir; rem_u->visited = 1; - NOISE_L( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "Inserting new element to the next list: guid=0x%" PRIx64 " %s\n", - cl_ntoh64(rem_u->sw->p_node->node_info.port_guid), - (rem_u->dir == UP ? "UP" : "DOWN")); cl_qlist_insert_tail(&list, &rem_u->list); } } @@ -578,9 +543,6 @@ updn_subn_rank( { remote_u = p_remote_physp->p_node->sw->priv; port_guid = p_remote_physp->port_guid; - NOISE_L( p_log, OSM_LOG_DEBUG, - "updn_subn_rank: " - "Ranking port GUID 0x%" PRIx64 "\n", cl_ntoh64(port_guid) ); did_cause_update = __updn_update_rank(remote_u, rank); osm_log( p_log, OSM_LOG_DEBUG, -- 1.5.0.1.40.gb40d From halr at voltaire.com Wed Feb 28 14:01:31 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Feb 2007 17:01:31 -0500 Subject: [ofa-general] Re: [PATCH TRIVIAL] opensm: remove NOISE_L macros from osm_ucast_updn.c In-Reply-To: <20070228215250.GG30973@sashak.voltaire.com> References: <45E54653.6010300@dev.mellanox.co.il> <20070228200426.GC30973@sashak.voltaire.com> <20070228215250.GG30973@sashak.voltaire.com> Message-ID: <1172700046.31770.94902.camel@hal.voltaire.com> On Wed, 2007-02-28 at 16:52, Sasha Khapyorsky wrote: > This removes NOISE_L macros completely. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From arlin.r.davis at intel.com Wed Feb 28 14:32:08 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 28 Feb 2007 14:32:08 -0800 Subject: [ofa-general] RE: [Bug 396] OFED 1.2 alpha DAPL failures using Intel MPI 3.0.33, kernel patching issues In-Reply-To: Message-ID: <000001c75b88$48996200$9f97070a@amr.corp.intel.com> > >I'm seeing some problems using Intel MPI 3.0.33 with OFED 1.2 alpha, and >Intel confirmed they are seeing it, too. > >[3:192.168.1.202] unexpected DAPL event 4008 from 0:192.168.1.201^M This is the result of incorrect timeout values being used as a result of sean_cm_limit_mra_timeout_patch. Can someone tell me the purpose of this patch and how it became part of the OFED 1.2 build? arlin From sean.hefty at intel.com Wed Feb 28 14:37:44 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 28 Feb 2007 14:37:44 -0800 Subject: [ofa-general] RE: [Bug 396] OFED 1.2 alpha DAPL failures using IntelMPI 3.0.33, kernel patching issues In-Reply-To: <000001c75b88$48996200$9f97070a@amr.corp.intel.com> Message-ID: <000001c75b89$108c0f10$ff0da8c0@amr.corp.intel.com> >This is the result of incorrect timeout values being used as a result of >sean_cm_limit_mra_timeout_patch. Can someone tell me the purpose of this patch >and how it became >part of the OFED 1.2 build? This patch sets the timeout values incorrectly and needs to be removed from OFED. The purpose was to work-around a storage target firmware bug, which I believe now has a fix. - Sean From xma at us.ibm.com Wed Feb 28 14:54:53 2007 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 28 Feb 2007 14:54:53 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Hello, Roland, Can we push this patch into OFED-1.2 as well? Thanks Shirley -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Wed Feb 28 15:01:12 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 28 Feb 2007 15:01:12 -0800 Subject: [ofa-general] preparing releases for librdmacm and libibcm Message-ID: <000101c75b8c$57778c30$ff0da8c0@amr.corp.intel.com> I've updated the librdmacm and libibcm libraries in preparation of creating 1.0 releases. The build environments were updated on both, and additional API documentation was added to the librdmacm. Tar files for both have been placed at: http://www.openfabrics.org/~shefty Because of dependencies on libibverbs features, final 1.0 releases will wait until libibverbs 1.1 is available. - Sean From mst at mellanox.co.il Wed Feb 28 15:04:42 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 01:04:42 +0200 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: References: Message-ID: <20070228230442.GB25842@mellanox.co.il> >Quoting Shirley Ma : >Subject: Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish > >Hello, Roland, > >Can we push this patch into OFED-1.2 as well? > >Thanks >Shirley I'd like to see Roland's opinion first. -- MST From arlin.r.davis at intel.com Wed Feb 28 16:55:17 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 28 Feb 2007 16:55:17 -0800 Subject: [ofa-general] [PATCH] udapl dtest, add delay option before accept Message-ID: <000101c75b9c$47736880$9f97070a@amr.corp.intel.com> Added optional delay feature to dtest before accept which is used to validate rdma_cm timeout settings. Signed-off by: Arlin Davis ardavis at ichips.intel.com diff --git a/test/dtest/dtest.c b/test/dtest/dtest.c index fe8a366..86b70cc 100644 --- a/test/dtest/dtest.c +++ b/test/dtest/dtest.c @@ -405,9 +405,6 @@ cleanup: else LOGPRINTF("%d unregister_rdma_memory done\n", getpid()); - - if (delay) sleep(delay); - /* Free protection domain */ LOGPRINTF("%d Freeing pz\n",getpid()); start = get_time(); @@ -699,6 +696,8 @@ connect_ep( char *hostname, int conn_id ) return( DAT_ABORT ); } + if (delay) sleep(delay); /* use to test rdma_cma timeout logic */ + /* accept connect request from client */ h_cr = event.event_data.cr_arrival_event_data.cr_handle; LOGPRINTF("%d Accepting connect request from client\n",getpid()); @@ -1799,7 +1798,7 @@ void print_usage() printf("c: use cno\n"); printf("v: verbose\n"); printf("p: polling\n"); - printf("d: delay before close\n"); + printf("d: delay before accept\n"); printf("b: buf length to allocate\n"); printf("B: burst count, rdma and msgs \n"); printf("h: hostname\n"); From mshefty at ichips.intel.com Wed Feb 28 17:06:41 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 28 Feb 2007 17:06:41 -0800 Subject: [ofa-general] Re: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path() In-Reply-To: <45E19730.7010008@dev.mellanox.co.il> References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com> <1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il> <45E19730.7010008@dev.mellanox.co.il> Message-ID: <45E62721.1030605@ichips.intel.com> > int ib_init_ah_from_path(struct ib_device *device, u8 port_num, > struct ib_sa_path_rec *rec, struct ib_ah_attr > *ah_attr) > { > int ret; > u16 gid_index; > > memset(ah_attr, 0, sizeof *ah_attr); > ah_attr->dlid = be16_to_cpu(rec->dlid); > ah_attr->sl = rec->sl; > ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f; I looked at this more, and to be technically correct here, what we can do is: in update_sm_ah: use port_attr.LMC to record a src_path_mask with ib_sa_port in ib_init_ah_from_path: use the src_path_mask from ib_sa_port to set src_path_bits However, I'm not completely convinced that masking off the upper bits of the SLID is necessary when setting the src_path_bits, which means that the mask used above could be removed. - Sean From rowland at cse.ohio-state.edu Wed Feb 28 18:09:43 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Wed, 28 Feb 2007 21:09:43 -0500 Subject: [ofa-general] ofed_1_2_scripts update patchs Message-ID: <45E635E7.5030401@cse.ohio-state.edu> I've uploaded a new MVAPICH2 SRPM: mvapich2-0.9.8-5.src.rpm. I will have to upload a new version again before the beta release, but I wanted to get these patches out and a new SRPM uploaded ASAP. I've attached the following patches done against the latest update for the ofed_1_2_scripts GIT repository: mvapich2.patch -------------- - fix for bug 386 - adds mpi-selector support to MVAPICH2 - changes one default setting for MVAPICH2 in the case the user has not specified build options mpi-selector.patch ------------------ - fixes an ordering problem in install.sh around line 130 (see below) The mpi-selector patch fixes an ordering issue with the mpi-selector package removal in install.sh around line 130. Since mpi-selector is a requirement for the MPI packages, it should be removed after the MPI packages instead of before. This patch changes the order to match what is done in the uninstall.sh script. I mentioned this to Jeff, so if he's doing maintenance on this part of the script - perhaps it would be better for him to review this patch... or whatever needs to happen. It is just something I noticed. I thought I'd pass along this patch too since I was trying the latest scripts with the OFED-1.2-20070226-1758 files and it solved the uninstall problem I was having. In the course of looking into this, I also noticed that the MVAPICH SRPM is not actually adding mpi-selector as a requirement like the MVAPICH2 and OMPI RPMs are doing. The %use_mpi_selector macro is there, but it is not defined as 1 until after it is used in the %if condition where Requires: is set. The mpi-selector program is used, but it's not set in Requires because of this ordering. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mvapich2.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mpi-selector.patch URL: From jsquyres at cisco.com Wed Feb 28 20:05:35 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 28 Feb 2007 23:05:35 -0500 Subject: [ofa-general] Re: ofed_1_2_scripts update patchs In-Reply-To: <45E635E7.5030401@cse.ohio-state.edu> References: <45E635E7.5030401@cse.ohio-state.edu> Message-ID: On Feb 28, 2007, at 9:09 PM, Shaun Rowland wrote: > The mpi-selector patch fixes an ordering issue with the mpi-selector > package removal in install.sh around line 130. Since mpi-selector is a > requirement for the MPI packages, it should be removed after the MPI > packages instead of before. This patch changes the order to match what > is done in the uninstall.sh script. I mentioned this to Jeff, so if > he's > doing maintenance on this part of the script - perhaps it would be > better for him to review this patch... or whatever needs to happen. It Looks perfect to me. Thanks! -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From or.gerlitz at gmail.com Wed Feb 28 21:42:57 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 1 Mar 2007 07:42:57 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey In-Reply-To: <15ddcffd0702281043h52ca49e7t110bc75e3ad2a832@mail.gmail.com> References: <1172507101.4102.277140.camel@hal.voltaire.com> <000201c759e3$24828410$55d8180a@amr.corp.intel.com> <15ddcffd0702281043h52ca49e7t110bc75e3ad2a832@mail.gmail.com> Message-ID: <15ddcffd0702282142i3213a922s106246dd18a42930@mail.gmail.com> resent - with a CC to general at lists.openfabrics.org so the message will get to the list... On 2/28/07, Or Gerlitz wrote: > On 2/26/07, Sean Hefty wrote: > > I think the following patch would make ipoib spec compliant. > > ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib. > > I'm not certain what this change would do to SRP, but the ib_cm and > > rdma_cm look okay, given that non-reversible paths aren't supported > > yet anyway. > > Sean, > > As Moni stated, we need this functionality and among other scenarions, > the use case I have mentioned over this discussion was of an I/O > target being a full member of a partition where the initiators > connected to it being partial members - since they need not and should > not talk among themselves. > > The connection may be implemented over TCP/UDP on top of IPoIB (eg > iscsi / nfs / some cluster file system) or over the RDMA CM and the > VERBS (iSER / rNFS / native implementation of cluster file systems) or > over the IB CM and the VERBS (srp). > > For all the above cases expect for SRP IPoIB is used as the ARP > provider and it means that the nodes with the partial membership must > join the "IPv4 broadcast" IB multicast group. This is working fine > with the openib IPoIB and core implementation running against the > Voltaire SA/SM and as Hal commented (Hal - can you verify it? see (*) > below ) also against the open SM/SA. My guess this is also working > fine with TopSpin/Cisco SM/SA. > > (*) simply configure the SM to allocate 0xffff (index 0) and 0x8001 > (index 1) to node A, then 0x7fff (index 0) and 0x0001 (index 1) to > node B. Now, configure ib0 of both nodes to subnet X, create an 0x8001 > ib0 child on both and configure ib0.8001 to subnet Y, make sure you > have pings on both subnets - thanks! > > My suggestion is that we act to have the spec changed to match this > real need and not that this code (my guess which is present there from > day one, I guess Roland can tell) would be removed to match the spec. > > Or. > From dotanb at dev.mellanox.co.il Wed Feb 28 22:57:39 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 01 Mar 2007 08:57:39 +0200 Subject: [ofa-general] Re: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path() In-Reply-To: <45E62721.1030605@ichips.intel.com> References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com> <1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il> <45E19730.7010008@dev.mellanox.co.il> <45E62721.1030605@ichips.intel.com> Message-ID: <45E67963.1070801@dev.mellanox.co.il> Sean Hefty wrote: >> int ib_init_ah_from_path(struct ib_device *device, u8 port_num, >> struct ib_sa_path_rec *rec, struct >> ib_ah_attr *ah_attr) >> { >> int ret; >> u16 gid_index; >> >> memset(ah_attr, 0, sizeof *ah_attr); >> ah_attr->dlid = be16_to_cpu(rec->dlid); >> ah_attr->sl = rec->sl; >> ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f; > > I looked at this more, and to be technically correct here, what we can > do is: > > in update_sm_ah: > use port_attr.LMC to record a src_path_mask with ib_sa_port > > in ib_init_ah_from_path: > use the src_path_mask from ib_sa_port to set src_path_bits > > However, I'm not completely convinced that masking off the upper bits > of the SLID is necessary when setting the src_path_bits, which means > that the mask used above could be removed. I think that this behavior is much better that current behavior .. thanks Dotan