From zhushisongzhu at yahoo.com Fri Sep 1 03:26:39 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Fri, 1 Sep 2006 03:26:39 -0700 (PDT) Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060830045927.GB25478@mellanox.co.il> Message-ID: <20060901102639.55709.qmail@web36915.mail.mud.yahoo.com> OFED-1.1-rc3 has passed my tests. I have to adjust Post buffer size to 0x4 and use your patch for me. Can you make it fixed not to do these myself manually? zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From bunk at stusta.de Fri Sep 1 09:00:23 2006 From: bunk at stusta.de (Adrian Bunk) Date: Fri, 1 Sep 2006 18:00:23 +0200 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901015818.42767813.akpm@osdl.org> References: <20060901015818.42767813.akpm@osdl.org> Message-ID: <20060901160023.GB18276@stusta.de> On Fri, Sep 01, 2006 at 01:58:18AM -0700, Andrew Morton wrote: >... > Changes since 2.6.18-rc4-mm3: >... > +amso1100-build-fix.patch > > Fix git-infiniband.patch >... This causes the following compile error on i386: <-- snip --> ... CC drivers/infiniband/hw/amso1100/c2.o /home/bunk/linux/kernel-2.6/linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.c: In function ‘c2_tx_ring_alloc’: /home/bunk/linux/kernel-2.6/linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.c:133: error: implicit declaration of function ‘__raw_writeq’ make[4]: *** [drivers/infiniband/hw/amso1100/c2.o] Error 1 <-- snip --> There seems to be some confusion regarding whether __raw_writeq() is considered a platform independent API. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From robert.j.woodruff at intel.com Fri Sep 1 09:12:42 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 1 Sep 2006 09:12:42 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready Message-ID: Tziporet Wrote, >Hi, >OFED 1.1-RC3 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ >File: OFED-1.1-rc3.tgz >Please report any issues in bugzilla http://openib.org/bugzilla/ Hi all, I installed the RC3 package on my Xeon/Lindenhurst platforms and with the pathscale card I have the following problem when trying to run Intel MPI and NetPipe. The is on a RedHat EL4-U3 (2.6.9-34EL kernel). Has anyone else been able to run DAPL/RDMA programs like Intel MPI over the Pathscale cards with OFED-1.1-RC3 ? # List of Benchmarks to run: # PingPong # PingPing # Sendrecv # Exchange # Allreduce # Reduce # Reduce_scatter # Allgather # Allgatherv # Alltoall # Bcast # Barrier [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with error. status=0x8. cookie=0x514ed0 rank 1 in job 2 rkl-13-ib0_32788 caused collective abort of all ranks exit status of rank 1: return code 255 I also tried a version of NetPipe that was modified to use DAPL and it works for messages < 2048 and then hangs Now starting the main loop 0: 1 bytes 1000 times --> 1.53 Mbps in 4.98 usec 1: 2 bytes 1000 times --> 3.11 Mbps in 4.91 usec 2: 3 bytes 1000 times --> 4.72 Mbps in 4.85 usec 3: 4 bytes 1000 times --> 6.20 Mbps in 4.92 usec 4: 6 bytes 1000 times --> 9.23 Mbps in 4.96 usec 5: 8 bytes 1000 times --> 12.41 Mbps in 4.92 usec 6: 12 bytes 1000 times --> 18.61 Mbps in 4.92 usec 7: 13 bytes 1000 times --> 19.98 Mbps in 4.96 usec 8: 16 bytes 1000 times --> 24.48 Mbps in 4.99 usec 9: 19 bytes 1000 times --> 29.04 Mbps in 4.99 usec 10: 21 bytes 1000 times --> 32.31 Mbps in 4.96 usec 11: 24 bytes 1000 times --> 36.72 Mbps in 4.99 usec 12: 27 bytes 1000 times --> 40.67 Mbps in 5.06 usec 13: 29 bytes 1000 times --> 44.24 Mbps in 5.00 usec 14: 32 bytes 1000 times --> 49.15 Mbps in 4.97 usec 15: 35 bytes 1000 times --> 53.18 Mbps in 5.02 usec 16: 45 bytes 1000 times --> 67.60 Mbps in 5.08 usec 17: 48 bytes 1000 times --> 72.32 Mbps in 5.06 usec 18: 51 bytes 1000 times --> 76.65 Mbps in 5.08 usec 19: 61 bytes 1000 times --> 90.20 Mbps in 5.16 usec 20: 64 bytes 1000 times --> 94.50 Mbps in 5.17 usec 21: 67 bytes 1000 times --> 96.89 Mbps in 5.28 usec 22: 93 bytes 1000 times --> 134.20 Mbps in 5.29 usec 23: 96 bytes 1000 times --> 137.11 Mbps in 5.34 usec 24: 99 bytes 1000 times --> 139.72 Mbps in 5.41 usec 25: 125 bytes 1000 times --> 175.08 Mbps in 5.45 usec 26: 128 bytes 1000 times --> 184.12 Mbps in 5.30 usec 27: 131 bytes 1000 times --> 184.72 Mbps in 5.41 usec 28: 189 bytes 1000 times --> 258.25 Mbps in 5.58 usec 29: 192 bytes 1000 times --> 269.32 Mbps in 5.44 usec 30: 195 bytes 1000 times --> 270.92 Mbps in 5.49 usec 31: 253 bytes 1000 times --> 339.98 Mbps in 5.68 usec 32: 256 bytes 1000 times --> 347.01 Mbps in 5.63 usec 33: 259 bytes 1000 times --> 349.64 Mbps in 5.65 usec 34: 381 bytes 1000 times --> 491.64 Mbps in 5.91 usec 35: 384 bytes 1000 times --> 495.59 Mbps in 5.91 usec 36: 387 bytes 1000 times --> 493.49 Mbps in 5.98 usec 37: 509 bytes 1000 times --> 621.98 Mbps in 6.24 usec 38: 512 bytes 1000 times --> 639.37 Mbps in 6.11 usec 39: 515 bytes 1000 times --> 632.97 Mbps in 6.21 usec 40: 765 bytes 1000 times --> 854.35 Mbps in 6.83 usec 41: 768 bytes 1000 times --> 878.14 Mbps in 6.67 usec 42: 771 bytes 1000 times --> 878.74 Mbps in 6.69 usec 43: 1021 bytes 1000 times --> 1067.29 Mbps in 7.30 usec 44: 1024 bytes 1000 times --> 1073.29 Mbps in 7.28 usec 45: 1027 bytes 1000 times --> 1076.14 Mbps in 7.28 usec 46: 1533 bytes 1000 times --> 1396.85 Mbps in 8.37 usec 47: 1536 bytes 1000 times --> 1407.83 Mbps in 8.32 usec 48: 1539 bytes 1000 times --> 1385.12 Mbps in 8.48 usec 49: 2045 bytes 1000 times --> 1647.53 Mbps in 9.47 usec 50: 2048 bytes 1000 times --> 1657.56 Mbps in 9.43 usec 51: 2051 bytes 1000 times --> <- hangs here From vuhuong at mellanox.com Fri Sep 1 09:25:01 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Fri, 01 Sep 2006 09:25:01 -0700 Subject: [openib-general] Srp question In-Reply-To: References: Message-ID: <44F85EDD.4090102@mellanox.com> > > By default, we found (via stats from the DDN) that we were only seeing reads > and writes in the 0-32Kbyte range. Comparing IBGold and OFED, we found that > the srp_sg_tablesize defaulted to 256, but in OFED it defaulted to 12. So, > changing this (via modprobe.conf) to 256 in OFED, we were able to see reads > and writes in the 128Kbyte range (which is what ultimately got us to the > performance above). I also noticed that there is a max_sects option you can > pass to add_target (in the SRP /sys entries) which seemed to be the same > idea as srp_sg_tablesize, but this didn't seem to affect anything. > > So, my question is, what is the right magic to get SRP up to speed? I played around with these parameters: srp_sg_tablesize (via modprobe.conf or passing it directly), max_sect and max_cmd_per_lun. srp_sg_tablesize={32, 64, and 128} max_sect={512, 1024, and 2048} max_cmd_per_lun={1, 2, 4, 8, 16, 32, and default 64} --> this really depends on the storage to have the right number -vu From caitlin.bestler at gmail.com Fri Sep 1 10:14:09 2006 From: caitlin.bestler at gmail.com (Caitlin Bestler) Date: Fri, 1 Sep 2006 10:14:09 -0700 Subject: [openib-general] single rkey In-Reply-To: References: Message-ID: <469958e00609011014y222e10eakd4714d35fed35891@mail.gmail.com> On 8/31/06, yipee wrote: > Hi, > > Is it possible for several memory registrations (using ibv_reg_mr) to have a > single rkey? > Can I add memory registrations to a previous rkey? > > You need to create the Memory Region as large as you think it will need to be. But there are two things you can keep in mind: 1) You can create multiple Memory Regions. A scatter gather list can reference multiple memory regions. 2) You can create Memory Windows within Memory Regions to limit the scope exposed to the remote end. 3) The same pages can be registered to multiple Memory Regions. So you could create a *new* Memory Region that included the prior pages *and* the new pages, use that, and release the old Memory Region eventually when all use of it ended. The goal of making Memory Region lookups by hardware efficient has encouraged most implementations to use data structures that are not friendly for dynamic host manipulation while the Memory Region is already in use. That's why the API is designed to set the contents of a Memory Region in a single operation rather than by piecemeal addition. From akpm at osdl.org Fri Sep 1 10:13:40 2006 From: akpm at osdl.org (Andrew Morton) Date: Fri, 1 Sep 2006 10:13:40 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901160023.GB18276@stusta.de> References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> Message-ID: <20060901101340.962150cb.akpm@osdl.org> On Fri, 1 Sep 2006 18:00:23 +0200 Adrian Bunk wrote: > On Fri, Sep 01, 2006 at 01:58:18AM -0700, Andrew Morton wrote: > >... > > Changes since 2.6.18-rc4-mm3: > >... > > +amso1100-build-fix.patch > > > > Fix git-infiniband.patch > >... > > This causes the following compile error on i386: > > <-- snip --> > > ... > CC drivers/infiniband/hw/amso1100/c2.o > /home/bunk/linux/kernel-2.6/linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.c: In function ‘c2_tx_ring_alloc’: > /home/bunk/linux/kernel-2.6/linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.c:133: error: implicit declaration of function ‘__raw_writeq’ > make[4]: *** [drivers/infiniband/hw/amso1100/c2.o] Error 1 > That would have been me cheerfully deleting stuff because it didn't build on powerpc. > > There seems to be some confusion regarding whether __raw_writeq() is > considered a platform independent API. > It appears to be undocumented and uncommented hence it's not an API _at all_, is it? What's __raw_writeq() supposed to do, anyway? On alpha it's writeq() without an mb(). On parisc it's writeq() only the data is byte-reversed. On sparc64() it's incomprehensible. On everything else it's writeq(). What a crock. From rdreier at cisco.com Fri Sep 1 10:34:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 01 Sep 2006 10:34:24 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901101340.962150cb.akpm@osdl.org> (Andrew Morton's message of "Fri, 1 Sep 2006 10:13:40 -0700") References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> Message-ID: Andrew> What's __raw_writeq() supposed to do, anyway? On alpha Andrew> it's writeq() without an mb(). On parisc it's writeq() Andrew> only the data is byte-reversed. On sparc64() it's Andrew> incomprehensible. On everything else it's writeq(). My understanding is that __raw_writeq() is like writeq() except not strongly ordered and without the byte-swap on big-endian architectures. The __raw_writeX() variants are convenient to avoid having to write inefficient code like writel(swab32(foo), ...) when talking to a PCI device that wants big-endian data. Without the raw variant, you end up with a double swap on big-endian architectures. sparc64 looks wrong, since __raw_writeq() seems identical to writeq(), which seems to imply it's going to swab what is stores. - R. From rjwalsh at pathscale.com Fri Sep 1 11:20:38 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 01 Sep 2006 11:20:38 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <44F879F6.8080601@pathscale.com> > Hi all, I installed the RC3 package on my Xeon/Lindenhurst platforms > and with the pathscale card I have the following problem > when trying to run Intel MPI and NetPipe. Actually, I've been trying to run Intel MPI myself, but haven't gotten very far yet. My attempts die like this: $ mpiexec -n 2 ./mpitest I_MPI: [0] set_up_devices(): will use device: libmpi.rdma.so I_MPI: [0] set_up_devices(): will use DAPL provider: OpenIB-cma I_MPI: [0] set_up_devices(): will use device: libmpi.rdma.so I_MPI: [0] set_up_devices(): will use DAPL provider: OpenIB-cma [0][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) [1][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) rank 0 in job 2 ib-idev-05_51713 caused collective abort of all ranks exit status of rank 0: return code 254 dapltest seems to work just fine, so I'm a little confused. Do you have any insight on what the DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) stuff is referring to? Regards, Robert. From akpm at osdl.org Fri Sep 1 11:23:12 2006 From: akpm at osdl.org (Andrew Morton) Date: Fri, 1 Sep 2006 11:23:12 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> Message-ID: <20060901112312.5ff0dd8d.akpm@osdl.org> On Fri, 01 Sep 2006 10:34:24 -0700 Roland Dreier wrote: > Andrew> What's __raw_writeq() supposed to do, anyway? On alpha > Andrew> it's writeq() without an mb(). On parisc it's writeq() > Andrew> only the data is byte-reversed. On sparc64() it's > Andrew> incomprehensible. On everything else it's writeq(). > > My understanding is that __raw_writeq() is like writeq() except not > strongly ordered and without the byte-swap on big-endian > architectures. The __raw_writeX() variants are convenient to avoid > having to write inefficient code like writel(swab32(foo), ...) when > talking to a PCI device that wants big-endian data. Without the raw > variant, you end up with a double swap on big-endian architectures. > > sparc64 looks wrong, since __raw_writeq() seems identical to writeq(), > which seems to imply it's going to swab what is stores. > OK. Can we please stop hacking around this in drivers and a) work out what it's supposed to do b) document that (Documentation/DocBook/deviceiobook.tmpl or code comment or whatever) c) tell arch maintainers? From robert.j.woodruff at intel.com Fri Sep 1 11:28:03 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 1 Sep 2006 11:28:03 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready Message-ID: Robert Walsh wrote, > [0][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not >create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) > [1][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not >create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) > rank 0 in job 2 ib-idev-05_51713 caused collective abort of all ranks > exit status of rank 0: return code 254 What version of Intel MPI are you running ? This looks like an error that we saw with the 2.0 release, not sure if this was a DAPL issue or an MPI issue, Arlin would remember for sure. You should get the Intel MPI 2.0.1 refresh release or the 3.0 beta release to make sure that you have all of the latest MPI fixes. woody From rjwalsh at pathscale.com Fri Sep 1 11:32:38 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 01 Sep 2006 11:32:38 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <44F87CC6.1060606@pathscale.com> Woodruff, Robert J wrote: > Robert Walsh wrote, >> [0][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not >> create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) >> [1][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not >> create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) >> rank 0 in job 2 ib-idev-05_51713 caused collective abort of all > ranks >> exit status of rank 0: return code 254 > > What version of Intel MPI are you running ? This looks like an error > that we saw with the 2.0 release, not sure if this was a DAPL > issue or an MPI issue, Arlin would remember for sure. > > You should get the Intel MPI 2.0.1 refresh release > or the 3.0 beta release to make sure that you have all of the latest > MPI fixes. I'm running 2.0.1. The package number bit of the tar file was "12". I'm running the DAPL that came with OFED-1.1-RC3. Can you send me a pointer to the 3.0 beta release? Regards, Robert. From Brian.Cain at ge.com Fri Sep 1 11:51:16 2006 From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare)) Date: Fri, 1 Sep 2006 14:51:16 -0400 Subject: [openib-general] PXE + infiniband? Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com> A while back (http://openib.org/pipermail/openib-general/2005-September/010801.html) there was mention of putting PXE stuff on an HCA. Has anyone done this with PXELINUX? It doesn't seem like it's as straightforward as just putting the stock PXELINUX image on your HCA. I'm assuming this image would have to recognize the HCA and bring up IPoIB in order to use the conventional TFTP transport? -- -Brian From Brian.Cain at ge.com Fri Sep 1 12:21:24 2006 From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare)) Date: Fri, 1 Sep 2006 15:21:24 -0400 Subject: [openib-general] PXE + infiniband? In-Reply-To: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com> Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7F1@CINMLVEM11.e2k.ad.ge.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Cain, > Brian (GE Healthcare) > Sent: Friday, September 01, 2006 1:51 PM > To: openib-general at openib.org > Subject: [openib-general] PXE + infiniband? > > A while back > (http://openib.org/pipermail/openib-general/2005-September/010 > 801.html) > there was mention of putting PXE stuff on an HCA. Has anyone > done this > with PXELINUX? It doesn't seem like it's as straightforward as just > putting the stock PXELINUX image on your HCA. I'm assuming this image > would have to recognize the HCA and bring up IPoIB in order to use the > conventional TFTP transport? Ok, nm -- I found that etherboot has a README.boot_over_ib which looks like it'll probably work well. -Brian From sean.hefty at intel.com Fri Sep 1 12:37:28 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 1 Sep 2006 12:37:28 -0700 Subject: [openib-general] [PATCH] cma: protect against adding device during destruction In-Reply-To: <20060831193707.GA3859@mellanox.co.il> Message-ID: <000001c6cdfe$0f4d0190$e598070a@amr.corp.intel.com> >I'll test some, but the problem hasn't reappeared since. >The patch looks right, I'd say push it for 2.6.18. We need the following change, which applies on top of the previous patch, as well. Add missing synchronization around acquiring an IB device. Signed-off-by: Sean Hefty --- Index: cma.c =================================================================== --- cma.c (revision 9217) +++ cma.c (revision 9218) @@ -1031,7 +1031,9 @@ static int cma_req_handler(struct ib_cm_ } atomic_inc(&conn_id->dev_remove); + mutex_lock(&lock); ret = cma_acquire_ib_dev(conn_id); + mutex_unlock(&lock); if (ret) { ret = -ENODEV; cma_release_remove(conn_id); From robert.j.woodruff at intel.com Fri Sep 1 12:40:12 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 1 Sep 2006 12:40:12 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready Message-ID: Tzporet wrote, >Hi, >OFED 1.1-RC3 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ >File: OFED-1.1-rc3.tgz >Please report any issues in bugzilla http://openib.org/bugzilla/ I tried running OFED1.1-rc3 on my Itanium machines on RedHat EL4-U3 and got the following error. [root at iclust-tiger1 woody]# /etc/init.d/openibd start Loading HCA driver and Access Layer: [FAILED] Please open an issue in the http://openib.org/bugzilla and attach /tmp/ib_debug_info.log dmesg ib_mthca 0000:0d:00.0: HCA FW version 3.3.2 is old (3.4.0 is current). ib_mthca 0000:0d:00.0: If you have problems, try updating your HCA FW. ib_uverbs: Unknown symbol hpage_shift <-----------------------I think this is the problem divert: not allocating divert_blk for non-ethernet device ib0 divert: not allocating divert_blk for non-ethernet device ib1 ip_tables: (C) 2000-2002 Netfilter core team From rdreier at cisco.com Fri Sep 1 12:53:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 01 Sep 2006 12:53:47 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901112312.5ff0dd8d.akpm@osdl.org> (Andrew Morton's message of "Fri, 1 Sep 2006 11:23:12 -0700") References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> Message-ID: Roland> My understanding is that __raw_writeq() is like writeq() Roland> except not strongly ordered and without the byte-swap on Roland> big-endian architectures. The __raw_writeX() variants are Roland> convenient to avoid having to write inefficient code like Roland> writel(swab32(foo), ...) when talking to a PCI device that Roland> wants big-endian data. Without the raw variant, you end Roland> up with a double swap on big-endian architectures. Oh, I left one other thing out: writeq() and __raw_writeq() shold be atomic in the sense that no other transactions should be able to get onto the IO bus in the middle -- so implementing writeq() as two writel()s in a row is not allowed Andrew> OK. Can we please stop hacking around this in drivers and Andrew> a) work out what it's supposed to do Andrew> b) document that (Documentation/DocBook/deviceiobook.tmpl Andrew> or code comment or whatever) Andrew> c) tell arch maintainers? Yes, I agree that's a good plan, especially the documentation part. However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h is legitimate: the driver uses __raw_writeq() when it exists and uses two __raw_writel()s properly serialized with a device-specific lock to get exactly the atomicity it needs on 32-bit archs. It's an open question what drivers that don't actually need atomicity but just want a convenient way to write 64 bits at time should do. - R. From akpm at osdl.org Fri Sep 1 13:04:44 2006 From: akpm at osdl.org (Andrew Morton) Date: Fri, 1 Sep 2006 13:04:44 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> Message-ID: <20060901130444.48f19457.akpm@osdl.org> On Fri, 01 Sep 2006 12:53:47 -0700 Roland Dreier wrote: > Roland> My understanding is that __raw_writeq() is like writeq() > Roland> except not strongly ordered and without the byte-swap on > Roland> big-endian architectures. The __raw_writeX() variants are > Roland> convenient to avoid having to write inefficient code like > Roland> writel(swab32(foo), ...) when talking to a PCI device that > Roland> wants big-endian data. Without the raw variant, you end > Roland> up with a double swap on big-endian architectures. > > Oh, I left one other thing out: writeq() and __raw_writeq() shold be > atomic in the sense that no other transactions should be able to get > onto the IO bus in the middle -- so implementing writeq() as two > writel()s in a row is not allowed > > Andrew> OK. Can we please stop hacking around this in drivers and > > Andrew> a) work out what it's supposed to do > > Andrew> b) document that (Documentation/DocBook/deviceiobook.tmpl > Andrew> or code comment or whatever) > > Andrew> c) tell arch maintainers? > > Yes, I agree that's a good plan, especially the documentation part. > However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h > is legitimate: the driver uses __raw_writeq() when it exists and uses > two __raw_writel()s properly serialized with a device-specific lock to > get exactly the atomicity it needs on 32-bit archs. No, driver-specific workarounds are not legitimate, sorry. The driver should simply fail to compile on architectures which do not implement __raw_writeq(). We can speed up the process by sending helpful emails to architecture maintainers, but they'll notice either way. Let's fix it once, and in the correct place. > It's an open question what drivers that don't actually need atomicity > but just want a convenient way to write 64 bits at time should do. Well yeah. We should sort out the design issues before implementing things ;) From tom at opengridcomputing.com Fri Sep 1 13:20:59 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 01 Sep 2006 15:20:59 -0500 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901130444.48f19457.akpm@osdl.org> References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> Message-ID: <1157142059.22301.74.camel@trinity.ogc.int> So to make sure I understand all this... The purpose of these services is to provide a platform independent API for reading and writing 16, 32 and 64b values to MMIO devices. The rationale for needing these services is that there is currently no platform independent API for efficiently reading and writing these values to BE devices on MMIO PCI devices. Examples are the mthca and amso1100 devices. Two classes of service are needed, atomic services that are interrupt safe and services that either don't require atomicity or are called with a suitable lock already held. Does the API look something like this? void mmio_wr_be16(__be16 val, void __iomem *addr); void mmio_wr_be32(__be32 val, void __iomem *addr); void mmio_wr_be64(__be64 val, void __iomem *addr); void mmio_atomic_wr_be16(__be16 val, void __iomem *addr); void mmio_atomic_wr_be32(__be32 val, void __iomem *addr); void mmio_atomic_wr_be64(__be64 val, void __iomem *addr); __be16 mmio_rd_be16(void __iomem *addr); __be32 mmio_rd_be32(void __iomem *addr); __be64 mmio_rd_be64(void __iomem *addr); __be16 mmio_atomic_wr_be16(void __iomem *addr); __be32 mmio_atomic_wr_be32(void __iomem *addr); __be64 mmio_atomic_wr_be64(void __iomem *addr); On Fri, 2006-09-01 at 13:04 -0700, Andrew Morton wrote: > On Fri, 01 Sep 2006 12:53:47 -0700 > Roland Dreier wrote: > > > Roland> My understanding is that __raw_writeq() is like writeq() > > Roland> except not strongly ordered and without the byte-swap on > > Roland> big-endian architectures. The __raw_writeX() variants are > > Roland> convenient to avoid having to write inefficient code like > > Roland> writel(swab32(foo), ...) when talking to a PCI device that > > Roland> wants big-endian data. Without the raw variant, you end > > Roland> up with a double swap on big-endian architectures. > > > > Oh, I left one other thing out: writeq() and __raw_writeq() shold be > > atomic in the sense that no other transactions should be able to get > > onto the IO bus in the middle -- so implementing writeq() as two > > writel()s in a row is not allowed > > > > Andrew> OK. Can we please stop hacking around this in drivers and > > > > Andrew> a) work out what it's supposed to do > > > > Andrew> b) document that (Documentation/DocBook/deviceiobook.tmpl > > Andrew> or code comment or whatever) > > > > Andrew> c) tell arch maintainers? > > > > Yes, I agree that's a good plan, especially the documentation part. > > However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h > > is legitimate: the driver uses __raw_writeq() when it exists and uses > > two __raw_writel()s properly serialized with a device-specific lock to > > get exactly the atomicity it needs on 32-bit archs. > > No, driver-specific workarounds are not legitimate, sorry. > > The driver should simply fail to compile on architectures which do not > implement __raw_writeq(). > > We can speed up the process by sending helpful emails to architecture > maintainers, but they'll notice either way. > > Let's fix it once, and in the correct place. > > > It's an open question what drivers that don't actually need atomicity > > but just want a convenient way to write 64 bits at time should do. > > Well yeah. We should sort out the design issues before implementing > things ;) > From bos at pathscale.com Fri Sep 1 13:45:27 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 01 Sep 2006 13:45:27 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> Message-ID: <1157143527.20958.8.camel@chalcedony.pathscale.com> On Fri, 2006-09-01 at 12:53 -0700, Roland Dreier wrote: > Yes, I agree that's a good plan, especially the documentation part. > However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h > is legitimate: the driver uses __raw_writeq() when it exists and uses > two __raw_writel()s properly serialized with a device-specific lock to > get exactly the atomicity it needs on 32-bit archs. On the off chance that you might be arguing that mthca_write64 could be a candidate drop-in for writeq on 32-bit arches: That approach might work on mthca hardware, but it's not safe in general. The ipath driver requires a proper writeq(), for example, because the hardware will quite legitimately treat 32-bit writes to some registers as separate accesses, and screw things up royally. You get atomicity from the perspective of software with this approach, but you can do exciting and bad things to hardware. References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> Message-ID: <20060901204343.GA4979@flint.arm.linux.org.uk> On Fri, Sep 01, 2006 at 01:04:44PM -0700, Andrew Morton wrote: > On Fri, 01 Sep 2006 12:53:47 -0700 > Roland Dreier wrote: > > Yes, I agree that's a good plan, especially the documentation part. > > However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h > > is legitimate: the driver uses __raw_writeq() when it exists and uses > > two __raw_writel()s properly serialized with a device-specific lock to > > get exactly the atomicity it needs on 32-bit archs. > > No, driver-specific workarounds are not legitimate, sorry. > > The driver should simply fail to compile on architectures which do not > implement __raw_writeq(). So, what you're basically saying is that on architectures which can _NOT_ implement an atomic __raw_writeq(), certain drivers simply will not be available? > We can speed up the process by sending helpful emails to architecture > maintainers, but they'll notice either way. I think you're completely wrong in the context of the message you're replying to - it's talking about an _atomic_ 64-bit write. Sure, if you want a _non-atomic_ 64-bit write then that's possible, but many 32-bit architectures can't do a 64-bit atomic IO write and that isn't something they can "fix". -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core From rdreier at cisco.com Fri Sep 1 13:51:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 01 Sep 2006 13:51:32 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901130444.48f19457.akpm@osdl.org> (Andrew Morton's message of "Fri, 1 Sep 2006 13:04:44 -0700") References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> Message-ID: Andrew> No, driver-specific workarounds are not legitimate, sorry. Andrew> The driver should simply fail to compile on architectures Andrew> which do not implement __raw_writeq(). But how should i386 (say) implement __raw_writeq()? As two __raw_writel()s protected by a spinlock (that serializes all IO transactions)? That seems rather ugly. - R. From rdreier at cisco.com Fri Sep 1 13:54:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 01 Sep 2006 13:54:04 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901204343.GA4979@flint.arm.linux.org.uk> (Russell King's message of "Fri, 1 Sep 2006 21:43:43 +0100") References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> <20060901204343.GA4979@flint.arm.linux.org.uk> Message-ID: Russell> Sure, if you want a _non-atomic_ 64-bit write then that's Russell> possible, but many 32-bit architectures can't do a 64-bit Russell> atomic IO write and that isn't something they can "fix". I agree completely. And going one step further: if an architecture cannot implement a 64-bit write atomically, then the precise serialization that is required is device-specific knowledge that belongs in the device driver. (For example, in the mthca case, the only serialization required is that no writes go to the same page of MMIO space between the two 32-bit halves of the 64-bit write) - R. From rdreier at cisco.com Fri Sep 1 13:59:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 01 Sep 2006 13:59:26 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <1157143527.20958.8.camel@chalcedony.pathscale.com> (Bryan O'Sullivan's message of "Fri, 01 Sep 2006 13:45:27 -0700") References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <1157143527.20958.8.camel@chalcedony.pathscale.com> Message-ID: Roland> Yes, I agree that's a good plan, especially the Roland> documentation part. However I would argue that what's in Roland> drivers/infiniband/hw/mthca/mthca_doorbell.h is Roland> legitimate: the driver uses __raw_writeq() when it exists Roland> and uses two __raw_writel()s properly serialized with a Roland> device-specific lock to get exactly the atomicity it needs Roland> on 32-bit archs. Bryan> On the off chance that you might be arguing that Bryan> mthca_write64 could be a candidate drop-in for writeq on Bryan> 32-bit arches: No, quite the opposite. I'm arguing that the wrappers in mthca do legitimately belong in a device driver, since they encapsulate device-specific knowledge about what serialization suffices when an atomic __raw_writeq() is not available. Bryan> That approach might work on mthca hardware, but it's not Bryan> safe in general. The ipath driver requires a proper Bryan> writeq(), for example, because the hardware will quite Bryan> legitimately treat 32-bit writes to some registers as Bryan> separate accesses, and screw things up royally. Yes, that's an unfortunate feature of the ipath hardware that apparently makes it impossible to drive on a generic 32-bit architecture. So perhaps writeq()/__raw_writeq() need to be defined to generate a single bus cycle to the extent that makes sense. Which would mean that it's not possible to implement on all architectures. - R. From bos at pathscale.com Fri Sep 1 14:01:41 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 01 Sep 2006 14:01:41 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> <20060901204343.GA4979@flint.arm.linux.org.uk> Message-ID: <1157144501.20958.12.camel@chalcedony.pathscale.com> On Fri, 2006-09-01 at 13:54 -0700, Roland Dreier wrote: > I agree completely. And going one step further: if an architecture > cannot implement a 64-bit write atomically, then the precise > serialization that is required is device-specific knowledge that > belongs in the device driver. Absolutely. References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> <20060901204343.GA4979@flint.arm.linux.org.uk> Message-ID: <20060901135911.bc53d89a.akpm@osdl.org> On Fri, 1 Sep 2006 21:43:43 +0100 Russell King wrote: > On Fri, Sep 01, 2006 at 01:04:44PM -0700, Andrew Morton wrote: > > On Fri, 01 Sep 2006 12:53:47 -0700 > > Roland Dreier wrote: > > > Yes, I agree that's a good plan, especially the documentation part. > > > However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h > > > is legitimate: the driver uses __raw_writeq() when it exists and uses > > > two __raw_writel()s properly serialized with a device-specific lock to > > > get exactly the atomicity it needs on 32-bit archs. > > > > No, driver-specific workarounds are not legitimate, sorry. > > > > The driver should simply fail to compile on architectures which do not > > implement __raw_writeq(). > > So, what you're basically saying is that on architectures which can _NOT_ > implement an atomic __raw_writeq(), certain drivers simply will not be > available? If the driver *requires* an atomic __raw_writeq(), then yes. The driver cannot work correctly on that machine. If, however, there is some way in which we can make the hardware work on that machine (say, with other locking) then we got the __raw_writeq() interface design (whatever that is) wrong. IOW, the best way of tackling this is to work out what we're trying to do, design an interface, then implement it. Doing funny workarounds within individual drivers isn't the way to address this. In fact it's an indication that something is wrong. > > We can speed up the process by sending helpful emails to architecture > > maintainers, but they'll notice either way. > > I think you're completely wrong in the context of the message you're > replying to - it's talking about an _atomic_ 64-bit write. > > Sure, if you want a _non-atomic_ 64-bit write then that's possible, > but many 32-bit architectures can't do a 64-bit atomic IO write and > that isn't something they can "fix". If the hardware/driver absolutely requires that the 64-bit write be atomic on-the-bus then sure, the fix is to disable that driver on that architecture in Kconfig. If, however, the atomicity requirement is a software thing (we need to be atomic against other CPU reads and writes) then that can be solved with locking, and we can design APIs for this which can be implemented efficiently on all architectures. From bos at serpentine.com Fri Sep 1 14:03:57 2006 From: bos at serpentine.com (Bryan O'Sullivan) Date: Fri, 01 Sep 2006 14:03:57 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <1157143527.20958.8.camel@chalcedony.pathscale.com> Message-ID: <1157144637.20958.15.camel@chalcedony.pathscale.com> On Fri, 2006-09-01 at 13:59 -0700, Roland Dreier wrote: > No, quite the opposite. I'm arguing that the wrappers in mthca do > legitimately belong in a device driver, since they encapsulate > device-specific knowledge about what serialization suffices when an > atomic __raw_writeq() is not available. Yes, I figured that out from some later messages. I think we're violently in agreement, in that case. References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> Message-ID: <20060901140313.51cf077b.akpm@osdl.org> On Fri, 01 Sep 2006 13:51:32 -0700 Roland Dreier wrote: > Andrew> No, driver-specific workarounds are not legitimate, sorry. > > Andrew> The driver should simply fail to compile on architectures > Andrew> which do not implement __raw_writeq(). > > But how should i386 (say) implement __raw_writeq()? As two > __raw_writel()s protected by a spinlock (that serializes all IO > transactions)? That seems rather ugly. > If it's a choice between "ugly" and "doesn't work on x86", we'll take "ugly" ;) From rdreier at cisco.com Fri Sep 1 14:05:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 01 Sep 2006 14:05:36 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901135911.bc53d89a.akpm@osdl.org> (Andrew Morton's message of "Fri, 1 Sep 2006 13:59:11 -0700") References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> <20060901204343.GA4979@flint.arm.linux.org.uk> <20060901135911.bc53d89a.akpm@osdl.org> Message-ID: Andrew> If the hardware/driver absolutely requires that the 64-bit Andrew> write be atomic on-the-bus then sure, the fix is to Andrew> disable that driver on that architecture in Kconfig. Andrew> If, however, the atomicity requirement is a software thing Andrew> (we need to be atomic against other CPU reads and writes) Andrew> then that can be solved with locking, and we can design Andrew> APIs for this which can be implemented efficiently on all Andrew> architectures. It seems that there are cases of both. ipath needs actual 64-bit bus transactions to work properly. mthca needs to make sure that if doorbell writes are split into two 32-bit halves, then no other writes go to the same MMIO page in between the halves. What do you think the API would look like? Something along the lines of mthca_doorbell.h, where we have macros for DECLARE_WRITEQ_LOCK() INIT_WRITEQ_LOCK() GET_WRITEQ_LOCK() which get stubbed out on architectures where writeq is already atomic, and then pass the lock into writeq()? But then you probably need some Kconfig symbol to say if writeq() is really atomic or just software atomic (for ipath et al to depend on). - R. From robert.j.woodruff at intel.com Fri Sep 1 14:14:44 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 1 Sep 2006 14:14:44 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready Message-ID: Woody wrote, >ib_mthca 0000:0d:00.0: HCA FW version 3.3.2 is old (3.4.0 is current). >ib_mthca 0000:0d:00.0: If you have problems, try updating your HCA FW. >ib_uverbs: Unknown symbol hpage_shift <-----------------------I think this is the problem Just a follow up note on this one. Looks like this is a new bug introduced at RC3, it did not fail at RC2. From akpm at osdl.org Fri Sep 1 14:26:06 2006 From: akpm at osdl.org (Andrew Morton) Date: Fri, 1 Sep 2006 14:26:06 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> <20060901204343.GA4979@flint.arm.linux.org.uk> <20060901135911.bc53d89a.akpm@osdl.org> Message-ID: <20060901142606.4f5c1152.akpm@osdl.org> On Fri, 01 Sep 2006 14:05:36 -0700 Roland Dreier wrote: > Andrew> If the hardware/driver absolutely requires that the 64-bit > Andrew> write be atomic on-the-bus then sure, the fix is to > Andrew> disable that driver on that architecture in Kconfig. > > Andrew> If, however, the atomicity requirement is a software thing > Andrew> (we need to be atomic against other CPU reads and writes) > Andrew> then that can be solved with locking, and we can design > Andrew> APIs for this which can be implemented efficiently on all > Andrew> architectures. > > It seems that there are cases of both. ipath needs actual 64-bit bus > transactions to work properly. If we define __raw_writeq() to be 64-bit-atomic-on-the-bus then an appropriate solution for ipath would be to call __raw_writeq() directly. If the arch cannot implement __raw_write() then build error -> Kconfig fix. > mthca needs to make sure that if > doorbell writes are split into two 32-bit halves, then no other writes > go to the same MMIO page in between the halves. > > What do you think the API would look like? Something along the lines > of mthca_doorbell.h, where we have macros for > > DECLARE_WRITEQ_LOCK() > INIT_WRITEQ_LOCK() > GET_WRITEQ_LOCK() > > which get stubbed out on architectures where writeq is already atomic, > and then pass the lock into writeq()? > > But then you probably need some Kconfig symbol to say if writeq() is > really atomic or just software atomic (for ipath et al to depend on). > It depends on how many other devices have (or are expected to have) mthca-like requirements. If the answer is "very few, maybe none" then perhaps we don't need to go designing generic interfaces to support such things. As for interfaces, umm, something like #ifdef CONFIG_ARCH_HAS_64BIT_ATOMIC_MMIO_WRITES struct be64_port { void __iomem *addr; }; static inline void atomic_be64_mmio_write(u64 v, struct be64_port *port) { __raw_writeq(v, port->addr); } #define be64_port_init(port, addr) port->addr = addr; #define be64_port_init_external_locking(port, addr, lockp) be64_port_init(port, addr) #else struct be64_port { void __iomem *addr; spinlock_t lock; spinlock_t *lockp; }; static inline void atomic_be64_mmio_write(u64 v, struct be64_port *port) { unsigned long flags; spin_lock_irqsave(port->lockp, flags); __raw_writel(...); __raw_writel(...); spin_unlock_irqrestore(port->lockp, flags); } #define be64_port_init(port, addr) spin_lock_init(&port->lock); port->lockp = &port->lock; port->addr = addr; #define be64_port_init_external_locking(port, addr, lockp) port->lockp = lockp; port->addr = addr; #endif perhaps? btw, 32-bit mthca_write64() is downright scary from an endianness POV. I guess it's right, but I wouldn't label it "obviously correct" ;) From rjwalsh at pathscale.com Fri Sep 1 14:33:50 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 01 Sep 2006 14:33:50 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready In-Reply-To: <44F879F6.8080601@pathscale.com> References: <44F879F6.8080601@pathscale.com> Message-ID: <44F8A73E.8050502@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Robert Walsh wrote: >> Hi all, I installed the RC3 package on my Xeon/Lindenhurst platforms >> and with the pathscale card I have the following problem >> when trying to run Intel MPI and NetPipe. > > Actually, I've been trying to run Intel MPI myself, but haven't gotten > very far yet. My attempts die like this: > > $ mpiexec -n 2 ./mpitest > I_MPI: [0] set_up_devices(): will use device: libmpi.rdma.so > I_MPI: [0] set_up_devices(): will use DAPL provider: OpenIB-cma > I_MPI: [0] set_up_devices(): will use device: libmpi.rdma.so > I_MPI: [0] set_up_devices(): will use DAPL provider: OpenIB-cma > [0][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not > create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) > [1][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not > create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) > rank 0 in job 2 ib-idev-05_51713 caused collective abort of all ranks > exit status of rank 0: return code 254 > > dapltest seems to work just fine, so I'm a little confused. Do you have > any insight on what the DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) stuff is > referring to? FWIW: I'm seeing a similar problem with the Intel MPI 3.0 beta release: $ mpiexec -n 2 ./a.out I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use rdma configuration will use rdma configuration [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma: could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) Hello world: rank 0 of 2 running on ib-idev-05 rank 1 in job 2 ib-idev-05_42160 caused collective abort of all ranks exit status of rank 1: killed by signal 9 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRPinPfzvnpzTd9fxAQJnzQf+OYOYjZwUpEQ0OtMiKJW94nAEa2okXh7H LV/WcyH4p8q0dDmzPaEXh1dEwD+DkPWjTb0uh8r+b1Dt1f5jfC98ZXb/2sMqIW4d 93sSIoDWWPN2R2WuGnsvuQcNQBkk7h0HbCBi5vJELPQcXrQAjYPNtRXCPwjXqiGE qefmsFXlUa+avWXQ+WbXBR+ldaBePvYGwFk+G4SwibgMhzyFwsSCzSc4FGrRvg7u YLUIehmV2j0snxbgFK1jVCOQ+QPo8dEhR6OcwXEMbJwUqqslnwK16zUCo2IUTTdN IROQ+kyuecaXfnH0gA2sDIKzGZxkw5zRU1cWN5cq92HPxnhjsCoa/A== =mS+M -----END PGP SIGNATURE----- From sean.hefty at intel.com Fri Sep 1 15:33:55 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 1 Sep 2006 15:33:55 -0700 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <000001c6cdfe$0f4d0190$e598070a@amr.corp.intel.com> Message-ID: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com> This closes a window where address resolution can attach an rdma_cm_id to a device during destruction of the rdma_cm_id. This can result in the rdma_cm_id remaining in the device list after its memory has been freed. Signed-off-by: Sean Hefty --- I generated this patch off the tip of the for-2.6.19 git branch, so it applies on top of the iWarp changes. Also, OF is looking at hosting git repositories. Once available, I will publish the patches there. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index c54c55a..2964dab 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -279,7 +279,7 @@ static int cma_acquire_dev(struct rdma_i default: return -ENODEV; } - mutex_lock(&lock); + list_for_each_entry(cma_dev, &dev_list, list) { ret = ib_find_cached_gid(cma_dev->device, &gid, &id_priv->id.port_num, NULL); @@ -288,7 +288,6 @@ static int cma_acquire_dev(struct rdma_i break; } } - mutex_unlock(&lock); return ret; } @@ -712,7 +711,9 @@ void rdma_destroy_id(struct rdma_cm_id * state = cma_exch(id_priv, CMA_DESTROYING); cma_cancel_operation(id_priv, state); + mutex_lock(&lock); if (id_priv->cma_dev) { + mutex_unlock(&lock); switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) @@ -727,8 +728,8 @@ void rdma_destroy_id(struct rdma_cm_id * } mutex_lock(&lock); cma_detach_from_dev(id_priv); - mutex_unlock(&lock); } + mutex_unlock(&lock); cma_release_port(id_priv); cma_deref_id(id_priv); @@ -925,7 +926,9 @@ static int cma_req_handler(struct ib_cm_ } atomic_inc(&conn_id->dev_remove); + mutex_lock(&lock); ret = cma_acquire_dev(conn_id); + mutex_unlock(&lock); if (ret) { ret = -ENODEV; cma_release_remove(conn_id); @@ -1097,7 +1100,9 @@ static int iw_conn_req_handler(struct iw goto out; } + mutex_lock(&lock); ret = cma_acquire_dev(conn_id); + mutex_unlock(&lock); if (ret) { cma_release_remove(conn_id); rdma_destroy_id(new_cm_id); @@ -1507,16 +1512,26 @@ static void addr_handler(int status, str enum rdma_cm_event_type event; atomic_inc(&id_priv->dev_remove); - if (!id_priv->cma_dev && !status) + + /* + * Grab mutex to block rdma_destroy_id() from removing the device while + * we're trying to acquire it. + */ + mutex_lock(&lock); + if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED)) { + mutex_unlock(&lock); + goto out; + } + + if (!status && !id_priv->cma_dev) status = cma_acquire_dev(id_priv); + mutex_unlock(&lock); if (status) { - if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_BOUND)) + if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ADDR_BOUND)) goto out; event = RDMA_CM_EVENT_ADDR_ERROR; } else { - if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED)) - goto out; memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); event = RDMA_CM_EVENT_ADDR_RESOLVED; @@ -1740,8 +1755,11 @@ int rdma_bind_addr(struct rdma_cm_id *id if (!cma_any_addr(addr)) { ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); - if (!ret) + if (!ret) { + mutex_lock(&lock); ret = cma_acquire_dev(id_priv); + mutex_unlock(&lock); + } if (ret) goto err; } From rdreier at cisco.com Fri Sep 1 15:42:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 01 Sep 2006 15:42:55 -0700 Subject: [openib-general] 2.6.18-rc5-mm1: drivers/infiniband/hw/amso1100/c2.c compile error In-Reply-To: <20060901142606.4f5c1152.akpm@osdl.org> (Andrew Morton's message of "Fri, 1 Sep 2006 14:26:06 -0700") References: <20060901015818.42767813.akpm@osdl.org> <20060901160023.GB18276@stusta.de> <20060901101340.962150cb.akpm@osdl.org> <20060901112312.5ff0dd8d.akpm@osdl.org> <20060901130444.48f19457.akpm@osdl.org> <20060901204343.GA4979@flint.arm.linux.org.uk> <20060901135911.bc53d89a.akpm@osdl.org> <20060901142606.4f5c1152.akpm@osdl.org> Message-ID: Andrew> It depends on how many other devices have (or are expected Andrew> to have) mthca-like requirements. If the answer is "very Andrew> few, maybe none" then perhaps we don't need to go Andrew> designing generic interfaces to support such things. I actually don't know of any others -- not that I'm an expert on the range of devices that exist... What's your feeling about drivers like amso1100, which don't particularly care about atomicity, but just want to write a 64-bit quantity conveniently? Should we require writeq()/__raw_writeq() for all archs, and then define CONFIG_ARCH_HAS_64BIT_ATOMIC_MMIO_WRITES as appropriate? I see stuff like this is drivers/dma/ioatdma.c: #if (BITS_PER_LONG == 64) ioatdma_chan_write64(ioat_chan, IOAT_CHAINADDR_OFFSET, desc->phys); #else ioatdma_chan_write32(ioat_chan, IOAT_CHAINADDR_OFFSET_LOW, (u32) desc->phys); ioatdma_chan_write32(ioat_chan, IOAT_CHAINADDR_OFFSET_HIGH, 0); #endif and drivers/char/hpet.c: #ifndef readq static inline unsigned long long readq(void __iomem *addr) { return readl(addr) | (((unsigned long long)readl(addr + 4)) << 32LL); } #endif #ifndef writeq static inline void writeq(unsigned long long v, void __iomem *addr) { writel(v & 0xffffffff, addr); writel(v >> 32, addr + 4); } #endif and so on... - R. From robert.j.woodruff at intel.com Fri Sep 1 16:11:31 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 1 Sep 2006 16:11:31 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready Message-ID: Robert wrote, > $ mpiexec -n 2 ./a.out > I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma > I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma > I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use rdma configuration > will use rdma configuration > [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma: >could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) > Hello world: rank 0 of 2 running on ib-idev-05 > rank 1 in job 2 ib-idev-05_42160 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 Hmm, if you have a debug version of the DAPL library, can you enable debug messages, export DAPL_DBG_TYPE=0xffff That may give us more information. I will also have Arlin take a look at this when returns on Tues. woody From rjwalsh at pathscale.com Fri Sep 1 16:56:22 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 01 Sep 2006 16:56:22 -0700 Subject: [openib-general] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <44F8C8A6.8090300@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Hmm, if you have a debug version of the DAPL library, can you enable > debug messages, > export DAPL_DBG_TYPE=0xffff > > That may give us more information. I will also have Arlin take a look > at this when returns on Tues. I've been using the stuff from OFED, which I don't think is built with debugging turned on. I compile up a new version next week with debugging enabled. Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRPjIpvzvnpzTd9fxAQIbsgf/R5Z1jqrhzOITZbILw2eW9rwpxEP0JQJE AWFjlnLXuj3aD/XjbYLQ13t8IXQSJ8KA6TGHcsRLVZYQqmQoVtyyfcMoZp++eKu9 koK0Ttac39ThHgjY7/EQc57WIVyIHoeDQqaS0Q8Y4P+ZwcVXuJT9TlDkCRQ/EtZW MljpJIa0XlOxyTXW0hiEMAaeMseumXbl/Sjfg5JDPz6m/d7URX6Q14Izt7PlJUly bBQUPPcukx1Vpg/3SNc/BGUSoqNa7NnMu48EVdfnG0sHBwCkXgwhFkN1bE4AcxGH Ndwksxzxz8zccu6D6dg7o/J7yOMLZo67iyAoC6c1mVjqhLeXMuZ0kw== =VbBn -----END PGP SIGNATURE----- From G.Rudd at isu.usyd.edu.au Fri Sep 1 22:19:49 2006 From: G.Rudd at isu.usyd.edu.au (Greg Rudd) Date: Sat, 02 Sep 2006 15:19:49 +1000 Subject: [openib-general] Have I got something very wrong here? Message-ID: <1157174389.29049.61.camel@localhost.localdomain> Hi all sorry for sounding like a total tool on this list but after upgrading one of my boxes to RHEL4 rel 4 and installing the 2.6.9-42.0.2.ELhugemem kernel my previous working ib interfaces defined as ib0 and ib1 that used to be able to talk IP can no longer talk but yet the interfaces can be brought up ok and starting to get some interesting messages via the dmesg ib0: Send unicast ARP to 002b ib0: Send unicast ARP to 002b ib0: Send unicast ARP to 002b ib0: Send unicast ARP to 002b ib0: Send unicast ARP to 002b ib1: stopping interface ib1: downing ib_dev ib1: Freeing ah e88b1b20 ib1: All sends and receives done. ip_tables: (C) 2000-2002 Netfilter core team ib1: bringing up interface ib1: Created ah e88cf960 ib0: Send unicast ARP to 002b on bringing up the interfaces this message appears in the dmesg ib0: Start path record lookup for fe80:0000:0000:0000:0013:21ff:ff75:3939 ib0: PathRec LID 0x002a for GID fe80:0000:0000:0000:0013:21ff:ff75:3939 ib0: Created ah e7f26600 ib0: created address handle e84f51c0 for LID 0x002a, SL 0 ib0: Send unicast ARP to 002a ib0: Start path record lookup for fe80:0000:0000:0000:0013:21ff:ff75:399d ib0: PathRec LID 0x002b for GID fe80:0000:0000:0000:0013:21ff:ff75:399d ib0: Created ah e88806c0 ib0: created address handle e88806a0 for LID 0x002b, SL 0 If I am correct redhat has totally changed the way how the infiniband drivers work in RHEL4 4 What it interesting is when you run /etc/init.d/openibd status I get the following ./openibd status HCA driver loaded Configured devices: ib0 ib1 Currently active devices: ib0 ib1 The following modules are also loaded: ib_cm ib_sdp I note that ib_ipoib does not appear in this list but when you do an lsmod it appears to be loaded into the kernel as shown below [root at hippo init.d]# lsmod |grep -i ib ib_sdp 35153 0 rdma_cm 26181 2 ib_sdp,rdma_ucm ib_addr 11717 1 rdma_cm ib_local_sa 15565 2 rdma_ucm,rdma_cm findex 8001 1 ib_local_sa ib_mthca 132969 0 ib_ipoib 50129 0 ib_uverbs 40169 1 rdma_ucm ib_umad 18929 0 ib_ucm 20549 0 ib_sa 17109 3 rdma_cm,ib_local_sa,ib_ipoib ib_cm 38444 2 rdma_cm,ib_ucm ib_mad 39385 5 ib_local_sa,ib_mthca,ib_umad,ib_sa,ib_cm ib_core 49985 11 ib_sdp,rdma_cm,ib_local_sa,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad As to the infiniband rpms installed this is what I have at the moment. kernel-ib-1.0-1 libmthca-1.0.2-1.i386 libsdp-0.9.0-1.i386 libibverbs-1.0.3-1.i386 libibverbs-utils-1.0.3-1.i386 libibcommon-1.0-1.i386 libibumad-1.0-1.i386 opensm-libs-1.2.0-1.i386 opensm-1.2.0-1.i386 libibcm-0.9.0-1.i386 libibmad-1.0-1.i386 openib-diags-1.0-1.i386 perftest-1.0-1.i386 tvflash-0.9.0-1.i386 srptools-0.0.4-1.i386 librdmacm-0.9.0-1.i386 mstflint-1.0-1.i386 To get the infiniband interfaces to work as they did before under 2-6.9-34 to work here as both ib0 and ib1 am I missing something very simple in the way of rpms or a kernel module that not has been loaded. Or is there something else happening here. Extra details ib0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:10.0.0.1 Bcast:10.255.255.255 Mask:255.0.0.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:850 errors:0 dropped:0 overruns:0 frame:0 TX packets:920 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:47600 (46.4 KiB) TX bytes:55256 (53.9 KiB) ib1 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:10.0.0.2 Bcast:10.255.255.255 Mask:255.0.0.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Copy of /etc/modprobe.conf alias eth0 tg3 alias eth1 tg3 alias bond0 bonding options bonding mode=active-backup miimon=100 alias scsi_hostadapter cciss alias eth2 e1000 alias eth3 e1000 alias usb-controller ohci-hcd alias ib0 ib_ipoib alias ib1 ib_ipoib alias net-pf-27 ib_sdp options ib_ipoib debug_level=2 options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180 remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx && { /sbin /modprobe -r --ignore-remove qla2xxx_conf; } alias scsi_hostadapter1 qla2xxx_conf alias scsi_hostadapter2 qla2xxx alias scsi_hostadapter3 qla2300 alias scsi_hostadapter4 qla2400 alias scsi_hostadapter5 qla6312 options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=30 ql2xloginretrycount=16 ql 2xfailover=1 ql2xlbType=1 ql2xautorestore=0x80 ifcfg files in /etc/sysconfig/network-scripts [root at hippo network-scripts]# more ifcfg-ib0 DEVICE=ib0 BOOTPROTO=static BROADCAST=10.255.255.255 IPADDR=10.0.0.1 NETMASK=255.0.0.0 ONBOOT=yes [root at hippo network-scripts]# more ifcfg-ib1 DEVICE=ib1 BOOTPROTO=static BROADCAST=10.255.255.255 IPADDR=10.0.0.2 NETMASK=255.0.0.0 ONBOOT=yes Thanks in advance -greg From ogerlitz at voltaire.com Sat Sep 2 23:57:20 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 03 Sep 2006 09:57:20 +0300 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com> References: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com> Message-ID: <44FA7CD0.5000506@voltaire.com> Sean Hefty wrote: > This closes a window where address resolution can attach an rdma_cm_id > to a device during destruction of the rdma_cm_id. This can result in > the rdma_cm_id remaining in the device list after its memory has been > freed. Sean, Does this patch protects against the case where an rdma_cm_id is being destructed while address resolution related to the **same** id attaches it to a device? If yes, why does someone destroys this id? is it legal to do so? If not, so your patch protects against the case where one id is being destroyed at the same time another id is being attached to the device? thanks, Or. From tziporet at dev.mellanox.co.il Sun Sep 3 02:13:19 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 03 Sep 2006 12:13:19 +0300 Subject: [openib-general] Interrupt Threshold Rate equivalent in Infiniband NIC In-Reply-To: References: Message-ID: <44FA9CAF.2060608@dev.mellanox.co.il> Aaron Fabbri wrote: > > I agree there is no equivalent to a rate limiter. I do recall there is (or was) > an interrupt timer that you can set when you burn the firmware on the Mellanox > HCAs. IIRC, it could be used to limit the interrupt rate, but the way it is > implemented it can add latency. If you don't care about latency you could try > it out. Ask Mellanox for specifics. > > Aaron > > > > Its not implemented for memfree devices, and also not fully tested for devices with memory. Tziporet From christian.guggenberger at rzg.mpg.de Sun Sep 3 10:53:46 2006 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Sun, 3 Sep 2006 19:53:46 +0200 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 In-Reply-To: <44F453FC.4070300@dev.mellanox.co.il> References: <44F453FC.4070300@dev.mellanox.co.il> Message-ID: <20060903175345.GA6931@daltons.rzg.mpg.de> Hi, On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote: > Hi All, > In testing today we found that on SLES9 SP3 memory locking as a regular > user fails. has any progress been made regarding this ? I'd like to ask if the SLES9 port is really mature yet, because I tried to go a step ahead and tried some trivial MPI code as root, but failed and got the involved node locked down hard. Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest, OFED-1.1-rc3 and mvapich2-0.9.5. Attached is a simple MPI code that causes the hard lock. Also attached are some Kernel BUGs gathered via serial console - they look garbled, unfortunately. Note, everything is fine, if I use recent vanilla kernels on that SLES9 machine. cheers. - Christian -- ----------------------------------------------------------- Phone +49-89-3299-1306 PGP http://www.rzg.mpg.de/~ccg/cg-public_key.asc S/MIME http://ra.rzg.mpg.de ----------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: text/x-csrc Size: 1260 bytes Desc: not available URL: -------------- next part -------------- Kernel BUG at page_alloc:853 invalid operand: 0000 [1] SMP CPU 0 Pid: 7092, comm: hanger Tainted: PF U (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531) RIP: 0010:[] {__free_pages+30} RSP: 0018:00000100e3fdbbf0 EFLAGS: 00010256 RAX: 0000000000000000 RBX: 00000100e72d1280 RCX: 000001000000d000 RDX: 0000010002a1c4d8 RSI: 0000000000000000 RDI: 0000010002a1c4d8 RBP: 00000100e3fdbcc8 R08: 00000100e3fda000 R09: 0000000000000002 R10: 0000000000000064 R11: 0000000000000001 R12: 0000000000000000 R13: 00000100e72d1280 R14: 000001007e644d90 R15: 00000000000493e0 FS: 0000002a95bb5b00(0000) GS:ffffffff8057dc00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000041b009 CR3: 0000000000101000 CR4: 00000000000006e0 Process hanger (pid: 7092, threadinfo 00000100e3fda000, task 000001007e644d90) Stack: ffffffff8013bd3f 0000000000000000 ffffffff801395a0 ffffffff803d3400 0000000000000246 00000000000339b3 0000000000000202 0000010002c1c600 000000000000006a 0000010002c1d6e0 Call Trace:{__mmdrop+63} {thread_return+108} {process_timeout+0} {schedule_timeout+246} {process_timeout+0} {:ib_mthca:mthca_cmd_wait+448} {default_wake_function+0} {default_wake_function+0} {:ib_mthca:mthca_cmd_box+66} {:ib_mthca:mthca_HW2SW_MPT+57} {:ib_mthca:mthca_free_mr+67} {:ib_mthca:mthca_dereg_mr+15} {:ib_core:ib_dereg_mr+26} {:ib_uverbs:ib_uverbs_close+611} {__fput+98} {filp_close+126} {sys_close+229} {system_call+124} Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83 RIP {__free_pages+30} RSP <00000100e3fdbbf0> ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at page_alloc:853 invalid operand: 0000 [2] SMP CPU 1 Pid: 1, comm: init Tainted: PF U (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531) RIP: 0010:[] {__free_pages+30} RSP: 0018:000001007ff81c80 EFLAGS: 00010256 RAX: 0000000000000000 RBX: 000001007e1e4980 RCX: 0000010080000000 RDX: 00000100815b6068 RSI: 0000000000000000 RDI: 00000100815b6068 RBP: 000001007ff81d58 R08: 000001007ff80000 R09: 0000000000000013 R10: 00000000000493e0 R11: 0000000000002710 R12: 0000000000000001 R13: 000001007e1e4980 R14: 00000100e7f3f2c0 R15: 00000000000493e0 FS: 0000002a95bb5b00(0000) GS:ffffffff8057dc80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000041b009 CR3: 000000007ff82000 CR4: 00000000000006e0 Process init (pid: 1, threadinfo 000001007ff80000, task 00000100e7f3f2c0) Stack: ffffffff8013bd3f 0000000000000040 ffffffff801395a0 00000100e7f3e9a0 000000d07f8a1580 0000000000000246 0000000000000001 00000100816f5580 000000010000007d 00000100816f6660 Call Trace:{__mmdrop+63} {thread_return+108} {schedule_timeout+246} {process_timeout+0} {do_select+1105} {__pollwait+0} {sys_select+902} {system_call+124} Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83 RIP {__free_pages+30} RSP <000001007ff81c80> b-<-0>-K--er--ne--l - p[ancuict : hAertte em] pt-e--d --to-- k-i- ll[p ileniatse! ite here B] ad-- p--a-ge-- s--ta roK aert nferl eBe_UhG otat_c poaldge_p_aaglle oc(:in85 p3 0 ceinssv al'hidan ogeper'ra, ndpa: ge00 00000 [0301] 008SM1P5b 6 68)CP U f0 la:0 x0P50id00:0 58025 m9,ap cpionmmg:: 00kl00og00d 00Ta00in00te0d00: 0 PFma ppU ed :(0 2.co6.un5-t:7.0 2p76ri-svampte S:0LxES009_00SP003_00BR ANBCHac-2kt00r6ac07e:24 104 l3C1)al t_RTrIPac: e:00<10ff:[ffad{b9ead>]_p ag{f8__0f16reaae_7fpa>{gefrs+ee30_h}o cRolSPd_: pa00ge18+1:0403}00<014>00 e4 e87 d4 0 E FL{AX__: mm0d0r0o00p+006300}00<04>00 0<0f RffBXff: ff00f800001310950ea072>{d1th28r0ea Rd_CXre: tu0r00n+0010108}000 00 0 0 se RD X:< 04>00{: dp00ut00+30300}00 00<00ff00ff0 ffRDfIf8: 01008900ff0e10>{00fi2alp1c_c4dlo8 10+1RB26P:} 0<400> 00 ff0e 4 e8 7e 18 {00s0ys R_c09lo: se00+022009}000 00<0ff00ff01ff3 0080R1110:07 01e00>0{s00ys00re00t_04c9ar3eef0 ulR+1113: }0<004>00 0 R10 00 02 71 0 2:Tr 0yi00ng00 0to00 f00ix00 i00t 00up , Rbu13t : a0 r00eb00oo10t 0eis72 dne12ed80ed R 02:ha 0ng00er0[01700093e4]:1d sf4egb0f auR1lt5: a 0t 0000000000000200a904579381e03 0 FrSip: 0 0000000000202a9a9575889134b0200 r(s00p 0000) 00GS00:f7fffbfffffffff0f808 5e7drrc0or0( 1004 0) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000041b009 CR3: 0000000000101000 CR4: 00000000000006e0 Process klogd (pid: 5259, threadinfo 00000100e4e86000, task 00000100e41df4b0) Stack: ffffffff8013bd3f 0000000000008040 ffffffff801395a0 00000100e395f5b0 0000000000000002 0000002a9556c010 000000000003ffff 0000000000040000 000000009566b1c0 0000010002c1d6e0 Call Trace:{__mmdrop+63} {thread_return+108} {do_sync_write+173} {do_syslog+384} {autoremove_wake_function+0} {autoremove_wake_function+0} {kmsg_read+66} {vfs_read+244} {sys_read+157} {system_call+124} Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83 RIP {__free_pages+30} RSP <00000100e4e87d40> ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at page_alloc:853 <0>invalid operand: 0000 [4] SMP CPU 1 Pid: 7091, comm: python2.3 Tainted: PF U B (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531) RIP: 0010:[] {__free_pages+30} RSP: 0000:00000100e32c3c80 EFLAGS: 00010256 RAX: 0000000000000000 RBX: 000001007e1e4980 RCX: 0000010080000000 RDX: 00000100815b6068 RSI: 0000000000000000 RDI: 00000100815b6068 RBP: 00000100e32c3d58 R08: 00000100e32c2000 R09: 0000000000000013 R10: 00000000000493e0 R11: 0000000000002710 R12: 0000000000000001 R13: 000001007e1e4980 R14: 000001007ec47590 R15: 00000000000493e0 FS: 0000002a96202320(0000) GS:ffffffff8057dc80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a95781302 CR3: 000000007ff82000 CR4: 00000000000006e0 Process python2.3 (pid: 7091, threadinfo 00000100e32c2000, task 000001007ec47590) Stack: ffffffff8013bd3f 0000000000000504 ffffffff801395a0 000001007e7cedb0 0000010077509c80 0000000000000256 0000000080004380 00000100816f5580 000000010000007d 00000100816f6660 Call Trace:{__mmdrop+63} {thread_return+108} {schedule_timeout+246} {process_timeout+0} {do_select+1105} {sys_sendto+246} {__pollwait+0} {sys_select+902} {system_call+124} Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83 RIP {__free_pages+30} RSP <00000100e32c3c80> <1>Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: {find_busiest_group+659} PML4 e3371067 PGD e3374067 PMD 0 Oops: 0000 [5] SMP CPU 1 Pid: 7091, comm: python2.3 Tainted: PF U B (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531) RIP: 0010:[] {find_busiest_group+659} RSP: 0000:00000100e7e07df0 EFLAGS: 00010006 RAX: 00000100e7e07eb8 RBX: 0000000000000000 RCX: 0000000000000080 RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000040 RBP: 00000100e7e07e90 R08: 0000000000000040 R09: ffffffff805c3200 R10: 0000000000000064 R11: 00000000000002ff R12: 00000000000002ff R13: ffffffff804aa7a0 R14: 0000000000000001 R15: 0000000000000000 FS: 0000002a96202320(0000) GS:ffffffff8057dc80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 000000007ff82000 CR4: 00000000000006e0 Process python2.3 (pid: 7091, threadinfo 00000100e32c2000, task 000001007ec47590) Stack: 00000100e7e07e50 0000000000000000 0000000000000001 0000000000000080 0000000000000000 0000000000017f80 0000000000000000 ffffffff804aa780 0000000102ebfb80 00000100e7e07eb8 Call Trace: {rebalance_tick+460} {smp_apic_timer_interrupt+52} {apic_timer_interrupt+99} {smp_stop_cpu+31} {smp_really_stop_cpu+9} {smp_call_function_interrupt+64} {call_function_interrupt+99} {oops_end+35} {oops_end+21} {die+59} {do_invalid_op+145} {__free_pages+30} {tcp_transmit_skb+1479} {error_exit+0} {__free_pages+30} {__mmdrop+63} {thread_return+108} {schedule_timeout+246} {process_timeout+0} {do_select+1105} {sys_sendto+246} {__pollwait+0} {sys_select+902} {system_call+124} Code: 48 8b 43 18 48 39 c8 48 0f 47 c1 48 0f af d0 48 c1 ea 07 48 RIP {find_busiest_group+659} RSP <00000100e7e07df0> CR2: 0000000000000018 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5594 bytes Desc: not available URL: From sean.hefty at intel.com Sun Sep 3 20:30:20 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 3 Sep 2006 20:30:20 -0700 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <44FA7CD0.5000506@voltaire.com> Message-ID: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com> >Does this patch protects against the case where an rdma_cm_id is being >destructed while address resolution related to the **same** id attaches >it to a device? > >If yes, why does someone destroys this id? is it legal to do so? Yes - this protects against the user destroying the id while that same id is being attached to a device. This is legal. The user may want to cancel address resolution by destroying the rdma_cm_id. The issue is that address resolution is asynchronous, with device attachment occurring in the address resolution callback handler. The user isn't aware that the callback handler has been invoked, and may attempt to destroy the rdma_cm_id when this occurs. - Sean From ogerlitz at voltaire.com Sun Sep 3 23:11:00 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 04 Sep 2006 09:11:00 +0300 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com> References: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com> Message-ID: <44FBC374.8040709@voltaire.com> Sean Hefty wrote: >> Does this patch protects against the case where an rdma_cm_id is being >> destructed while address resolution related to the **same** id attaches >> it to a device? >> >> If yes, why does someone destroys this id? is it legal to do so? > > Yes - this protects against the user destroying the id while that same id is > being attached to a device. This is legal. The user may want to cancel address > resolution by destroying the rdma_cm_id. ok, thanks for clarifying that, is cancellation allowed only for address resolution or also for route resolving and/or CM calls? also how about documenting this? Or. diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index 402c63d..b9e22c8 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -117,6 +117,14 @@ struct rdma_cm_id { struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, void *context, enum rdma_port_space ps); +/** + * rdma_destroy_id - Destroys an RDMA identifier. + * + * @id: RDMA identifier. + * + * Note: calling this function has the effect of canceling in-flight + * asynchronous operations associated with the id. + */ void rdma_destroy_id(struct rdma_cm_id *id); /** From ogerlitz at voltaire.com Mon Sep 4 02:00:09 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 04 Sep 2006 12:00:09 +0300 Subject: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM In-Reply-To: References: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com> <44EC8F10.5050806@ichips.intel.com> Message-ID: <44FBEB19.3010606@voltaire.com> Roland Dreier wrote: > Sean> This patch set appears to be the preferred approach. Any > Sean> objection to committing this? > > It's unfortunate that we have to add a special-case event hook for the > CM, but I guess the iWARP CM changes are so ugly anyway it doesn't > matter much. So I think committing this is OK. Hi Sean, My thinking is that the thing needs to be committed somewhere or at least please resubmit to the list the version you are intending to merge. We will be able to test it with iser target running on gen2 stack and provide further feedback. I guess testing it can be carried also over SDP under a high rate connection open/close test, or with what ever CM/CMA test does supports reconnecting, the data/before/rtu race happens a lot and the code would be well exercised. You will then be able to push it for 2.6.19 what do you think? Or. From tziporet at mellanox.co.il Mon Sep 4 02:29:55 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 4 Sep 2006 12:29:55 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1-rc3 is ready Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7837@mtlexch01.mtl.com> Thanks, We found the problem it caused by the huge pages support we added in RC3. Was fixed and will be in RC4 Tziporet -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Woodruff, Robert J Sent: Saturday, September 02, 2006 12:15 AM To: Woodruff, Robert J; Tziporet Koren; EWG Cc: OPENIB Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.1-rc3 is ready Woody wrote, >ib_mthca 0000:0d:00.0: HCA FW version 3.3.2 is old (3.4.0 is current). >ib_mthca 0000:0d:00.0: If you have problems, try updating your HCA FW. >ib_uverbs: Unknown symbol hpage_shift <-----------------------I think this is the problem Just a follow up note on this one. Looks like this is a new bug introduced at RC3, it did not fail at RC2. _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From ogerlitz at voltaire.com Mon Sep 4 03:54:26 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 4 Sep 2006 13:54:26 +0300 (IDT) Subject: [openib-general] IPoIB fails attaching QP to mcast group Message-ID: Michael, Roland, Any idea what can cause the below, what actually is the error i am running into here? ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) ib_mthca 0000:02:00.0: modify QP 3->3 returned status 10. ib0: failed to modify QP, ret = -22 ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff This is kernel.org system (netdev git) with PCI X HCA, some details below. Or. # uname -a Linux dill 2.6.18-rc4-gdac2b3d3-dirty #2 SMP Thu Aug 24 13:21:58 IDT 2006 x86_64 x86_64 x86_64 GNU/Linux # cat /sys/class/infiniband/mthca0/* 3.4.0 MT23108 a1 0008:f104:0396:51dc 1: CA 0008:f104:0396:51df # cat /sys/module/ib_mthca/parameters/* 0 0 0 0 0 From mst at mellanox.co.il Mon Sep 4 04:32:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 4 Sep 2006 14:32:03 +0300 Subject: [openib-general] IPoIB fails attaching QP to mcast group In-Reply-To: References: Message-ID: <20060904113203.GM3440@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: [openib-general] IPoIB fails attaching QP to mcast group > > Michael, Roland, > > Any idea what can cause the below, what actually is the error i am running into here? > > ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) > ib_mthca 0000:02:00.0: modify QP 3->3 returned status 10. > ib0: failed to modify QP, ret = -22 > ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff > > This is kernel.org system (netdev git) with PCI X HCA, some details below. > > Or. > > # uname -a > > Linux dill 2.6.18-rc4-gdac2b3d3-dirty #2 SMP Thu Aug 24 13:21:58 IDT 2006 x86_64 x86_64 x86_64 GNU/Linux > > # cat /sys/class/infiniband/mthca0/* > > 3.4.0 > MT23108 > a1 > 0008:f104:0396:51dc > 1: CA > 0008:f104:0396:51df > > # cat /sys/module/ib_mthca/parameters/* > 0 > 0 > 0 > 0 > 0 Looks like the QP is in error state, so modify QP fails. Is this at all reproducible? If so could you try with latest firmware please? -- MST From ogerlitz at voltaire.com Mon Sep 4 05:45:09 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 04 Sep 2006 15:45:09 +0300 Subject: [openib-general] IPoIB fails attaching QP to mcast group In-Reply-To: <20060904113203.GM3440@mellanox.co.il> References: <20060904113203.GM3440@mellanox.co.il> Message-ID: <44FC1FD5.3030008@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) >> ib_mthca 0000:02:00.0: modify QP 3->3 returned status 10. >> ib0: failed to modify QP, ret = -22 >> ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff > Looks like the QP is in error state, so modify QP fails. > Is this at all reproducible? > If so could you try with latest firmware please? I am not following you, do you claim that the SW (IPoIB/MTHCA) consider the QP to be in RTS but the FW/HW say that this QP is actually in error state? This happened today in endless loop on a system which i have played with its IB link, specifically, i also saw the "recv port errors" counter was getting incremented. Once i have stopped/reloaded ipoib it does not happen any more. Or. From mst at mellanox.co.il Mon Sep 4 05:48:31 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 4 Sep 2006 15:48:31 +0300 Subject: [openib-general] IPoIB fails attaching QP to mcast group In-Reply-To: <44FC1FD5.3030008@voltaire.com> References: <44FC1FD5.3030008@voltaire.com> Message-ID: <20060904124831.GA28926@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [openib-general] IPoIB fails attaching QP to mcast group > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz : > > >> ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) > >> ib_mthca 0000:02:00.0: modify QP 3->3 returned status 10. > >> ib0: failed to modify QP, ret = -22 > >> ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff > > > Looks like the QP is in error state, so modify QP fails. > > Is this at all reproducible? > > If so could you try with latest firmware please? > > I am not following you, do you claim that the SW (IPoIB/MTHCA) consider > the QP to be in RTS but the FW/HW say that this QP is actually in error > state? Seems like this. > This happened today in endless loop on a system which i have played with > its IB link, specifically, i also saw the "recv port errors" counter was > getting incremented. Once i have stopped/reloaded ipoib it does not > happen any more. > > Or. > > -- MST From johnt1johnt2 at gmail.com Mon Sep 4 05:56:46 2006 From: johnt1johnt2 at gmail.com (john t) Date: Mon, 4 Sep 2006 18:26:46 +0530 Subject: [openib-general] MPI Brodcast doubt Message-ID: Hi, I have 3 nodes connected via IB as shown below: node1 ---> switch1 ---> node2 |----------> node3 If node1 sends a brodcast message to node2 and node3, I want to know if the message is delivered to the switch twice (first time for node2 and second time for node3) or just once (where switch will know by looking at some headers or so that its a brodcast message and will send it on all the outgoing ports) ? Regards, John T. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at dev.mellanox.co.il Mon Sep 4 07:35:00 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 04 Sep 2006 17:35:00 +0300 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 In-Reply-To: <20060903175345.GA6931@daltons.rzg.mpg.de> References: <44F453FC.4070300@dev.mellanox.co.il> <20060903175345.GA6931@daltons.rzg.mpg.de> Message-ID: <44FC3994.8020509@dev.mellanox.co.il> Christian Guggenberger wrote: > Hi, > On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote: > >> Hi All, >> In testing today we found that on SLES9 SP3 memory locking as a regular >> user fails. >> > has any progress been made regarding this ? > > I'd like to ask if the SLES9 port is really mature yet, because I tried > to go a step ahead and tried some trivial MPI code as root, but failed > and got the involved node locked down hard. > Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox > PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest, > OFED-1.1-rc3 and mvapich2-0.9.5. > Attached is a simple MPI code that causes the hard lock. Also attached > are some Kernel BUGs gathered via serial console - they look garbled, > unfortunately. > Note, everything is fine, if I use recent vanilla kernels on that SLES9 > machine. > > cheers. > - Christian > Hi, We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. We tried to run here the test you attached on mvapich1 but have not seen this failure. Can you try to reproduce with mvapich1 version? If not please send us detailed instructions how to reproduce with mvapich2 (where to take sources, compile, etc.) BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853 We couldn't find it. Which kernel version are you using? We use here 2.6.5-7.244-smp. Tziporet & Eli From christian.guggenberger at rzg.mpg.de Mon Sep 4 07:44:18 2006 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Mon, 4 Sep 2006 16:44:18 +0200 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 In-Reply-To: <44FC3994.8020509@dev.mellanox.co.il> References: <44F453FC.4070300@dev.mellanox.co.il> <20060903175345.GA6931@daltons.rzg.mpg.de> <44FC3994.8020509@dev.mellanox.co.il> Message-ID: <20060904144417.GD7576@daltons.rzg.mpg.de> Hi, > >Attached is a simple MPI code that causes the hard lock. Also attached > >are some Kernel BUGs gathered via serial console - they look garbled, > >unfortunately. > >Note, everything is fine, if I use recent vanilla kernels on that SLES9 > >machine. > > > >cheers. > > - Christian > > > Hi, > We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. > We tried to run here the test you attached on mvapich1 but have not seen > this failure. > Can you try to reproduce with mvapich1 version? is it also okay if I tried with plain mvapich1 from OSU ? > If not please send us detailed instructions how to reproduce with > mvapich2 (where to take sources, compile, etc.) > BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853 > > We couldn't find it. > Which kernel version are you using? We use here 2.6.5-7.244-smp. > this is with 2.6.5-7.276-smp cheers. - Christian -- ----------------------------------------------------------- Phone +49-89-3299-1306 PGP http://www.rzg.mpg.de/~ccg/cg-public_key.asc S/MIME http://ra.rzg.mpg.de ----------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5594 bytes Desc: not available URL: From tziporet at dev.mellanox.co.il Mon Sep 4 08:08:27 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 04 Sep 2006 18:08:27 +0300 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 In-Reply-To: <20060904144417.GD7576@daltons.rzg.mpg.de> References: <44F453FC.4070300@dev.mellanox.co.il> <20060903175345.GA6931@daltons.rzg.mpg.de> <44FC3994.8020509@dev.mellanox.co.il> <20060904144417.GD7576@daltons.rzg.mpg.de> Message-ID: <44FC416B.6020409@dev.mellanox.co.il> Christian Guggenberger wrote: >> Hi, >> We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. >> We tried to run here the test you attached on mvapich1 but have not seen >> this failure. >> Can you try to reproduce with mvapich1 version? >> > > is it also okay if I tried with plain mvapich1 from OSU ? I guess yes, although we use the one that comes with OFED. >> > this is with 2.6.5-7.276-smp > > > I'll see if we can update our kernel version. Tziporet From christian.guggenberger at rzg.mpg.de Mon Sep 4 08:24:49 2006 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Mon, 4 Sep 2006 17:24:49 +0200 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 In-Reply-To: <44FC416B.6020409@dev.mellanox.co.il> References: <44F453FC.4070300@dev.mellanox.co.il> <20060903175345.GA6931@daltons.rzg.mpg.de> <44FC3994.8020509@dev.mellanox.co.il> <20060904144417.GD7576@daltons.rzg.mpg.de> <44FC416B.6020409@dev.mellanox.co.il> Message-ID: <20060904152449.GF7576@daltons.rzg.mpg.de> > >>We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. > >>We tried to run here the test you attached on mvapich1 but have not seen > >>this failure. > >>Can you try to reproduce with mvapich1 version? > >> > > > >is it also okay if I tried with plain mvapich1 from OSU ? > I guess yes, although we use the one that comes with OFED. hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not reproducible. Using mvapich2-0.9.5 it happens each time... cheers. - Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5594 bytes Desc: not available URL: From bunk at stusta.de Mon Sep 4 10:03:50 2006 From: bunk at stusta.de (Adrian Bunk) Date: Mon, 4 Sep 2006 19:03:50 +0200 Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups In-Reply-To: <20060901015818.42767813.akpm@osdl.org> References: <20060901015818.42767813.akpm@osdl.org> Message-ID: <20060904170350.GR4416@stusta.de> On Fri, Sep 01, 2006 at 01:58:18AM -0700, Andrew Morton wrote: >... > Changes since 2.6.18-rc4-mm3: >... > git-infiniband.patch >... > git trees. >... This patch contains the following possible cleanups: - make the following needlessly global functions static: - c2_ae.c: to_qp_state_str() - c2_cq.c: c2_cq_get() - c2_cq.c: c2_cq_put() - c2_qp.c: to_ib_state() - c2_qp.c: to_ib_state_str() - c2_rnic.c: c2_rnic_query() - #if 0 the following unused global function: - c2_mq.c: c2_mq_count() Signed-off-by: Adrian Bunk --- drivers/infiniband/hw/amso1100/c2.h | 1 - drivers/infiniband/hw/amso1100/c2_ae.c | 2 +- drivers/infiniband/hw/amso1100/c2_cq.c | 4 ++-- drivers/infiniband/hw/amso1100/c2_mq.c | 3 ++- drivers/infiniband/hw/amso1100/c2_mq.h | 1 - drivers/infiniband/hw/amso1100/c2_qp.c | 4 ++-- drivers/infiniband/hw/amso1100/c2_rnic.c | 3 +-- 7 files changed, 8 insertions(+), 10 deletions(-) --- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_ae.c.old 2006-09-01 21:02:16.000000000 +0200 +++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_ae.c 2006-09-01 21:02:23.000000000 +0200 @@ -125,7 +125,7 @@ return event_str[event]; } -const char *to_qp_state_str(int state) +static const char *to_qp_state_str(int state) { switch (state) { case C2_QP_STATE_IDLE: --- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_cq.c.old 2006-09-01 21:02:45.000000000 +0200 +++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_cq.c 2006-09-01 21:03:06.000000000 +0200 @@ -41,7 +41,7 @@ #define C2_CQ_MSG_SIZE ((sizeof(struct c2wr_ce) + 32-1) & ~(32-1)) -struct c2_cq *c2_cq_get(struct c2_dev *c2dev, int cqn) +static struct c2_cq *c2_cq_get(struct c2_dev *c2dev, int cqn) { struct c2_cq *cq; unsigned long flags; @@ -57,7 +57,7 @@ return cq; } -void c2_cq_put(struct c2_cq *cq) +static void c2_cq_put(struct c2_cq *cq) { if (atomic_dec_and_test(&cq->refcount)) wake_up(&cq->wait); --- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_mq.h.old 2006-09-01 21:03:23.000000000 +0200 +++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_mq.h 2006-09-01 21:03:30.000000000 +0200 @@ -98,7 +98,6 @@ extern void c2_mq_produce(struct c2_mq *q); extern void *c2_mq_consume(struct c2_mq *q); extern void c2_mq_free(struct c2_mq *q); -extern u32 c2_mq_count(struct c2_mq *q); extern void c2_mq_req_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size, u8 __iomem *pool_start, u16 __iomem *peer, u32 type); extern void c2_mq_rep_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size, --- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_mq.c.old 2006-09-01 21:03:37.000000000 +0200 +++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_mq.c 2006-09-01 21:03:49.000000000 +0200 @@ -121,7 +121,7 @@ } } - +#if 0 u32 c2_mq_count(struct c2_mq *q) { s32 count; @@ -138,6 +138,7 @@ return (u32) count; } +#endif /* 0 */ void c2_mq_req_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size, u8 __iomem *pool_start, u16 __iomem *peer, u32 type) --- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_qp.c.old 2006-09-01 21:04:06.000000000 +0200 +++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_qp.c 2006-09-01 21:04:22.000000000 +0200 @@ -75,7 +75,7 @@ } } -int to_ib_state(enum c2_qp_state c2_state) +static int to_ib_state(enum c2_qp_state c2_state) { switch (c2_state) { case C2_QP_STATE_IDLE: @@ -95,7 +95,7 @@ } } -const char *to_ib_state_str(int ib_state) +static const char *to_ib_state_str(int ib_state) { static const char *state_str[] = { "IB_QPS_RESET", --- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.h.old 2006-09-01 21:04:49.000000000 +0200 +++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.h 2006-09-01 21:04:54.000000000 +0200 @@ -485,7 +485,6 @@ extern int c2_rnic_init(struct c2_dev *c2dev); extern void c2_rnic_term(struct c2_dev *c2dev); extern void c2_rnic_interrupt(struct c2_dev *c2dev); -extern int c2_rnic_query(struct c2_dev *c2dev, struct ib_device_attr *props); extern int c2_del_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask); extern int c2_add_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask); --- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_rnic.c.old 2006-09-01 21:05:03.000000000 +0200 +++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_rnic.c 2006-09-01 21:05:17.000000000 +0200 @@ -118,8 +118,7 @@ /* * Query the adapter */ -int c2_rnic_query(struct c2_dev *c2dev, - struct ib_device_attr *props) +static int c2_rnic_query(struct c2_dev *c2dev, struct ib_device_attr *props) { struct c2_vq_req *vq_req; struct c2wr_rnic_query_req wr; From sashak at voltaire.com Mon Sep 4 10:20:06 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 04 Sep 2006 20:20:06 +0300 Subject: [openib-general] [PATCH] opensm: osm_log_init_v2() - new osm_log initializer Message-ID: <20060904172006.10400.62708.stgit@sashak.voltaire.com> There is new osm_log initializer osm_log_init_v2(), this is wrapped by osm_log_init() in order to preserve existing API. Signed-off-by: Sasha Khapyorsky --- diags/src/saquery.c | 4 ++-- osm/complib/cl_event_wheel.c | 2 +- osm/include/opensm/osm_log.h | 29 +++++++++++++++++++++++++---- osm/opensm/libopensm.map | 1 + osm/opensm/osm_db_files.c | 2 +- osm/opensm/osm_log.c | 13 ++++++++++++- osm/opensm/osm_opensm.c | 6 +++--- osm/osmtest/osmtest.c | 4 ++-- 8 files changed, 47 insertions(+), 14 deletions(-) diff --git a/diags/src/saquery.c b/diags/src/saquery.c index 0bb46be..5e4b5f1 100644 --- a/diags/src/saquery.c +++ b/diags/src/saquery.c @@ -442,8 +442,8 @@ get_bind_handle(void) complib_init(); osm_log_construct(&log_osm); - if ((status = osm_log_init( &log_osm, TRUE, - 0x0001, NULL, 0, TRUE )) != IB_SUCCESS) { + if ((status = osm_log_init_v2(&log_osm, TRUE, 0x0001, NULL, + 0, TRUE)) != IB_SUCCESS) { fprintf(stderr, "Failed to init osm_log: %s\n", ib_get_err_str(status)); exit (-1); diff --git a/osm/complib/cl_event_wheel.c b/osm/complib/cl_event_wheel.c index a215f40..e1ab141 100644 --- a/osm/complib/cl_event_wheel.c +++ b/osm/complib/cl_event_wheel.c @@ -610,7 +610,7 @@ main () cl_event_wheel_construct( &event_wheel ); /* init */ - osm_log_init( &log, TRUE, 0xff, NULL, 0, FALSE); + osm_log_init_v2( &log, TRUE, 0xff, NULL, 0, FALSE); cl_event_wheel_init( &event_wheel, &log ); /* Start Playing */ diff --git a/osm/include/opensm/osm_log.h b/osm/include/opensm/osm_log.h index 5bfaef5..6f536f3 100644 --- a/osm/include/opensm/osm_log.h +++ b/osm/include/opensm/osm_log.h @@ -203,18 +203,18 @@ osm_log_destroy( * osm_log_init *********/ -/****f* OpenSM: Log/osm_log_init +/****f* OpenSM: Log/osm_log_init_v2 * NAME -* osm_log_init +* osm_log_init_v2 * * DESCRIPTION -* The osm_log_init function initializes a +* The osm_log_init_v2 function initializes a * Log object for use. * * SYNOPSIS */ ib_api_status_t -osm_log_init( +osm_log_init_v2( IN osm_log_t* const p_log, IN const boolean_t flush, IN const uint8_t log_flags, @@ -249,6 +249,27 @@ osm_log_init( * osm_log_destroy *********/ +/****f* OpenSM: Log/osm_log_init +* NAME +* osm_log_init +* +* DESCRIPTION +* The osm_log_init function initializes a +* Log object for use. It wrapper for osm_log_init_v2() +* +* SYNOPSIS +*/ +ib_api_status_t +osm_log_init( + IN osm_log_t* const p_log, + IN const boolean_t flush, + IN const uint8_t log_flags, + IN const char *log_file, + IN const boolean_t accum_log_file ); +/* +* All as above (osm_log_init_v2()), but without max_size parameters +*/ + /****f* OpenSM: Log/osm_log_get_level * NAME * osm_log_get_level diff --git a/osm/opensm/libopensm.map b/osm/opensm/libopensm.map index c60e3d5..3ac0dc4 100644 --- a/osm/opensm/libopensm.map +++ b/osm/opensm/libopensm.map @@ -3,6 +3,7 @@ OPENSM_1.2 { osm_log; osm_is_debug; osm_log_init; + osm_log_init_v2; osm_mad_pool_construct; osm_mad_pool_destroy; osm_mad_pool_init; diff --git a/osm/opensm/osm_db_files.c b/osm/opensm/osm_db_files.c index 6ae968e..d2f39ac 100644 --- a/osm/opensm/osm_db_files.c +++ b/osm/opensm/osm_db_files.c @@ -712,7 +712,7 @@ main(int argc, char **argv) cl_list_construct( &keys ); cl_list_init( &keys, 10 ); - osm_log_init( &log, TRUE, 0xff, "/tmp/test_osm_db.log", FALSE); + osm_log_init_v2( &log, TRUE, 0xff, "/tmp/test_osm_db.log", 0, FALSE); osm_db_construct(&db); if (osm_db_init(&db, &log)) diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c index a5dac10..45acebc 100644 --- a/osm/opensm/osm_log.c +++ b/osm/opensm/osm_log.c @@ -225,7 +225,7 @@ #endif /* defined( _DEBUG_ ) */ } ib_api_status_t -osm_log_init( +osm_log_init_v2( IN osm_log_t* const p_log, IN const boolean_t flush, IN const uint8_t log_flags, @@ -279,3 +279,14 @@ osm_log_init( else return IB_ERROR; } + +ib_api_status_t +osm_log_init( + IN osm_log_t* const p_log, + IN const boolean_t flush, + IN const uint8_t log_flags, + IN const char *log_file, + IN const boolean_t accum_log_file ) +{ + return osm_log_init_v2(p_log, flush, log_flags, log_file, 0, accum_log_file); +} diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c index 0b39d13..19d0412 100644 --- a/osm/opensm/osm_opensm.c +++ b/osm/opensm/osm_opensm.c @@ -180,9 +180,9 @@ osm_opensm_init( /* Can't use log macros here, since we're initializing the log. */ osm_opensm_construct( p_osm ); - status = osm_log_init( &p_osm->log, p_opt->force_log_flush, - p_opt->log_flags, p_opt->log_file, - p_opt->log_max_size, p_opt->accum_log_file ); + status = osm_log_init_v2( &p_osm->log, p_opt->force_log_flush, + p_opt->log_flags, p_opt->log_file, + p_opt->log_max_size, p_opt->accum_log_file ); if( status != IB_SUCCESS ) return ( status ); diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c index 4f41e38..7b719a7 100644 --- a/osm/osmtest/osmtest.c +++ b/osm/osmtest/osmtest.c @@ -520,8 +520,8 @@ osmtest_init( IN osmtest_t * const p_osm /* Can't use log macros here, since we're initializing the log. */ osmtest_construct( p_osmt ); - status = osm_log_init( &p_osmt->log, p_opt->force_log_flush, - 0x0001, p_opt->log_file, 0, TRUE ); + status = osm_log_init_v2( &p_osmt->log, p_opt->force_log_flush, + 0x0001, p_opt->log_file, 0, TRUE ); if( status != IB_SUCCESS ) return ( status ); From tziporet at mellanox.co.il Mon Sep 4 12:55:02 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 4 Sep 2006 22:55:02 +0300 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7843@mtlexch01.mtl.com> Can you explain me how to run mvapich2-0.9.5? Thanks, Tziporet -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Christian Guggenberger Sent: Monday, September 04, 2006 6:25 PM To: Tziporet Koren Cc: Eli Cohen; Christian Guggenberger; OPENIB Subject: Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 > >>We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. > >>We tried to run here the test you attached on mvapich1 but have not seen > >>this failure. > >>Can you try to reproduce with mvapich1 version? > >> > > > >is it also okay if I tried with plain mvapich1 from OSU ? > I guess yes, although we use the one that comes with OFED. hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not reproducible. Using mvapich2-0.9.5 it happens each time... cheers. - Christian From mamidala at cse.ohio-state.edu Mon Sep 4 13:15:05 2006 From: mamidala at cse.ohio-state.edu (amith rajith mamidala) Date: Mon, 4 Sep 2006 16:15:05 -0400 (EDT) Subject: [openib-general] rdmacm library In-Reply-To: <42C02E8B.1070506@ichips.intel.com> Message-ID: Hi Sean, I installed the latest kernel:2.6.17.11 and the latest ib stack: rev 9240 When I compile programs with rdmacm library, I get the error: (though the program runs fine...) /usr/bin/ld: warning: libibverbs.so.1, needed by /usr/local/lib/librdmacm.so, may conflict with libibverbs.so.2 Does rdmacm use the older version of ibverbs or do I need to install rdmacm differently? Thanks, Amith From christian.guggenberger at rzg.mpg.de Mon Sep 4 13:45:11 2006 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Mon, 4 Sep 2006 22:45:11 +0200 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7843@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7843@mtlexch01.mtl.com> Message-ID: <20060904204511.GA7855@daltons.rzg.mpg.de> Hi Tziporet, On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote: > Can you explain me how to run mvapich2-0.9.5? at first, simple compiling using the OSU scripts (make.mvapich2.gen2) - should work out of the box. (except you will use PCI-X HCAs - you'll have to ommit "-DSRQ" in the build script then). Note, python-devel is needed for the build. then, assuming your doing your tests as root on a single box. - create /etc/mpd.conf containing the line "secretword=blabla" - just some non-meaningful passphrase ;) (you'll probably also need the same file as ~/.mpd.conf and ~/.mpdpasswd , too) - start mpd ring # mpdboot -n 1 -f hosts (hosts should contain the hostname) - check if mpdring is up and running # mpdtrace - start application on 2 CPUs # mpiexec -n 2 ./a.out - once tests are over, stop the ring # mpdallexit hope that helps, cheers. - Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5594 bytes Desc: not available URL: From panda at cse.ohio-state.edu Mon Sep 4 14:06:28 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon, 4 Sep 2006 17:06:28 -0400 (EDT) Subject: [openib-general] problems to regiser memory as a reglar In-Reply-To: <20060904204511.GA7855@daltons.rzg.mpg.de> from "Christian Guggenberger" at Sep 04, 2006 10:45:11 PM Message-ID: <200609042106.k84L6S2q025644@xi.cse.ohio-state.edu> Christian - Thanks for sending instructions for running mvapich2-0.9.5 to Tziporet. Tziporet - Thanks for looking into this problem on SLES9 environment. Please note that a detailed user guide for running and tuning MVAPICH2 0.9.5 is available from the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html DK > Hi Tziporet, > On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote: > > Can you explain me how to run mvapich2-0.9.5? > > at first, simple compiling using the OSU scripts (make.mvapich2.gen2) - > should work out of the box. (except you will use PCI-X HCAs - you'll > have to ommit "-DSRQ" in the build script then). Note, python-devel is > needed for the build. > > then, assuming your doing your tests as root on a single box. > > - create /etc/mpd.conf > > containing the line "secretword=blabla" - just some non-meaningful > passphrase ;) > (you'll probably also need the same file as ~/.mpd.conf and > ~/.mpdpasswd , too) > > - start mpd ring > # mpdboot -n 1 -f hosts > (hosts should contain the hostname) > > - check if mpdring is up and running > # mpdtrace > > - start application on 2 CPUs > # mpiexec -n 2 ./a.out > > - once tests are over, stop the ring > # mpdallexit > > hope that helps, > > cheers. > - Christian From sean.hefty at intel.com Mon Sep 4 21:05:31 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 4 Sep 2006 21:05:31 -0700 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <44FBC374.8040709@voltaire.com> Message-ID: <000001c6d0a0$87bd9050$15248686@amr.corp.intel.com> >ok, thanks for clarifying that, is cancellation allowed only for address >resolution or also for route resolving and/or CM calls? also how about >documenting this? Cancellation is allowed for any asynchronous operation. I will pull in your patch when I get back in the office. Thanks. - Sean From sean.hefty at intel.com Mon Sep 4 21:08:04 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 4 Sep 2006 21:08:04 -0700 Subject: [openib-general] rdmacm library In-Reply-To: Message-ID: <000101c6d0a0$e3207de0$15248686@amr.corp.intel.com> >/usr/bin/ld: warning: libibverbs.so.1, needed by >/usr/local/lib/librdmacm.so, may conflict with libibverbs.so.2 > >Does rdmacm use the older version of ibverbs or do I need to install >rdmacm differently? I keep the RDMA CM updated with the latest version of verbs. There may be an issue with the library's build; I'll look into this. - Sean From eitan at dev.mellanox.co.il Mon Sep 4 22:40:59 2006 From: eitan at dev.mellanox.co.il (eitan at dev.mellanox.co.il) Date: Tue, 5 Sep 2006 08:40:59 +0300 (IDT) Subject: [openib-general] MPI Brodcast doubt In-Reply-To: References: Message-ID: <10677.194.90.237.34.1157434859.squirrel@dev.mellanox.co.il> > > I have 3 nodes connected via IB as shown below: > > node1 ---> switch1 ---> node2 > |----------> node3 > > If node1 sends a brodcast message to node2 and node3, I want to know if > the > message is delivered to the switch twice (first time for node2 and second > time for node3) or just once (where switch will know by looking at some > headers or so that its a brodcast message and will send it on all the > outgoing ports) ? Message delivered once. Switch duplicates it. EZ From ogerlitz at voltaire.com Mon Sep 4 23:43:15 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 5 Sep 2006 09:43:15 +0300 (IDT) Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB Message-ID: Hi, While doing some work to have linux bonding driver be able to work on top of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62. ib0: failed send event (status=2, wrid=52 vend_err 62) What does this vendor error means? its the same system over which i saw the qp modify error. There are some more problematic prints i see here which i will be happy to get some idea on their meaning... ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ??? ib1: timing out; will leak address handles ib1: ib_dealloc_pd failed (the pd dealloc failure is as of the ah leak) but what is the leak cause ??? Below is a more detailed snapshot of the time the problems has occured, I was playing with this HCA 2 IB links, getting one of down for about 45 seconds (by some instrumentation of the SM) and then the other, etc. The ipoib code is unchanged (other then adding the "ipoib_set_mcast_list called" print). The bonding code was changed not to set the slave mac address but rather use the mac address of the active slave and also override the ether_setup() settings with the active slave ones. One thing which i think to see is that the IPoIB attempts to join the IPv4 broadcast group even when the port IB link is down, am i correct? if yes, would it be easy to fix this? Or. 1 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 2 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 3 ib0: starting multicast thread 4 ib1: stopping multicast thread 5 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 6 ib1: flushing multicast list 7 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 8 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 9 ib1: starting multicast thread 10 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 11 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 12 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) 13 ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c103c0, LID 0xc000, SL 0 14 ib1: successfully joined all multicast groups 15 bonding: bond0: link status definitely down for interface ib0, disabling it 16 bonding: bond0: making interface ib1 the new active one. 17 ib0: ipoib_set_mcast_list called 18 ib1: ipoib_set_mcast_list called 19 ib0: restarting multicast task 20 ib0: stopping multicast thread 21 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 22 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4) 23 ib0: starting multicast thread 24 ib1: restarting multicast task 25 ib1: stopping multicast thread 26 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 27 ib1: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001 28 ib1: starting multicast thread 29 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 30 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001 31 ib1: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0) 32 ib1: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff810037f91d00, LID 0xc001, SL 0 33 ib1: successfully joined all multicast groups 34 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110) 35 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110 36 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 37 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110) 38 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110 39 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 40 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110) 41 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110 42 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 43 ib0: stopping multicast thread 44 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 45 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4) 46 ib0: flushing multicast list 47 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 48 ib0: starting multicast thread 49 ib1: stopping multicast thread 50 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001 51 ib1: flushing multicast list 52 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001 53 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001 54 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 55 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 56 ib1: starting multicast thread 57 ib0: stopping multicast thread 58 ib0: flushing multicast list 59 ib0: starting multicast thread 60 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 61 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 62 bonding: bond0: link status definitely down for interface ib1, disabling it 63 ib1: ipoib_set_mcast_list called 64 bonding: bond0: now running without any active interface ! 65 ib1: restarting multicast task 66 ib1: stopping multicast thread 67 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 68 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4) 69 ib1: starting multicast thread 70 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 71 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) 72 ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c10d80, LID 0xc000, SL 0 73 ib0: successfully joined all multicast groups 74 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) 75 ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff81000b8453c0, LID 0xc000, SL 0 76 ib1: successfully joined all multicast groups 77 ib1: dev_queue_xmit failed to requeue packet 78 ib1: dev_queue_xmit failed to requeue packet 79 bonding: bond0: link status definitely up for interface ib0. 80 bonding: bond0: link status definitely up for interface ib1. 81 bonding: bond0: making interface ib0 the new active one. 82 ib0: ipoib_set_mcast_list called 83 bonding: bond0: first active interface up! 84 ib0: restarting multicast task 85 ib0: stopping multicast thread 86 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 87 ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001 88 ib0: starting multicast thread 89 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001 90 ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0) 91 ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff81000099c340, LID 0xc001, SL 0 92 ib0: successfully joined all multicast groups 93 ib0: failed send event (status=2, wrid=52 vend_err 62) 94 ib0: ipoib_set_mcast_list called 95 ib0: restarting multicast task 96 ib0: stopping multicast thread 97 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001 98 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001 99 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001 100 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001 101 ib0: starting multicast thread 102 ib0: successfully joined all multicast groups 103 ib0: stopping multicast thread 104 ib0: flushing multicast list 105 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 106 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 107 ib1: stopping multicast thread 108 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 109 ib1: flushing multicast list 110 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 111 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 112 ib1: timing out; will leak address handles 113 bonding: bond0: released all slaves 114 ib0: stopping multicast thread 115 ib0: flushing multicast list 116 ib1: stopping multicast thread 117 ib1: flushing multicast list 118 ib1: ib_dealloc_pd failed From mst at mellanox.co.il Tue Sep 5 00:13:02 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 5 Sep 2006 10:13:02 +0300 Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB In-Reply-To: References: Message-ID: <20060905071302.GC5401@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: getting LOC_QP_OP_ERR with IPoIB > > Hi, > > While doing some work to have linux bonding driver be able to work on top > of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62. > > ib0: failed send event (status=2, wrid=52 vend_err 62) > > What does this vendor error means? its the same system over which i saw the qp modify error. vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched -- MST From leonid at scalemp.com Tue Sep 5 00:30:41 2006 From: leonid at scalemp.com (Leonid Arsh) Date: Tue, 5 Sep 2006 10:30:41 +0300 Subject: [openib-general] OpenSM - guid2lid cache file questions Message-ID: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com> Hi list, I have a question regarding the guid2lid cache file. The file is read by OpenSM on the start up. OpenSM may reassign LIDs according to the LIDs saved in this file. It isn't always acceptable. Is it a right policy? Am I missing anything here? Is there a way to disable the file reading on start up? Regards, Leonid From ogerlitz at voltaire.com Tue Sep 5 00:40:56 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 05 Sep 2006 10:40:56 +0300 Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB In-Reply-To: <20060905071302.GC5401@mellanox.co.il> References: <20060905071302.GC5401@mellanox.co.il> Message-ID: <44FD2A08.1040708@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> While doing some work to have linux bonding driver be able to work on top >> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62. >> ib0: failed send event (status=2, wrid=52 vend_err 62) >> What does this vendor error means? its the same system over which i saw the qp modify error. > vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched Thanks. So what's your thinking, am i running into some ipoib bogus scenario? Or. From mst at mellanox.co.il Tue Sep 5 00:48:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 5 Sep 2006 10:48:34 +0300 Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB In-Reply-To: <44FD2A08.1040708@voltaire.com> References: <20060905071302.GC5401@mellanox.co.il> <44FD2A08.1040708@voltaire.com> Message-ID: <20060905074834.GD5401@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: getting LOC_QP_OP_ERR with IPoIB > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz : > > >> While doing some work to have linux bonding driver be able to work on top > >> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62. > >> ib0: failed send event (status=2, wrid=52 vend_err 62) > >> What does this vendor error means? its the same system over which i saw the qp modify error. > > > > vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched > > Thanks. > > So what's your thinking, am i running into some ipoib bogus scenario? > > Or. Donnu, it looks really weird. Could you try firmware 3.5.0 please? -- MST From halr at voltaire.com Tue Sep 5 03:57:53 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Sep 2006 06:57:53 -0400 Subject: [openib-general] OpenSM - guid2lid cache file questions In-Reply-To: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com> References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com> Message-ID: <1157453867.26953.176326.camel@hal.voltaire.com> Hi Leonid, On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote: > Hi list, > > I have a question regarding the guid2lid cache file. > > The file is read by OpenSM on the start up. > OpenSM may reassign LIDs according to the LIDs saved in this file. > It isn't always acceptable. > > Is it a right policy? Am I missing anything here? > Is there a way to disable the file reading on start up? There is the -r (--reassign_lids) option for this but it is not the default behavior of OpenSM. -- Hal > > Regards, > Leonid > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Tue Sep 5 03:58:27 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Sep 2006 06:58:27 -0400 Subject: [openib-general] MPI Brodcast doubt In-Reply-To: References: Message-ID: <1157453896.26953.176365.camel@hal.voltaire.com> John, On Mon, 2006-09-04 at 08:56, john t wrote: > Hi, > > I have 3 nodes connected via IB as shown below: > > node1 ---> switch1 ---> node2 > |----------> node3 > > If node1 sends a brodcast message to node2 and node3, I want to know > if the message is delivered to the switch twice (first time for node2 > and second time for node3) or just once (where switch will know by > looking at some headers or so that its a brodcast message and will > send it on all the outgoing ports) ? Assuming nodes 1, 2, and 3 are part of the same multicast group, the multicast send is sent once from node 1. When received at the switch, it is replicated to all ports which have members in the same group (in this case, nodes 2 and 3). The switch knows by the header (specifically the LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable to determine on which ports to forward it. However, IB multicast is unreliable so to create reliable multicast, it is sometimes "emulated" in that the sender tracks the group members and may use serial unicast sends or augment a multicast send with unicast sends to the receivers and track their acknowledgements of receipt. -- Hal > Regards, > John T. > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From leonid at scalemp.com Tue Sep 5 05:11:33 2006 From: leonid at scalemp.com (Leonid Arsh) Date: Tue, 5 Sep 2006 15:11:33 +0300 Subject: [openib-general] OpenSM - guid2lid cache file questions In-Reply-To: <1157453867.26953.176326.camel@hal.voltaire.com> References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com> <1157453867.26953.176326.camel@hal.voltaire.com> Message-ID: <10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com> Hi Hal, Thank you for your reply. Probably I wasn't clear. I have a problem when OpenSM, being started, reads an out-if-date guid2lid file. OpenSM changes LIDs in this case. I don't want the LIDs to be changed. As I understand it, the '-r' option, on the contrary, causes the SM to reassign all the LIDs. I could just remove the file to handle the problem. I'd like to know if there is a way to do it without touching the file. Thanks, Leonid On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock wrote: > Hi Leonid, > > On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote: > > Hi list, > > > > I have a question regarding the guid2lid cache file. > > > > The file is read by OpenSM on the start up. > > OpenSM may reassign LIDs according to the LIDs saved in this file. > > It isn't always acceptable. > > > > Is it a right policy? Am I missing anything here? > > Is there a way to disable the file reading on start up? > > There is the -r (--reassign_lids) option for this but it is not the > default behavior of OpenSM. > > -- Hal > > > > > Regards, > > Leonid > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From dotanb at dev.mellanox.co.il Tue Sep 5 05:26:33 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 05 Sep 2006 15:26:33 +0300 Subject: [openib-general] MPI Brodcast doubt In-Reply-To: <1157453896.26953.176365.camel@hal.voltaire.com> References: <1157453896.26953.176365.camel@hal.voltaire.com> Message-ID: <44FD6CF9.6090805@dev.mellanox.co.il> Hal Rosenstock wrote: > John, > > On Mon, 2006-09-04 at 08:56, john t wrote: > >> Hi, >> >> I have 3 nodes connected via IB as shown below: >> >> node1 ---> switch1 ---> node2 >> |----------> node3 >> >> If node1 sends a brodcast message to node2 and node3, I want to know >> if the message is delivered to the switch twice (first time for node2 >> and second time for node3) or just once (where switch will know by >> looking at some headers or so that its a brodcast message and will >> send it on all the outgoing ports) ? >> > > Assuming nodes 1, 2, and 3 are part of the same multicast group, the > multicast send is sent once from node 1. When received at the switch, it > is replicated to all ports which have members in the same group (in this > case, nodes 2 and 3). The switch knows by the header (specifically the > LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable > to determine on which ports to forward it. However, IB multicast is > unreliable so to create reliable multicast, it is sometimes "emulated" > in that the sender tracks the group members and may use serial unicast > sends or augment a multicast send with unicast sends to the receivers > and track their acknowledgements of receipt. > > -- Hal > All of the above is true for IB multicast (there isn't any broadcast in IB). If the question was "what happens when one send a message using MPI_broadcast?" then the answer will be: it depends on the MPI implementation. I know that in MVAPICH the MPI handles the duplications by itself by default (and the switch will get two messages and not one). There is an option in that MPI to use IB multicast but it is disabled by default. Dotan From halr at voltaire.com Tue Sep 5 05:46:22 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Sep 2006 08:46:22 -0400 Subject: [openib-general] OpenSM - guid2lid cache file questions In-Reply-To: <10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com> References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com> <1157453867.26953.176326.camel@hal.voltaire.com> <10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com> Message-ID: <1157460382.26953.179764.camel@hal.voltaire.com> Hi Leonid, On Tue, 2006-09-05 at 08:11, Leonid Arsh wrote: > Hi Hal, > > Thank you for your reply. > > Probably I wasn't clear. > > I have a problem when OpenSM, being started, reads an out-if-date guid2lid file. > OpenSM changes LIDs in this case. How do you know the file is "out of date" ? > I don't want the LIDs to be changed. Oh, it's the other way you were asking about. > As I understand it, the '-r' option, on the contrary, causes the SM to > reassign all the LIDs. > > I could just remove the file to handle the problem. or move it aside. > I'd like to know if there is a way to do it without touching the file. Not currently. There is the -x (--honor_guid2lid) which will do this (ignore the guid2lid file) when OpenSM is coming out of STANDBY though. -- Hal > Thanks, > Leonid > > On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock wrote: > > Hi Leonid, > > > > On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote: > > > Hi list, > > > > > > I have a question regarding the guid2lid cache file. > > > > > > The file is read by OpenSM on the start up. > > > OpenSM may reassign LIDs according to the LIDs saved in this file. > > > It isn't always acceptable. > > > > > > Is it a right policy? Am I missing anything here? > > > Is there a way to disable the file reading on start up? > > > > There is the -r (--reassign_lids) option for this but it is not the > > default behavior of OpenSM. > > > > -- Hal > > > > > > > > Regards, > > > Leonid > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > From ogerlitz at voltaire.com Tue Sep 5 05:51:43 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 05 Sep 2006 15:51:43 +0300 Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question In-Reply-To: <20060905074834.GD5401@mellanox.co.il> References: <20060905071302.GC5401@mellanox.co.il> <44FD2A08.1040708@voltaire.com> <20060905074834.GD5401@mellanox.co.il> Message-ID: <44FD72DF.4000708@voltaire.com> Michael S. Tsirkin wrote: > Donnu, it looks really weird. Could you try firmware 3.5.0 please? I just noted that you can not work with mstflint if the mthca driver is not loaded, i think it was not the case in the gen1 tools, am i correct. Is this connected to this print ACPI: PCI interrupt for device 0000:02:00.0 disabled i see once the mthca driver is unloaded? Or. > dill:/tmp # modprobe -r ib_mthca > dill:/tmp # ./mstflint -d 00:02:00.0 q > *** ERROR *** Read a corrupted device id (0xffff). Probably HW/PCI access problem > *** ERROR *** Device type 65535 not supported. > *** ERROR *** Can not get flash type using device 00:02:00.0 > dill:/tmp # modprobe ib_mthca > dill:/tmp # ./mstflint -d 00:02:00.0 q > Image type: Failsafe > I.S. Version: 1 > Chip Revision: A1 > GUID Des: Node Port1 Port2 Sys image > GUIDs: 0008f104039651dc 0008f104039651dd 0008f104039651de 0008f104039651df > Board ID: (VLT0010010001) > VSD: > PSID: VLT0010010001 > dill:/tmp # dmesg > ACPI: PCI interrupt for device 0000:02:00.0 disabled > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > ib_mthca: Initializing 0000:02:00.0 > PCI: Enabling device 0000:02:00.0 (0110 -> 0112) > ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 29 (level, low) -> IRQ 193 From tziporet at dev.mellanox.co.il Tue Sep 5 05:57:17 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 05 Sep 2006 15:57:17 +0300 Subject: [openib-general] problems to regiser memory as a reglar In-Reply-To: <200609042106.k84L6S2q025644@xi.cse.ohio-state.edu> References: <200609042106.k84L6S2q025644@xi.cse.ohio-state.edu> Message-ID: <44FD742D.10506@dev.mellanox.co.il> Dhabaleswar Panda wrote: > Christian - Thanks for sending instructions for running mvapich2-0.9.5 > to Tziporet. > > Tziporet - Thanks for looking into this problem on SLES9 environment. > > Please note that a detailed user guide for running and tuning MVAPICH2 > 0.9.5 is available from the following URL: > > http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html > > DK > Thanks to all, We found the bug that was in memory registration flow of SLES9 only. A fix will be available in OFED 1.1 RC4 Tziporet From oibleo at gmail.com Tue Sep 5 06:13:00 2006 From: oibleo at gmail.com (Leonid Arsh) Date: Tue, 5 Sep 2006 16:13:00 +0300 Subject: [openib-general] OpenSM - guid2lid cache file questions In-Reply-To: <1157460382.26953.179764.camel@hal.voltaire.com> References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com> <1157453867.26953.176326.camel@hal.voltaire.com> <10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com> <1157460382.26953.179764.camel@hal.voltaire.com> Message-ID: <10e223bf0609050613w68de2491r9f4e1c0a37f5a5b6@mail.gmail.com> Thanks, On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock wrote: > > I have a problem when OpenSM, being started, reads an out-if-date guid2lid file. > > OpenSM changes LIDs in this case. > > How do you know the file is "out of date" ? > Actually, the LIDs were assigned by another SM. When I start my new OpenSM, the old SM is already dead. Before starting the new OpenSM, the ibnetdiscover utility shows LIDs different from ones in the file. When I start OpenSM, the LIDs are reassigned on the fabric. From bugzilla-daemon at openib.org Tue Sep 5 06:16:24 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 5 Sep 2006 06:16:24 -0700 (PDT) Subject: [openib-general] [Bug 131] working with huge pages may crash the kernel on Suse10 Message-ID: <20060905131624.21B162283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=131 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from tziporet at mellanox.co.il 2006-09-05 06:16 ------- was fixed in 1.1-rc3 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Tue Sep 5 06:18:58 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 5 Sep 2006 06:18:58 -0700 (PDT) Subject: [openib-general] [Bug 145] IB Core unable to communicate IPoIB on Fedora Core 4 Message-ID: <20060905131858.0ED0E228423@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=145 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from tziporet at mellanox.co.il 2006-09-05 06:18 ------- this is not a bug in OFED ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Tue Sep 5 06:18:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Sep 2006 09:18:03 -0400 Subject: [openib-general] OpenSM - guid2lid cache file questions In-Reply-To: <10e223bf0609050613w68de2491r9f4e1c0a37f5a5b6@mail.gmail.com> References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com> <1157453867.26953.176326.camel@hal.voltaire.com> <10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com> <1157460382.26953.179764.camel@hal.voltaire.com> <10e223bf0609050613w68de2491r9f4e1c0a37f5a5b6@mail.gmail.com> Message-ID: <1157462283.26953.180804.camel@hal.voltaire.com> Leonid, On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote: > Thanks, > > On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock wrote: > > > I have a problem when OpenSM, being started, reads an out-if-date guid2lid file. > > > OpenSM changes LIDs in this case. > > > > How do you know the file is "out of date" ? > > > Actually, the LIDs were assigned by another SM. Different (vendor) SMs have different LID assignment and pathing (routing) policies. It is inadvisable to failover across vendor SMs for this and other reasons. -- Hal > When I start my new OpenSM, the old SM is already dead. > Before starting the new OpenSM, the ibnetdiscover utility shows LIDs different > from ones in the file. > When I start OpenSM, the LIDs are reassigned on the fabric. From halr at voltaire.com Tue Sep 5 06:25:28 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Sep 2006 09:25:28 -0400 Subject: [openib-general] [PATCH] opensm: osm_log_init_v2() - new osm_log initializer In-Reply-To: <20060904172006.10400.62708.stgit@sashak.voltaire.com> References: <20060904172006.10400.62708.stgit@sashak.voltaire.com> Message-ID: <1157462724.26953.181066.camel@hal.voltaire.com> On Mon, 2006-09-04 at 13:20, Sasha Khapyorsky wrote: > There is new osm_log initializer osm_log_init_v2(), this is wrapped > by osm_log_init() in order to preserve existing API. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to trunk and 1.1). -- Hal From eitan at dev.mellanox.co.il Tue Sep 5 06:25:44 2006 From: eitan at dev.mellanox.co.il (Eitan Zahavi) Date: Tue, 5 Sep 2006 16:25:44 +0300 Subject: [openib-general] OpenSM - guid2lid cache file questions In-Reply-To: <1157462283.26953.180804.camel@hal.voltaire.com> References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com> <1157453867.26953.176326.camel@hal.voltaire.com> <10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com> <1157460382.26953.179764.camel@hal.voltaire.com> <10e223bf0609050613w68de2491r9f4e1c0a37f5a5b6@mail.gmail.com> <1157462283.26953.180804.camel@hal.voltaire.com> Message-ID: <000001c6d0ee$cb29dcb0$617d9610$@mellanox.co.il> Hi Leonid, The best approach when switching from another vendor SM to OpenSM is to delete the /var/cache/osm/guid2lid file. > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Hal Rosenstock > Sent: Tuesday, September 05, 2006 4:18 PM > To: Leonid Arsh > Cc: openib-general at openib.org > Subject: Re: [openib-general] OpenSM - guid2lid cache file questions > > Leonid, > > On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote: > > Thanks, > > > > On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock wrote: > > > > I have a problem when OpenSM, being started, reads an out-if-date > guid2lid file. > > > > OpenSM changes LIDs in this case. > > > > > > How do you know the file is "out of date" ? > > > > > Actually, the LIDs were assigned by another SM. > > Different (vendor) SMs have different LID assignment and pathing > (routing) policies. It is inadvisable to failover across vendor SMs for this and > other reasons. > > -- Hal > > > When I start my new OpenSM, the old SM is already dead. > > Before starting the new OpenSM, the ibnetdiscover utility shows LIDs > > different from ones in the file. > > When I start OpenSM, the LIDs are reassigned on the fabric. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Tue Sep 5 06:36:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 5 Sep 2006 16:36:50 +0300 Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question In-Reply-To: <44FD72DF.4000708@voltaire.com> References: <44FD72DF.4000708@voltaire.com> Message-ID: <20060905133650.GL5401@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: getting LOC_QP_OP_ERR with IPoIB - mstflint question > > Michael S. Tsirkin wrote: > > Donnu, it looks really weird. Could you try firmware 3.5.0 please? > > I just noted that you can not work with mstflint if the mthca driver is > not loaded, i think it was not the case in the gen1 tools, am i correct. Yes, recent kernels disable device access once driver is unloaded: mstflint -d 08:00.0 q *** ERROR *** Read a corrupted device id (0xffff). Probably HW/PCI access problem *** ERROR *** Device type 65535 not supported. *** ERROR *** Can not get flash type using device 08:00.0 mstflint should work without driver using /proc: mstflint -d /proc/bus/pci/08/00.0 q Image type: Failsafe I.S. Version: 1 Chip Revision: A0 In gen1 flint had a separate driver which you had to load. I am not sure whether this would work on 2.6.18 > Is this connected to this print > > ACPI: PCI interrupt for device 0000:02:00.0 disabled > > i see once the mthca driver is unloaded? > > Or. Probably not. -- MST From thomas.bub at thomson.net Tue Sep 5 07:22:28 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Tue, 5 Sep 2006 16:22:28 +0200 Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. Message-ID: I'm still in the process of migrating my gen1 application to gen2. Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener on another machine. Any hints where to look at? Is there anything in the architecture that might prevent a libibcm connection to another machine? I'm using an old Voltaire switch to connect the machines. Can this be the reason? The switch didn't cause problems using gen1 clients. Thanks Thomas Bub -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at dev.mellanox.co.il Tue Sep 5 08:12:08 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 05 Sep 2006 18:12:08 +0300 Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. In-Reply-To: References: Message-ID: <44FD93C8.1020604@dev.mellanox.co.il> Hi bub. Bub Thomas wrote: > > I’m still in the process of migrating my gen1 application to gen2. > > Actually I CAN connect a gen2 application to a gen2 listener > application on the same machine but NOT to a gen 2 listener on another > machine. > > Any hints where to look at? > > Is there anything in the architecture that might prevent a libibcm > connection to another machine? > > I’m using an old Voltaire switch to connect the machines. Can this be > the reason? > > The switch didn’t cause problems using gen1 clients. > What is the problem that you see? there are some examples that comes with the libibcm that can show you how to use the library. there can be several reasons for your problem: 1) side A send a req when side B is not ready and there is a timeout failure 2) only in side A the ib_ucm kernel module enabled 3) SM is not working (well) 4) host A cannot be reached to host B using IB 5) endianess issues? i tried to use the libibcm and i don't have any problem (but i don't have any Voltaire switch, so i can't check your scenario). Dotan From halr at voltaire.com Tue Sep 5 08:20:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Sep 2006 11:20:00 -0400 Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. In-Reply-To: References: Message-ID: <1157469599.26953.184762.camel@hal.voltaire.com> Hi Bub, On Tue, 2006-09-05 at 10:22, Bub Thomas wrote: > I’m still in the process of migrating my gen1 application to gen2. > > Actually I CAN connect a gen2 application to a gen2 listener > application on the same machine but NOT to a gen 2 listener on another > machine. > > Any hints where to look at? What are you using for SM ? OpenSM or vendor SM ? > Is there anything in the architecture that might prevent a libibcm > connection to another machine? I don't think this is an architectural issue. -- Hal > I’m using an old Voltaire switch to connect the machines. Can this be > the reason? > > The switch didn’t cause problems using gen1 clients. > > Thanks > > Thomas Bub > > > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From thomas.bub at thomson.net Tue Sep 5 09:11:13 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Tue, 5 Sep 2006 18:11:13 +0200 Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. Message-ID: Dotan, the ibv_rc_pingpong example works for me so I can exclude the architecture. I never got the libibcm example compiled. Which is your example and which architecture x86 vs. x86_64 did you compile it for? Can you share your libibcm the example code? (if it is not the standard that I can't get compiled) Thomas -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Dotan Barak Sent: Tuesday, September 05, 2006 5:12 PM To: Bub Thomas Cc: openib-general at openib.org Subject: Re: [openib-general] libibcm can't connect/talk to libicm on other machine. Hi bub. Bub Thomas wrote: > > I'm still in the process of migrating my gen1 application to gen2. > > Actually I CAN connect a gen2 application to a gen2 listener > application on the same machine but NOT to a gen 2 listener on another > machine. > > Any hints where to look at? > > Is there anything in the architecture that might prevent a libibcm > connection to another machine? > > I'm using an old Voltaire switch to connect the machines. Can this be > the reason? > > The switch didn't cause problems using gen1 clients. > What is the problem that you see? there are some examples that comes with the libibcm that can show you how to use the library. there can be several reasons for your problem: 1) side A send a req when side B is not ready and there is a timeout failure 2) only in side A the ib_ucm kernel module enabled 3) SM is not working (well) 4) host A cannot be reached to host B using IB 5) endianess issues? i tried to use the libibcm and i don't have any problem (but i don't have any Voltaire switch, so i can't check your scenario). Dotan _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From dlpaktor at us.ibm.com Tue Sep 5 10:30:43 2006 From: dlpaktor at us.ibm.com (David L Paktor) Date: Tue, 5 Sep 2006 10:30:43 -0700 Subject: [openib-general] New development tool for boot-time drivers (FCode, IEE-1275, IBM/Sun) Message-ID: If anyone is interested in developing boot-time device drivers for plug-in devices, conformant to the IEEE-1275 (Open Firmware) specification, using FCode (tokenized Forth source), which is compatible with both IBM and Sun platforms (and is platform-independent, so that a driver written once is compatible with all Open Firmware platforms ... but you already know all this if you're using Open Firmware), then you will need a Tokenizer to translate from your Forth source to FCode tokens, which are the "medium of exchange" between the device and the platform. I am writing to announce that a new FCode Tokenizer, capable of running on IBM equipment (and that can be compiled on any other host that supports the GnuCC compiler, and others as well) is freely available at the web-site of the OpenBIOS project, www.openbios.org (and just follow the links about the New FCODE suite) If you have any questions, please direct them to the OpenBIOS Mailing List. Thank you. ----- David L. Paktor System Firmware Developer System and Technology Group Global Firmware Division dlpaktor at us.ibm.com David L Paktor/Almaden/IBM at IBMUS 18880 Homestead Rd. Building 9945 Cupertino CA 95014 Room 1026 408-342-6110 T/L 560-6110 "The Bug Stops Here" -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue Sep 5 11:14:41 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 05 Sep 2006 11:14:41 -0700 Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. In-Reply-To: References: Message-ID: <44FDBE91.4020005@ichips.intel.com> Bub Thomas wrote: > Dotan, > the ibv_rc_pingpong example works for me so I can exclude the > architecture. > I never got the libibcm example compiled. > Which is your example and which architecture x86 vs. x86_64 did you > compile it for? > Can you share your libibcm the example code? (if it is not the standard > that I can't get compiled) > Thomas Did you try applying the following patch? http://openib.org/pipermail/openib-general/2006-August/025005.html I should also mention that I have a version of cmpost that works with the new libibsa, but I am waiting for the review of the kernel sa_query changes before committing. - Sean From jwm at systemfabricworks.com Tue Sep 5 11:43:25 2006 From: jwm at systemfabricworks.com (JWM) Date: Tue, 5 Sep 2006 13:43:25 -0500 Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. References: Message-ID: <004201c6d11b$2be1d360$7401a8c0@Maelstrom> libibcm can't connect/talk to libicm on other machine. I know this sounds simple, but have you checked the routing tables? ....JW ----- Original Message ----- From: Bub Thomas To: openib-general at openib.org Sent: Tuesday, September 05, 2006 9:22 AM Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. I'm still in the process of migrating my gen1 application to gen2. Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener on another machine. Any hints where to look at? Is there anything in the architecture that might prevent a libibcm connection to another machine? I'm using an old Voltaire switch to connect the machines. Can this be the reason? The switch didn't cause problems using gen1 clients. Thanks Thomas Bub ------------------------------------------------------------------------------ _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From arlin.r.davis at intel.com Tue Sep 5 14:16:15 2006 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 5 Sep 2006 14:16:15 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <44F8E006.5030607@pathscale.com> Message-ID: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> Robert, Here is a slightly modified patch for your attributes issue. Can you give it a try? Signed-off by: Arlin Davis ardavis at ichips.intel.com Index: dapl/openib/dapl_ib_util.c =================================================================== --- dapl/openib/dapl_ib_util.c (revision 9106) +++ dapl/openib/dapl_ib_util.c (working copy) @@ -446,6 +446,7 @@ return(dapl_convert_errno(errno,"ib_query_hca")); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->ia_address_ptr = @@ -470,7 +471,12 @@ /* ia_attr->hardware_version_minor = dev_attr.fw_ver; */ ia_attr->max_eps = dev_attr.max_qp; ia_attr->max_dto_per_ep = dev_attr.max_qp_wr; - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_out = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; + ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE; ia_attr->max_evds = dev_attr.max_cq; ia_attr->max_evd_qlen = dev_attr.max_cqe; ia_attr->max_iov_segments_per_dto = dev_attr.max_sge; @@ -501,6 +507,7 @@ } if (ep_attr != NULL) { + (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr)); ep_attr->max_mtu_size = port_attr.max_msg_sz; ep_attr->max_rdma_size = port_attr.max_msg_sz; ep_attr->max_recv_dtos = dev_attr.max_qp_wr; Index: dapl/openib_cma/dapl_ib_util.c =================================================================== --- dapl/openib_cma/dapl_ib_util.c (revision 9106) +++ dapl/openib_cma/dapl_ib_util.c (working copy) @@ -424,6 +424,7 @@ return(dapl_convert_errno(errno,"ib_query_hca")); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->ia_address_ptr = @@ -446,6 +447,8 @@ ia_attr->hardware_version_major = dev_attr.hw_ver; ia_attr->max_eps = dev_attr.max_qp; ia_attr->max_dto_per_ep = dev_attr.max_qp_wr; + ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_out = dev_attr.max_qp_rd_atom; ia_attr->max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; ia_attr->max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; @@ -481,6 +484,7 @@ } if (ep_attr != NULL) { + (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr)); ep_attr->max_mtu_size = port_attr.max_msg_sz; ep_attr->max_rdma_size = port_attr.max_msg_sz; ep_attr->max_recv_dtos = dev_attr.max_qp_wr; Index: dapl/openib_scm/dapl_ib_util.c =================================================================== --- dapl/openib_scm/dapl_ib_util.c (revision 9106) +++ dapl/openib_scm/dapl_ib_util.c (working copy) @@ -373,6 +373,7 @@ return(dapl_convert_errno(errno,"ib_query_hca")); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->ia_address_ptr = (DAT_IA_ADDRESS_PTR)&hca_ptr->hca_address; @@ -390,7 +391,12 @@ /* ia_attr->hardware_version_minor = dev_attr.fw_ver; */ ia_attr->max_eps = dev_attr.max_qp; ia_attr->max_dto_per_ep = dev_attr.max_qp_wr; - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_out = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; + ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE; ia_attr->max_evds = dev_attr.max_cq; ia_attr->max_evd_qlen = dev_attr.max_cqe; ia_attr->max_iov_segments_per_dto = dev_attr.max_sge; @@ -422,6 +428,7 @@ } if (ep_attr != NULL) { + (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr)); ep_attr->max_mtu_size = port_attr.max_msg_sz; ep_attr->max_rdma_size = port_attr.max_msg_sz; ep_attr->max_recv_dtos = dev_attr.max_qp_wr; From rjwalsh at pathscale.com Tue Sep 5 14:30:05 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 05 Sep 2006 14:30:05 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> Message-ID: <44FDEC5D.8050508@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Arlin Davis wrote: > Robert, > > Here is a slightly modified patch for your attributes issue. Can you give it a try? > I'll give it a spin this afternoon: it looks quite a bit more comprehensive than the small patch I did. Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP3sXfzvnpzTd9fxAQLwwAf+IOIsC+gqb9Juzt8rwJJlnSW1PjZFrRGi NrCnRXvn52tsgclNNHGSzqOgkIntZ2TqxwEJJeTou3UhUQ5laJWEkQgwrvFTazcn +IQH3BGDLFyZJJQO0WSi2685dEKOH5by6Zp9yVo9sy3Odu6jod2v/uCOjdGkR8ys CvQW+y70qDmom1SJ9P2XQ4/dxxX/v2IFYOWMoVzMlDZsNnvnti/Uspwc1KpQeP6F RRwWImlDyuuAW6+JX6atM5Lne797T5IO7MugW6d/+0oAMVU7H3oiDBdX+9tVwBci IBJJ/PdQ8e7a7x4uOg+LKOSDH16IFVNaua4XhBfVmQEjf1y41KepDw== =1zt8 -----END PGP SIGNATURE----- From mvharish at gmail.com Tue Sep 5 14:49:27 2006 From: mvharish at gmail.com (harish) Date: Tue, 5 Sep 2006 14:49:27 -0700 Subject: [openib-general] Question about interrupt generation Message-ID: Hi All, I tried the following simple experiment and am not able to understand the results: Calcualted the number of interrupts generated by the infiniband [with little or no traffic to the NIC] over a period of 10seconds and saw around 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K interrupts/sec. This screwed up my host machine. To reduce the impact of the interrupts, I add a call back that is scheduled to be periodically called every few microseconds that masks the irq line used by the NIC and a little later unmasks the same. Noticed that with no traffic, I see anywhere between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ interrupts/sec. Am a newbie to infiniband technology and so do not understand why so many interrupts are getting generated when I have my call back periodically called. Could it be that the Infiniband supports MSI? Or is what I am seeing IPIs? Or does Infiniband generate interrupts based on types of events and what I am doing by masking/unmasking the interrupt line is one such event? Any information/suggestions would be useful. Thanks in advance, harish -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.j.woodruff at intel.com Tue Sep 5 14:51:37 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 5 Sep 2006 14:51:37 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready Message-ID: Robert Walsh wrote, >I'll give it a spin this afternoon: it looks quite a bit more >comprehensive than the small patch I did. I also just tried running the ib_rdma_bw test and it seems to be flaky if you stress it. If you just run the defaults, it seems to work, but if you crank up the iterations and the message size, it sometimes fails with..... [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | iters=10000 | duplex=0 | cma=0 | 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 VAddr 0x00002a95dd3480 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 VAddr 0x00002a95c85480 4730:main: Completion with error at client: 4730:main: Failed status 9: wr_id 3 4730:main: scnt=7584, ccnt=6584 [woody at rkl-13 bin]$ woody From rdreier at cisco.com Tue Sep 5 14:57:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 05 Sep 2006 14:57:29 -0700 Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups In-Reply-To: <20060904170350.GR4416@stusta.de> (Adrian Bunk's message of "Mon, 4 Sep 2006 19:03:50 +0200") References: <20060901015818.42767813.akpm@osdl.org> <20060904170350.GR4416@stusta.de> Message-ID: Thanks, I've rolled this up in the amso1100 patch I have queued up. > - #if 0 the following unused global function: > - c2_mq.c: c2_mq_count() Tom/Steve, any reason to keep c2_mq_count() at all? - R. From rdreier at cisco.com Tue Sep 5 14:58:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 05 Sep 2006 14:58:56 -0700 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com> (Sean Hefty's message of "Fri, 1 Sep 2006 15:33:55 -0700") References: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com> Message-ID: Thanks, queued for 2.6.19. From ardavis at ichips.intel.com Tue Sep 5 15:07:59 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Tue, 05 Sep 2006 15:07:59 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <44FDEC5D.8050508@pathscale.com> References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> <44FDEC5D.8050508@pathscale.com> Message-ID: <44FDF53F.3040601@ichips.intel.com> Robert Walsh wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >Arlin Davis wrote: > > >>Robert, >> >>Here is a slightly modified patch for your attributes issue. Can you give it a try? >> >> >> > >I'll give it a spin this afternoon: it looks quite a bit more >comprehensive than the small patch I did. > >Regards, > Robert. > > Just added all appropriate RDMA in/out fields and some code to zero out the structure to avoid uninitialized data fields. -arlin From rjwalsh at pathscale.com Tue Sep 5 15:13:25 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 05 Sep 2006 15:13:25 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <44FDF53F.3040601@ichips.intel.com> References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> <44FDEC5D.8050508@pathscale.com> <44FDF53F.3040601@ichips.intel.com> Message-ID: <44FDF685.8020205@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Just added all appropriate RDMA in/out fields and some code to zero out > the structure to avoid uninitialized data fields. Yup. By "comprehensive", I meant "better" :-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP32hfzvnpzTd9fxAQJnMwgAgcyxQpxdbk/eLEECXTnAOAYjyv3seTpE Ir1s+K7JEYL2Rbyk9h9CzbK67YSYe4QeIE52pTopEVFw8mnSLaz+ZIOmvdRUiHSS FiwEyfbXEPrFKZfyXu/REsigWx5vn7vCZid3hUIdx1vbt9eVAiVPGbAO1ALI8en9 /xc7iTGpYxwBwNOYbdhW0cOCjvobV98Fp6UJebvxd9xiRUS6c2JeZKLYdQyRO5rm JV7L8HqJr1dS8nbAiPG7DSjCv7/3SFdQVr+Tgt5MQpVfD56z41eBBuXzEfeqsg5E HHSxUOTdqizpscMyLudAWGAr5DZwOAQ4Z90zAL8gc2YYbjbOT3D6bA== =JKRU -----END PGP SIGNATURE----- From swise at opengridcomputing.com Tue Sep 5 15:14:25 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 05 Sep 2006 17:14:25 -0500 Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups In-Reply-To: References: <20060901015818.42767813.akpm@osdl.org> <20060904170350.GR4416@stusta.de> Message-ID: <1157494465.9086.45.camel@stevo-desktop> Its old debug code that isn't used anywhere. It would be nice to keep it around, but if you really don't want it, nuke it... On Tue, 2006-09-05 at 14:57 -0700, Roland Dreier wrote: > Thanks, I've rolled this up in the amso1100 patch I have queued up. > > > - #if 0 the following unused global function: > > - c2_mq.c: c2_mq_count() > > Tom/Steve, any reason to keep c2_mq_count() at all? > > - R. From rjwalsh at pathscale.com Tue Sep 5 15:33:38 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 05 Sep 2006 15:33:38 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> Message-ID: <44FDFB42.3040305@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Arlin Davis wrote: > Robert, > > Here is a slightly modified patch for your attributes issue. Can you give it a try? Oddly enough, I'm back to the same problem with your new patch as I saw with the unpatched version: $ mpiexec -n 2 ./a.out I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use rdma configuration will use rdma configuration [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma: could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) Hello world: rank 0 of 2 running on ib-idev-05 rank 1 in job 1 ib-idev-05_51891 caused collective abort of all ranks exit status of rank 1: killed by signal 9 Still tracking this one down. I noticed in the patch you removed a couple of lines, too: - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; Any particular reason why you did this? Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP37QvzvnpzTd9fxAQI79wf6Anc3/Ve7tg3x31hE4i5qa9bB01qEYmEv 9xx4FQqXNbhMos9hHEQAWJ9S0sKccr+yCNekkIX6GzlaVDv+AKDzZF6uzA8Prrhr CEcf28c1Pw7gflg8MMfVcnAHr2YG/hXyd+ve9m6cGv0rxgPqY6lWmHjghKDxKO7h f/SaDOaVAuN6kEJMRgIrKIxDyFSVl4z1tGXAK3yHVhslvPqNqGwDqNfFMV6UQK+V NNfKVVKVCttUWdzcVELzi3zkiat5xDdqIcwQr8xs2YaXHfAGeD4NurWowil887Sn bRuh5soVdBaKW9mAtQWuAECt9VLDvyYReLWkEq6ikgilPGCeJluDEw== =TNaE -----END PGP SIGNATURE----- From rdreier at cisco.com Tue Sep 5 15:39:50 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 05 Sep 2006 15:39:50 -0700 Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups In-Reply-To: <1157494465.9086.45.camel@stevo-desktop> (Steve Wise's message of "Tue, 05 Sep 2006 17:14:25 -0500") References: <20060901015818.42767813.akpm@osdl.org> <20060904170350.GR4416@stusta.de> <1157494465.9086.45.camel@stevo-desktop> Message-ID: Steve> Its old debug code that isn't used anywhere. It would be Steve> nice to keep it around, but if you really don't want it, Steve> nuke it... No, that's fine, I'll leave it inside the #if 0. - R. From arlin.r.davis at intel.com Tue Sep 5 15:51:46 2006 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 5 Sep 2006 15:51:46 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <44FDFB42.3040305@pathscale.com> Message-ID: <000101c6d13d$dd9975f0$bb97070a@amr.corp.intel.com> >Oddly enough, I'm back to the same problem with your new patch as I saw >with the unpatched version: Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your adapter and it worked. Did you ever pick up the Intel MPI 3.0 beta? > > $ mpiexec -n 2 ./a.out > I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from >registry: OpenIB-cma > I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from >registry: OpenIB-cma > I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use >rdma configuration > will use rdma configuration > [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma: >could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) > Hello world: rank 0 of 2 running on ib-idev-05 > rank 1 in job 1 ib-idev-05_51891 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 > >Still tracking this one down. I noticed in the patch you removed a >couple of lines, too: > > - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; > >Any particular reason why you did this? max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. Look at dat.h line #369 /* To support backwards compatibility for DAPL-1.0 */ #define max_rdma_read_per_ep max_rdma_read_per_ep_in #define DAT_IA_FIELD_IA_MAX_DTO_PER_OP DAT_IA_FIELD_IA_MAX_DTO_PER_EP_IN /* To support backwards compatibility for DAPL-1.0 & DAPL-1.1 */ #define max_mtu_size max_message_size -arlin From rjwalsh at pathscale.com Tue Sep 5 16:07:24 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 05 Sep 2006 16:07:24 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <000101c6d13d$dd9975f0$bb97070a@amr.corp.intel.com> References: <000101c6d13d$dd9975f0$bb97070a@amr.corp.intel.com> Message-ID: <44FE032C.4010108@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> Oddly enough, I'm back to the same problem with your new patch as I saw >> with the unpatched version: > > Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your adapter and it worked. Weird - it's not working for me at all. Maybe I'm messing up somewhere. I've got a meeting for the next hour or so - I'll check again when I get back. > Did you ever pick up the Intel MPI 3.0 beta? Yup. > max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. Ah - fair enough. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4DLPzvnpzTd9fxAQJ3nwgAiO+dLDRQv22RrBHYqHcodDwC2ZakxzFh pXBn9j5kwzA2EmnXCvex14v7K168Alqr9lgUpfaGr6StZsCdBU0FY2TRjok41VFl h+fYu78QFgDjleTMkp17Hl7RG9/r8AWzKzTG1LDn1YqwHrn9ngeZlqFfy1BP1tfB pkkW+Nj7HQXbXUNiDc/V9HKW7eBOjwCvkfDI7Knbrfp2QVBI/9ABpWGO4bJf3P7X n9ZzlEBN0SCOHKtGAa1gspQrmJGMHw0qyajUA6Yuyp1dWRygbl8L+ahF2BJFwZSx KGyhoBRZexpP8m0AJASnKgAVjGf6JR31dL7O8WAOjD4QpFEofMSqqA== =yDmH -----END PGP SIGNATURE----- From bugzilla-daemon at openib.org Tue Sep 5 16:22:29 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 5 Sep 2006 16:22:29 -0700 (PDT) Subject: [openib-general] [Bug 218] New: Call usage verifier is detecting reinitialization of spinlocks already in use Message-ID: <20060905232229.F083B2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=218 Summary: Call usage verifier is detecting reinitialization of spinlocks already in use Product: OpenFabrics Windows Version: unspecified Platform: X86 OS/Version: Other Status: NEW Severity: major Priority: P2 Component: mthca driver AssignedTo: bugzilla at openib.org ReportedBy: jbottorff at xsigo.com I built a debug version of revision 467 and turned on call usage verifier (CUV) for the mthca driver. It's detecting many cases of spinlocks being initialized after they have already been used. This is usually bad. To build with CUV all you have to do is add the following line to the sources file. VERIFIER_DDK_EXTENSIONS=1 My experience is CUV tends to detect a different set of bugs from driver verifier, and it might be useful to turn on CUV for all the drivers and see what's reported. CUV Driver Error: Calling KeInitializeSpinLock(...) at File k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h, Line 57 The Spin lock specified as parameter 1 [0x87840EDC] has been previously initialized and used as a In-Stack Queued Spin lock by this driver. Break, Ignore, Zap, Remove, Disable all, H for help (bizrdh)? b b Breaking in... (press g to return to assert menu) Break instruction exception - code 80000003 (first chance) nt!DbgBreakPoint: 8075cc00 cc int 3 0: kd> k 50 ChildEBP RetAddr f7926438 baeab189 nt!DbgBreakPoint f7926450 baeaa814 mthca!DDKExtPrompt+0x10a [d:\dnsrv\sdktools\ddk\ddk_ext\verifier\messages.cpp @ 709] f7926468 baea990e mthca!DDKExtVInitializeItem+0x98 [d:\dnsrv\sdktools\ddk\ddk_ext\verifier\validate.cpp @ 195] f7926490 bae81635 mthca!DDK_KeInitializeSpinLock+0x35 [d:\dnsrv\sdktools\ddk\ddk_ext\verifier\locks.cpp @ 298] f79264a4 baea42ee mthca!spin_lock_init+0x15 [k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h @ 58] f79264b0 baea4057 mthca!mthca_wq_init+0xe [k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 383] f792653c bae7eaac mthca!mthca_modify_qp+0xe97 [k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 853] f7926550 bae76eaa mthca!ibv_modify_qp+0x1c [k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_verbs.c @ 467] f7926628 ba99e0f3 mthca!mlnx_modify_qp+0x11a [k:\windows-openib\src\winib-467b\hw\mthca\kernel\hca_verbs.c @ 955] f792673c ba99df12 ibbus!al_modify_qp+0x113 [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1346] f7926760 ba99d7b8 ibbus!modify_qp+0x502 [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1313] f7926778 ba99eef5 ibbus!ib_modify_qp+0x18 [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1288] f7926848 ba99ec9e ibbus!init_dgrm_svc+0x175 [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1453] f7926870 ba96d005 ibbus!ib_init_dgrm_svc+0x73e [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1395] f7926c4c ba969fd8 ibbus!create_spl_qp_svc+0x18a5 [k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 718] f7926c78 ba969a45 ibbus!spl_qp_agent_pnp+0x128 [k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 476] f7926c8c ba98f071 ibbus!spl_qp0_agent_pnp_cb+0x95 [k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 429] f7926cf4 ba98f2e8 ibbus!__pnp_notify_user+0x561 [k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 523] f7926d38 ba990e7c ibbus!__pnp_port_notify+0x118 [k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 612] f7926d70 ba94d8a4 ibbus!__pnp_process_add_ca+0x2dc [k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 943] f7926d8c ba953b94 ibbus!__cl_async_proc_worker+0x94 [k:\windows-openib\src\winib-467b\core\complib\cl_async_proc.c @ 153] f7926da0 ba955c4c ibbus!__cl_thread_pool_routine+0x54 [k:\windows-openib\src\winib-467b\core\complib\cl_threadpool.c @ 67] f7926dac 80a07678 ibbus!__thread_callback+0x2c [k:\windows-openib\src\winib-467b\core\complib\kernel\cl_thread.c @ 49] f7926ddc 80781346 nt!PspSystemThreadStartup+0x2e 00000000 00000000 nt!KiThreadStartup+0x16 0: kd> g ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mvharish at gmail.com Tue Sep 5 16:53:10 2006 From: mvharish at gmail.com (harish) Date: Tue, 5 Sep 2006 16:53:10 -0700 Subject: [openib-general] Question about interrupt generation In-Reply-To: References: Message-ID: Hi, One more question. What kind of event mask helps mask the interrupts? thanks harish On 9/5/06, harish wrote: > > Hi All, > > I tried the following simple experiment and am not able to understand the > results: > > Calcualted the number of interrupts generated by the infiniband [with > little or no traffic to the NIC] over a period of 10seconds and saw around > 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K > interrupts/sec. This screwed up my host machine. To reduce the impact of > the interrupts, I add a call back that is scheduled to be periodically > called every few microseconds that masks the irq line used by the NIC and a > little later unmasks the same. Noticed that with no traffic, I see anywhere > between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ > interrupts/sec. > > Am a newbie to infiniband technology and so do not understand why so many > interrupts are getting generated when I have my call back periodically > called. Could it be that the Infiniband supports MSI? Or is what I am seeing > IPIs? Or does Infiniband generate interrupts based on types of events and > what I am doing by masking/unmasking the interrupt line is one such event? > Any information/suggestions would be useful. > > Thanks in advance, > harish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjwalsh at pathscale.com Tue Sep 5 17:45:45 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 05 Sep 2006 17:45:45 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <44FE1A39.3060108@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Woodruff, Robert J wrote: > Robert Walsh wrote, >> I'll give it a spin this afternoon: it looks quite a bit more >> comprehensive than the small patch I did. > > I also just tried running the ib_rdma_bw test and it seems to > be flaky if you stress it. If you just run the defaults, it seems to > work, but if you crank up the iterations and the message size, > it sometimes fails with..... > > [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 > 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | > iters=10000 | duplex=0 | cma=0 | > 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 > VAddr 0x00002a95dd3480 > 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 > VAddr 0x00002a95c85480 > 4730:main: Completion with error at client: > 4730:main: Failed status 9: wr_id 3 > 4730:main: scnt=7584, ccnt=6584 > [woody at rkl-13 bin]$ This looks like a known bug, the fix to which didn't make it into OFED 1.1-RC3. Hopefully we can still get this into 1.1-RC4. Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4aOfzvnpzTd9fxAQKAEggAlZC5hYi9kdxLkj9Mfl/BwHJQxWUwsKcG K2ck3jtrP6PVa04FdVI/TNL2XE7R3eu69vTfBaTS26pw2CVM6av0ztFiWEV2r5Fu 8FXGJBOuDOYxnwuA0o3yHSMVFtrRW6Jgn2G/JQPZ8IDAK7GrPj3VebvyclPwF5+d KMPIFXJaTzjoJl2JEGFLiSlf+tFMOEs3vazrRwkZpQezKRcs3F1E6TQImtN7kuYK 0/IKxeS4ZOduXpczsJZgsPs6Y9kYi94XN0E4JeJJAh9Miq+bXkxhxbrafieNl7xW n9m7i/phcFcngSzDwjBNXE2ZuQjujDpz94SRnkVedomYNbr5zKXBgQ== =NurT -----END PGP SIGNATURE----- From rjwalsh at pathscale.com Tue Sep 5 18:21:18 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 05 Sep 2006 18:21:18 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> Message-ID: <44FE228E.9050402@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Here is a slightly modified patch for your attributes issue. Can you give it a try? I rebuilt OFED from scratch with the patch, and ran successfully on Intel MPI 2.0.1 with the refresh patch. I could not get it to run on Intel MPI 3.0b. If you could verify that the fix you mentioned that is in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. If you have a later beta version you could send me, that would be great, too. Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4ijvzvnpzTd9fxAQIqeggAkJ4OQ3GrkpqyJUbHImgqbob6npINOv5L lBUANcHZZ8DMFIq5hP4H+OYX2s/yoS3AKDGf0x8kHoVsTDFTFNe69bsGzJMT3znP YDmq3ETN4aSGOgKX2NFzWs+mYG0pEN9uDt/SmEYmccYiIuK3lTlb8jxON6mqqJFL nfitAp7WaLn7OS8A3CfVrAbWwYJ4U6UWPD/rB5sJTg8nTxECc94JaOhPZ90smB6H 9xk8OihEoTxodFLzcpaz/ORS4EPAle69Uw2tP3myjr/4w/SzLGJT6DFVpGQ0BaWC jVXFYVKyVW4JmFMcW1X29ogmVNH8gEDBUfbG1P5Wd8sLzMMB18tINA== =X/q7 -----END PGP SIGNATURE----- From sweitzen at cisco.com Tue Sep 5 21:57:30 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 5 Sep 2006 21:57:30 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1-rc2 is ready (how do I enable madeye)? Message-ID: > 5. Added Madeye utility How do I build madeye? I don't see any reference to it to install.sh. Is there any documentation for madeye? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems From bugzilla-daemon at openib.org Tue Sep 5 23:39:21 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 5 Sep 2006 23:39:21 -0700 (PDT) Subject: [openib-general] [Bug 218] Call usage verifier is detecting reinitialization of spinlocks already in use Message-ID: <20060906063921.1E4AC2283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=218 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |rolandd at cisco.com ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From moshek at voltaire.com Wed Sep 6 02:26:47 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Wed, 6 Sep 2006 12:26:47 +0300 Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question Message-ID: I have tested the mstflint problem with two different ppc64 machines : - On sles 10 PPC64 PowerMac G5 -> mstflint -d 0001:07:00.0 q works o.k. with and without the ib_mthca loaded - On s;es10 PPC64 IBM JS21 -> mstflint -d 0001:07:00.0 q DOESN'T work with and without the ib_mthca loaded and I have to use /proc/bus/pci/..... Is it time to create a work arround that opens /proc/bus/pci/ .... And always work ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Michael S. Tsirkin Sent: Tuesday, September 05, 2006 4:37 PM To: Or Gerlitz Cc: Roland Dreier; openib-general at openib.org Subject: Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question Quoting r. Or Gerlitz : > Subject: Re: getting LOC_QP_OP_ERR with IPoIB - mstflint question > > Michael S. Tsirkin wrote: > > Donnu, it looks really weird. Could you try firmware 3.5.0 please? > > I just noted that you can not work with mstflint if the mthca driver > is > not loaded, i think it was not the case in the gen1 tools, am i correct. Yes, recent kernels disable device access once driver is unloaded: mstflint -d 08:00.0 q *** ERROR *** Read a corrupted device id (0xffff). Probably HW/PCI access problem *** ERROR *** Device type 65535 not supported. *** ERROR *** Can not get flash type using device 08:00.0 mstflint should work without driver using /proc: mstflint -d /proc/bus/pci/08/00.0 q Image type: Failsafe I.S. Version: 1 Chip Revision: A0 In gen1 flint had a separate driver which you had to load. I am not sure whether this would work on 2.6.18 > Is this connected to this print > > ACPI: PCI interrupt for device 0000:02:00.0 disabled > > i see once the mthca driver is unloaded? > > Or. Probably not. -- MST _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tziporet at dev.mellanox.co.il Wed Sep 6 04:23:29 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 06 Sep 2006 14:23:29 +0300 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <44FE228E.9050402@pathscale.com> References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> <44FE228E.9050402@pathscale.com> Message-ID: <44FEAFB1.3040902@dev.mellanox.co.il> Robert Walsh wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > >> Here is a slightly modified patch for your attributes issue. Can you give it a try? >> > > I rebuilt OFED from scratch with the patch, and ran successfully on > Intel MPI 2.0.1 with the refresh patch. I could not get it to run on > Intel MPI 3.0b. If you could verify that the fix you mentioned that is > in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. > If you have a later beta version you could send me, that would be great, > too. > > Regards, > Robert. > I added this patch under fixes to OFED 1.1. Will be in RC4 Tziporet From tzachid at mellanox.co.il Wed Sep 6 04:37:17 2006 From: tzachid at mellanox.co.il (Tzachi Dar) Date: Wed, 6 Sep 2006 14:37:17 +0300 Subject: [openib-general] [Openib-windows] File transfer performance options Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302D8674E@mtlexch01.mtl.com> Hi Paul, In the beginning of this mail thread you have described a problem of passing files from a Linux server to windows server. You have described many experiments that you did and the fact that the performance that you received was not as good as expected. In reply I have advised you to consider using SDP for this file transfers. if to summarize your answer in one sentence you said that SDP is still not ready. I would have loved to tell you that SDP is ready, but unfortunately the windows SDP is not a product yet. However, it is mature enough to start doing some measurements which is what I did. I have changed a simple benchmark program that I had to also write it's data to disk. As a disk I have used AMT Ramdisk (512 MB). I have run two instances of this program, and got the results of 578 MB/sec which is considerably higher than results that you have achieved using other experiments. (one client gave me 450 MB/sec) Please note that since data is being copied 3 times in this scenario, we are standing near the theoretical speed of the machine (one copy is from the HCA to the kernel buffer, another is from the kernel buffer to the application buffer, and that last copy is from the application buffer to the Ram Disk). It is true that the development road of your application might force you not to use SDP, as SDP is not in production right now, but if you can wait the extra time than please note that SDP can supply the BW. Thanks Tzachi > -----Original Message----- > From: Paul Baxter [mailto:paul.baxter at dsl.pipex.com] > Sent: Friday, September 01, 2006 1:11 AM > To: openib-windows at openib.org; Tzachi Dar > Subject: Re: [Openib-windows] File transfer performance options > > > > From: "Tzachi Dar" There is one > thing that is missing from your mail, and that is if you want > to see the windows machine as some file server (for example > SAMBA, NFS, SRP), or are you ready to accept it as a normal > server. The big difference is that on the second option the > server can be running at user mode (for example FTP server). > < > > The windows machine has to list and then choose amongst a set > of files from our Linux system and retrieve only relevant > files e.g. those whose filename relates to particular time slots. > We prefer not to write a Linux 'client' application to do > this explicitly but would rather have the windows machine's > application access our data files directly. > A few application-level locks are in place so that we won't > be writing new files to our local disks at the same time as > the remote archiving accesses them. > > Other than that the main goal is to make the inter-OS (and > inter-company) interface as simple as possible. It currently > doesn't seem that there is a proven solution to support this > at any transfer rate that takes significant advantage of Infiniband. > > I've specced my disks for 200 MB/s and we have DDR cards etc. > (for other reasons!), just no means to flex their muscles too > easily using existing COTS infrastructure. > > > > When (the server application is) running at user mode, SDP > can be used as a socket provider. This means that > theoretically every socket application should run and enjoy > the speed of Infiniband. Currently there are two projects of > SDP under development: one is for Linux and the other for > Windows, so SDP can be used to allow machines from both types > to connect. > < > > The key here is 'theoretical'. IMHO, Linux-Linux and > Windows-Windows get a lot more testing and priority than a > Linux-Windows combination. (Which is fair enough if that's > where the market is.) > > We've been burnt by this not being robustly tested and proven > in reality in cross-platform cases. (Note that this was > before the current openfabrics windows driver initiative). > > > > Performance that we have measured on the windows platform, > using DDR cards was bigger than 1200 MB/Sec. (of course, this > data was from host memory, and not from disks). > < > > We've used SDP previously in our Linux message interface and > were very happy with the results. Then someone included an > old (v9 ) Solaris machine into the mix so even before we > tested on Windows, we ended up using sockets/gigabit ethernet > for command transfers. > > SDP as an option for other parts of our application (large > data transfers) took a big turn for the worse when the > previous Linux SDP implementation was mothballed without a > mature replacement. We've ended up writing our application to > use RDMA write directly now. > > Note that I'm not too critical of the way SDP went away since > I can appreciate the need to greatly simplify the Linux SDP > implementation, it did leave people like me in the lurch > however. I really appreciate the effort put into these things > by Michael Tsirkin et al. and look forward to the new code in OFED 1.1 > > > I'm also not sure that cross-platform operation of > high-performance Infiniband is near the top of anyone's > agenda. Inside the windows world and inside the Linux world > things are looking rosey, but I'm largely stuck with IPoIB or > low-level verbs for cross-platform use. > > SRP looks promising, but as a user, I see lots of statements > that this SRP initiator only works with that SRP target. > Support for cross-platform high-speed operation is 'coming soon'. > > I'd love to know whether there has been significant testing > between the windows openfabrics SRP initiator and the openIB > Linux SRP target? Is this on anyone's agenda. (Distinct from > any windows SRP 'WHQL-certification > issues.) Is their even an 'inter-operable' standard that both > implementations can aspire to match? > > > > So, if all you need to do is to pass files from one side to > the other, I would recommend that you will check this option. > < > Thanks for the tip. Maybe now the dust is settling on Linux > SDP we may well revisit this option. > > > > One note about your experiments: when using ram disks, this > probably means that there is one more copy from the ram disk > to the application buffer. A real disk, has it's DMA engine, > while a ram disk doesn't. > Another copy is probably not a problem when you are talking > about 100MB/sec, but it would become a problem once you will > use SDP (I hope). > < > We were only using these as a sanity check that physical > disks weren't the cause of the bottleneck. > > > > Thanks > Tzachi > < > Thanks to you, Tzachi, and everyone helping to develop robust > infiniband support across a range of platforms. > From moshek at voltaire.com Wed Sep 6 06:01:44 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Wed, 6 Sep 2006 16:01:44 +0300 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 Message-ID: Hi Tziporet, I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64. Install is stopped at the very beginning as 64-bit udev is missing. I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed as result of compilation error. Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit udev ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Tuesday, August 29, 2006 5:50 PM To: OPENIB Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 Hi All, In testing today we found that on SLES9 SP3 memory locking as a regular user fails. Although I changed /etc/security/limits.conf and added the following two lines: * soft memlock * hard memlock Note that same change does work in SLES10. Another change I tried (that worked in gen1) was to add the following line to the file/etc/sysctl.conf: vm.disable_cap_mlock=1. However nothing helped in SLES9 Does anyone have any idea how to solve this? Thanks, Tziporet _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Wed Sep 6 06:24:48 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 6 Sep 2006 16:24:48 +0300 Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question In-Reply-To: References: Message-ID: <20060906132448.GA6928@mellanox.co.il> Quoting r. Moshe Kazir : > Is it time to create a work arround that opens /proc/bus/pci/ .... And > always work ? But why isn't the driver loaded? -- MST From halr at voltaire.com Wed Sep 6 06:21:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2006 09:21:25 -0400 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace Message-ID: <1157548884.12940.5019.camel@hal.voltaire.com> OpenSM/osm_log API: Rather than polluting the namespace with needless symbols, use symbol versions and have a versioned osm_log_init rather than adding osm_log_init_v2 as an additional API This patch is intended to be applied to both trunk and 1.1 versions. Signed-off-by: Doug Ledford Signed-off-by: Hal Rosenstock Index: osm/opensm/libopensm.map =================================================================== --- osm/opensm/libopensm.map (revision 9253) +++ osm/opensm/libopensm.map (working copy) @@ -3,7 +3,6 @@ OPENSM_1.3 { osm_log; osm_is_debug; osm_log_init; - osm_log_init_v2; osm_mad_pool_construct; osm_mad_pool_destroy; osm_mad_pool_init; @@ -55,3 +54,8 @@ OPENSM_1.3 { osm_get_sm_mgr_state_str; local: *; }; + +OPENSM_1.3.1 { + global: + osm_log_init; +} OPENSM_1.3; Index: osm/opensm/libopensm.ver =================================================================== --- osm/opensm/libopensm.ver (revision 9158) +++ osm/opensm/libopensm.ver (working copy) @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=2:0:1 +LIBVERSION=2:1:1 Index: osm/include/opensm/osm_log.h =================================================================== --- osm/include/opensm/osm_log.h (revision 9251) +++ osm/include/opensm/osm_log.h (working copy) @@ -152,13 +152,13 @@ osm_log_construct( * This function does not return a value. * * NOTES -* Allows calling osm_log_init, osm_log_init_v2, osm_log_destroy +* Allows calling osm_log_init, osm_log_destroy * * Calling osm_log_construct is a prerequisite to calling any other -* method except osm_log_init or osm_log_init_v2. +* method except osm_log_init. * * SEE ALSO -* Log object, osm_log_init, osm_log_init_v2, +* Log object, osm_log_init, * osm_log_destroy *********/ @@ -196,25 +196,25 @@ osm_log_destroy( * Log object. * Further operations should not be attempted on the destroyed object. * This function should only be called after a call to -* osm_log_construct, osm_log_init, or osm_log_init_v2. +* osm_log_construct, osm_log_init. * * SEE ALSO * Log object, osm_log_construct, -* osm_log_init, osm_log_init_v2 +* osm_log_init *********/ -/****f* OpenSM: Log/osm_log_init_v2 +/****f* OpenSM: Log/osm_log_init * NAME -* osm_log_init_v2 +* osm_log_init * * DESCRIPTION -* The osm_log_init_v2 function initializes a +* The osm_log_init function initializes a * Log object for use. * * SYNOPSIS */ ib_api_status_t -osm_log_init_v2( +osm_log_init( IN osm_log_t* const p_log, IN const boolean_t flush, IN const uint8_t log_flags, @@ -249,27 +249,6 @@ osm_log_init_v2( * osm_log_destroy *********/ -/****f* OpenSM: Log/osm_log_init -* NAME -* osm_log_init -* -* DESCRIPTION -* The osm_log_init function initializes a -* Log object for use. It is a wrapper for osm_log_init_v2(). -* -* SYNOPSIS -*/ -ib_api_status_t -osm_log_init( - IN osm_log_t* const p_log, - IN const boolean_t flush, - IN const uint8_t log_flags, - IN const char *log_file, - IN const boolean_t accum_log_file ); -/* - * Same as osm_log_init_v2() but without max_size parameter - */ - /****f* OpenSM: Log/osm_log_get_level * NAME * osm_log_get_level Index: osm/opensm/osm_log.c =================================================================== --- osm/opensm/osm_log.c (revision 9257) +++ osm/opensm/osm_log.c (working copy) @@ -225,7 +225,7 @@ osm_is_debug(void) } ib_api_status_t -osm_log_init_v2( +osm_log_init_1_3_1( IN osm_log_t* const p_log, IN const boolean_t flush, IN const uint8_t log_flags, @@ -280,13 +280,18 @@ osm_log_init_v2( return IB_ERROR; } +__asm__(".symver osm_log_init_1_3_1, osm_log_init@@OPENSM_1.3.1"); + ib_api_status_t -osm_log_init( +osm_log_init_1_3( IN osm_log_t* const p_log, IN const boolean_t flush, IN const uint8_t log_flags, IN const char *log_file, IN const boolean_t accum_log_file ) { - return osm_log_init_v2( p_log, flush, log_flags, log_file, 0, accum_log_file ); + return osm_log_init_1_3_1( p_log, flush, log_flags, log_file, 0, accum_log_file ); } + +__asm__(".symver osm_log_init_1_3, osm_log_init at OPENSM_1.3"); + Index: osm/opensm/osm_opensm.c =================================================================== --- osm/opensm/osm_opensm.c (revision 9251) +++ osm/opensm/osm_opensm.c (working copy) @@ -180,9 +180,9 @@ osm_opensm_init( /* Can't use log macros here, since we're initializing the log. */ osm_opensm_construct( p_osm ); - status = osm_log_init_v2( &p_osm->log, p_opt->force_log_flush, - p_opt->log_flags, p_opt->log_file, - p_opt->log_max_size, p_opt->accum_log_file ); + status = osm_log_init( &p_osm->log, p_opt->force_log_flush, + p_opt->log_flags, p_opt->log_file, + p_opt->log_max_size, p_opt->accum_log_file ); if( status != IB_SUCCESS ) return ( status ); Index: osm/opensm/osm_db_files.c =================================================================== --- osm/opensm/osm_db_files.c (revision 9275) +++ osm/opensm/osm_db_files.c (working copy) @@ -712,7 +712,7 @@ main(int argc, char **argv) cl_list_construct( &keys ); cl_list_init( &keys, 10 ); - osm_log_init_v2( &log, TRUE, 0xff, "/var/log/osm_db_test.log", 0, FALSE); + osm_log_init( &log, TRUE, 0xff, "/var/log/osm_db_test.log", 0, FALSE ); osm_db_construct(&db); if (osm_db_init(&db, &log)) Index: osm/osmtest/osmtest.c =================================================================== --- osm/osmtest/osmtest.c (revision 9251) +++ osm/osmtest/osmtest.c (working copy) @@ -520,8 +520,8 @@ osmtest_init( IN osmtest_t * const p_osm /* Can't use log macros here, since we're initializing the log. */ osmtest_construct( p_osmt ); - status = osm_log_init_v2( &p_osmt->log, p_opt->force_log_flush, - 0x0001, p_opt->log_file, 0, TRUE ); + status = osm_log_init( &p_osmt->log, p_opt->force_log_flush, + 0x0001, p_opt->log_file, 0, TRUE ); if( status != IB_SUCCESS ) return ( status ); Index: osm/complib/cl_event_wheel.c =================================================================== --- osm/complib/cl_event_wheel.c (revision 9251) +++ osm/complib/cl_event_wheel.c (working copy) @@ -610,7 +610,7 @@ main () cl_event_wheel_construct( &event_wheel ); /* init */ - osm_log_init_v2( &log, TRUE, 0xff, NULL, 0, FALSE); + osm_log_init( &log, TRUE, 0xff, NULL, 0, FALSE ); cl_event_wheel_init( &event_wheel, &log ); /* Start Playing */ Index: diags/src/saquery.c =================================================================== --- diags/src/saquery.c (revision 9251) +++ diags/src/saquery.c (working copy) @@ -442,8 +442,8 @@ get_bind_handle(void) complib_init(); osm_log_construct(&log_osm); - if ((status = osm_log_init_v2(&log_osm, TRUE, 0x0001, NULL, - 0, TRUE)) != IB_SUCCESS) { + if ((status = osm_log_init(&log_osm, TRUE, 0x0001, NULL, + 0, TRUE)) != IB_SUCCESS) { fprintf(stderr, "Failed to init osm_log: %s\n", ib_get_err_str(status)); exit(-1); From tziporet at dev.mellanox.co.il Wed Sep 6 06:33:56 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 06 Sep 2006 16:33:56 +0300 Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 In-Reply-To: References: Message-ID: <44FECE44.4070003@dev.mellanox.co.il> Moshe Kazir wrote: > Hi Tziporet, > > I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64. > > Install is stopped at the very beginning as 64-bit udev is missing. > > I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed > as result of compilation error. > > Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit > udev ? > > We have here only one MAC PPC64 machine that can run only Fedora C4 thus this is the only system we check. Maybe Vlad can help but I think best if you approach Novel (Mois is their contact for OFED) Tziporet From mst at mellanox.co.il Wed Sep 6 06:42:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 6 Sep 2006 16:42:06 +0300 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <1157548884.12940.5019.camel@hal.voltaire.com> References: <1157548884.12940.5019.camel@hal.voltaire.com> Message-ID: <20060906134206.GB6928@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > OpenSM/osm_log API: Rather than polluting the namespace with needless > symbols, use symbol versions and have a versioned osm_log_init rather > than adding osm_log_init_v2 as an additional API > > This patch is intended to be applied to both trunk and 1.1 versions. > > Signed-off-by: Doug Ledford > Signed-off-by: Hal Rosenstock This preserves the ABI, but would this not break the API? Anyway, frankly, I do not see why was osm_log_init_v2 added into 1.1 at all. We are in code freeze, only critical fixes are supposed to be applied to branch at this stage. How was adding osm_log_init_v2 critical? Nor is this feature uncontroversial. Would not support for log rotation be better? So - why are all these changes going into 1.1 branch? -- MST From halr at voltaire.com Wed Sep 6 07:14:30 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2006 10:14:30 -0400 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <20060906134206.GB6928@mellanox.co.il> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> Message-ID: <1157552070.12940.6861.camel@hal.voltaire.com> On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > OpenSM/osm_log API: Rather than polluting the namespace with needless > > symbols, use symbol versions and have a versioned osm_log_init rather > > than adding osm_log_init_v2 as an additional API > > > > This patch is intended to be applied to both trunk and 1.1 versions. > > > > Signed-off-by: Doug Ledford > > Signed-off-by: Hal Rosenstock > > This preserves the ABI, but would this not break the API? Yes, this patch changes the API (in a most trivial way). > Anyway, frankly, I do not see why was osm_log_init_v2 added into 1.1 at all. > We are in code freeze, only critical fixes are supposed to be applied to branch > at this stage. How was adding osm_log_init_v2 critical? There was a bug reported when the log filled up which started motivating these changes. We had just missed the rc3 window for this. It is an upward compatible change so is low risk. > Nor is this feature uncontroversial. Would not support for log rotation > be better? Were there comments on the list before to this effect ? > So - why are all these changes going into 1.1 branch? See answers above. -- Hal From dledford at redhat.com Wed Sep 6 07:58:09 2006 From: dledford at redhat.com (Doug Ledford) Date: Wed, 06 Sep 2006 10:58:09 -0400 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <1157552070.12940.6861.camel@hal.voltaire.com> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> <1157552070.12940.6861.camel@hal.voltaire.com> Message-ID: <1157554690.2569.6.camel@fc6.xsintricity.com> On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote: > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > > Nor is this feature uncontroversial. Would not support for log rotation > > be better? If you are just going to do log rotation, then no need to change opensm, just add an appropriate logrotate.d/opensm file to the distribution. But, that doesn't address what to do if you hit a full filesystem condition, nor how to limit the size of a log file between rotations (which, as I understand it, is really only an issue because opensm can log so much), which is what this entire patch series was designed to address. They are two different problem spaces. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From dotanb at dev.mellanox.co.il Wed Sep 6 08:07:54 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 06 Sep 2006 18:07:54 +0300 Subject: [openib-general] [librdmacm] execuation of the the test udaddy is failing Message-ID: <44FEE44A.60308@dev.mellanox.co.il> Here are the machine/driver props: ************************************************************* Host Architecture : x86_64 Linux Distribution: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 Kernel Version : 2.6.16.21-0.8-smp GCC Version : gcc (GCC) 4.1.0 (SUSE Linux) Memory size : 4045720 kB Driver Version : gen2_linux-20060905-1700 (REV=9264) HCA ID(s) : mthca0 HCA model(s) : 25218 FW version(s) : 5.1.927 Board(s) : MT_0150000001 ************************************************************* Here is the output of the test: # udaddy udaddy: starting server librdmacm: Kernel ABI does not support requested port space. udaddy: listen request failed test complete return status -93 executing the test mckey fails too because of the same problem: # mckey recv 239.0.0.2 librdmacm: Kernel ABI does not support requested port space. The tests rping and the ucmatose are passing with no problem. thanks Dotan From halr at voltaire.com Wed Sep 6 08:09:06 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2006 11:09:06 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_base.h: Change OSM_DEFAULT_TMP_DIR to /var/log for Linux Message-ID: <1157555341.12940.8796.camel@hal.voltaire.com> OpenSM/osm_base.h: Change OSM_DEFAULT_TMP_DIR to /var/log for Linux Signed-off-by: Hal Rosenstock Index: include/opensm/osm_base.h =================================================================== --- include/opensm/osm_base.h (revision 9158) +++ include/opensm/osm_base.h (working copy) @@ -177,15 +177,14 @@ BEGIN_C_DECLS * * DESCRIPTION * Specifies the default temporary directory for the log file, subnet.lst -* and the other log files (with the exception of osm.log for Linux being -* in /var/log). +* and the other log files. * * SYNOPSIS */ #ifdef __WIN__ #define OSM_DEFAULT_TMP_DIR GetOsmTempPath() #else -#define OSM_DEFAULT_TMP_DIR "/tmp/" +#define OSM_DEFAULT_TMP_DIR "/var/log/" #endif /***********/ From mst at mellanox.co.il Wed Sep 6 08:16:59 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 6 Sep 2006 18:16:59 +0300 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <1157552070.12940.6861.camel@hal.voltaire.com> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> <1157552070.12940.6861.camel@hal.voltaire.com> Message-ID: <20060906151659.GC6928@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > > > OpenSM/osm_log API: Rather than polluting the namespace with needless > > > symbols, use symbol versions and have a versioned osm_log_init rather > > > than adding osm_log_init_v2 as an additional API > > > > > > This patch is intended to be applied to both trunk and 1.1 versions. > > > > > > Signed-off-by: Doug Ledford > > > Signed-off-by: Hal Rosenstock > > > > This preserves the ABI, but would this not break the API? > > Yes, this patch changes the API (in a most trivial way). So all users need to change code or they won't compile against the new library? > > Anyway, frankly, I do not see why was osm_log_init_v2 added into 1.1 at all. > > We are in code freeze, only critical fixes are supposed to be applied to branch > > at this stage. How was adding osm_log_init_v2 critical? > > There was a bug reported when the log filled up which started motivating > these changes. As I see it, a bugzilla ticket does not automatically convert feature request into a bug report. The issue is not exactly new, and people seem to have been able to live with this. The enhancement will keep opensm friendly to appliance like devices that are single task subnet managers. fine, but OFED by default will activate opensm without this switch. Given all of the above, I don't see how can this be considered a critical bug fix. > We had just missed the rc3 window for this. So that's a reason not to apply on branch unless it is critical. > It is an upward compatible change so is low risk. Not sure what do you mean by upward compatible. This API change does not seem to be backward compatible - won't it break building dependent applications? If so is not something you should do after code freeze. If we care about namespace pollution that much (and I don't really see an issue) do the changes on trunk. > > Nor is this feature uncontroversial. Would not support for log rotation > > be better? > > Were there comments on the list before to this effect ? Hmm. Not explicitly. There were comments this is non-standard and will surprise system administrators if activated. http://thread.gmane.org/gmane.linux.drivers.openib/29195/focus=29199 -- MST From eitan at mellanox.co.il Wed Sep 6 08:28:42 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 6 Sep 2006 18:28:42 +0300 Subject: [openib-general] [PATCH] OpenSM/osm_base.h: Change OSM_DEFAULT_TMP_DIR to /var/logfor Linux Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302D8686F@mtlexch01.mtl.com> OK. I will need to update the ibdmchk utility to look by default for osm.{fdbs,mcfdbs} and subnet.lst in the /var/tmp ... I hope this is not targeting the OFED 1.1 as it must not be critical ... Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, September 06, 2006 6:09 PM > To: openib-general at openib.org > Cc: Eitan Zahavi > Subject: [PATCH] OpenSM/osm_base.h: Change OSM_DEFAULT_TMP_DIR to > /var/logfor Linux > > OpenSM/osm_base.h: Change OSM_DEFAULT_TMP_DIR to /var/log for Linux > > Signed-off-by: Hal Rosenstock > > Index: include/opensm/osm_base.h > ================================================================ > === > --- include/opensm/osm_base.h (revision 9158) > +++ include/opensm/osm_base.h (working copy) > @@ -177,15 +177,14 @@ BEGIN_C_DECLS > * > * DESCRIPTION > * Specifies the default temporary directory for the log file, subnet.lst > -* and the other log files (with the exception of osm.log for Linux being > -* in /var/log). > +* and the other log files. > * > * SYNOPSIS > */ > #ifdef __WIN__ > #define OSM_DEFAULT_TMP_DIR GetOsmTempPath() #else -#define > OSM_DEFAULT_TMP_DIR "/tmp/" > +#define OSM_DEFAULT_TMP_DIR "/var/log/" > #endif > /***********/ > > From mst at mellanox.co.il Wed Sep 6 08:27:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 6 Sep 2006 18:27:29 +0300 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <1157554690.2569.6.camel@fc6.xsintricity.com> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> <1157552070.12940.6861.camel@hal.voltaire.com> <1157554690.2569.6.camel@fc6.xsintricity.com> Message-ID: <20060906152729.GD6928@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote: > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > > > > Nor is this feature uncontroversial. Would not support for log rotation > > > be better? > > If you are just going to do log rotation, then no need to change opensm, > just add an appropriate logrotate.d/opensm file to the distribution. I guess opensm will need to be signalled to close/reopen the log file though. No? > But, that doesn't address what to do if you hit a full filesystem > condition, Since logs are compressed this should at least alleviate that. what do other daemons do? > nor how to limit the size of a log file between rotations again, what do other daemons do? > (which, as I understand it, is really only an issue because opensm can > log so much), > which is what this entire patch series was designed to > address. They are two different problem spaces. So ... wouldn't it be better to address the real issue? As I see it, the problem only appears if you activate opensm in the verbose mode. And the reason to run so for a long time is only if you suspect you'll want to debug something later, without killing opensm. So the ability to control verbosity at runtime will be a better solution it seems, and there are patches that do that. -- MST From thomas.bub at thomson.net Wed Sep 6 08:29:44 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Wed, 6 Sep 2006 17:29:44 +0200 Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work Message-ID: I'm still in the process of porting my gen1 code to gen2. As mentioned yesterday I can connect to a listener on the same machine using libibcm. Doing this I have to do the ibv_modify_qp by myself to get the qp's from INIT via RTR to RTS on both sides. At least the ibv_modify_qp doesn not complain when having done the connection via the libibcm. So my assumption is I have my two qp's successfully connected. First action after the connection is the listener to wait on it's receive cq for an IBV_WR_SEND done by the connector. Here is now the problem: * The listener never gets a completion * The connector doing the IBV_WR_SEND does get error on the send cq like opcode=0x7f status=0x5 vendor_err=129 for the first IBV_WR_SEND and opcode=0x7f status=0xc vendor_err=129 for all sub-sequent attempt to send the data Is there anyone out there who can help me out to understand the error codes and or to understand what is wrong? Thanks in advance from Germany Thomas ............................................................ Thomas Bub Grass Valley Germany GmbH Brunnenweg 9 64331 Weiterstadt, Germany Tel: +49 6150 104 147 Fax: +49 6150 104 656 Email: Thomas.Bub at thomson.net www.GrassValley.com ............................................................ -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Sep 6 08:34:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2006 11:34:25 -0400 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <20060906151659.GC6928@mellanox.co.il> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> <1157552070.12940.6861.camel@hal.voltaire.com> <20060906151659.GC6928@mellanox.co.il> Message-ID: <1157556861.12940.9754.camel@hal.voltaire.com> On Wed, 2006-09-06 at 11:16, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > > > > > OpenSM/osm_log API: Rather than polluting the namespace with needless > > > > symbols, use symbol versions and have a versioned osm_log_init rather > > > > than adding osm_log_init_v2 as an additional API > > > > > > > > This patch is intended to be applied to both trunk and 1.1 versions. > > > > > > > > Signed-off-by: Doug Ledford > > > > Signed-off-by: Hal Rosenstock > > > > > > This preserves the ABI, but would this not break the API? > > > > Yes, this patch changes the API (in a most trivial way). > > So all users need to change code or they won't compile against the new > library? > > > > Anyway, frankly, I do not see why was osm_log_init_v2 added into 1.1 at all. > > > We are in code freeze, only critical fixes are supposed to be applied to branch > > > at this stage. How was adding osm_log_init_v2 critical? > > > > There was a bug reported when the log filled up which started motivating > > these changes. > > As I see it, a bugzilla ticket does not automatically convert feature request > into a bug report. The issue is not exactly new, and people seem to have been > able to live with this. > > The enhancement will keep opensm friendly to appliance like devices that are > single task subnet managers. fine, but OFED by default will activate opensm > without this switch. It is another feature when this situation is encountered. It has been encountered and will be again. > Given all of the above, I don't see how can this be considered a critical bug > fix. > > > We had just missed the rc3 window for this. > > So that's a reason not to apply on branch unless it is critical. I've also seen other patches which do not meet this criteria go into 1.1. I know that's not a reason either. > > It is an upward compatible change so is low risk. > > Not sure what do you mean by upward compatible. This API change does not seem to > be backward compatible - won't it break building dependent applications? We are talking about 2 different changes. I was responding to your comment about the addition of osm_log_init_v2 not being a bug fix, not the symver patch on top of that. > If so is not something you should do after code freeze. > > If we care about namespace pollution that much (and I don't really see an issue) > do the changes on trunk. > > > > Nor is this feature uncontroversial. Would not support for log rotation > > > be better? > > > > Were there comments on the list before to this effect ? > > Hmm. Not explicitly. There were comments this is non-standard and > will surprise system administrators if activated. > http://thread.gmane.org/gmane.linux.drivers.openib/29195/focus=29199 Not sure exactly what email (and comment) you are referring to here. -- Hal From mst at mellanox.co.il Wed Sep 6 08:46:26 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 6 Sep 2006 18:46:26 +0300 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than pollutingnamespace In-Reply-To: <1157556861.12940.9754.camel@hal.voltaire.com> References: <1157556861.12940.9754.camel@hal.voltaire.com> Message-ID: <20060906154626.GE6928@mellanox.co.il> Quoting r. Hal Rosenstock : > > > It is an upward compatible change so is low risk. > > > > Not sure what do you mean by upward compatible. This API change does not > > seem to be backward compatible - won't it break building dependent > > applications? > > We are talking about 2 different changes. I was responding to your > comment about the addition of osm_log_init_v2 not being a bug fix, not > the symver patch on top of that. I'm mostly concerned with the symver patch. I think we can't do API changes at this stage in the release process. -- MST From halr at voltaire.com Wed Sep 6 08:51:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2006 11:51:58 -0400 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <20060906152729.GD6928@mellanox.co.il> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> <1157552070.12940.6861.camel@hal.voltaire.com> <1157554690.2569.6.camel@fc6.xsintricity.com> <20060906152729.GD6928@mellanox.co.il> Message-ID: <1157557918.12940.10426.camel@hal.voltaire.com> On Wed, 2006-09-06 at 11:27, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote: > > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > > > > > > Nor is this feature uncontroversial. Would not support for log rotation > > > > be better? > > > > If you are just going to do log rotation, then no need to change opensm, > > just add an appropriate logrotate.d/opensm file to the distribution. > > I guess opensm will need to be signalled to close/reopen the log file though. > No? > > > But, that doesn't address what to do if you hit a full filesystem > > condition, > > Since logs are compressed this should at least alleviate that. > what do other daemons do? > > > nor how to limit the size of a log file between rotations > > again, what do other daemons do? > > > (which, as I understand it, is really only an issue because opensm can > > log so much), > > which is what this entire patch series was designed to > > address. They are two different problem spaces. > > So ... wouldn't it be better to address the real issue? > As I see it, the problem only appears if you activate opensm in the verbose > mode. And the reason to run so for a long time is only if you suspect you'll > want to debug something later, without killing opensm. Those patches are still pending and won't be in OFED 1,1, right ? > So the ability to control verbosity at runtime There already is a way to do that. -- Hal > will be a better solution > it seems, and there are patches that do that. From mst at mellanox.co.il Wed Sep 6 09:10:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 6 Sep 2006 19:10:54 +0300 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <1157557918.12940.10426.camel@hal.voltaire.com> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> <1157552070.12940.6861.camel@hal.voltaire.com> <1157554690.2569.6.camel@fc6.xsintricity.com> <20060906152729.GD6928@mellanox.co.il> <1157557918.12940.10426.camel@hal.voltaire.com> Message-ID: <20060906161054.GF6928@mellanox.co.il> Quoting r. Hal Rosenstock : > > > (which, as I understand it, is really only an issue because opensm can > > > log so much), > > > which is what this entire patch series was designed to > > > address. They are two different problem spaces. > > > > So ... wouldn't it be better to address the real issue? > > As I see it, the problem only appears if you activate opensm in the verbose > > mode. And the reason to run so for a long time is only if you suspect you'll > > want to debug something later, without killing opensm. > > Those patches are still pending and won't be in OFED 1,1, right ? Well, I donnu. If it's assumed reducing log size is important enough for 1.1, then maybe applying these patches are the way to go? -- MST From halr at voltaire.com Wed Sep 6 09:13:22 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2006 12:13:22 -0400 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <20060906161054.GF6928@mellanox.co.il> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> <1157552070.12940.6861.camel@hal.voltaire.com> <1157554690.2569.6.camel@fc6.xsintricity.com> <20060906152729.GD6928@mellanox.co.il> <1157557918.12940.10426.camel@hal.voltaire.com> <20060906161054.GF6928@mellanox.co.il> Message-ID: <1157559199.12940.11246.camel@hal.voltaire.com> On Wed, 2006-09-06 at 12:10, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > > > (which, as I understand it, is really only an issue because opensm can > > > > log so much), > > > > which is what this entire patch series was designed to > > > > address. They are two different problem spaces. > > > > > > So ... wouldn't it be better to address the real issue? > > > As I see it, the problem only appears if you activate opensm in the verbose > > > mode. And the reason to run so for a long time is only if you suspect you'll > > > want to debug something later, without killing opensm. > > > > Those patches are still pending and won't be in OFED 1,1, right ? > > Well, I donnu. If it's assumed reducing log size is important enough > for 1.1, then maybe applying these patches are the way to go? Right now these also involve an API change and that patch is being reworked (so they couldn't possibly make OFED 1.1 rc4). -- Hal From mst at mellanox.co.il Wed Sep 6 09:34:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 6 Sep 2006 19:34:01 +0300 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than pollutingnamespace In-Reply-To: <1157559199.12940.11246.camel@hal.voltaire.com> References: <1157559199.12940.11246.camel@hal.voltaire.com> Message-ID: <20060906163401.GG6928@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than pollutingnamespace > > On Wed, 2006-09-06 at 12:10, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > > > (which, as I understand it, is really only an issue because opensm can > > > > > log so much), > > > > > which is what this entire patch series was designed to > > > > > address. They are two different problem spaces. > > > > > > > > So ... wouldn't it be better to address the real issue? > > > > As I see it, the problem only appears if you activate opensm in the verbose > > > > mode. And the reason to run so for a long time is only if you suspect you'll > > > > want to debug something later, without killing opensm. > > > > > > Those patches are still pending and won't be in OFED 1,1, right ? > > > > Well, I donnu. If it's assumed reducing log size is important enough > > for 1.1, then maybe applying these patches are the way to go? > > Right now these also involve an API change and that patch is being > reworked (so they couldn't possibly make OFED 1.1 rc4). Actually I was under impression that that patch was preserving the exiting API (only extension). I hope we all agree API breakage isn't an option at this point. -- MST From sean.hefty at intel.com Wed Sep 6 09:34:54 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 6 Sep 2006 09:34:54 -0700 Subject: [openib-general] [librdmacm] execuation of the the test udaddy is failing In-Reply-To: <44FEE44A.60308@dev.mellanox.co.il> Message-ID: <000201c6d1d2$625590f0$51c8180a@amr.corp.intel.com> > # udaddy >udaddy: starting server >librdmacm: Kernel ABI does not support requested port space. >udaddy: listen request failed >test complete >return status -93 UD QP and multicast support requires kernel ABI version 2. It appears that the kernel version running is 1. - Sean From Don.Dhondt at Bull.com Wed Sep 6 10:56:37 2006 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Wed, 6 Sep 2006 10:56:37 -0700 Subject: [openib-general] Latency Problem with MT25204 HCAs Message-ID: We are seeing a latency problem that seems to be specific to the Mellanox MT25204 HCA. We do not see the same problem with MT25208 HCAs running in MT23108 compatibility mode. The problem is demonstrated running multiple streams of ib_rdma_lat. On the SDR MT25208 HCA: typical latency 1 stream 3.70 usec 2 streams 4.47 usec 4 streams 6.74 usec On the DDR MT25204 HCA: typical latency 1 stream 3.03 usec 2 streams 7.36 usec 4 streams 22.4 usec Can anyone explain this behavior? We are running OFED 1.0 release on a pair of EM64T dual CPU nodes. ibstat output: CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0005ad0000035950 System image GUID: 0x0005ad000100d050 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 21 LMC: 0 SM lid: 16 Capability mask: 0x02500a68 Port GUID: 0x0005ad0000035951 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02500a68 Port GUID: 0x0005ad0000035952 CA 'mthca1' CA type: MT25204 Number of ports: 1 Firmware version: 1.0.800 Hardware version: a0 Node GUID: 0x0002c90200216e40 System image GUID: 0x0002c90200216e43 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 22 LMC: 0 SM lid: 22 Capability mask: 0x02500a6a Port GUID: 0x0002c90200216e41 -Don -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjwalsh at pathscale.com Wed Sep 6 11:39:29 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 06 Sep 2006 11:39:29 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <44FEAFB1.3040902@dev.mellanox.co.il> References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com> <44FE228E.9050402@pathscale.com> <44FEAFB1.3040902@dev.mellanox.co.il> Message-ID: <44FF15E1.4040704@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tziporet Koren wrote: > Robert Walsh wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> >>> Here is a slightly modified patch for your attributes issue. Can you >>> give it a try? >>> >> >> I rebuilt OFED from scratch with the patch, and ran successfully on >> Intel MPI 2.0.1 with the refresh patch. I could not get it to run on >> Intel MPI 3.0b. If you could verify that the fix you mentioned that is >> in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. >> If you have a later beta version you could send me, that would be great, >> too. >> >> Regards, >> Robert. >> > I added this patch under fixes to OFED 1.1. Will be in RC4 Excellent. Thanks, Tziporet. Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP8V4fzvnpzTd9fxAQLZVAf+IYtLA2c7cBCbzih2Suy4AHUdD1CghC0U XL+iWjLo4TFbcUhBIrzwG4M72VQanqhNr2Qs3ZtfU2+qN6qKnSZXdejd7nYYOAsz 5LnrWa6Y+9Jfy3K/JOQ4wpjc3lWs3rvuzPTBhmEPcNHZk5+/m0gbfzYLdrc2djPp soyFSQpyLdpF0J5iY12EWiPYnFK7ConoqYHkTODZV8IjBJIImvDoScouIC+Uzi+x HlANIlneKa4/zQHNaK+3vZ6N7ZUq30quMZU6ICMI2gzFEzsEe/HxbtnraXfnXH1J NQ4mMOJNXwPVveNn1E9zA7IgFTMYsnGH080O5saloj2S6P6jb3PLXw== =mDD0 -----END PGP SIGNATURE----- From arlin.r.davis at intel.com Wed Sep 6 13:43:47 2006 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Wed, 6 Sep 2006 13:43:47 -0700 Subject: [openib-general] uDAPL patch Message-ID: >Hi James, > >I don't know if you've been following the back-and-forth on >openib-general concerning the problems we've had running Intel MPI on >QLogic's adapters. Basically, between Arlin Davis and myself, we've >come up with a patch to uDAPL to fix some uninitialized fields returned >by dat_ia_query() that allows the InfiniPath adapters to work correctly. Committed in OpenFabrics (svn9315) and SourceForge (svn1411). Thanks, -arlin From bos at pathscale.com Wed Sep 6 15:54:12 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 06 Sep 2006 15:54:12 -0700 Subject: [openib-general] [PATCH] Reduce packet loss in receive path, OFED 1.1 Message-ID: <1157583252.22887.62.camel@sardonyx> Hi, Tziporet - This is another patch for RC4, which reduces the likelihood of packet loss when the receiver is being saturated with packets. Please apply. Thanks, Signed-off-by: Bryan O'Sullivan diff -r d8eed27eaaa2 drivers/infiniband/hw/ipath/Makefile --- a/drivers/infiniband/hw/ipath/Makefile Wed Sep 06 13:26:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/Makefile Wed Sep 06 15:48:34 2006 -0700 @@ -31,4 +31,5 @@ ib_ipath-y := \ ipath_verbs.o ib_ipath-$(CONFIG_X86_64) += ipath_wc_x86_64.o +ib_ipath-$(CONFIG_X86_64) += memcpy_cachebypass_x86_64.o ib_ipath-$(CONFIG_PPC64) += ipath_wc_ppc64.o diff -r d8eed27eaaa2 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Wed Sep 06 13:26:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Wed Sep 06 15:48:45 2006 -0700 @@ -40,6 +40,12 @@ #include "ipath_verbs.h" #include "ipath_common.h" +#ifdef __x86_64__ +void *memcpy_cachebypass(void *, const void *, __kernel_size_t); +#else +#define memcpy_cachebypass(a,b,c) memcpy((a),(b),(c)) +#endif + static unsigned int ib_ipath_qp_table_size = 251; module_param_named(qp_table_size, ib_ipath_qp_table_size, uint, S_IRUGO); MODULE_PARM_DESC(qp_table_size, "QP table size"); @@ -167,7 +173,7 @@ void ipath_copy_sge(struct ipath_sge_sta BUG_ON(len == 0); if (len > length) len = length; - memcpy(sge->vaddr, data, len); + memcpy_cachebypass(sge->vaddr, data, len); sge->vaddr += len; sge->length -= len; sge->sge_length -= len; diff -r d8eed27eaaa2 drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S Wed Sep 06 15:48:34 2006 -0700 @@ -0,0 +1,115 @@ + .text + .p2align 4,,15 + /* rdi destination, rsi source, rdx count */ + .globl memcpy_cachebypass + .type memcpy_cachebypass, @function +memcpy_cachebypass: + movq %rdi, %rax +.L5: + cmpq $15, %rdx + ja .L34 +.L3: + cmpl $8, %edx /* rdx is 0..15 */ + jbe .L9 +.L6: + testb $8, %dxl /* rdx is 3,5,6,7,9..15 */ + je .L13 + movq (%rsi), %rcx + addq $8, %rsi + movq %rcx, (%rdi) + addq $8, %rdi +.L13: + testb $4, %dxl + je .L15 + movl (%rsi), %ecx + addq $4, %rsi + movl %ecx, (%rdi) + addq $4, %rdi +.L15: + testb $2, %dxl + je .L17 + movzwl (%rsi), %ecx + addq $2, %rsi + movw %cx, (%rdi) + addq $2, %rdi +.L17: + testb $1, %dxl + je .L33 +.L1: + movzbl (%rsi), %ecx + movb %cl, (%rdi) +.L33: + ret +.L34: + cmpq $63, %rdx /* rdx is > 15 */ + ja .L64 + movl $16, %ecx /* rdx is 16..63 */ +.L25: + movq 8(%rsi), %r8 + movq (%rsi), %r9 + addq %rcx, %rsi + movq %r8, 8(%rdi) + movq %r9, (%rdi) + addq %rcx, %rdi + subq %rcx, %rdx + cmpl %edx, %ecx /* is rdx >= 16? */ + jbe .L25 + jmp .L3 /* rdx is 0..15 */ + .p2align 4,,7 +.L64: + movl $64, %ecx +.L42: + prefetchnta 128(%rsi) + movq (%rsi), %r8 + movq 8(%rsi), %r9 + movq 16(%rsi), %r10 + movq 24(%rsi), %r11 + subq %rcx, %rdx + movq %r8, (%rdi) + movq 32(%rsi), %r8 + movq %r9, 8(%rdi) + movq 40(%rsi), %r9 + movq %r10, 16(%rdi) + movq 48(%rsi), %r10 + movq %r11, 24(%rdi) + movq 56(%rsi), %r11 + addq %rcx, %rsi + movq %r8, 32(%rdi) + movq %r9, 40(%rdi) + movq %r10, 48(%rdi) + movq %r11, 56(%rdi) + addq %rcx, %rdi + cmpq %rdx, %rcx /* is rdx >= 64? */ + jbe .L42 + sfence + orl %edx, %edx + je .L33 + jmp .L5 +.L9: + jmp *.L12(,%rdx,8) /* rdx is 0..8 */ + .section .rodata + .align 8 + .align 4 +.L12: + .quad .L33 + .quad .L1 + .quad .L2 + .quad .L6 + .quad .L4 + .quad .L6 + .quad .L6 + .quad .L6 + .quad .L8 + .text +.L2: + movzwl (%rsi), %ecx + movw %cx, (%rdi) + ret +.L4: + movl (%rsi), %ecx + movl %ecx, (%rdi) + ret +.L8: + movq (%rsi), %rcx + movq %rcx, (%rdi) + ret From bugzilla-daemon at openib.org Wed Sep 6 16:01:45 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 6 Sep 2006 16:01:45 -0700 (PDT) Subject: [openib-general] [Bug 222] New: ib_uverbs fails to load on ia64, OFED 1.1 - rc3 Message-ID: <20060906230145.640032283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=222 Summary: ib_uverbs fails to load on ia64, OFED 1.1 - rc3 Product: OpenFabrics Linux Version: 1.1rc3 Platform: IA64 OS/Version: RHEL 4 Status: NEW Severity: blocker Priority: P1 Component: Verbs AssignedTo: bugzilla at openib.org ReportedBy: robert.j.woodruff at intel.com OFED 1.1-rc3 ib_uverbs fails to load on Itanium on RHEL4-U3, due to unknown symbol hpage_shift. This is a new bug that did not happen with OFED 1.1-rc2. /etc/init.d/openibd start Loading HCA driver and Access Layer: [FAILED] Please open an issue in the http://openib.org/bugzilla and attach /tmp/ib_debug_info.log > dmesg ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) ib_mthca: Initializing 0000:0d:00.0 ACPI: PCI interrupt 0000:0d:00.0[A] -> GSI 76 (level, low) -> IRQ 58 ib_mthca 0000:0d:00.0: HCA FW version 3.3.2 is old (3.4.0 is current). ib_mthca 0000:0d:00.0: If you have problems, try updating your HCA FW. ib_uverbs: Unknown symbol hpage_shift divert: not allocating divert_blk for non-ethernet device ib0 divert: not allocating divert_blk for non-ethernet device ib1 ip_tables: (C) 2000-2002 Netfilter core team ib0: no IPv6 routers present ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From robert.j.woodruff at intel.com Wed Sep 6 16:56:55 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 6 Sep 2006 16:56:55 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready Message-ID: Robert Walsh wrote, >> I rebuilt OFED from scratch with the patch, and ran successfully on >> Intel MPI 2.0.1 with the refresh patch. I could not get it to run on >> Intel MPI 3.0b. If you could verify that the fix you mentioned that is >> in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. >> If you have a later beta version you could send me, that would be great, >> too. >> >> Regards, >> Robert. I spoke with our MPI team lead and it is very likely that the fix that is in 2.0.1-refresh did not make it into 3.0 beta, but it should be in the 3.0 release schedule to be completed in a couple of weeks. woody From rjwalsh at pathscale.com Wed Sep 6 17:16:09 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 06 Sep 2006 17:16:09 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <44FF64C9.4090608@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I spoke with our MPI team lead and it is very likely that the fix that > is in 2.0.1-refresh did not make it into 3.0 beta, but it should be > in the 3.0 release schedule to be completed in a couple of weeks. OK then - I'll wait for that. Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP9kyfzvnpzTd9fxAQJu/wf+PEjyS1xAKzmXD+oZJxUNNeaW7QpqKz3h zc370m74yIWjI+8GianGN4VM6Zx4InPdsRbGNGTd+FRhmZvYDhuuo8VBQUDdAZdB Tkm+PomDIWdftj8cWCsiah4UkhzRv//83TiIkGZ5+zk25qOvQ6VAW4fy6vpJhKvo uTW9Sow/G/BAIuMZ8wwg5Jyz5kbYxDxr+21jzQ+nblM/6YdGVco3GI1/z/dXwK5V JEPIEu4ZxExOU9yGqS/hculq2Z9WFyGTBYoll67KkhpOuLUxiCxCxStA8Z0x52fG OIhL0vKYgiOWLZnxZONRsy89OR/mUV7SNZeOZVqJSqMh7SpeLWWYHQ== =SRiy -----END PGP SIGNATURE----- From tom at opengridcomputing.com Wed Sep 6 20:51:11 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 06 Sep 2006 22:51:11 -0500 Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups In-Reply-To: Message-ID: Roland: Is there anything we know about that is still unresolved at this point? We've got a bunch of balls up in the air here and I want to make sure we haven't dropped one. Thanks, Tom On 9/5/06 5:39 PM, "Roland Dreier" wrote: > Steve> Its old debug code that isn't used anywhere. It would be > Steve> nice to keep it around, but if you really don't want it, > Steve> nuke it... > > No, that's fine, I'll leave it inside the #if 0. > > - R. From dledford at redhat.com Wed Sep 6 21:16:00 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 07 Sep 2006 00:16:00 -0400 Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather than polluting namespace In-Reply-To: <20060906151659.GC6928@mellanox.co.il> References: <1157548884.12940.5019.camel@hal.voltaire.com> <20060906134206.GB6928@mellanox.co.il> <1157552070.12940.6861.camel@hal.voltaire.com> <20060906151659.GC6928@mellanox.co.il> Message-ID: <1157602561.4652.53.camel@fc6.xsintricity.com> On Wed, 2006-09-06 at 18:16 +0300, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > > > > > OpenSM/osm_log API: Rather than polluting the namespace with needless > > > > symbols, use symbol versions and have a versioned osm_log_init rather > > > > than adding osm_log_init_v2 as an additional API > > > > > > > > This patch is intended to be applied to both trunk and 1.1 versions. > > > > > > > > Signed-off-by: Doug Ledford > > > > Signed-off-by: Hal Rosenstock > > > > > > This preserves the ABI, but would this not break the API? > > > > Yes, this patch changes the API (in a most trivial way). > > So all users need to change code or they won't compile against the new > library? Yes, and that is the correct way to handle this change. I could see leaving the whole log init change out entirely, but if it's going to go in, this is the right way to do it. > Not sure what do you mean by upward compatible. This API change does not seem to > be backward compatible - won't it break building dependent applications? > If so is not something you should do after code freeze. APIs change. Any app you can build can compensate. The goal is to keep apps that aren't recompiled working, and to make apps that are recompiled compliant with the latest version of the function. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From dotanb at dev.mellanox.co.il Wed Sep 6 22:31:40 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 07 Sep 2006 08:31:40 +0300 Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work In-Reply-To: References: Message-ID: <44FFAEBC.9020109@dev.mellanox.co.il> Hi Thomas. Bub Thomas wrote: > > I’m still in the process of porting my gen1 code to gen2. > > As mentioned yesterday I can connect to a listener on the same machine > using libibcm. > > Doing this I have to do the ibv_modify_qp by myself to get the qp’s > from INIT via RTR to RTS on both sides. > > At least the ibv_modify_qp doesn not complain when having done the > connection via the libibcm. > > So my assumption is I have my two qp’s successfully connected. > > First action after the connection is the listener to wait on it’s > receive cq for an IBV_WR_SEND done by the connector. > > Here is now the problem: > > · The listener never gets a completion > > · The connector doing the IBV_WR_SEND does get error on the send cq like > opcode=0x7f status=0x5 vendor_err=129 for the first IBV_WR_SEND and > opcode=0x7f status=0xc vendor_err=129 for all sub-sequent attempt to > send the data > > Is there anyone out there who can help me out to understand the error > codes and or to understand what is wrong? > > Thanks in advance from Germany > > Thomas > Which QP do you use (RC/UC/UD)? do you get any completion in the connector side? If you are using RC QP: the reason for not getting any completion in the CQ is that Did you post any RR (Receive Request) at the listener side? rnr_retry =7 means that in case of RNR retry there will be infinite retries if the timeout = 0 and the remote QP is not ready then there won't be any retransmition. Dotan From bugzilla-daemon at openib.org Wed Sep 6 23:04:17 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 6 Sep 2006 23:04:17 -0700 (PDT) Subject: [openib-general] [Bug 222] ib_uverbs fails to load on ia64, OFED 1.1 - rc3 Message-ID: <20060907060417.DBC4C2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=222 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from tziporet at mellanox.co.il 2006-09-06 23:04 ------- A fix was done in the way page shift are calculated in Itanium. Will be part of RC4 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sweitzen at cisco.com Wed Sep 6 23:22:48 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 6 Sep 2006 23:22:48 -0700 Subject: [openib-general] Cisco SQA results so far for OFED 1.1 rc3 Message-ID: Testing is still continuing, we have not started testing RHEL4 U4, SLES 10, IPoIB HA, or SRP HA yet. Some high points of rc3 (all testing done with Mellanox HCAs and Cisco switches): * We have migrated from Intel 9.0 compilers to 9.1. * Are seeing 2 million msg/sec with MVAPICH. * Sinai 1.1.000 firmware fixes SDP scalability. * Open MPI 1.1.1 is working better than 1.1. * We see up to 3.5 Gb/sec max throughput with IPoIB on latest Intel Xeon and AMD Opteron processors. Many bug fixes have been tested: * 193 OFED 1.1 rc1: openib-diags should not be linked with opensm libs * 197 OFED 1.1rc1: Open MPI fails on RHEL4 64-bit * 74 OFED 1.0 rc4: Open MPI Pallas test hangs * 109 OFED 1.0 rc5: SDP can't sustain 100+ concurrent SDP connections (mem leak?) * 101 OFED 1.0: need documentation on openib-diags * 80 OFED 1.0 rc4: Open MPI fails on RHEL4 U3 ppc64 * 135 OFED 1.0: MVAPICH doesn't work on RHEL4 U3 ppc64 * 176 OFED 1.0: mpicc fails with Intel C on RHEL4 IA64 * 179 move /usr/local/ofed/sbin binaries to /usr/local/ofed/bin * 103 OFED 1.0: change ibutils to not depend on opensm Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed_sqa_results.xls Type: application/vnd.ms-excel Size: 81408 bytes Desc: ofed_sqa_results.xls URL: From mst at mellanox.co.il Wed Sep 6 23:22:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 7 Sep 2006 09:22:43 +0300 Subject: [openib-general] OpenSM/osm_log API: Use symbol versionsrather than polluting namespace In-Reply-To: <1157602561.4652.53.camel@fc6.xsintricity.com> References: <1157602561.4652.53.camel@fc6.xsintricity.com> Message-ID: <20060907062243.GH6928@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: [openib-general] OpenSM/osm_log API: Use symbol versionsrather than polluting namespace > > On Wed, 2006-09-06 at 18:16 +0300, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > > > > Quoting r. Hal Rosenstock : > > > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > > > > > > > OpenSM/osm_log API: Rather than polluting the namespace with needless > > > > > symbols, use symbol versions and have a versioned osm_log_init rather > > > > > than adding osm_log_init_v2 as an additional API > > > > > > > > > > This patch is intended to be applied to both trunk and 1.1 versions. > > > > > > > > > > Signed-off-by: Doug Ledford > > > > > Signed-off-by: Hal Rosenstock > > > > > > > > This preserves the ABI, but would this not break the API? > > > > > > Yes, this patch changes the API (in a most trivial way). > > > > So all users need to change code or they won't compile against the new > > library? > > Yes, and that is the correct way to handle this change. I disagree. In my opinion, asking all users to add a parameter they don't care about is worse than having multiple functions with a convenient set of options. And if there is a low cost way to help apps compile without code change, I don't see why it makes sense to create work. Even if this were a good idea, I don't think introducing a flag day for all users without warning is as good way to extend library APIs. I would expect at least one release where both new and old functions are available. > I could see > leaving the whole log init change out entirely, but if it's going to go > in, this is the right way to do it. Maybe it should be left out. Whether the issue this addresses is critical for release is Hal's call. But if the change affects other modules I think it's clear we won't be able to take the fix. > > Not sure what do you mean by upward compatible. This API change does not seem to > > be backward compatible - won't it break building dependent applications? > > If so is not something you should do after code freeze. > > APIs change. APIs should not change with every release. > Any app you can build can compensate. Sure it seems simple if you are RedHat and rebuild the whole OS. However, let us look at an application vendor that cares about portability. What this "trivial" change involves is: 1. add a configure hook to check library version installed 2. define an approprite macro 3. add a wrapper in header file to call the appropriate function 4. update the application to use the wrapper instead of the function directly All this after a supposed code freeze. > The goal is to keep > apps that aren't recompiled working, and to make apps that are > recompiled compliant with the latest version of the function. We are past code freeze. I agree with Hal that it might be hard to draw a line between a critical and a non-critical bugfix. However, an API change that 1. is purely cosmetical 2. requires code changes in dependent applications 3. is not uncontroversial is, for me, obviously beyond that line. -- MST From eli at dev.mellanox.co.il Wed Sep 6 23:28:58 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Thu, 07 Sep 2006 09:28:58 +0300 Subject: [openib-general] PXE + infiniband? In-Reply-To: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com> References: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com> Message-ID: <1157610538.30038.35.camel@localhost> On Fri, 2006-09-01 at 14:51 -0400, Cain, Brian (GE Healthcare) wrote: > A while back > (http://openib.org/pipermail/openib-general/2005-September/010801.html) > there was mention of putting PXE stuff on an HCA. Has anyone done this > with PXELINUX? It doesn't seem like it's as straightforward as just > putting the stock PXELINUX image on your HCA. I'm assuming this image > would have to recognize the HCA and bring up IPoIB in order to use the > conventional TFTP transport? There is an implementation of PXE for Mellanox's HCAs that can be found here: http://sourceforge.net/forum/forum.php?forum_id=494529 From paul.baxter at dsl.pipex.com Thu Sep 7 00:19:15 2006 From: paul.baxter at dsl.pipex.com (Paul Baxter) Date: Thu, 7 Sep 2006 08:19:15 +0100 Subject: [openib-general] PXE + infiniband? References: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com> <1157610538.30038.35.camel@localhost> Message-ID: <003e01c6d24d$f19caed0$8000a8c0@blorp> > There is an implementation of PXE for Mellanox's HCAs that can be found > here: http://sourceforge.net/forum/forum.php?forum_id=494529 Thanks for the tip I, too, am interested in this. Do you have a more direct link as I wandered around etherboot's project site and couldn't find anything IB-specific. Paul Baxter From paul.baxter at dsl.pipex.com Thu Sep 7 00:28:39 2006 From: paul.baxter at dsl.pipex.com (Paul Baxter) Date: Thu, 7 Sep 2006 08:28:39 +0100 Subject: [openib-general] PXE + infiniband? References: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com> <1157610538.30038.35.camel@localhost> <003e01c6d24d$f19caed0$8000a8c0@blorp> Message-ID: <005701c6d24f$3cf2c3f0$8000a8c0@blorp> >> There is an implementation of PXE for Mellanox's HCAs that can be found >> here: http://sourceforge.net/forum/forum.php?forum_id=494529 > > Thanks for the tip > > I, too, am interested in this. > > Do you have a more direct link as I wandered around etherboot's project > site > and couldn't find anything IB-specific. I must have been having a 'special moment' before, because I couldn't find the mailing lists Here they are! http://sourceforge.net/search/?ml_name=etherboot-developers&type_of_search=mlists&group_id=4233&words=infiniband From dotanb at dev.mellanox.co.il Thu Sep 7 01:01:53 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 07 Sep 2006 11:01:53 +0300 Subject: [openib-general] [librdmacm] execuation of the the test udaddy is failing In-Reply-To: <000201c6d1d2$625590f0$51c8180a@amr.corp.intel.com> References: <000201c6d1d2$625590f0$51c8180a@amr.corp.intel.com> Message-ID: <44FFD1F1.7050204@dev.mellanox.co.il> Sean Hefty wrote: >> # udaddy >> udaddy: starting server >> librdmacm: Kernel ABI does not support requested port space. >> udaddy: listen request failed >> test complete >> return status -93 >> > > UD QP and multicast support requires kernel ABI version 2. It appears that the > kernel version running is 1. > > Thanks, that was the problem. Dotan From dotanb at dev.mellanox.co.il Thu Sep 7 01:11:38 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 07 Sep 2006 11:11:38 +0300 Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. In-Reply-To: References: Message-ID: <44FFD43A.4020108@dev.mellanox.co.il> Bub Thomas wrote: > Dotan, > the ibv_rc_pingpong example works for me so I can exclude the > architecture. > I never got the libibcm example compiled. > Which is your example and which architecture x86 vs. x86_64 did you > compile it for? > Can you share your libibcm the example code? (if it is not the standard > that I can't get compiled) > Thomas > I started to modify the qp_test (a test that can be found in https://openib.org/svn/trunk/contrib/mellanox/ibtp/gen2/userspace/useraccess/qp_test/) here is the main file that deals with the libibcm. I'm sorry, but if you'll add this file to the qp_test it won't compile (because of some more changed in the code). When I'll finish to clean the code i will commit the full version (with the libibcm support) to the openib svn. I hope that this code will help you ... Dotan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: connect_qp.c URL: From moshek at voltaire.com Thu Sep 7 02:31:59 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 7 Sep 2006 12:31:59 +0300 Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question Message-ID: Let assume that the HCA has wrong FWR and/or other reason that cause driver load failure ? We have to check what's going on in this case. -> mstflint is one of our tools. Moshe. ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Wednesday, September 06, 2006 4:25 PM To: Moshe Kazir Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org; Yiftah Shahar; Tseng-hui Lin Subject: Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question Quoting r. Moshe Kazir : > Is it time to create a work arround that opens /proc/bus/pci/ .... > And always work ? But why isn't the driver loaded? -- MST From tziporet at dev.mellanox.co.il Thu Sep 7 03:50:11 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 07 Sep 2006 13:50:11 +0300 Subject: [openib-general] [PATCH] Reduce packet loss in receive path, OFED 1.1 In-Reply-To: <1157583252.22887.62.camel@sardonyx> References: <1157583252.22887.62.camel@sardonyx> Message-ID: <44FFF963.6030508@dev.mellanox.co.il> Bryan O'Sullivan wrote: > Hi, Tziporet - > > This is another patch for RC4, which reduces the likelihood of packet > loss when the receiver is being saturated with packets. Please apply. > > this patch is in for RC4 Tziporet From ishai at mellanox.co.il Thu Sep 7 04:05:39 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Thu, 7 Sep 2006 14:05:39 +0300 Subject: [openib-general] FW: OFED 1.1 rc3 srp driver panic Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302D86A99@mtlexch01.mtl.com> Here is an Oops in ib_srp ________________________________ From: Dachepalli, Sudhir [mailto:Sudhir.Dachepalli at lsil.com] Sent: Tuesday, September 05, 2006 11:09 PM To: Vu Pham Cc: Richard, Bill Subject: OFED 1.1 rc3 srp driver panic Hello Vu, I am trying to integrate MPP and OFED 1.1 rc3 srp. Status on following 2 issues. * New Host number allocation for controller offline / online - MPP will handle this with out the need to run hot_add. we need to use srp_daemon. * scsi error handler invocation - we need to figure out how to cleanly exit out of error handler after cleaning up all the IO's - THIS IS THE BIGGEST ISSUE NOW. Panic I noticed the following panic when I performed sysreboot on controller A while IO is going on : ib_srp: SRP reset_host called ib_srp: QP event 1 ib_srp: QP event 1 ib_srp: connection closed Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<0000000000000000>] PML4 214f0d067 PGD 214657067 PMD 0 Oops: 0010 [1] SMP CPU 1 Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc rdma_ucm(U) rdma_cm(U) ib_addr(U) ib_srp(U) ds yenta_socket pcmcia_core dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib _sa(U) ib_mad(U) ib_core(U) e1000 ext3 jbd mppVhba(U) qla2400 qla2322 qla2xxx scsi_transport_fc mptscsih mptsas mptspi mptfc mptscsi mptbase ata_piix libata mppUpper(U) sg sd_mod scsi_mod Pid: 4991, comm: scsi_eh_7 Not tainted 2.6.9-34.ELsmp RIP: 0010:[<0000000000000000>] [<0000000000000000>] RSP: 0018:000001021202dd70 EFLAGS: 00010006 RAX: 0000010210234100 RBX: 00000102114b0a28 RCX: 00000102114b08a0 RDX: 00000102114b08b0 RSI: 00000102114b0a28 RDI: 0000010210234100 RBP: 00000102114b0c08 R08: 0000000000000000 R09: 0000000210d6c000 R10: 00000102114b03c8 R11: 00000102114b03c8 R12: 00000102114b03c8 R13: 000001021202dee8 R14: 00000102114b0000 R15: 000001021202ded8 FS: 0000002a9589a760(0000) GS:ffffffff804d7b80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000cfe58000 CR4: 00000000000006e0 Process scsi_eh_7 (pid: 4991, threadinfo 000001021202c000, task 0000010211e9b030) Stack: ffffffffa02a1792 00000102114b08b0 00000102114b03c8 0000000000000000 ffffffffa02a18bf 0000003000000008 000001021202de78 000001021202ddb8 0000010211e9b030 ffffffff801333c8 Call Trace:{:ib_srp:srp_reset_req+37} {:ib_srp:srp_reconnect_target+288} {default_wake_function+0} {kobject_release+0} {:ib_srp:srp_reset_host+51} {:ib_srp:srp_reset_host+59} {:scsi_mod:scsi_try_host_reset+118} {:scsi_mod:scsi_error_handler+2347} {child_rip+8} {:scsi_mod:scsi_error_handler+0} {child_rip+0} Code: Bad RIP value. RIP [<0000000000000000>] RSP <000001021202dd70> CR2: 0000000000000000 <0>Kernel panic - not syncing: Oops Sudhir Dachepalli Engenio Storage Group LSI Logic Corporation 12331 Riata Trace Parkway Suite B200 Austin , Texas 78727 512 794 3706 phone 512 794 3702 fax sudhir.dachepalli at lsil.com www.lsilogic.com/engenio -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Thu Sep 7 04:24:05 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 7 Sep 2006 14:24:05 +0300 Subject: [openib-general] [openfabrics-ewg] Cisco SQA results so far for OFED 1.1 rc3 Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA787B@mtlexch01.mtl.com> Hi Scott, Thanks for the details report. This is the status of bugs that are not Cisco specific (e.g. tvflash) for RC4: 219 OFED 1.1rc3 contains prerelease unstable libibverbs code - fixed 221 SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118 - open and Roland should work on this 222 ib_uverbs fails to load on ia64, OFED 1.1 - rc3 (opened by Bob Woodruff, but we saw it too) - fixed Tziporet -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Thursday, September 07, 2006 9:23 AM To: EWG Cc: OPENIB Subject: [openfabrics-ewg] Cisco SQA results so far for OFED 1.1 rc3 Testing is still continuing, we have not started testing RHEL4 U4, SLES 10, IPoIB HA, or SRP HA yet. Some high points of rc3 (all testing done with Mellanox HCAs and Cisco switches): * We have migrated from Intel 9.0 compilers to 9.1. * Are seeing 2 million msg/sec with MVAPICH. * Sinai 1.1.000 firmware fixes SDP scalability. * Open MPI 1.1.1 is working better than 1.1. * We see up to 3.5 Gb/sec max throughput with IPoIB on latest Intel Xeon and AMD Opteron processors. Many bug fixes have been tested: * 193 OFED 1.1 rc1: openib-diags should not be linked with opensm libs * 197 OFED 1.1rc1: Open MPI fails on RHEL4 64-bit * 74 OFED 1.0 rc4: Open MPI Pallas test hangs * 109 OFED 1.0 rc5: SDP can't sustain 100+ concurrent SDP connections (mem leak?) * 101 OFED 1.0: need documentation on openib-diags * 80 OFED 1.0 rc4: Open MPI fails on RHEL4 U3 ppc64 * 135 OFED 1.0: MVAPICH doesn't work on RHEL4 U3 ppc64 * 176 OFED 1.0: mpicc fails with Intel C on RHEL4 IA64 * 179 move /usr/local/ofed/sbin binaries to /usr/local/ofed/bin * 103 OFED 1.0: change ibutils to not depend on opensm Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From ishai at dev.mellanox.co.il Thu Sep 7 04:18:22 2006 From: ishai at dev.mellanox.co.il (ishai at dev.mellanox.co.il) Date: Thu, 7 Sep 2006 14:18:22 +0300 (IDT) Subject: [openib-general] FW: OFED 1.1 rc3 srp driver panic In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E302D86A99@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E302D86A99@mtlexch01.mtl.com> Message-ID: <58551.194.90.237.34.1157627902.squirrel@dev.mellanox.co.il> I think I found the race that causes this NULL Dereference. 1) There is a connection error. 2) srp_completion gets bad status and schedules a call to srp_reconnect_work. 3) srp_reconnect_work is scheduled to run and calls srp_reconnect_target. 4) srp_reconnect_target starts to run, changes the target state to SRP_TARGET_CONNECTING but there is a context switch before it gets to execute srp_reset_req. 5) The scsi error handling calls to srp_reset_host. 6) srp_reset_host calls srp_reconnect_target that returns -EAGAIN (because the target state is not SRP_TARGET_LIVE). 7) srp_reset_host returns FAILED and therefore the device goes offline. 8) Because the device goes offline the commands are being freed (In the scsi mid-layer). 9) The first execution of srp_reconnect_target resumes and calls to srp_reset_req that tries to access the commands that were freed. 10) NULL deref. Ishai From mst at mellanox.co.il Thu Sep 7 05:00:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 7 Sep 2006 15:00:01 +0300 Subject: [openib-general] [PATCH] IB/srp: don't schedule reconnect from srp, scsi does it for us (was Re: FW: OFED 1.1 rc3 srp driver panic) In-Reply-To: <58551.194.90.237.34.1157627902.squirrel@dev.mellanox.co.il> References: <6AB138A2AB8C8E4A98B9C0C3D52670E302D86A99@mtlexch01.mtl.com> <58551.194.90.237.34.1157627902.squirrel@dev.mellanox.co.il> Message-ID: <20060907120001.GO6928@mellanox.co.il> Quoting r. ishai at dev.mellanox.co.il : > Subject: Re: FW: OFED 1.1 rc3 srp driver panic > > I think I found the race that causes this NULL Dereference. > > 1) There is a connection error. > > 2) srp_completion gets bad status and schedules a call to srp_reconnect_work. > > 3) srp_reconnect_work is scheduled to run and calls srp_reconnect_target. > > 4) srp_reconnect_target starts to run, changes the target state to > SRP_TARGET_CONNECTING but there is a context switch before it gets to > execute srp_reset_req. > > 5) The scsi error handling calls to srp_reset_host. > > 6) srp_reset_host calls srp_reconnect_target that returns -EAGAIN > (because the target state is not SRP_TARGET_LIVE). > > 7) srp_reset_host returns FAILED and therefore the device goes offline. > > 8) Because the device goes offline the commands are being freed (In the > scsi mid-layer). > > 9) The first execution of srp_reconnect_target resumes and calls to > srp_reset_req that tries to access the commands that were freed. > > 10) NULL deref. > > Ishai It seems that we don't really need to schedule srp_reconnect_work on error since it will be called later anyway. So it seems we can address these crashes and simplify srp in the following way: --- IB/srp: don't schedule reconnet from srp, scsi does it for us If there is a problem in the connection, the scsi mid-layer will eventually call reset_host that will call srp_reconnect, so we do not need to schedule srp_reconnect_work from srp_completion. Removing this prevents srp_reset_host from failing if srp_completion was in progress, which in turn was causing crashes as both scsi midlayer and srp_reconnect were cancelling commands. Signed-off-by: Ishai Rabinovitz Signed-off-by: Michael S. Tsirkin Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-06 15:37:50.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-07 11:16:28.000000000 +0300 @@ -799,13 +799,6 @@ static void srp_process_rsp(struct srp_t spin_unlock_irqrestore(target->scsi_host->host_lock, flags); } -static void srp_reconnect_work(void *target_ptr) -{ - struct srp_target_port *target = target_ptr; - - srp_reconnect_target(target); -} - static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc) { struct srp_iu *iu; @@ -858,7 +851,6 @@ static void srp_completion(struct ib_cq { struct srp_target_port *target = target_ptr; struct ib_wc wc; - unsigned long flags; ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(cq, 1, &wc) > 0) { @@ -866,10 +858,6 @@ static void srp_completion(struct ib_cq printk(KERN_ERR PFX "failed %s status %d\n", wc.wr_id & SRP_OP_RECV ? "receive" : "send", wc.status); - spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state == SRP_TARGET_LIVE) - schedule_work(&target->work); - spin_unlock_irqrestore(target->scsi_host->host_lock, flags); break; } @@ -1724,8 +1712,6 @@ static ssize_t srp_create_target(struct target->scsi_host = target_host; target->srp_host = host; - INIT_WORK(&target->work, srp_reconnect_work, target); - INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); for (i = 0; i < SRP_SQ_SIZE; ++i) { -- MST From vlad at mellanox.co.il Thu Sep 7 05:05:08 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 07 Sep 2006 15:05:08 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1-rc2 is ready (how do I enable madeye)? In-Reply-To: References: Message-ID: <45000AF4.8040103@mellanox.co.il> Madeye build will be available in OFED-1.1-rc4. To build madeye run: *export OPENIB_PARAMS="--with-madeye-mod"* (or put it into ofed.conf file for unattended installation) Run */install.sh /*( or /./install.sh -c openib.conf/ for unattended installation) Regards, Vladimir Scott Weitzenkamp (sweitzen) wrote: >> 5. Added Madeye utility >> > > How do I build madeye? I don't see any reference to it to install.sh. > Is there any documentation for madeye? > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From Brian.Cain at ge.com Thu Sep 7 06:32:10 2006 From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare)) Date: Thu, 7 Sep 2006 09:32:10 -0400 Subject: [openib-general] PXE + infiniband? In-Reply-To: <005701c6d24f$3cf2c3f0$8000a8c0@blorp> Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033DC1296@CINMLVEM11.e2k.ad.ge.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Paul Baxter > Sent: Thursday, September 07, 2006 2:29 AM > To: openib-general at openib.org; Eli cohen > Subject: Re: [openib-general] PXE + infiniband? > > >> There is an implementation of PXE for Mellanox's HCAs that > can be found > >> here: http://sourceforge.net/forum/forum.php?forum_id=494529 > > > > Thanks for the tip > > > > I, too, am interested in this. > > > > Do you have a more direct link as I wandered around > etherboot's project > > site > > and couldn't find anything IB-specific. > > > I must have been having a 'special moment' before, because I > couldn't find > the mailing lists > > Here they are! > > http://sourceforge.net/search/?ml_name=etherboot-developers&ty > pe_of_search=mlists&group_id=4233&words=infiniband I was able to follow the procedure outlined in Eli's README and I achieved some mixed results. On one hand, lspci now shows "Expansion ROM at ed700000 [disabled] [size=1M]" whereas it didn't indicate that before ("disabled" means it's zeroed out, maybe?). The BIOS seems to confirm the whole disabled thing since it doesn't list the HCA in the boot priority list. After making this change, IPoIB seems to work via this HCA, but SRP (initiation, anyways) no longer does. "ibsrpdm -c" no longer produces any output, even though I can see the target via the ibnetdiscover. Accessing the SRP target from another host on the fabric works fine. -Brian From rdreier at cisco.com Thu Sep 7 07:20:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 07 Sep 2006 07:20:22 -0700 Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups In-Reply-To: (Tom Tucker's message of "Wed, 06 Sep 2006 22:51:11 -0500") References: Message-ID: Tom> Is there anything we know about that is still unresolved at Tom> this point? We've got a bunch of balls up in the air here Tom> and I want to make sure we haven't dropped one. nope, I think we're good. From tom at opengridcomputing.com Thu Sep 7 07:56:22 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Thu, 07 Sep 2006 09:56:22 -0500 Subject: [openib-general] RDMA CMA and C++ Message-ID: <1157640982.20399.5.camel@trinity.ogc.int> Sean: The user-mode cm header files don't have the C++ stuff to identify all the declarations as C. The verbs.h file has it and works fine if you wanted to copy it, but all you really need is ... #ifdef __cpluplus extern "C" { #endif at the top and and, #ifdef __cplusplus } #endif at the bottom of the file. Thanks, Tom From dotanb at dev.mellanox.co.il Thu Sep 7 08:13:21 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 07 Sep 2006 18:13:21 +0300 Subject: [openib-general] RDMA CMA and C++ In-Reply-To: <1157640982.20399.5.camel@trinity.ogc.int> References: <1157640982.20399.5.camel@trinity.ogc.int> Message-ID: <45003711.3040108@dev.mellanox.co.il> Tom Tucker wrote: > Sean: > > The user-mode cm header files don't have the C++ stuff to identify all > the declarations as C. The verbs.h file has it and works fine if you > wanted to copy it, but all you really need is ... > > #ifdef __cpluplus > extern "C" { > #endif > > at the top and and, > > #ifdef __cplusplus > } > #endif > > at the bottom of the file. > > Thanks, > Tom > Sean, please add those definitions to the libibcm header as well. Dotan From thomas.bub at thomson.net Thu Sep 7 08:20:15 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Thu, 7 Sep 2006 17:20:15 +0200 Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. Message-ID: Sean, Finally I could compile the cmpost example. The solution was: 1.) Use the OFED-1.1-rc3 instead of OFED-1.0.1 This removed some missing DEFINES. As an "End-User" ;-) I'm not following the SVN tree but installing releases. 2.) Add "#include " to cmpost.c. Now I can compile and use the example at least on one machine. The issues with client and server on one machine, that I reported yesterday, are not visible as well. So I'm able now to debug my connection establishment and the initial data exchange. Next week I can debug cmpost on two different machines, my second machine has been stolen by a developer colleague till mid of next week. ;-) I would suggest that the cmpost.c example, including the missing include from above, might be integrated into the next OFED-release? Thanks Thomas -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Sean Hefty Sent: Tuesday, September 05, 2006 8:15 PM To: Bub Thomas Cc: openib-general at openib.org Subject: Re: [openib-general] libibcm can't connect/talk to libicm on other machine. Bub Thomas wrote: > Dotan, > the ibv_rc_pingpong example works for me so I can exclude the > architecture. > I never got the libibcm example compiled. > Which is your example and which architecture x86 vs. x86_64 did you > compile it for? > Can you share your libibcm the example code? (if it is not the standard > that I can't get compiled) > Thomas Did you try applying the following patch? http://openib.org/pipermail/openib-general/2006-August/025005.html I should also mention that I have a version of cmpost that works with the new libibsa, but I am waiting for the review of the kernel sa_query changes before committing. - Sean _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From thomas.bub at thomson.net Thu Sep 7 08:29:55 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Thu, 7 Sep 2006 17:29:55 +0200 Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work Message-ID: Dotan, Find my answers inline. Since I could get the cmpost example from Sean compiled and running I will try to compare cmpost.c with my code and find the bugs in my code this way. I will keep your connect_qp example for the case that I can't find the problems the other way. Thanks Thomas Which QP do you use (RC/UC/UD)? [Bub] Rc do you get any completion in the connector side? [Bub] Only the the errors If you are using RC QP: the reason for not getting any completion in the CQ is that Did you post any RR (Receive Request) at the listener side? [Bub] yes rnr_retry =7 means that in case of RNR retry there will be infinite retries [Bub] rnr_retry is at 4 if the timeout = 0 and the remote QP is not ready then there won't be any retransmition. [Bub] I have the timeout at 254 Dotan From swise at opengridcomputing.com Thu Sep 7 08:50:45 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 07 Sep 2006 10:50:45 -0500 Subject: [openib-general] missing dtest program evdtest.c Message-ID: <1157644245.28308.55.camel@stevo-desktop> Is dapl/test/dtest missing evdtest.c? Its in the makefile... Steve. From toralf.foerster at gmx.de Thu Sep 7 10:02:56 2006 From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=) Date: Thu, 7 Sep 2006 19:02:56 +0200 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' Message-ID: <200609071902.57379.toralf.foerster@gmx.de> The compile test of the attached .config failed : ... drivers/built-in.o: In function `iser_connect': drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' drivers/infiniband/ulp/iser/iser_verbs.c:525: undefined reference to `rdma_resolve_addr' drivers/built-in.o: In function `iscsi_transport_init': drivers/scsi/scsi_transport_iscsi.c:1636: undefined reference to `netlink_register_notifier' drivers/scsi/scsi_transport_iscsi.c:1640: undefined reference to `netlink_kernel_create' drivers/scsi/scsi_transport_iscsi.c:1652: undefined reference to `sock_release' drivers/scsi/scsi_transport_iscsi.c:1654: undefined reference to `netlink_unregister_notifier' drivers/built-in.o: In function `iscsi_transport_exit': drivers/scsi/scsi_transport_iscsi.c:1669: undefined reference to `sock_release' drivers/scsi/scsi_transport_iscsi.c:1670: undefined reference to `netlink_unregister_notifier' make: *** [.tmp_vmlinux1] Error 1 # # Automatically generated make config: don't edit # Linux kernel version: 2.6.18-rc6-git1 # Thu Sep 7 18:29:08 2006 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # # CONFIG_EXPERIMENTAL is not set CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_SYSCTL=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_RELAY=y CONFIG_INITRAMFS_SOURCE="" CONFIG_UID16=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_RT_MUTEXES=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # # CONFIG_MODULES is not set # # Block layer # CONFIG_LBD=y CONFIG_BLK_DEV_IO_TRACE=y CONFIG_LSF=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_HPET_TIMER=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set # CONFIG_X86_UP_APIC is not set CONFIG_X86_MCE=y # CONFIG_X86_MCE_NONFATAL is not set CONFIG_VM86=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set CONFIG_X86_REBOOTFIXUPS=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y # # Firmware Drivers # # CONFIG_EDD is not set # CONFIG_EFI_VARS is not set # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set CONFIG_NOHIGHMEM=y # CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM64G is not set CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_RESOURCES_64BIT is not set CONFIG_MATH_EMULATION=y # CONFIG_MTRR is not set CONFIG_EFI=y CONFIG_BOOT_IOREMAP=y CONFIG_REGPARM=y CONFIG_SECCOMP=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_PHYSICAL_START=0x100000 # CONFIG_COMPAT_VDSO is not set # # Power management options (ACPI, APM) # CONFIG_PM=y CONFIG_PM_LEGACY=y # CONFIG_PM_DEBUG is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y # CONFIG_ACPI_SLEEP_PROC_SLEEP is not set # CONFIG_ACPI_AC is not set CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_VIDEO=y CONFIG_ACPI_FAN=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set CONFIG_ACPI_BLACKLIST_YEAR=0 CONFIG_ACPI_DEBUG=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y # # APM (Advanced Power Management) BIOS Support # # CONFIG_APM is not set # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # # Bus options (PCI, PCMCIA, EISA, MCA, ISA) # CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set CONFIG_PCI_GOMMCONFIG=y # CONFIG_PCI_GODIRECT is not set # CONFIG_PCI_GOANY is not set CONFIG_PCI_MMCONFIG=y # CONFIG_PCIEPORTBUS is not set CONFIG_PCI_DEBUG=y CONFIG_ISA_DMA_API=y # CONFIG_ISA is not set # CONFIG_MCA is not set CONFIG_SCx200=y CONFIG_SCx200HR_TIMER=y # # PCCARD (PCMCIA/CardBus) support # CONFIG_PCCARD=y CONFIG_PCMCIA_DEBUG=y CONFIG_PCMCIA=y CONFIG_PCMCIA_IOCTL=y # CONFIG_CARDBUS is not set # # PC-card bridges # # CONFIG_YENTA is not set CONFIG_PD6729=y # CONFIG_I82092 is not set CONFIG_PCCARD_NONSTATIC=y # # PCI Hotplug Support # # # Executable file formats # # CONFIG_BINFMT_ELF is not set CONFIG_BINFMT_AOUT=y # CONFIG_BINFMT_MISC is not set # # Networking # # CONFIG_NET is not set # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y CONFIG_DEBUG_DRIVER=y # CONFIG_SYS_HYPERVISOR is not set # # Connector - unified userspace <-> kernelspace linker # # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # CONFIG_PARPORT=y CONFIG_PARPORT_PC=y CONFIG_PARPORT_SERIAL=y # CONFIG_PARPORT_PC_PCMCIA is not set CONFIG_PARPORT_NOT_PC=y # CONFIG_PARPORT_GSC is not set CONFIG_PARPORT_AX88796=y CONFIG_PARPORT_1284=y # # Plug and Play support # # CONFIG_PNP is not set # # Block devices # CONFIG_BLK_DEV_FD=y CONFIG_PARIDE=y CONFIG_PARIDE_PARPORT=y # # Parallel IDE high-level drivers # # CONFIG_PARIDE_PD is not set # CONFIG_PARIDE_PCD is not set # CONFIG_PARIDE_PF is not set # CONFIG_PARIDE_PT is not set CONFIG_PARIDE_PG=y # # Parallel IDE protocol modules # CONFIG_PARIDE_ATEN=y CONFIG_PARIDE_BPCK=y CONFIG_PARIDE_BPCK6=y CONFIG_PARIDE_COMM=y CONFIG_PARIDE_DSTR=y # CONFIG_PARIDE_FIT2 is not set # CONFIG_PARIDE_FIT3 is not set # CONFIG_PARIDE_EPAT is not set CONFIG_PARIDE_EPIA=y CONFIG_PARIDE_FRIQ=y CONFIG_PARIDE_FRPW=y # CONFIG_PARIDE_KBIC is not set CONFIG_PARIDE_KTTI=y CONFIG_PARIDE_ON20=y # CONFIG_PARIDE_ON26 is not set CONFIG_BLK_CPQ_DA=y # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_COW_COMMON is not set # CONFIG_BLK_DEV_LOOP is not set CONFIG_BLK_DEV_SX8=y # CONFIG_BLK_DEV_RAM is not set # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CDROM_PKTCDVD is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set # CONFIG_BLK_DEV_IDECS is not set # CONFIG_BLK_DEV_IDECD is not set CONFIG_BLK_DEV_IDEFLOPPY=y # CONFIG_BLK_DEV_IDESCSI is not set CONFIG_IDE_TASK_IOCTL=y # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y # CONFIG_BLK_DEV_CMD640 is not set CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_OFFBOARD=y CONFIG_BLK_DEV_GENERIC=y CONFIG_BLK_DEV_RZ1000=y CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set CONFIG_BLK_DEV_AEC62XX=y # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set CONFIG_BLK_DEV_ATIIXP=y CONFIG_BLK_DEV_CMD64X=y # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_CS5535 is not set # CONFIG_BLK_DEV_HPT34X is not set CONFIG_BLK_DEV_HPT366=y CONFIG_BLK_DEV_SC1200=y CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_IT821X is not set # CONFIG_BLK_DEV_NS87415 is not set CONFIG_BLK_DEV_PDC202XX_OLD=y # CONFIG_PDC202XX_BURST is not set CONFIG_BLK_DEV_PDC202XX_NEW=y # CONFIG_BLK_DEV_SVWKS is not set CONFIG_BLK_DEV_SIIMAGE=y # CONFIG_BLK_DEV_SIS5513 is not set CONFIG_BLK_DEV_SLC90E66=y CONFIG_BLK_DEV_TRM290=y # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # CONFIG_RAID_ATTRS=y CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_ST=y # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y CONFIG_CHR_DEV_SCH=y # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y # CONFIG_SCSI_CONSTANTS is not set CONFIG_SCSI_LOGGING=y # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y CONFIG_SCSI_ISCSI_ATTRS=y CONFIG_SCSI_SAS_ATTRS=y # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set CONFIG_SCSI_3W_9XXX=y # CONFIG_SCSI_ACARD is not set CONFIG_SCSI_AACRAID=y CONFIG_SCSI_AIC7XXX=y CONFIG_AIC7XXX_CMDS_PER_DEVICE=32 CONFIG_AIC7XXX_RESET_DELAY_MS=5000 # CONFIG_AIC7XXX_DEBUG_ENABLE is not set CONFIG_AIC7XXX_DEBUG_MASK=0 # CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set # CONFIG_SCSI_AIC7XXX_OLD is not set CONFIG_SCSI_AIC79XX=y CONFIG_AIC79XX_CMDS_PER_DEVICE=32 CONFIG_AIC79XX_RESET_DELAY_MS=5000 # CONFIG_AIC79XX_ENABLE_RD_STRM is not set CONFIG_AIC79XX_DEBUG_ENABLE=y CONFIG_AIC79XX_DEBUG_MASK=0 # CONFIG_AIC79XX_REG_PRETTY_PRINT is not set # CONFIG_SCSI_DPT_I2O is not set CONFIG_SCSI_ADVANSYS=y # CONFIG_MEGARAID_NEWGEN is not set CONFIG_MEGARAID_LEGACY=y CONFIG_MEGARAID_SAS=y # CONFIG_SCSI_SATA is not set CONFIG_SCSI_HPTIOP=y CONFIG_SCSI_BUSLOGIC=y # CONFIG_SCSI_OMIT_FLASHPOINT is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set CONFIG_SCSI_FUTURE_DOMAIN=y CONFIG_SCSI_GDTH=y # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set CONFIG_SCSI_PPA=y CONFIG_SCSI_IMM=y # CONFIG_SCSI_IZIP_EPP16 is not set CONFIG_SCSI_IZIP_SLOW_CTR=y # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set CONFIG_SCSI_QLOGIC_1280=y CONFIG_SCSI_QLA_FC=y # CONFIG_SCSI_LPFC is not set CONFIG_SCSI_DC390T=y CONFIG_SCSI_NSP32=y # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # # CONFIG_MD is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # CONFIG_FUSION_SPI is not set # CONFIG_FUSION_FC is not set # CONFIG_FUSION_SAS is not set # # IEEE 1394 (FireWire) support # # CONFIG_IEEE1394 is not set # # I2O device support # CONFIG_I2O=y CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y CONFIG_I2O_EXT_ADAPTEC=y CONFIG_I2O_CONFIG=y CONFIG_I2O_CONFIG_OLD_IOCTL=y CONFIG_I2O_BUS=y CONFIG_I2O_BLOCK=y CONFIG_I2O_SCSI=y CONFIG_I2O_PROC=y # # ISDN subsystem # # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y # CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=y CONFIG_INPUT_EVBUG=y # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set CONFIG_KEYBOARD_LKKBD=y CONFIG_KEYBOARD_XTKBD=y CONFIG_KEYBOARD_NEWTON=y # CONFIG_INPUT_MOUSE is not set CONFIG_INPUT_JOYSTICK=y CONFIG_JOYSTICK_ANALOG=y # CONFIG_JOYSTICK_A3D is not set # CONFIG_JOYSTICK_ADI is not set # CONFIG_JOYSTICK_COBRA is not set CONFIG_JOYSTICK_GF2K=y # CONFIG_JOYSTICK_GRIP is not set CONFIG_JOYSTICK_GRIP_MP=y CONFIG_JOYSTICK_GUILLEMOT=y CONFIG_JOYSTICK_INTERACT=y # CONFIG_JOYSTICK_SIDEWINDER is not set # CONFIG_JOYSTICK_TMDC is not set # CONFIG_JOYSTICK_IFORCE is not set # CONFIG_JOYSTICK_WARRIOR is not set # CONFIG_JOYSTICK_MAGELLAN is not set # CONFIG_JOYSTICK_SPACEORB is not set CONFIG_JOYSTICK_SPACEBALL=y # CONFIG_JOYSTICK_STINGER is not set CONFIG_JOYSTICK_TWIDJOY=y CONFIG_JOYSTICK_DB9=y CONFIG_JOYSTICK_GAMECON=y # CONFIG_JOYSTICK_TURBOGRAFX is not set CONFIG_JOYSTICK_JOYDUMP=y # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y # CONFIG_INPUT_PCSPKR is not set # CONFIG_INPUT_WISTRON_BTNS is not set # CONFIG_INPUT_UINPUT is not set # # Hardware I/O ports # CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set CONFIG_SERIO_PARKBD=y # CONFIG_SERIO_PCIPS2 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set CONFIG_GAMEPORT=y CONFIG_GAMEPORT_NS558=y # CONFIG_GAMEPORT_L4 is not set # CONFIG_GAMEPORT_EMU10K1 is not set CONFIG_GAMEPORT_FM801=y # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_VT_HW_CONSOLE_BINDING=y CONFIG_SERIAL_NONSTANDARD=y # CONFIG_COMPUTONE is not set # CONFIG_ROCKETPORT is not set CONFIG_CYCLADES=y # CONFIG_DIGIEPCA is not set # CONFIG_MOXA_INTELLIO is not set CONFIG_MOXA_SMARTIO=y CONFIG_ISI=y CONFIG_SYNCLINK=y CONFIG_SYNCLINKMP=y CONFIG_SYNCLINK_GT=y CONFIG_N_HDLC=y # CONFIG_RISCOM8 is not set # CONFIG_SPECIALIX is not set CONFIG_SX=y # CONFIG_RIO is not set CONFIG_STALDRV=y # CONFIG_STALLION is not set # CONFIG_ISTALLION is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_PCI=y # CONFIG_SERIAL_8250_CS is not set CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_8250_RUNTIME_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y # CONFIG_LEGACY_PTYS is not set CONFIG_PRINTER=y # CONFIG_LP_CONSOLE is not set # CONFIG_PPDEV is not set # CONFIG_TIPAR is not set # # IPMI # CONFIG_IPMI_HANDLER=y # CONFIG_IPMI_PANIC_EVENT is not set CONFIG_IPMI_DEVICE_INTERFACE=y CONFIG_IPMI_SI=y CONFIG_IPMI_WATCHDOG=y # CONFIG_IPMI_POWEROFF is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set CONFIG_HW_RANDOM=y # CONFIG_HW_RANDOM_INTEL is not set CONFIG_HW_RANDOM_AMD=y CONFIG_HW_RANDOM_GEODE=y CONFIG_HW_RANDOM_VIA=y CONFIG_NVRAM=y # CONFIG_RTC is not set CONFIG_GEN_RTC=y # CONFIG_GEN_RTC_X is not set # CONFIG_DTLK is not set CONFIG_R3964=y # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # # CONFIG_FTAPE is not set # CONFIG_AGP is not set # CONFIG_DRM is not set # # PCMCIA character devices # # CONFIG_SYNCLINK_CS is not set # CONFIG_CARDMAN_4000 is not set # CONFIG_CARDMAN_4040 is not set CONFIG_MWAVE=y # CONFIG_SCx200_GPIO is not set # CONFIG_PC8736x_GPIO is not set # CONFIG_NSC_GPIO is not set CONFIG_CS5535_GPIO=y # CONFIG_RAW_DRIVER is not set CONFIG_HPET=y # CONFIG_HPET_RTC_IRQ is not set # CONFIG_HPET_MMAP is not set # CONFIG_HANGCHECK_TIMER is not set # # TPM devices # # # I2C support # CONFIG_I2C=y CONFIG_I2C_CHARDEV=y # # I2C Algorithms # CONFIG_I2C_ALGOBIT=y CONFIG_I2C_ALGOPCF=y # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_I801 is not set CONFIG_I2C_I810=y CONFIG_I2C_PIIX4=y # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT is not set # CONFIG_I2C_PARPORT_LIGHT is not set CONFIG_I2C_PROSAVAGE=y # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Miscellaneous I2C Chip support # # CONFIG_I2C_DEBUG_CORE is not set CONFIG_I2C_DEBUG_ALGO=y # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # SPI support # # CONFIG_SPI is not set # CONFIG_SPI_MASTER is not set # # Dallas's 1-wire bus # # # Hardware Monitoring support # # CONFIG_HWMON is not set # CONFIG_HWMON_VID is not set # # Misc devices # # # Multimedia devices # # CONFIG_VIDEO_DEV is not set CONFIG_VIDEO_V4L2=y # # Digital Video Broadcasting Devices # # # Graphics support # # CONFIG_FIRMWARE_EDID is not set CONFIG_FB=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y # CONFIG_FB_MACMODES is not set # CONFIG_FB_BACKLIGHT is not set CONFIG_FB_MODE_HELPERS=y CONFIG_FB_TILEBLITTING=y # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set CONFIG_FB_CYBER2000=y CONFIG_FB_ARC=y CONFIG_FB_ASILIANT=y CONFIG_FB_IMSTT=y CONFIG_FB_VGA16=y # CONFIG_FB_VESA is not set # CONFIG_FB_IMAC is not set # CONFIG_FB_HGA is not set CONFIG_FB_S1D13XXX=y # CONFIG_FB_NVIDIA is not set CONFIG_FB_RIVA=y CONFIG_FB_RIVA_I2C=y CONFIG_FB_RIVA_DEBUG=y CONFIG_FB_MATROX=y # CONFIG_FB_MATROX_MILLENIUM is not set CONFIG_FB_MATROX_MYSTIQUE=y CONFIG_FB_MATROX_G=y CONFIG_FB_MATROX_I2C=y # CONFIG_FB_MATROX_MAVEN is not set CONFIG_FB_MATROX_MULTIHEAD=y CONFIG_FB_RADEON=y # CONFIG_FB_RADEON_I2C is not set CONFIG_FB_RADEON_DEBUG=y # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_SIS is not set CONFIG_FB_NEOMAGIC=y CONFIG_FB_KYRO=y CONFIG_FB_3DFX=y # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_CYBLA is not set CONFIG_FB_TRIDENT=y CONFIG_FB_VIRTUAL=y # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_VGACON_SOFT_SCROLLBACK=y CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64 CONFIG_VIDEO_SELECT=y CONFIG_DUMMY_CONSOLE=y # CONFIG_FRAMEBUFFER_CONSOLE is not set # # Logo configuration # # CONFIG_LOGO is not set # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # # Sound # CONFIG_SOUND=y # # Advanced Linux Sound Architecture # CONFIG_SND=y CONFIG_SND_TIMER=y CONFIG_SND_PCM=y CONFIG_SND_HWDEP=y CONFIG_SND_RAWMIDI=y CONFIG_SND_SEQUENCER=y CONFIG_SND_SEQ_DUMMY=y CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=y CONFIG_SND_PCM_OSS=y # CONFIG_SND_PCM_OSS_PLUGINS is not set # CONFIG_SND_SEQUENCER_OSS is not set # CONFIG_SND_DYNAMIC_MINORS is not set CONFIG_SND_SUPPORT_OLD_API=y CONFIG_SND_VERBOSE_PROCFS=y # CONFIG_SND_VERBOSE_PRINTK is not set # CONFIG_SND_DEBUG is not set # # Generic devices # CONFIG_SND_MPU401_UART=y CONFIG_SND_OPL3_LIB=y CONFIG_SND_VX_LIB=y CONFIG_SND_AC97_CODEC=y CONFIG_SND_AC97_BUS=y # CONFIG_SND_DUMMY is not set CONFIG_SND_VIRMIDI=y # CONFIG_SND_MTPAV is not set # CONFIG_SND_SERIAL_U16550 is not set CONFIG_SND_MPU401=y # # PCI devices # # CONFIG_SND_AD1889 is not set CONFIG_SND_ALS300=y CONFIG_SND_ALS4000=y CONFIG_SND_ALI5451=y # CONFIG_SND_ATIIXP is not set CONFIG_SND_ATIIXP_MODEM=y CONFIG_SND_AU8810=y CONFIG_SND_AU8820=y CONFIG_SND_AU8830=y CONFIG_SND_BT87X=y # CONFIG_SND_BT87X_OVERCLOCK is not set # CONFIG_SND_CA0106 is not set CONFIG_SND_CMIPCI=y # CONFIG_SND_CS4281 is not set # CONFIG_SND_CS46XX is not set CONFIG_SND_CS5535AUDIO=y # CONFIG_SND_DARLA20 is not set # CONFIG_SND_GINA20 is not set # CONFIG_SND_LAYLA20 is not set CONFIG_SND_DARLA24=y # CONFIG_SND_GINA24 is not set CONFIG_SND_LAYLA24=y # CONFIG_SND_MONA is not set # CONFIG_SND_MIA is not set # CONFIG_SND_ECHO3G is not set CONFIG_SND_INDIGO=y # CONFIG_SND_INDIGOIO is not set CONFIG_SND_INDIGODJ=y # CONFIG_SND_EMU10K1 is not set CONFIG_SND_EMU10K1X=y # CONFIG_SND_ENS1370 is not set # CONFIG_SND_ENS1371 is not set CONFIG_SND_ES1938=y # CONFIG_SND_ES1968 is not set # CONFIG_SND_FM801 is not set # CONFIG_SND_HDA_INTEL is not set CONFIG_SND_HDSP=y CONFIG_SND_HDSPM=y # CONFIG_SND_ICE1712 is not set CONFIG_SND_ICE1724=y CONFIG_SND_INTEL8X0=y CONFIG_SND_INTEL8X0M=y CONFIG_SND_KORG1212=y CONFIG_SND_MAESTRO3=y # CONFIG_SND_MIXART is not set CONFIG_SND_NM256=y # CONFIG_SND_PCXHR is not set CONFIG_SND_RIPTIDE=y CONFIG_SND_RME32=y CONFIG_SND_RME96=y # CONFIG_SND_RME9652 is not set CONFIG_SND_SONICVIBES=y CONFIG_SND_TRIDENT=y # CONFIG_SND_VIA82XX is not set # CONFIG_SND_VIA82XX_MODEM is not set # CONFIG_SND_VX222 is not set CONFIG_SND_YMFPCI=y # # PCMCIA devices # CONFIG_SND_VXPOCKET=y # CONFIG_SND_PDAUDIOCF is not set # # Open Sound System # # CONFIG_SOUND_PRIME is not set # # USB support # CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y # CONFIG_USB is not set # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' # # # USB Gadget Support # CONFIG_USB_GADGET=y CONFIG_USB_GADGET_DEBUG_FILES=y CONFIG_USB_GADGET_SELECTED=y CONFIG_USB_GADGET_NET2280=y CONFIG_USB_NET2280=y # CONFIG_USB_GADGET_PXA2XX is not set # CONFIG_USB_GADGET_GOKU is not set # CONFIG_USB_GADGET_LH7A40X is not set # CONFIG_USB_GADGET_OMAP is not set # CONFIG_USB_GADGET_AT91 is not set # CONFIG_USB_GADGET_DUMMY_HCD is not set CONFIG_USB_GADGET_DUALSPEED=y # CONFIG_USB_ZERO is not set # CONFIG_USB_ETH is not set # CONFIG_USB_GADGETFS is not set CONFIG_USB_FILE_STORAGE=y CONFIG_USB_FILE_STORAGE_TEST=y # CONFIG_USB_G_SERIAL is not set # # MMC/SD Card support # CONFIG_MMC=y CONFIG_MMC_DEBUG=y # CONFIG_MMC_BLOCK is not set CONFIG_MMC_WBSD=y # # LED devices # # CONFIG_NEW_LEDS is not set # # LED drivers # # # LED Triggers # # # InfiniBand support # CONFIG_INFINIBAND=y CONFIG_INFINIBAND_USER_MAD=y CONFIG_INFINIBAND_USER_ACCESS=y CONFIG_INFINIBAND_MTHCA=y CONFIG_INFINIBAND_MTHCA_DEBUG=y CONFIG_INFINIBAND_SRP=y CONFIG_INFINIBAND_ISER=y # # EDAC - error detection and reporting (RAS) (EXPERIMENTAL) # # # Real Time Clock # # # DMA Engine support # CONFIG_DMA_ENGINE=y # # DMA Clients # # # DMA Devices # CONFIG_INTEL_IOATDMA=y # # File systems # # CONFIG_EXT2_FS is not set CONFIG_EXT3_FS=y # CONFIG_EXT3_FS_XATTR is not set CONFIG_JBD=y CONFIG_JBD_DEBUG=y CONFIG_REISERFS_FS=y # CONFIG_REISERFS_CHECK is not set CONFIG_REISERFS_PROC_INFO=y # CONFIG_REISERFS_FS_XATTR is not set CONFIG_JFS_FS=y CONFIG_JFS_POSIX_ACL=y # CONFIG_JFS_SECURITY is not set # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set CONFIG_FS_POSIX_ACL=y # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_INOTIFY is not set # CONFIG_QUOTA is not set CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=y CONFIG_AUTOFS4_FS=y CONFIG_FUSE_FS=y # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=y CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # # CONFIG_MSDOS_FS is not set # CONFIG_VFAT_FS is not set # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_HFSPLUS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set CONFIG_HPFS_FS=y # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set CONFIG_UFS_FS=y # CONFIG_UFS_DEBUG is not set # # Partition Types # CONFIG_PARTITION_ADVANCED=y CONFIG_ACORN_PARTITION=y # CONFIG_ACORN_PARTITION_CUMANA is not set # CONFIG_ACORN_PARTITION_EESOX is not set # CONFIG_ACORN_PARTITION_ICS is not set # CONFIG_ACORN_PARTITION_ADFS is not set # CONFIG_ACORN_PARTITION_POWERTEC is not set CONFIG_ACORN_PARTITION_RISCIX=y # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set # CONFIG_MAC_PARTITION is not set CONFIG_MSDOS_PARTITION=y CONFIG_BSD_DISKLABEL=y # CONFIG_MINIX_SUBPARTITION is not set CONFIG_SOLARIS_X86_PARTITION=y # CONFIG_UNIXWARE_DISKLABEL is not set # CONFIG_LDM_PARTITION is not set CONFIG_SGI_PARTITION=y # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set # CONFIG_KARMA_PARTITION is not set CONFIG_EFI_PARTITION=y # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" # CONFIG_NLS_CODEPAGE_437 is not set CONFIG_NLS_CODEPAGE_737=y CONFIG_NLS_CODEPAGE_775=y # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set CONFIG_NLS_CODEPAGE_855=y CONFIG_NLS_CODEPAGE_857=y CONFIG_NLS_CODEPAGE_860=y CONFIG_NLS_CODEPAGE_861=y CONFIG_NLS_CODEPAGE_862=y # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set CONFIG_NLS_CODEPAGE_865=y # CONFIG_NLS_CODEPAGE_866 is not set CONFIG_NLS_CODEPAGE_869=y # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set CONFIG_NLS_CODEPAGE_932=y # CONFIG_NLS_CODEPAGE_949 is not set CONFIG_NLS_CODEPAGE_874=y CONFIG_NLS_ISO8859_8=y # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set # CONFIG_NLS_ASCII is not set # CONFIG_NLS_ISO8859_1 is not set CONFIG_NLS_ISO8859_2=y # CONFIG_NLS_ISO8859_3 is not set CONFIG_NLS_ISO8859_4=y CONFIG_NLS_ISO8859_5=y # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set CONFIG_NLS_ISO8859_13=y # CONFIG_NLS_ISO8859_14 is not set CONFIG_NLS_ISO8859_15=y # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set # CONFIG_NLS_UTF8 is not set # # Kernel hacking # CONFIG_TRACE_IRQFLAGS_SUPPORT=y # CONFIG_PRINTK_TIME is not set CONFIG_MAGIC_SYSRQ=y CONFIG_UNUSED_SYMBOLS=y CONFIG_DEBUG_KERNEL=y CONFIG_LOG_BUF_SHIFT=15 CONFIG_DETECT_SOFTLOCKUP=y CONFIG_SCHEDSTATS=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SLAB_LEAK=y # CONFIG_DEBUG_RT_MUTEXES is not set # CONFIG_RT_MUTEX_TESTER is not set CONFIG_DEBUG_SPINLOCK=y # CONFIG_DEBUG_MUTEXES is not set # CONFIG_DEBUG_RWSEMS is not set # CONFIG_DEBUG_LOCK_ALLOC is not set # CONFIG_PROVE_LOCKING is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set CONFIG_DEBUG_KOBJECT=y CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_VM=y CONFIG_FRAME_POINTER=y # CONFIG_UNWIND_INFO is not set # CONFIG_FORCED_INLINING is not set CONFIG_RCU_TORTURE_TEST=y CONFIG_EARLY_PRINTK=y CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUG_RODATA is not set CONFIG_4KSTACKS=y CONFIG_DOUBLEFAULT=y # # Security options # CONFIG_KEYS=y # CONFIG_KEYS_DEBUG_PROC_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y # CONFIG_CRYPTO_NULL is not set CONFIG_CRYPTO_MD4=y CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y # CONFIG_CRYPTO_SHA256 is not set CONFIG_CRYPTO_SHA512=y # CONFIG_CRYPTO_WP512 is not set # CONFIG_CRYPTO_TGR192 is not set # CONFIG_CRYPTO_DES is not set # CONFIG_CRYPTO_BLOWFISH is not set CONFIG_CRYPTO_TWOFISH=y CONFIG_CRYPTO_SERPENT=y CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_AES_586=y CONFIG_CRYPTO_CAST5=y # CONFIG_CRYPTO_CAST6 is not set # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set CONFIG_CRYPTO_KHAZAD=y # CONFIG_CRYPTO_ANUBIS is not set CONFIG_CRYPTO_DEFLATE=y CONFIG_CRYPTO_MICHAEL_MIC=y # CONFIG_CRYPTO_CRC32C is not set # # Hardware crypto devices # # CONFIG_CRYPTO_DEV_PADLOCK is not set # # Library routines # CONFIG_CRC_CCITT=y # CONFIG_CRC16 is not set CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y CONFIG_PLIST=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_KTIME_SCALAR=y ------------------------------------------------------- -- MfG/Sincerely Toralf Förster -------------- next part -------------- The compile test of the attached .config failed : ... drivers/built-in.o: In function `iser_connect': drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' drivers/infiniband/ulp/iser/iser_verbs.c:525: undefined reference to `rdma_resolve_addr' drivers/built-in.o: In function `iscsi_transport_init': drivers/scsi/scsi_transport_iscsi.c:1636: undefined reference to `netlink_register_notifier' drivers/scsi/scsi_transport_iscsi.c:1640: undefined reference to `netlink_kernel_create' drivers/scsi/scsi_transport_iscsi.c:1652: undefined reference to `sock_release' drivers/scsi/scsi_transport_iscsi.c:1654: undefined reference to `netlink_unregister_notifier' drivers/built-in.o: In function `iscsi_transport_exit': drivers/scsi/scsi_transport_iscsi.c:1669: undefined reference to `sock_release' drivers/scsi/scsi_transport_iscsi.c:1670: undefined reference to `netlink_unregister_notifier' make: *** [.tmp_vmlinux1] Error 1 # # Automatically generated make config: don't edit # Linux kernel version: 2.6.18-rc6-git1 # Thu Sep 7 18:29:08 2006 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # # CONFIG_EXPERIMENTAL is not set CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_SYSCTL=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_RELAY=y CONFIG_INITRAMFS_SOURCE="" CONFIG_UID16=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_RT_MUTEXES=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # # CONFIG_MODULES is not set # # Block layer # CONFIG_LBD=y CONFIG_BLK_DEV_IO_TRACE=y CONFIG_LSF=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_HPET_TIMER=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set # CONFIG_X86_UP_APIC is not set CONFIG_X86_MCE=y # CONFIG_X86_MCE_NONFATAL is not set CONFIG_VM86=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set CONFIG_X86_REBOOTFIXUPS=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y # # Firmware Drivers # # CONFIG_EDD is not set # CONFIG_EFI_VARS is not set # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set CONFIG_NOHIGHMEM=y # CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM64G is not set CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_RESOURCES_64BIT is not set CONFIG_MATH_EMULATION=y # CONFIG_MTRR is not set CONFIG_EFI=y CONFIG_BOOT_IOREMAP=y CONFIG_REGPARM=y CONFIG_SECCOMP=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_PHYSICAL_START=0x100000 # CONFIG_COMPAT_VDSO is not set # # Power management options (ACPI, APM) # CONFIG_PM=y CONFIG_PM_LEGACY=y # CONFIG_PM_DEBUG is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y # CONFIG_ACPI_SLEEP_PROC_SLEEP is not set # CONFIG_ACPI_AC is not set CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_VIDEO=y CONFIG_ACPI_FAN=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set CONFIG_ACPI_BLACKLIST_YEAR=0 CONFIG_ACPI_DEBUG=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y # # APM (Advanced Power Management) BIOS Support # # CONFIG_APM is not set # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # # Bus options (PCI, PCMCIA, EISA, MCA, ISA) # CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set CONFIG_PCI_GOMMCONFIG=y # CONFIG_PCI_GODIRECT is not set # CONFIG_PCI_GOANY is not set CONFIG_PCI_MMCONFIG=y # CONFIG_PCIEPORTBUS is not set CONFIG_PCI_DEBUG=y CONFIG_ISA_DMA_API=y # CONFIG_ISA is not set # CONFIG_MCA is not set CONFIG_SCx200=y CONFIG_SCx200HR_TIMER=y # # PCCARD (PCMCIA/CardBus) support # CONFIG_PCCARD=y CONFIG_PCMCIA_DEBUG=y CONFIG_PCMCIA=y CONFIG_PCMCIA_IOCTL=y # CONFIG_CARDBUS is not set # # PC-card bridges # # CONFIG_YENTA is not set CONFIG_PD6729=y # CONFIG_I82092 is not set CONFIG_PCCARD_NONSTATIC=y # # PCI Hotplug Support # # # Executable file formats # # CONFIG_BINFMT_ELF is not set CONFIG_BINFMT_AOUT=y # CONFIG_BINFMT_MISC is not set # # Networking # # CONFIG_NET is not set # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y CONFIG_DEBUG_DRIVER=y # CONFIG_SYS_HYPERVISOR is not set # # Connector - unified userspace <-> kernelspace linker # # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # CONFIG_PARPORT=y CONFIG_PARPORT_PC=y CONFIG_PARPORT_SERIAL=y # CONFIG_PARPORT_PC_PCMCIA is not set CONFIG_PARPORT_NOT_PC=y # CONFIG_PARPORT_GSC is not set CONFIG_PARPORT_AX88796=y CONFIG_PARPORT_1284=y # # Plug and Play support # # CONFIG_PNP is not set # # Block devices # CONFIG_BLK_DEV_FD=y CONFIG_PARIDE=y CONFIG_PARIDE_PARPORT=y # # Parallel IDE high-level drivers # # CONFIG_PARIDE_PD is not set # CONFIG_PARIDE_PCD is not set # CONFIG_PARIDE_PF is not set # CONFIG_PARIDE_PT is not set CONFIG_PARIDE_PG=y # # Parallel IDE protocol modules # CONFIG_PARIDE_ATEN=y CONFIG_PARIDE_BPCK=y CONFIG_PARIDE_BPCK6=y CONFIG_PARIDE_COMM=y CONFIG_PARIDE_DSTR=y # CONFIG_PARIDE_FIT2 is not set # CONFIG_PARIDE_FIT3 is not set # CONFIG_PARIDE_EPAT is not set CONFIG_PARIDE_EPIA=y CONFIG_PARIDE_FRIQ=y CONFIG_PARIDE_FRPW=y # CONFIG_PARIDE_KBIC is not set CONFIG_PARIDE_KTTI=y CONFIG_PARIDE_ON20=y # CONFIG_PARIDE_ON26 is not set CONFIG_BLK_CPQ_DA=y # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_COW_COMMON is not set # CONFIG_BLK_DEV_LOOP is not set CONFIG_BLK_DEV_SX8=y # CONFIG_BLK_DEV_RAM is not set # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CDROM_PKTCDVD is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set # CONFIG_BLK_DEV_IDECS is not set # CONFIG_BLK_DEV_IDECD is not set CONFIG_BLK_DEV_IDEFLOPPY=y # CONFIG_BLK_DEV_IDESCSI is not set CONFIG_IDE_TASK_IOCTL=y # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y # CONFIG_BLK_DEV_CMD640 is not set CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_OFFBOARD=y CONFIG_BLK_DEV_GENERIC=y CONFIG_BLK_DEV_RZ1000=y CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set CONFIG_BLK_DEV_AEC62XX=y # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set CONFIG_BLK_DEV_ATIIXP=y CONFIG_BLK_DEV_CMD64X=y # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_CS5535 is not set # CONFIG_BLK_DEV_HPT34X is not set CONFIG_BLK_DEV_HPT366=y CONFIG_BLK_DEV_SC1200=y CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_IT821X is not set # CONFIG_BLK_DEV_NS87415 is not set CONFIG_BLK_DEV_PDC202XX_OLD=y # CONFIG_PDC202XX_BURST is not set CONFIG_BLK_DEV_PDC202XX_NEW=y # CONFIG_BLK_DEV_SVWKS is not set CONFIG_BLK_DEV_SIIMAGE=y # CONFIG_BLK_DEV_SIS5513 is not set CONFIG_BLK_DEV_SLC90E66=y CONFIG_BLK_DEV_TRM290=y # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # CONFIG_RAID_ATTRS=y CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_ST=y # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y CONFIG_CHR_DEV_SCH=y # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y # CONFIG_SCSI_CONSTANTS is not set CONFIG_SCSI_LOGGING=y # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y CONFIG_SCSI_ISCSI_ATTRS=y CONFIG_SCSI_SAS_ATTRS=y # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set CONFIG_SCSI_3W_9XXX=y # CONFIG_SCSI_ACARD is not set CONFIG_SCSI_AACRAID=y CONFIG_SCSI_AIC7XXX=y CONFIG_AIC7XXX_CMDS_PER_DEVICE=32 CONFIG_AIC7XXX_RESET_DELAY_MS=5000 # CONFIG_AIC7XXX_DEBUG_ENABLE is not set CONFIG_AIC7XXX_DEBUG_MASK=0 # CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set # CONFIG_SCSI_AIC7XXX_OLD is not set CONFIG_SCSI_AIC79XX=y CONFIG_AIC79XX_CMDS_PER_DEVICE=32 CONFIG_AIC79XX_RESET_DELAY_MS=5000 # CONFIG_AIC79XX_ENABLE_RD_STRM is not set CONFIG_AIC79XX_DEBUG_ENABLE=y CONFIG_AIC79XX_DEBUG_MASK=0 # CONFIG_AIC79XX_REG_PRETTY_PRINT is not set # CONFIG_SCSI_DPT_I2O is not set CONFIG_SCSI_ADVANSYS=y # CONFIG_MEGARAID_NEWGEN is not set CONFIG_MEGARAID_LEGACY=y CONFIG_MEGARAID_SAS=y # CONFIG_SCSI_SATA is not set CONFIG_SCSI_HPTIOP=y CONFIG_SCSI_BUSLOGIC=y # CONFIG_SCSI_OMIT_FLASHPOINT is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set CONFIG_SCSI_FUTURE_DOMAIN=y CONFIG_SCSI_GDTH=y # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set CONFIG_SCSI_PPA=y CONFIG_SCSI_IMM=y # CONFIG_SCSI_IZIP_EPP16 is not set CONFIG_SCSI_IZIP_SLOW_CTR=y # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set CONFIG_SCSI_QLOGIC_1280=y CONFIG_SCSI_QLA_FC=y # CONFIG_SCSI_LPFC is not set CONFIG_SCSI_DC390T=y CONFIG_SCSI_NSP32=y # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # # CONFIG_MD is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # CONFIG_FUSION_SPI is not set # CONFIG_FUSION_FC is not set # CONFIG_FUSION_SAS is not set # # IEEE 1394 (FireWire) support # # CONFIG_IEEE1394 is not set # # I2O device support # CONFIG_I2O=y CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y CONFIG_I2O_EXT_ADAPTEC=y CONFIG_I2O_CONFIG=y CONFIG_I2O_CONFIG_OLD_IOCTL=y CONFIG_I2O_BUS=y CONFIG_I2O_BLOCK=y CONFIG_I2O_SCSI=y CONFIG_I2O_PROC=y # # ISDN subsystem # # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y # CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=y CONFIG_INPUT_EVBUG=y # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set CONFIG_KEYBOARD_LKKBD=y CONFIG_KEYBOARD_XTKBD=y CONFIG_KEYBOARD_NEWTON=y # CONFIG_INPUT_MOUSE is not set CONFIG_INPUT_JOYSTICK=y CONFIG_JOYSTICK_ANALOG=y # CONFIG_JOYSTICK_A3D is not set # CONFIG_JOYSTICK_ADI is not set # CONFIG_JOYSTICK_COBRA is not set CONFIG_JOYSTICK_GF2K=y # CONFIG_JOYSTICK_GRIP is not set CONFIG_JOYSTICK_GRIP_MP=y CONFIG_JOYSTICK_GUILLEMOT=y CONFIG_JOYSTICK_INTERACT=y # CONFIG_JOYSTICK_SIDEWINDER is not set # CONFIG_JOYSTICK_TMDC is not set # CONFIG_JOYSTICK_IFORCE is not set # CONFIG_JOYSTICK_WARRIOR is not set # CONFIG_JOYSTICK_MAGELLAN is not set # CONFIG_JOYSTICK_SPACEORB is not set CONFIG_JOYSTICK_SPACEBALL=y # CONFIG_JOYSTICK_STINGER is not set CONFIG_JOYSTICK_TWIDJOY=y CONFIG_JOYSTICK_DB9=y CONFIG_JOYSTICK_GAMECON=y # CONFIG_JOYSTICK_TURBOGRAFX is not set CONFIG_JOYSTICK_JOYDUMP=y # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y # CONFIG_INPUT_PCSPKR is not set # CONFIG_INPUT_WISTRON_BTNS is not set # CONFIG_INPUT_UINPUT is not set # # Hardware I/O ports # CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set CONFIG_SERIO_PARKBD=y # CONFIG_SERIO_PCIPS2 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set CONFIG_GAMEPORT=y CONFIG_GAMEPORT_NS558=y # CONFIG_GAMEPORT_L4 is not set # CONFIG_GAMEPORT_EMU10K1 is not set CONFIG_GAMEPORT_FM801=y # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_VT_HW_CONSOLE_BINDING=y CONFIG_SERIAL_NONSTANDARD=y # CONFIG_COMPUTONE is not set # CONFIG_ROCKETPORT is not set CONFIG_CYCLADES=y # CONFIG_DIGIEPCA is not set # CONFIG_MOXA_INTELLIO is not set CONFIG_MOXA_SMARTIO=y CONFIG_ISI=y CONFIG_SYNCLINK=y CONFIG_SYNCLINKMP=y CONFIG_SYNCLINK_GT=y CONFIG_N_HDLC=y # CONFIG_RISCOM8 is not set # CONFIG_SPECIALIX is not set CONFIG_SX=y # CONFIG_RIO is not set CONFIG_STALDRV=y # CONFIG_STALLION is not set # CONFIG_ISTALLION is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_PCI=y # CONFIG_SERIAL_8250_CS is not set CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_8250_RUNTIME_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y # CONFIG_LEGACY_PTYS is not set CONFIG_PRINTER=y # CONFIG_LP_CONSOLE is not set # CONFIG_PPDEV is not set # CONFIG_TIPAR is not set # # IPMI # CONFIG_IPMI_HANDLER=y # CONFIG_IPMI_PANIC_EVENT is not set CONFIG_IPMI_DEVICE_INTERFACE=y CONFIG_IPMI_SI=y CONFIG_IPMI_WATCHDOG=y # CONFIG_IPMI_POWEROFF is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set CONFIG_HW_RANDOM=y # CONFIG_HW_RANDOM_INTEL is not set CONFIG_HW_RANDOM_AMD=y CONFIG_HW_RANDOM_GEODE=y CONFIG_HW_RANDOM_VIA=y CONFIG_NVRAM=y # CONFIG_RTC is not set CONFIG_GEN_RTC=y # CONFIG_GEN_RTC_X is not set # CONFIG_DTLK is not set CONFIG_R3964=y # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # # CONFIG_FTAPE is not set # CONFIG_AGP is not set # CONFIG_DRM is not set # # PCMCIA character devices # # CONFIG_SYNCLINK_CS is not set # CONFIG_CARDMAN_4000 is not set # CONFIG_CARDMAN_4040 is not set CONFIG_MWAVE=y # CONFIG_SCx200_GPIO is not set # CONFIG_PC8736x_GPIO is not set # CONFIG_NSC_GPIO is not set CONFIG_CS5535_GPIO=y # CONFIG_RAW_DRIVER is not set CONFIG_HPET=y # CONFIG_HPET_RTC_IRQ is not set # CONFIG_HPET_MMAP is not set # CONFIG_HANGCHECK_TIMER is not set # # TPM devices # # # I2C support # CONFIG_I2C=y CONFIG_I2C_CHARDEV=y # # I2C Algorithms # CONFIG_I2C_ALGOBIT=y CONFIG_I2C_ALGOPCF=y # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_I801 is not set CONFIG_I2C_I810=y CONFIG_I2C_PIIX4=y # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT is not set # CONFIG_I2C_PARPORT_LIGHT is not set CONFIG_I2C_PROSAVAGE=y # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Miscellaneous I2C Chip support # # CONFIG_I2C_DEBUG_CORE is not set CONFIG_I2C_DEBUG_ALGO=y # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # SPI support # # CONFIG_SPI is not set # CONFIG_SPI_MASTER is not set # # Dallas's 1-wire bus # # # Hardware Monitoring support # # CONFIG_HWMON is not set # CONFIG_HWMON_VID is not set # # Misc devices # # # Multimedia devices # # CONFIG_VIDEO_DEV is not set CONFIG_VIDEO_V4L2=y # # Digital Video Broadcasting Devices # # # Graphics support # # CONFIG_FIRMWARE_EDID is not set CONFIG_FB=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y # CONFIG_FB_MACMODES is not set # CONFIG_FB_BACKLIGHT is not set CONFIG_FB_MODE_HELPERS=y CONFIG_FB_TILEBLITTING=y # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set CONFIG_FB_CYBER2000=y CONFIG_FB_ARC=y CONFIG_FB_ASILIANT=y CONFIG_FB_IMSTT=y CONFIG_FB_VGA16=y # CONFIG_FB_VESA is not set # CONFIG_FB_IMAC is not set # CONFIG_FB_HGA is not set CONFIG_FB_S1D13XXX=y # CONFIG_FB_NVIDIA is not set CONFIG_FB_RIVA=y CONFIG_FB_RIVA_I2C=y CONFIG_FB_RIVA_DEBUG=y CONFIG_FB_MATROX=y # CONFIG_FB_MATROX_MILLENIUM is not set CONFIG_FB_MATROX_MYSTIQUE=y CONFIG_FB_MATROX_G=y CONFIG_FB_MATROX_I2C=y # CONFIG_FB_MATROX_MAVEN is not set CONFIG_FB_MATROX_MULTIHEAD=y CONFIG_FB_RADEON=y # CONFIG_FB_RADEON_I2C is not set CONFIG_FB_RADEON_DEBUG=y # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_SIS is not set CONFIG_FB_NEOMAGIC=y CONFIG_FB_KYRO=y CONFIG_FB_3DFX=y # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_CYBLA is not set CONFIG_FB_TRIDENT=y CONFIG_FB_VIRTUAL=y # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_VGACON_SOFT_SCROLLBACK=y CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64 CONFIG_VIDEO_SELECT=y CONFIG_DUMMY_CONSOLE=y # CONFIG_FRAMEBUFFER_CONSOLE is not set # # Logo configuration # # CONFIG_LOGO is not set # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # # Sound # CONFIG_SOUND=y # # Advanced Linux Sound Architecture # CONFIG_SND=y CONFIG_SND_TIMER=y CONFIG_SND_PCM=y CONFIG_SND_HWDEP=y CONFIG_SND_RAWMIDI=y CONFIG_SND_SEQUENCER=y CONFIG_SND_SEQ_DUMMY=y CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=y CONFIG_SND_PCM_OSS=y # CONFIG_SND_PCM_OSS_PLUGINS is not set # CONFIG_SND_SEQUENCER_OSS is not set # CONFIG_SND_DYNAMIC_MINORS is not set CONFIG_SND_SUPPORT_OLD_API=y CONFIG_SND_VERBOSE_PROCFS=y # CONFIG_SND_VERBOSE_PRINTK is not set # CONFIG_SND_DEBUG is not set # # Generic devices # CONFIG_SND_MPU401_UART=y CONFIG_SND_OPL3_LIB=y CONFIG_SND_VX_LIB=y CONFIG_SND_AC97_CODEC=y CONFIG_SND_AC97_BUS=y # CONFIG_SND_DUMMY is not set CONFIG_SND_VIRMIDI=y # CONFIG_SND_MTPAV is not set # CONFIG_SND_SERIAL_U16550 is not set CONFIG_SND_MPU401=y # # PCI devices # # CONFIG_SND_AD1889 is not set CONFIG_SND_ALS300=y CONFIG_SND_ALS4000=y CONFIG_SND_ALI5451=y # CONFIG_SND_ATIIXP is not set CONFIG_SND_ATIIXP_MODEM=y CONFIG_SND_AU8810=y CONFIG_SND_AU8820=y CONFIG_SND_AU8830=y CONFIG_SND_BT87X=y # CONFIG_SND_BT87X_OVERCLOCK is not set # CONFIG_SND_CA0106 is not set CONFIG_SND_CMIPCI=y # CONFIG_SND_CS4281 is not set # CONFIG_SND_CS46XX is not set CONFIG_SND_CS5535AUDIO=y # CONFIG_SND_DARLA20 is not set # CONFIG_SND_GINA20 is not set # CONFIG_SND_LAYLA20 is not set CONFIG_SND_DARLA24=y # CONFIG_SND_GINA24 is not set CONFIG_SND_LAYLA24=y # CONFIG_SND_MONA is not set # CONFIG_SND_MIA is not set # CONFIG_SND_ECHO3G is not set CONFIG_SND_INDIGO=y # CONFIG_SND_INDIGOIO is not set CONFIG_SND_INDIGODJ=y # CONFIG_SND_EMU10K1 is not set CONFIG_SND_EMU10K1X=y # CONFIG_SND_ENS1370 is not set # CONFIG_SND_ENS1371 is not set CONFIG_SND_ES1938=y # CONFIG_SND_ES1968 is not set # CONFIG_SND_FM801 is not set # CONFIG_SND_HDA_INTEL is not set CONFIG_SND_HDSP=y CONFIG_SND_HDSPM=y # CONFIG_SND_ICE1712 is not set CONFIG_SND_ICE1724=y CONFIG_SND_INTEL8X0=y CONFIG_SND_INTEL8X0M=y CONFIG_SND_KORG1212=y CONFIG_SND_MAESTRO3=y # CONFIG_SND_MIXART is not set CONFIG_SND_NM256=y # CONFIG_SND_PCXHR is not set CONFIG_SND_RIPTIDE=y CONFIG_SND_RME32=y CONFIG_SND_RME96=y # CONFIG_SND_RME9652 is not set CONFIG_SND_SONICVIBES=y CONFIG_SND_TRIDENT=y # CONFIG_SND_VIA82XX is not set # CONFIG_SND_VIA82XX_MODEM is not set # CONFIG_SND_VX222 is not set CONFIG_SND_YMFPCI=y # # PCMCIA devices # CONFIG_SND_VXPOCKET=y # CONFIG_SND_PDAUDIOCF is not set # # Open Sound System # # CONFIG_SOUND_PRIME is not set # # USB support # CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y # CONFIG_USB is not set # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' # # # USB Gadget Support # CONFIG_USB_GADGET=y CONFIG_USB_GADGET_DEBUG_FILES=y CONFIG_USB_GADGET_SELECTED=y CONFIG_USB_GADGET_NET2280=y CONFIG_USB_NET2280=y # CONFIG_USB_GADGET_PXA2XX is not set # CONFIG_USB_GADGET_GOKU is not set # CONFIG_USB_GADGET_LH7A40X is not set # CONFIG_USB_GADGET_OMAP is not set # CONFIG_USB_GADGET_AT91 is not set # CONFIG_USB_GADGET_DUMMY_HCD is not set CONFIG_USB_GADGET_DUALSPEED=y # CONFIG_USB_ZERO is not set # CONFIG_USB_ETH is not set # CONFIG_USB_GADGETFS is not set CONFIG_USB_FILE_STORAGE=y CONFIG_USB_FILE_STORAGE_TEST=y # CONFIG_USB_G_SERIAL is not set # # MMC/SD Card support # CONFIG_MMC=y CONFIG_MMC_DEBUG=y # CONFIG_MMC_BLOCK is not set CONFIG_MMC_WBSD=y # # LED devices # # CONFIG_NEW_LEDS is not set # # LED drivers # # # LED Triggers # # # InfiniBand support # CONFIG_INFINIBAND=y CONFIG_INFINIBAND_USER_MAD=y CONFIG_INFINIBAND_USER_ACCESS=y CONFIG_INFINIBAND_MTHCA=y CONFIG_INFINIBAND_MTHCA_DEBUG=y CONFIG_INFINIBAND_SRP=y CONFIG_INFINIBAND_ISER=y # # EDAC - error detection and reporting (RAS) (EXPERIMENTAL) # # # Real Time Clock # # # DMA Engine support # CONFIG_DMA_ENGINE=y # # DMA Clients # # # DMA Devices # CONFIG_INTEL_IOATDMA=y # # File systems # # CONFIG_EXT2_FS is not set CONFIG_EXT3_FS=y # CONFIG_EXT3_FS_XATTR is not set CONFIG_JBD=y CONFIG_JBD_DEBUG=y CONFIG_REISERFS_FS=y # CONFIG_REISERFS_CHECK is not set CONFIG_REISERFS_PROC_INFO=y # CONFIG_REISERFS_FS_XATTR is not set CONFIG_JFS_FS=y CONFIG_JFS_POSIX_ACL=y # CONFIG_JFS_SECURITY is not set # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set CONFIG_FS_POSIX_ACL=y # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_INOTIFY is not set # CONFIG_QUOTA is not set CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=y CONFIG_AUTOFS4_FS=y CONFIG_FUSE_FS=y # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=y CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # # CONFIG_MSDOS_FS is not set # CONFIG_VFAT_FS is not set # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_HFSPLUS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set CONFIG_HPFS_FS=y # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set CONFIG_UFS_FS=y # CONFIG_UFS_DEBUG is not set # # Partition Types # CONFIG_PARTITION_ADVANCED=y CONFIG_ACORN_PARTITION=y # CONFIG_ACORN_PARTITION_CUMANA is not set # CONFIG_ACORN_PARTITION_EESOX is not set # CONFIG_ACORN_PARTITION_ICS is not set # CONFIG_ACORN_PARTITION_ADFS is not set # CONFIG_ACORN_PARTITION_POWERTEC is not set CONFIG_ACORN_PARTITION_RISCIX=y # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set # CONFIG_MAC_PARTITION is not set CONFIG_MSDOS_PARTITION=y CONFIG_BSD_DISKLABEL=y # CONFIG_MINIX_SUBPARTITION is not set CONFIG_SOLARIS_X86_PARTITION=y # CONFIG_UNIXWARE_DISKLABEL is not set # CONFIG_LDM_PARTITION is not set CONFIG_SGI_PARTITION=y # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set # CONFIG_KARMA_PARTITION is not set CONFIG_EFI_PARTITION=y # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" # CONFIG_NLS_CODEPAGE_437 is not set CONFIG_NLS_CODEPAGE_737=y CONFIG_NLS_CODEPAGE_775=y # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set CONFIG_NLS_CODEPAGE_855=y CONFIG_NLS_CODEPAGE_857=y CONFIG_NLS_CODEPAGE_860=y CONFIG_NLS_CODEPAGE_861=y CONFIG_NLS_CODEPAGE_862=y # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set CONFIG_NLS_CODEPAGE_865=y # CONFIG_NLS_CODEPAGE_866 is not set CONFIG_NLS_CODEPAGE_869=y # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set CONFIG_NLS_CODEPAGE_932=y # CONFIG_NLS_CODEPAGE_949 is not set CONFIG_NLS_CODEPAGE_874=y CONFIG_NLS_ISO8859_8=y # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set # CONFIG_NLS_ASCII is not set # CONFIG_NLS_ISO8859_1 is not set CONFIG_NLS_ISO8859_2=y # CONFIG_NLS_ISO8859_3 is not set CONFIG_NLS_ISO8859_4=y CONFIG_NLS_ISO8859_5=y # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set CONFIG_NLS_ISO8859_13=y # CONFIG_NLS_ISO8859_14 is not set CONFIG_NLS_ISO8859_15=y # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set # CONFIG_NLS_UTF8 is not set # # Kernel hacking # CONFIG_TRACE_IRQFLAGS_SUPPORT=y # CONFIG_PRINTK_TIME is not set CONFIG_MAGIC_SYSRQ=y CONFIG_UNUSED_SYMBOLS=y CONFIG_DEBUG_KERNEL=y CONFIG_LOG_BUF_SHIFT=15 CONFIG_DETECT_SOFTLOCKUP=y CONFIG_SCHEDSTATS=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SLAB_LEAK=y # CONFIG_DEBUG_RT_MUTEXES is not set # CONFIG_RT_MUTEX_TESTER is not set CONFIG_DEBUG_SPINLOCK=y # CONFIG_DEBUG_MUTEXES is not set # CONFIG_DEBUG_RWSEMS is not set # CONFIG_DEBUG_LOCK_ALLOC is not set # CONFIG_PROVE_LOCKING is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set CONFIG_DEBUG_KOBJECT=y CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_VM=y CONFIG_FRAME_POINTER=y # CONFIG_UNWIND_INFO is not set # CONFIG_FORCED_INLINING is not set CONFIG_RCU_TORTURE_TEST=y CONFIG_EARLY_PRINTK=y CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUG_RODATA is not set CONFIG_4KSTACKS=y CONFIG_DOUBLEFAULT=y # # Security options # CONFIG_KEYS=y # CONFIG_KEYS_DEBUG_PROC_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y # CONFIG_CRYPTO_NULL is not set CONFIG_CRYPTO_MD4=y CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y # CONFIG_CRYPTO_SHA256 is not set CONFIG_CRYPTO_SHA512=y # CONFIG_CRYPTO_WP512 is not set # CONFIG_CRYPTO_TGR192 is not set # CONFIG_CRYPTO_DES is not set # CONFIG_CRYPTO_BLOWFISH is not set CONFIG_CRYPTO_TWOFISH=y CONFIG_CRYPTO_SERPENT=y CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_AES_586=y CONFIG_CRYPTO_CAST5=y # CONFIG_CRYPTO_CAST6 is not set # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set CONFIG_CRYPTO_KHAZAD=y # CONFIG_CRYPTO_ANUBIS is not set CONFIG_CRYPTO_DEFLATE=y CONFIG_CRYPTO_MICHAEL_MIC=y # CONFIG_CRYPTO_CRC32C is not set # # Hardware crypto devices # # CONFIG_CRYPTO_DEV_PADLOCK is not set # # Library routines # CONFIG_CRC_CCITT=y # CONFIG_CRC16 is not set CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y CONFIG_PLIST=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_KTIME_SCALAR=y -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From ardavis at ichips.intel.com Thu Sep 7 11:06:55 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 07 Sep 2006 11:06:55 -0700 Subject: [openib-general] missing dtest program evdtest.c In-Reply-To: <1157644245.28308.55.camel@stevo-desktop> References: <1157644245.28308.55.camel@stevo-desktop> Message-ID: <45005FBF.5080902@ichips.intel.com> Steve Wise wrote: >Is dapl/test/dtest missing evdtest.c? Its in the makefile... > >Steve. > > > It was inadvertently included with the last update when I was testing the fix for dat_evd_set_unwaitable. I will update the makefile. From or.gerlitz at gmail.com Thu Sep 7 12:52:54 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 7 Sep 2006 21:52:54 +0200 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <200609071902.57379.toralf.foerster@gmx.de> References: <200609071902.57379.toralf.foerster@gmx.de> Message-ID: <15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com> On 9/7/06, Toralf Förster wrote: > The compile test of the attached .config failed : > ... > > drivers/built-in.o: In function `iser_connect': > drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' > drivers/infiniband/ulp/iser/iser_verbs.c:525: undefined reference to `rdma_resolve_addr' > drivers/built-in.o: In function `iscsi_transport_init': > drivers/scsi/scsi_transport_iscsi.c:1636: undefined reference to `netlink_register_notifier' > drivers/scsi/scsi_transport_iscsi.c:1640: undefined reference to `netlink_kernel_create' > drivers/scsi/scsi_transport_iscsi.c:1652: undefined reference to `sock_release' > drivers/scsi/scsi_transport_iscsi.c:1654: undefined reference to `netlink_unregister_notifier' > drivers/built-in.o: In function `iscsi_transport_exit': > drivers/scsi/scsi_transport_iscsi.c:1669: undefined reference to `sock_release' > drivers/scsi/scsi_transport_iscsi.c:1670: undefined reference to `netlink_unregister_notifier' > make: *** [.tmp_vmlinux1] Error 1 you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i think you are missing CONFIG_INET=m Or. From tziporet at mellanox.co.il Thu Sep 7 13:01:30 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 7 Sep 2006 23:01:30 +0300 Subject: [openib-general] OFED 1.1 status Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7892@mtlexch01.mtl.com> Hi, OFED 1.1 RC4 will be published on Monday 11-Sep. We currently work on several showstoppers: 1. 223: mthca.so not properly linked to libibverbs - Vlad & Jack 2. 221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118 - Roland 3. 219: OFED 1.1rc3 contains prerelease unstable libibverbs code - Vlad & Jack Thus final release date will be delayed to end of next week Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rsalmon at tulane.edu Thu Sep 7 13:06:40 2006 From: rsalmon at tulane.edu (Rene Salmon) Date: Thu, 07 Sep 2006 15:06:40 -0500 Subject: [openib-general] PXE + infiniband? In-Reply-To: <2376B63A5AF8564F8A2A2D76BC6DB033DC1296@CINMLVEM11.e2k.ad.ge.com> References: <2376B63A5AF8564F8A2A2D76BC6DB033DC1296@CINMLVEM11.e2k.ad.ge.com> Message-ID: <45007BD0.9040101@tulane.edu> Hi, We are also interested in either PXE or etherboot over IB. We also run LinuxBIOS. If anyone manages to get this working can you post some notes maybe a wiki or a howto. thanks Rene Cain, Brian (GE Healthcare) wrote: >> -----Original Message----- >> From: openib-general-bounces at openib.org >> [mailto:openib-general-bounces at openib.org] On Behalf Of Paul Baxter >> Sent: Thursday, September 07, 2006 2:29 AM >> To: openib-general at openib.org; Eli cohen >> Subject: Re: [openib-general] PXE + infiniband? >> >>>> There is an implementation of PXE for Mellanox's HCAs that >> can be found >>>> here: http://sourceforge.net/forum/forum.php?forum_id=494529 >>> Thanks for the tip >>> >>> I, too, am interested in this. >>> >>> Do you have a more direct link as I wandered around >> etherboot's project >>> site >>> and couldn't find anything IB-specific. >> >> I must have been having a 'special moment' before, because I >> couldn't find >> the mailing lists >> >> Here they are! >> >> http://sourceforge.net/search/?ml_name=etherboot-developers&ty >> pe_of_search=mlists&group_id=4233&words=infiniband > > I was able to follow the procedure outlined in Eli's README and I > achieved some mixed results. On one hand, lspci now shows "Expansion > ROM at ed700000 [disabled] [size=1M]" whereas it didn't indicate that > before ("disabled" means it's zeroed out, maybe?). The BIOS seems to > confirm the whole disabled thing since it doesn't list the HCA in the > boot priority list. > > After making this change, IPoIB seems to work via this HCA, but SRP > (initiation, anyways) no longer does. "ibsrpdm -c" no longer produces > any output, even though I can see the target via the ibnetdiscover. > Accessing the SRP target from another host on the fabric works fine. > > -Brian > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu Sep 7 13:11:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 07 Sep 2006 13:11:34 -0700 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com> (Or Gerlitz's message of "Thu, 7 Sep 2006 21:52:54 +0200") References: <200609071902.57379.toralf.foerster@gmx.de> <15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com> Message-ID: Or> you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i Or> think you are missing CONFIG_INET=m Seems like a bug in the iSER Kconfig -- it shouldn't be possible to select iSER without everything it needs to compile. From HNGUYEN at de.ibm.com Thu Sep 7 14:42:58 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 7 Sep 2006 23:42:58 +0200 Subject: [openib-general] [PATCH] ehca for OFED 1.1-rc4 In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7892@mtlexch01.mtl.com> Message-ID: Hello Tziporet! Below is a patch of ehca against the ofed git tree branch ehca-branch in order to upgrade it to the same code level of Roland's git tree branch for-2.6.19, which has been posted for a while. The main code changes are: - Replace the "huge" EDEB macro by a simpler wrapper based on dev_err/dbg - Remove superfluous variables initialization and arguments checking - Replace struct ehca_module by static member variables in appropriate files, where they are accessed - Rename module name to ib_ehca.ko Thanks! Nam Nguyen Signed-off-by: Hoang-Nam Nguyen --- Kconfig | 14 Makefile | 9 ehca_av.c | 128 ++---- ehca_classes.h | 27 - ehca_cq.c | 222 +++++------ ehca_eq.c | 71 --- ehca_hca.c | 103 +---- ehca_irq.c | 221 +++-------- ehca_main.c | 491 ++++++++---------------- ehca_mcast.c | 119 +---- ehca_mrmw.c | 1113 ++++++++++++++++++++++---------------------------------- ehca_mrmw.h | 3 ehca_pd.c | 60 +-- ehca_qp.c | 572 ++++++++++++---------------- ehca_reqs.c | 219 ++++------- ehca_sqp.c | 50 -- ehca_tools.h | 337 ++-------------- ehca_uverbs.c | 278 ++++++------- hcp_if.c | 834 ++++++++++++----------------------------- hcp_phyp.c | 26 - hcp_phyp.h | 10 hipz_fns_core.h | 44 -- ipz_pt_fn.c | 37 - ipz_pt_fn.h | 7 24 files changed, 1781 insertions(+), 3214 deletions(-) diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/Kconfig linux-2.6/drivers/infiniband/hw/ehca/Kconfig --- linux-2.6_orig/drivers/infiniband/hw/ehca/Kconfig 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/Kconfig 2006-08-30 20:00:16.000000000 +0200 @@ -1,12 +1,16 @@ config INFINIBAND_EHCA - tristate "eHCA support" - depends on IBMEBUS && INFINIBAND - ---help--- - This is a low level device driver for the IBM GX based Host channel - adapters (HCAs). + tristate "eHCA support" + depends on IBMEBUS && INFINIBAND + ---help--- + This driver supports the IBM pSeries eHCA InfiniBand adapter. + + To compile the driver as a module, choose M here. The module + will be called ib_ehca. config INFINIBAND_EHCA_SCALING bool "Scaling support (EXPERIMENTAL)" depends on IBMEBUS && INFINIBAND_EHCA && HOTPLUG_CPU && EXPERIMENTAL ---help--- eHCA scaling support schedules the CQ callbacks to different CPUs. + + To enable this feature choose Y here. diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/Makefile linux-2.6/drivers/infiniband/hw/ehca/Makefile --- linux-2.6_orig/drivers/infiniband/hw/ehca/Makefile 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/Makefile 2006-08-30 20:00:17.000000000 +0200 @@ -8,11 +8,10 @@ # # This source code is distributed under a dual license of GPL v2.0 and OpenIB BSD. -obj-$(CONFIG_INFINIBAND_EHCA) += hcad_mod.o +obj-$(CONFIG_INFINIBAND_EHCA) += ib_ehca.o -hcad_mod-objs = ehca_main.o ehca_hca.o ehca_mcast.o ehca_pd.o ehca_av.o ehca_eq.o \ - ehca_cq.o ehca_qp.o ehca_sqp.o ehca_mrmw.o ehca_reqs.o ehca_irq.o \ - ehca_uverbs.o ipz_pt_fn.o hcp_if.o hcp_phyp.o +ib_ehca-objs = ehca_main.o ehca_hca.o ehca_mcast.o ehca_pd.o ehca_av.o ehca_eq.o \ + ehca_cq.o ehca_qp.o ehca_sqp.o ehca_mrmw.o ehca_reqs.o ehca_irq.o \ + ehca_uverbs.o ipz_pt_fn.o hcp_if.o hcp_phyp.o -CFLAGS += -DEHCA_USE_HCALL -DEHCA_USE_HCALL_KERNEL diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_av.c linux-2.6/drivers/infiniband/hw/ehca/ehca_av.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_av.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_av.c 2006-08-30 20:00:16.000000000 +0200 @@ -42,34 +42,26 @@ */ -#define DEB_PREFIX "ehav" - #include #include "ehca_tools.h" #include "ehca_iverbs.h" #include "hcp_if.h" +static struct kmem_cache *av_cache; + struct ib_ah *ehca_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) { - extern struct ehca_module ehca_module; - extern int ehca_static_rate; - int ret = 0; - struct ehca_av *av = NULL; - struct ehca_shca *shca = NULL; - - EHCA_CHECK_PD_P(pd); - EHCA_CHECK_ADR_P(ah_attr); + int ret; + struct ehca_av *av; + struct ehca_shca *shca = container_of(pd->device, struct ehca_shca, + ib_device); - shca = container_of(pd->device, struct ehca_shca, ib_device); - - EDEB_EN(7, "pd=%p ah_attr=%p", pd, ah_attr); - - av = kmem_cache_alloc(ehca_module.cache_av, SLAB_KERNEL); + av = kmem_cache_alloc(av_cache, SLAB_KERNEL); if (!av) { - EDEB_ERR(4, "Out of memory pd=%p ah_attr=%p", pd, ah_attr); - ret = -ENOMEM; - goto create_ah_exit0; + ehca_err(pd->device, "Out of memory pd=%p ah_attr=%p", + pd, ah_attr); + return ERR_PTR(-ENOMEM); } av->av.sl = ah_attr->sl; @@ -89,10 +81,6 @@ struct ib_ah *ehca_create_ah(struct ib_p } else av->av.ipd = ehca_static_rate; - EDEB(7, "IPD av->av.ipd set =%x ah_attr->static_rate=%x " - "shca_ib_rate=%x ",av->av.ipd, ah_attr->static_rate, - shca->sport[ah_attr->port_num].rate); - av->av.lnh = ah_attr->ah_flags; av->av.grh.word_0 = EHCA_BMASK_SET(GRH_IPVERSION_MASK, 6); av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_TCLASS_MASK, @@ -104,7 +92,7 @@ struct ib_ah *ehca_create_ah(struct ib_p av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_NEXTHEADER_MASK, 0x1B); /* set sgid in grh.word_1 */ if (ah_attr->ah_flags & IB_AH_GRH) { - int rc = 0; + int rc; struct ib_port_attr port_attr; union ib_gid gid; memset(&port_attr, 0, sizeof(port_attr)); @@ -112,7 +100,7 @@ struct ib_ah *ehca_create_ah(struct ib_p &port_attr); if (rc) { /* invalid port number */ ret = -EINVAL; - EDEB_ERR(4, "Invalid port number " + ehca_err(pd->device, "Invalid port number " "ehca_query_port() returned %x " "pd=%p ah_attr=%p", rc, pd, ah_attr); goto create_ah_exit1; @@ -123,7 +111,7 @@ struct ib_ah *ehca_create_ah(struct ib_p ah_attr->grh.sgid_index, &gid); if (rc) { ret = -EINVAL; - EDEB_ERR(4, "Failed to retrieve sgid " + ehca_err(pd->device, "Failed to retrieve sgid " "ehca_query_gid() returned %x " "pd=%p ah_attr=%p", rc, pd, ah_attr); goto create_ah_exit1; @@ -137,37 +125,24 @@ struct ib_ah *ehca_create_ah(struct ib_p memcpy(&av->av.grh.word_3, &ah_attr->grh.dgid, sizeof(ah_attr->grh.dgid)); - EHCA_REGISTER_AV(device, pd); - - EDEB_EX(7, "pd=%p ah_attr=%p av=%p", pd, ah_attr, av); return &av->ib_ah; create_ah_exit1: - kmem_cache_free(ehca_module.cache_av, av); - -create_ah_exit0: - EDEB_EX(7, "ret=%x pd=%p ah_attr=%p", ret, pd, ah_attr); + kmem_cache_free(av_cache, av); return ERR_PTR(ret); } int ehca_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) { - struct ehca_av *av = NULL; + struct ehca_av *av; struct ehca_ud_av new_ehca_av; - struct ehca_pd *my_pd = NULL; + struct ehca_pd *my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); u32 cur_pid = current->tgid; - int ret = 0; - - EHCA_CHECK_AV(ah); - EHCA_CHECK_ADR(ah_attr); - EDEB_EN(7, "ah=%p ah_attr=%p", ah, ah_attr); - - my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && my_pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(ah->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); return -EINVAL; } @@ -189,33 +164,31 @@ int ehca_modify_ah(struct ib_ah *ah, str /* set sgid in grh.word_1 */ if (ah_attr->ah_flags & IB_AH_GRH) { - int rc = 0; + int rc; struct ib_port_attr port_attr; union ib_gid gid; memset(&port_attr, 0, sizeof(port_attr)); rc = ehca_query_port(ah->device, ah_attr->port_num, &port_attr); if (rc) { /* invalid port number */ - ret = -EINVAL; - EDEB_ERR(4, "Invalid port number " + ehca_err(ah->device, "Invalid port number " "ehca_query_port() returned %x " "ah=%p ah_attr=%p port_num=%x", rc, ah, ah_attr, ah_attr->port_num); - goto modify_ah_exit1; + return -EINVAL; } memset(&gid, 0, sizeof(gid)); rc = ehca_query_gid(ah->device, ah_attr->port_num, ah_attr->grh.sgid_index, &gid); if (rc) { - ret = -EINVAL; - EDEB_ERR(4, "Failed to retrieve sgid " + ehca_err(ah->device, "Failed to retrieve sgid " "ehca_query_gid() returned %x " "ah=%p ah_attr=%p port_num=%x " "sgid_index=%x", rc, ah, ah_attr, ah_attr->port_num, ah_attr->grh.sgid_index); - goto modify_ah_exit1; + return -EINVAL; } memcpy(&new_ehca_av.grh.word_1, &gid, sizeof(gid)); } @@ -228,33 +201,22 @@ int ehca_modify_ah(struct ib_ah *ah, str av = container_of(ah, struct ehca_av, ib_ah); av->av = new_ehca_av; -modify_ah_exit1: - EDEB_EX(7, "ret=%x ah=%p ah_attr=%p", ret, ah, ah_attr); - - return ret; + return 0; } int ehca_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) { - int ret = 0; - struct ehca_av *av = NULL; - struct ehca_pd *my_pd = NULL; + struct ehca_av *av = container_of(ah, struct ehca_av, ib_ah); + struct ehca_pd *my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); u32 cur_pid = current->tgid; - EHCA_CHECK_AV(ah); - EHCA_CHECK_ADR(ah_attr); - - EDEB_EN(7, "ah=%p ah_attr=%p", ah, ah_attr); - - my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && my_pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(ah->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); return -EINVAL; } - av = container_of(ah, struct ehca_av, ib_ah); memcpy(&ah_attr->grh.dgid, &av->av.grh.word_3, sizeof(ah_attr->grh.dgid)); ah_attr->sl = av->av.sl; @@ -271,33 +233,39 @@ int ehca_query_ah(struct ib_ah *ah, stru ah_attr->grh.flow_label = EHCA_BMASK_GET(GRH_FLOWLABEL_MASK, av->av.grh.word_0); - EDEB_EX(7, "ah=%p ah_attr=%p ret=%x", ah, ah_attr, ret); - return ret; + return 0; } int ehca_destroy_ah(struct ib_ah *ah) { - extern struct ehca_module ehca_module; - struct ehca_pd *my_pd = NULL; + struct ehca_pd *my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); u32 cur_pid = current->tgid; - int ret = 0; - - EHCA_CHECK_AV(ah); - EHCA_DEREGISTER_AV(ah); - - EDEB_EN(7, "ah=%p", ah); - my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && my_pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(ah->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); return -EINVAL; } - kmem_cache_free(ehca_module.cache_av, - container_of(ah, struct ehca_av, ib_ah)); + kmem_cache_free(av_cache, container_of(ah, struct ehca_av, ib_ah)); - EDEB_EX(7, "ret=%x ah=%p", ret, ah); - return ret; + return 0; +} + +int ehca_init_av_cache(void) +{ + av_cache = kmem_cache_create("ehca_cache_av", + sizeof(struct ehca_av), 0, + SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!av_cache) + return -ENOMEM; + return 0; +} + +void ehca_cleanup_av_cache(void) +{ + if (av_cache) + kmem_cache_destroy(av_cache); } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_classes.h linux-2.6/drivers/infiniband/hw/ehca/ehca_classes.h --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_classes.h 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_classes.h 2006-08-30 20:00:16.000000000 +0200 @@ -63,18 +63,6 @@ struct ehca_av; #include "ehca_irq.h" -struct ehca_module { - struct list_head shca_list; - spinlock_t shca_lock; - struct timer_list timer; - kmem_cache_t *cache_pd; - kmem_cache_t *cache_cq; - kmem_cache_t *cache_qp; - kmem_cache_t *cache_av; - kmem_cache_t *cache_mr; - kmem_cache_t *cache_mw; -}; - struct ehca_eq { u32 length; struct ipz_queue ipz_queue; @@ -274,11 +262,26 @@ int ehca_shca_delete(struct ehca_shca *m struct ehca_sport *ehca_sport_new(struct ehca_shca *anchor); +int ehca_init_pd_cache(void); +void ehca_cleanup_pd_cache(void); +int ehca_init_cq_cache(void); +void ehca_cleanup_cq_cache(void); +int ehca_init_qp_cache(void); +void ehca_cleanup_qp_cache(void); +int ehca_init_av_cache(void); +void ehca_cleanup_av_cache(void); +int ehca_init_mrmw_cache(void); +void ehca_cleanup_mrmw_cache(void); + extern spinlock_t ehca_qp_idr_lock; extern spinlock_t ehca_cq_idr_lock; extern struct idr ehca_qp_idr; extern struct idr ehca_cq_idr; +extern int ehca_static_rate; +extern int ehca_port_act_time; +extern int ehca_use_hp_mr; + struct ipzu_queue_resp { u64 queue; /* points to first queue entry */ u32 qe_size; /* queue entry size */ diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_cq.c linux-2.6/drivers/infiniband/hw/ehca/ehca_cq.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_cq.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_cq.c 2006-08-30 20:00:17.000000000 +0200 @@ -43,8 +43,6 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#define DEB_PREFIX "e_cq" - #include #include "ehca_iverbs.h" @@ -52,17 +50,20 @@ #include "ehca_irq.h" #include "hcp_if.h" +static struct kmem_cache *cq_cache; + int ehca_cq_assign_qp(struct ehca_cq *cq, struct ehca_qp *qp) { unsigned int qp_num = qp->real_qp_num; unsigned int key = qp_num & (QP_HASHTAB_LEN-1); - unsigned long spl_flags = 0; + unsigned long spl_flags; spin_lock_irqsave(&cq->spinlock, spl_flags); hlist_add_head(&qp->list_entries, &cq->qp_hashtab[key]); spin_unlock_irqrestore(&cq->spinlock, spl_flags); - EDEB(7, "cq_num=%x real_qp_num=%x", cq->cq_number, qp_num); + ehca_dbg(cq->ib_cq.device, "cq_num=%x real_qp_num=%x", + cq->cq_number, qp_num); return 0; } @@ -71,26 +72,27 @@ int ehca_cq_unassign_qp(struct ehca_cq * { int ret = -EINVAL; unsigned int key = real_qp_num & (QP_HASHTAB_LEN-1); - struct hlist_node *iter = NULL; - struct ehca_qp *qp = NULL; - unsigned long spl_flags = 0; + struct hlist_node *iter; + struct ehca_qp *qp; + unsigned long spl_flags; spin_lock_irqsave(&cq->spinlock, spl_flags); hlist_for_each(iter, &cq->qp_hashtab[key]) { qp = hlist_entry(iter, struct ehca_qp, list_entries); if (qp->real_qp_num == real_qp_num) { hlist_del(iter); - EDEB(7, "removed qp from cq .cq_num=%x real_qp_num=%x", - cq->cq_number, real_qp_num); + ehca_dbg(cq->ib_cq.device, + "removed qp from cq .cq_num=%x real_qp_num=%x", + cq->cq_number, real_qp_num); ret = 0; break; } } spin_unlock_irqrestore(&cq->spinlock, spl_flags); - if (ret) { - EDEB_ERR(4, "qp not found cq_num=%x real_qp_num=%x", + if (ret) + ehca_err(cq->ib_cq.device, + "qp not found cq_num=%x real_qp_num=%x", cq->cq_number, real_qp_num); - } return ret; } @@ -99,8 +101,8 @@ struct ehca_qp* ehca_cq_get_qp(struct eh { struct ehca_qp *ret = NULL; unsigned int key = real_qp_num & (QP_HASHTAB_LEN-1); - struct hlist_node *iter = NULL; - struct ehca_qp *qp = NULL; + struct hlist_node *iter; + struct ehca_qp *qp; hlist_for_each(iter, &cq->qp_hashtab[key]) { qp = hlist_entry(iter, struct ehca_qp, list_entries); if (qp->real_qp_num == real_qp_num) { @@ -115,37 +117,28 @@ struct ib_cq *ehca_create_cq(struct ib_d struct ib_ucontext *context, struct ib_udata *udata) { - extern struct ehca_module ehca_module; - struct ib_cq *cq = NULL; - struct ehca_cq *my_cq = NULL; - struct ehca_shca *shca = NULL; + static const u32 additional_cqe = 20; + struct ib_cq *cq; + struct ehca_cq *my_cq; + struct ehca_shca *shca = + container_of(device, struct ehca_shca, ib_device); struct ipz_adapter_handle adapter_handle; - /* h_call's out parameters */ - struct ehca_alloc_cq_parms param; - u32 counter = 0; - void *vpage = NULL; - u64 rpage = 0; + struct ehca_alloc_cq_parms param; /* h_call's out parameters */ struct h_galpa gal; - u64 cqx_fec = 0; - u64 h_ret = 0; - int ipz_rc = 0; - int ret = 0; - const u32 additional_cqe=20; - int i= 0; + void *vpage; + u32 counter; + u64 rpage, cqx_fec, h_ret; + int ipz_rc, ret, i; unsigned long flags; - EHCA_CHECK_DEVICE_P(device); - EDEB_EN(7, "device=%p cqe=%x context=%p", device, cqe, context); - if (cqe >= 0xFFFFFFFF - 64 - additional_cqe) return ERR_PTR(-EINVAL); - my_cq = kmem_cache_alloc(ehca_module.cache_cq, SLAB_KERNEL); + my_cq = kmem_cache_alloc(cq_cache, SLAB_KERNEL); if (!my_cq) { - cq = ERR_PTR(-ENOMEM); - EDEB_ERR(4, "Out of memory for ehca_cq struct device=%p", + ehca_err(device, "Out of memory for ehca_cq struct device=%p", device); - goto create_cq_exit0; + return ERR_PTR(-ENOMEM); } memset(my_cq, 0, sizeof(struct ehca_cq)); @@ -158,17 +151,14 @@ struct ib_cq *ehca_create_cq(struct ib_d cq = &my_cq->ib_cq; - shca = container_of(device, struct ehca_shca, ib_device); adapter_handle = shca->ipz_hca_handle; param.eq_handle = shca->eq.ipz_eq_handle; - do { if (!idr_pre_get(&ehca_cq_idr, GFP_KERNEL)) { cq = ERR_PTR(-ENOMEM); - EDEB_ERR(4, - "Can't reserve idr resources. " - "device=%p", device); + ehca_err(device, "Can't reserve idr nr. device=%p", + device); goto create_cq_exit1; } @@ -180,9 +170,8 @@ struct ib_cq *ehca_create_cq(struct ib_d if (ret) { cq = ERR_PTR(-ENOMEM); - EDEB_ERR(4, - "Can't allocate new idr entry. " - "device=%p", device); + ehca_err(device, "Can't allocate new idr entry. device=%p", + device); goto create_cq_exit1; } @@ -194,7 +183,7 @@ struct ib_cq *ehca_create_cq(struct ib_d h_ret = hipz_h_alloc_resource_cq(adapter_handle, my_cq, ¶m); if (h_ret != H_SUCCESS) { - EDEB_ERR(4,"hipz_h_alloc_resource_cq() failed " + ehca_err(device, "hipz_h_alloc_resource_cq() failed " "h_ret=%lx device=%p", h_ret, device); cq = ERR_PTR(ehca2ib_return_code(h_ret)); goto create_cq_exit2; @@ -203,9 +192,8 @@ struct ib_cq *ehca_create_cq(struct ib_d ipz_rc = ipz_queue_ctor(&my_cq->ipz_queue, param.act_pages, EHCA_PAGESIZE, sizeof(struct ehca_cqe), 0); if (!ipz_rc) { - EDEB_ERR(4, - "ipz_queue_ctor() failed " - "ipz_rc=%x device=%p", ipz_rc, device); + ehca_err(device, "ipz_queue_ctor() failed ipz_rc=%x device=%p", + ipz_rc, device); cq = ERR_PTR(-EINVAL); goto create_cq_exit3; } @@ -213,7 +201,7 @@ struct ib_cq *ehca_create_cq(struct ib_d for (counter = 0; counter < param.act_pages; counter++) { vpage = ipz_qpageit_get_inc(&my_cq->ipz_queue); if (!vpage) { - EDEB_ERR(4, "ipz_qpageit_get_inc() " + ehca_err(device, "ipz_qpageit_get_inc() " "returns NULL device=%p", device); cq = ERR_PTR(-EAGAIN); goto create_cq_exit4; @@ -231,10 +219,9 @@ struct ib_cq *ehca_create_cq(struct ib_d kernel); if (h_ret < H_SUCCESS) { - EDEB_ERR(4, "hipz_h_register_rpage_cq() failed " - "ehca_cq=%p cq_num=%x h_ret=%lx " - "counter=%i act_pages=%i", - my_cq, my_cq->cq_number, + ehca_err(device, "hipz_h_register_rpage_cq() failed " + "ehca_cq=%p cq_num=%x h_ret=%lx counter=%i " + "act_pages=%i", my_cq, my_cq->cq_number, h_ret, counter, param.act_pages); cq = ERR_PTR(-EINVAL); goto create_cq_exit4; @@ -243,16 +230,16 @@ struct ib_cq *ehca_create_cq(struct ib_d if (counter == (param.act_pages - 1)) { vpage = ipz_qpageit_get_inc(&my_cq->ipz_queue); if ((h_ret != H_SUCCESS) || vpage) { - EDEB_ERR(4, "Registration of pages not " + ehca_err(device, "Registration of pages not " "complete ehca_cq=%p cq_num=%x " - "h_ret=%lx", - my_cq, my_cq->cq_number, h_ret); + "h_ret=%lx", my_cq, my_cq->cq_number, + h_ret); cq = ERR_PTR(-EAGAIN); goto create_cq_exit4; } } else { if (h_ret != H_PAGE_REGISTERED) { - EDEB_ERR(4, "Registration of page failed " + ehca_err(device, "Registration of page failed " "ehca_cq=%p cq_num=%x h_ret=%lx" "counter=%i act_pages=%i", my_cq, my_cq->cq_number, @@ -267,8 +254,8 @@ struct ib_cq *ehca_create_cq(struct ib_d gal = my_cq->galpas.kernel; cqx_fec = hipz_galpa_load(gal, CQTEMM_OFFSET(cqx_fec)); - EDEB(8, "ehca_cq=%p cq_num=%x CQX_FEC=%lx", - my_cq, my_cq->cq_number, cqx_fec); + ehca_dbg(device, "ehca_cq=%p cq_num=%x CQX_FEC=%lx", + my_cq, my_cq->cq_number, cqx_fec); my_cq->ib_cq.cqe = my_cq->nr_of_entries = param.act_nr_of_entries - additional_cqe; @@ -280,7 +267,7 @@ struct ib_cq *ehca_create_cq(struct ib_d if (context) { struct ipz_queue *ipz_queue = &my_cq->ipz_queue; struct ehca_create_cq_resp resp; - struct vm_area_struct *vma = NULL; + struct vm_area_struct *vma; memset(&resp, 0, sizeof(resp)); resp.cq_number = my_cq->cq_number; resp.token = my_cq->token; @@ -294,7 +281,7 @@ struct ib_cq *ehca_create_cq(struct ib_d (void**)&resp.ipz_queue.queue, &vma); if (ret) { - EDEB_ERR(4, "Could not mmap queue pages"); + ehca_err(device, "Could not mmap queue pages"); cq = ERR_PTR(ret); goto create_cq_exit4; } @@ -304,19 +291,17 @@ struct ib_cq *ehca_create_cq(struct ib_d (void**)&resp.galpas.kernel.fw_handle, &vma); if (ret) { - EDEB_ERR(4, "Could not mmap fw_handle"); + ehca_err(device, "Could not mmap fw_handle"); cq = ERR_PTR(ret); goto create_cq_exit5; } my_cq->uspace_fwh = (u64)resp.galpas.kernel.fw_handle; if (ib_copy_to_udata(udata, &resp, sizeof(resp))) { - EDEB_ERR(4, "Copy to udata failed."); + ehca_err(device, "Copy to udata failed."); goto create_cq_exit6; } } - EDEB_EX(7,"retcode=%p ehca_cq=%p cq_num=%x cq_size=%x", - cq, my_cq, my_cq->cq_number, param.act_nr_of_entries); return cq; create_cq_exit6: @@ -331,8 +316,8 @@ create_cq_exit4: create_cq_exit3: h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 1); if (h_ret != H_SUCCESS) - EDEB(4, "hipz_h_destroy_cq() failed ehca_cq=%p cq_num=%x " - "h_ret=%lx", my_cq, my_cq->cq_number, h_ret); + ehca_err(device, "hipz_h_destroy_cq() failed ehca_cq=%p " + "cq_num=%x h_ret=%lx", my_cq, my_cq->cq_number, h_ret); create_cq_exit2: spin_lock_irqsave(&ehca_cq_idr_lock, flags); @@ -340,36 +325,24 @@ create_cq_exit2: spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); create_cq_exit1: - kmem_cache_free(ehca_module.cache_cq, my_cq); + kmem_cache_free(cq_cache, my_cq); -create_cq_exit0: - EDEB_EX(4, "An error has occured retcode=%p", cq); return cq; } int ehca_destroy_cq(struct ib_cq *cq) { - extern struct ehca_module ehca_module; - u64 h_ret = 0; - int ret = 0; - struct ehca_cq *my_cq = NULL; - int cq_num = 0; - struct ib_device *device = NULL; - struct ehca_shca *shca = NULL; - struct ipz_adapter_handle adapter_handle; + u64 h_ret; + int ret; + struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); + int cq_num = my_cq->cq_number; + struct ib_device *device = cq->device; + struct ehca_shca *shca = container_of(device, struct ehca_shca, + ib_device); + struct ipz_adapter_handle adapter_handle = shca->ipz_hca_handle; u32 cur_pid = current->tgid; unsigned long flags; - EHCA_CHECK_CQ(cq); - my_cq = container_of(cq, struct ehca_cq, ib_cq); - cq_num = my_cq->cq_number; - device = cq->device; - EHCA_CHECK_DEVICE(device); - shca = container_of(device, struct ehca_shca, ib_device); - adapter_handle = shca->ipz_hca_handle; - EDEB_EN(7, "ehca_cq=%p cq_num=%x", - my_cq, my_cq->cq_number); - spin_lock_irqsave(&ehca_cq_idr_lock, flags); while (my_cq->nr_callbacks) yield(); @@ -378,7 +351,7 @@ int ehca_destroy_cq(struct ib_cq *cq) spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); if (my_cq->uspace_queue && my_cq->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_cq->ownpid); return -EINVAL; } @@ -386,64 +359,69 @@ int ehca_destroy_cq(struct ib_cq *cq) /* un-mmap if vma alloc */ if (my_cq->uspace_queue ) { ret = ehca_munmap(my_cq->uspace_queue, - my_cq->ipz_queue.queue_length); + my_cq->ipz_queue.queue_length); + if (ret) + ehca_err(device, "Could not munmap queue ehca_cq=%p " + "cq_num=%x", my_cq, cq_num); ret = ehca_munmap(my_cq->uspace_fwh, EHCA_PAGESIZE); + if (ret) + ehca_err(device, "Could not munmap fwh ehca_cq=%p " + "cq_num=%x", my_cq, cq_num); } h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 0); if (h_ret == H_R_STATE) { /* cq in err: read err data and destroy it forcibly */ - EDEB(4, "ehca_cq=%p cq_num=%x ressource=%lx in err state. " - "Try to delete it forcibly.", - my_cq, my_cq->cq_number, my_cq->ipz_cq_handle.handle); + ehca_dbg(device, "ehca_cq=%p cq_num=%x ressource=%lx in err " + "state. Try to delete it forcibly.", + my_cq, cq_num, my_cq->ipz_cq_handle.handle); ehca_error_data(shca, my_cq, my_cq->ipz_cq_handle.handle); h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 1); if (h_ret == H_SUCCESS) - EDEB(4, "ehca_cq=%p cq_num=%x deleted successfully.", - my_cq, my_cq->cq_number); + ehca_dbg(device, "cq_num=%x deleted successfully.", + cq_num); } if (h_ret != H_SUCCESS) { - EDEB_ERR(4,"hipz_h_destroy_cq() failed " - "h_ret=%lx ehca_cq=%p cq_num=%x", - h_ret, my_cq, my_cq->cq_number); - ret = ehca2ib_return_code(h_ret); - goto destroy_cq_exit0; + ehca_err(device, "hipz_h_destroy_cq() failed h_ret=%lx " + "ehca_cq=%p cq_num=%x", h_ret, my_cq, cq_num); + return ehca2ib_return_code(h_ret); } ipz_queue_dtor(&my_cq->ipz_queue); - kmem_cache_free(ehca_module.cache_cq, my_cq); + kmem_cache_free(cq_cache, my_cq); -destroy_cq_exit0: - EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x ", - my_cq, cq_num, ret); - return ret; + return 0; } int ehca_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata) { - int ret = 0; - struct ehca_cq *my_cq = NULL; + struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); u32 cur_pid = current->tgid; - if (unlikely(!cq)) { - EDEB_ERR(4, "cq is NULL"); - return -EFAULT; - } - - my_cq = container_of(cq, struct ehca_cq, ib_cq); - EDEB_EN(7, "ehca_cq=%p cq_num=%x", - my_cq, my_cq->cq_number); - if (my_cq->uspace_queue && my_cq->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(cq->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_cq->ownpid); return -EINVAL; } /* TODO: proper resize needs to be done */ - ret = -EFAULT; - EDEB_ERR(4, "not implemented yet"); + ehca_err(cq->device, "not implemented yet"); - EDEB_EX(7, "ehca_cq=%p cq_num=%x", - my_cq, my_cq->cq_number); - return ret; + return -EFAULT; +} + +int ehca_init_cq_cache(void) +{ + cq_cache = kmem_cache_create("ehca_cache_cq", + sizeof(struct ehca_cq), 0, + SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!cq_cache) + return -ENOMEM; + return 0; +} + +void ehca_cleanup_cq_cache(void) +{ + if (cq_cache) + kmem_cache_destroy(cq_cache); } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_eq.c linux-2.6/drivers/infiniband/hw/ehca/ehca_eq.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_eq.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_eq.c 2006-08-30 20:00:16.000000000 +0200 @@ -43,8 +43,6 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#define DEB_PREFIX "e_eq" - #include "ehca_classes.h" #include "ehca_irq.h" #include "ehca_iverbs.h" @@ -56,24 +54,21 @@ int ehca_create_eq(struct ehca_shca *shc struct ehca_eq *eq, const enum ehca_eq_type type, const u32 length) { - u64 ret = H_SUCCESS; - u32 nr_pages = 0; + u64 ret; + u32 nr_pages; u32 i; - void *vpage = NULL; - - EDEB_EN(7, "shca=%p eq=%p length=%x", shca, eq, length); - EHCA_CHECK_ADR(shca); - EHCA_CHECK_ADR(eq); + void *vpage; + struct ib_device *ib_dev = &shca->ib_device; spin_lock_init(&eq->spinlock); eq->is_initialized = 0; if (type != EHCA_EQ && type != EHCA_NEQ) { - EDEB_ERR(4, "Invalid EQ type %x. eq=%p", type, eq); + ehca_err(ib_dev, "Invalid EQ type %x. eq=%p", type, eq); return -EINVAL; } - if (length == 0) { - EDEB_ERR(4, "EQ length must not be zero. eq=%p", eq); + if (!length) { + ehca_err(ib_dev, "EQ length must not be zero. eq=%p", eq); return -EINVAL; } @@ -86,14 +81,14 @@ int ehca_create_eq(struct ehca_shca *shc &nr_pages, &eq->ist); if (ret != H_SUCCESS) { - EDEB_ERR(4, "Can't allocate EQ / NEQ. eq=%p", eq); + ehca_err(ib_dev, "Can't allocate EQ/NEQ. eq=%p", eq); return -EINVAL; } ret = ipz_queue_ctor(&eq->ipz_queue, nr_pages, EHCA_PAGESIZE, sizeof(struct ehca_eqe), 0); if (!ret) { - EDEB_ERR(4, "Can't allocate EQ pages. eq=%p", eq); + ehca_err(ib_dev, "Can't allocate EQ pages eq=%p", eq); goto create_eq_exit1; } @@ -130,7 +125,7 @@ int ehca_create_eq(struct ehca_shca *shc SA_INTERRUPT, "ehca_eq", (void *)shca); if (ret < 0) - EDEB_ERR(4, "Can't map interrupt handler."); + ehca_err(ib_dev, "Can't map interrupt handler."); tasklet_init(&eq->interrupt_task, ehca_tasklet_eq, (long)shca); } else if (type == EHCA_NEQ) { @@ -138,15 +133,13 @@ int ehca_create_eq(struct ehca_shca *shc SA_INTERRUPT, "ehca_neq", (void *)shca); if (ret < 0) - EDEB_ERR(4, "Can't map interrupt handler."); + ehca_err(ib_dev, "Can't map interrupt handler."); tasklet_init(&eq->interrupt_task, ehca_tasklet_neq, (long)shca); } eq->is_initialized = 1; - EDEB_EX(7, "ret=%lx", ret); - return 0; create_eq_exit2: @@ -155,53 +148,25 @@ create_eq_exit2: create_eq_exit1: hipz_h_destroy_eq(shca->ipz_hca_handle, eq); - EDEB_EX(7, "ret=%lx", ret); - return -EINVAL; } void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq) { - unsigned long flags = 0; - void *eqe = NULL; - - EDEB_EN(7, "shca=%p eq=%p", shca, eq); - EHCA_CHECK_ADR_P(shca); - EHCA_CHECK_EQ_P(eq); + unsigned long flags; + void *eqe; spin_lock_irqsave(&eq->spinlock, flags); eqe = ipz_eqit_eq_get_inc_valid(&eq->ipz_queue); spin_unlock_irqrestore(&eq->spinlock, flags); - EDEB_EX(7, "eq=%p eqe=%p", eq, eqe); - return eqe; } -void ehca_poll_eqs(unsigned long data) -{ - struct ehca_shca *shca; - struct ehca_module *module = (struct ehca_module*)data; - - spin_lock(&module->shca_lock); - list_for_each_entry(shca, &module->shca_list, shca_list) { - if (shca->eq.is_initialized) - ehca_tasklet_eq((unsigned long)(void*)shca); - } - mod_timer(&module->timer, jiffies + HZ); - spin_unlock(&module->shca_lock); - - return; -} - int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq) { - unsigned long flags = 0; - u64 h_ret = H_SUCCESS; - - EDEB_EN(7, "shca=%p eq=%p", shca, eq); - EHCA_CHECK_ADR(shca); - EHCA_CHECK_EQ(eq); + unsigned long flags; + u64 h_ret; spin_lock_irqsave(&eq->spinlock, flags); ibmebus_free_irq(NULL, eq->ist, (void *)shca); @@ -211,12 +176,10 @@ int ehca_destroy_eq(struct ehca_shca *sh spin_unlock_irqrestore(&eq->spinlock, flags); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "Can't free EQ resources."); + ehca_err(&shca->ib_device, "Can't free EQ resources."); return -EINVAL; } ipz_queue_dtor(&eq->ipz_queue); - EDEB_EX(7, "h_ret=%lx", h_ret); - - return h_ret; + return 0; } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_hca.c linux-2.6/drivers/infiniband/hw/ehca/ehca_hca.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_hca.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_hca.c 2006-08-30 20:00:16.000000000 +0200 @@ -39,36 +39,29 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#undef DEB_PREFIX -#define DEB_PREFIX "shca" - #include "ehca_tools.h" - #include "hcp_if.h" int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { int ret = 0; - struct ehca_shca *shca; + struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, + ib_device); struct hipz_query_hca *rblock; - EDEB_EN(7, ""); - - memset(props, 0, sizeof(struct ib_device_attr)); - shca = container_of(ibdev, struct ehca_shca, ib_device); - rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!rblock) { - EDEB_ERR(4, "Can't allocate rblock memory."); - ret = -ENOMEM; - goto query_device0; + ehca_err(&shca->ib_device, "Can't allocate rblock memory."); + return -ENOMEM; } if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) { - EDEB_ERR(4, "Can't query device properties"); + ehca_err(&shca->ib_device, "Can't query device properties"); ret = -EINVAL; goto query_device1; } + + memset(props, 0, sizeof(struct ib_device_attr)); props->fw_ver = rblock->hw_ver; props->max_mr_size = rblock->max_mr_size; props->vendor_id = rblock->vendor_id >> 8; @@ -105,9 +98,6 @@ int ehca_query_device(struct ib_device * query_device1: kfree(rblock); -query_device0: - EDEB_EX(7, "ret=%x", ret); - return ret; } @@ -115,27 +105,23 @@ int ehca_query_port(struct ib_device *ib u8 port, struct ib_port_attr *props) { int ret = 0; - struct ehca_shca *shca; + struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, + ib_device); struct hipz_query_port *rblock; - EDEB_EN(7, "port=%x", port); - - memset(props, 0, sizeof(struct ib_port_attr)); - shca = container_of(ibdev, struct ehca_shca, ib_device); - rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!rblock) { - EDEB_ERR(4, "Can't allocate rblock memory."); - ret = -ENOMEM; - goto query_port0; + ehca_err(&shca->ib_device, "Can't allocate rblock memory."); + return -ENOMEM; } if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != H_SUCCESS) { - EDEB_ERR(4, "Can't query port properties"); + ehca_err(&shca->ib_device, "Can't query port properties"); ret = -EINVAL; goto query_port1; } + memset(props, 0, sizeof(struct ib_port_attr)); props->state = rblock->state; switch (rblock->max_mtu) { @@ -155,7 +141,9 @@ int ehca_query_port(struct ib_device *ib props->active_mtu = props->max_mtu = IB_MTU_4096; break; default: - EDEB_ERR(4, "Unknown MTU size: %x.", rblock->max_mtu); + ehca_err(&shca->ib_device, "Unknown MTU size: %x.", + rblock->max_mtu); + break; } props->gid_tbl_len = rblock->gid_tbl_len; @@ -176,37 +164,28 @@ int ehca_query_port(struct ib_device *ib query_port1: kfree(rblock); -query_port0: - EDEB_EX(7, "ret=%x", ret); - return ret; } int ehca_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 *pkey) { int ret = 0; - struct ehca_shca *shca; + struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, ib_device); struct hipz_query_port *rblock; - EDEB_EN(7, "port=%x index=%x", port, index); - if (index > 16) { - EDEB_ERR(4, "Invalid index: %x.", index); - ret = -EINVAL; - goto query_pkey0; + ehca_err(&shca->ib_device, "Invalid index: %x.", index); + return -EINVAL; } - shca = container_of(ibdev, struct ehca_shca, ib_device); - rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!rblock) { - EDEB_ERR(4, "Can't allocate rblock memory."); - ret = -ENOMEM; - goto query_pkey0; + ehca_err(&shca->ib_device, "Can't allocate rblock memory."); + return -ENOMEM; } if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != H_SUCCESS) { - EDEB_ERR(4, "Can't query port properties"); + ehca_err(&shca->ib_device, "Can't query port properties"); ret = -EINVAL; goto query_pkey1; } @@ -216,9 +195,6 @@ int ehca_query_pkey(struct ib_device *ib query_pkey1: kfree(rblock); -query_pkey0: - EDEB_EX(7, "ret=%x", ret); - return ret; } @@ -226,28 +202,23 @@ int ehca_query_gid(struct ib_device *ibd int index, union ib_gid *gid) { int ret = 0; - struct ehca_shca *shca; + struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, + ib_device); struct hipz_query_port *rblock; - EDEB_EN(7, "port=%x index=%x", port, index); - if (index > 255) { - EDEB_ERR(4, "Invalid index: %x.", index); - ret = -EINVAL; - goto query_gid0; + ehca_err(&shca->ib_device, "Invalid index: %x.", index); + return -EINVAL; } - shca = container_of(ibdev, struct ehca_shca, ib_device); - rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!rblock) { - EDEB_ERR(4, "Can't allocate rblock memory."); - ret = -ENOMEM; - goto query_gid0; + ehca_err(&shca->ib_device, "Can't allocate rblock memory."); + return -ENOMEM; } if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != H_SUCCESS) { - EDEB_ERR(4, "Can't query port properties"); + ehca_err(&shca->ib_device, "Can't query port properties"); ret = -EINVAL; goto query_gid1; } @@ -258,11 +229,6 @@ int ehca_query_gid(struct ib_device *ibd query_gid1: kfree(rblock); -query_gid0: - EDEB_EX(7, "ret=%x GID=%lx%lx", ret, - *(u64 *) & gid->raw[0], - *(u64 *) & gid->raw[8]); - return ret; } @@ -270,13 +236,6 @@ int ehca_modify_port(struct ib_device *i u8 port, int port_modify_mask, struct ib_port_modify *props) { - int ret = 0; - - EDEB_EN(7, "port=%x", port); - - /* Not implemented yet. */ - - EDEB_EX(7, "ret=%x", ret); - - return ret; + /* Not implemented yet */ + return -EFAULT; } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_irq.c linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c 2006-08-30 20:00:16.000000000 +0200 @@ -39,8 +39,6 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#define DEB_PREFIX "eirq" - #include "ehca_classes.h" #include "ehca_irq.h" #include "ehca_iverbs.h" @@ -64,15 +62,17 @@ #define ERROR_DATA_LENGTH EHCA_BMASK_IBM(52,63) #define ERROR_DATA_TYPE EHCA_BMASK_IBM(0,7) +#ifdef CONFIG_INFINIBAND_EHCA_SCALING + static void queue_comp_task(struct ehca_cq *__cq); static struct ehca_comp_pool* pool; static struct notifier_block comp_pool_callback_nb; +#endif + static inline void comp_event_callback(struct ehca_cq *cq) { - EDEB_EN(7, "cq=%p", cq); - if (!cq->ib_cq.comp_handler) return; @@ -80,8 +80,6 @@ static inline void comp_event_callback(s cq->ib_cq.comp_handler(&cq->ib_cq, cq->ib_cq.cq_context); spin_unlock(&cq->cb_lock); - EDEB_EX(7, "cq=%p", cq); - return; } @@ -91,9 +89,6 @@ static void print_error_data(struct ehca u64 type = EHCA_BMASK_GET(ERROR_DATA_TYPE, rblock[2]); u64 resource = rblock[1]; - EDEB_EN(7, "shca=%p data=%p rblock=%p length=%x", - shca, data, rblock, length); - switch (type) { case 0x1: /* Queue Pair */ { @@ -103,7 +98,8 @@ static void print_error_data(struct ehca if (rblock[6] == 0) return; - EDEB_ERR(4, "QP 0x%x (resource=%lx) has errors.", + ehca_err(&shca->ib_device, + "QP 0x%x (resource=%lx) has errors.", qp->ib_qp.qp_num, resource); break; } @@ -111,25 +107,25 @@ static void print_error_data(struct ehca { struct ehca_cq *cq = (struct ehca_cq*)data; - EDEB_ERR(4, "CQ 0x%x (resource=%lx) has errors.", + ehca_err(&shca->ib_device, + "CQ 0x%x (resource=%lx) has errors.", cq->cq_number, resource); break; } default: - EDEB_ERR(4, "Unknown errror type: %lx on %s.", + ehca_err(&shca->ib_device, + "Unknown errror type: %lx on %s.", type, shca->ib_device.name); break; } - EDEB_ERR(4, "Error data is available: %lx.", resource); - EDEB_ERR(4, "EHCA ----- error data begin " + ehca_err(&shca->ib_device, "Error data is available: %lx.", resource); + ehca_err(&shca->ib_device, "EHCA ----- error data begin " "---------------------------------------------------"); - EDEB_DMP(4, rblock, length, "resource=%lx", resource); - EDEB_ERR(4, "EHCA ----- error data end " + ehca_dmp(rblock, length, "resource=%lx", resource); + ehca_err(&shca->ib_device, "EHCA ----- error data end " "----------------------------------------------------"); - EDEB_EX(7, ""); - return; } @@ -137,15 +133,13 @@ int ehca_error_data(struct ehca_shca *sh u64 resource) { - unsigned long ret = 0; + unsigned long ret; u64 *rblock; unsigned long block_count; - EDEB_EN(7, "shca=%p data=%p resource=%lx", shca, data, resource); - rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!rblock) { - EDEB_ERR(4, "Cannot allocate rblock memory."); + ehca_err(&shca->ib_device, "Cannot allocate rblock memory."); ret = -ENOMEM; goto error_data1; } @@ -156,7 +150,8 @@ int ehca_error_data(struct ehca_shca *sh &block_count); if (ret == H_R_STATE) { - EDEB_ERR(4, "No error data is available: %lx.", resource); + ehca_err(&shca->ib_device, + "No error data is available: %lx.", resource); } else if (ret == H_SUCCESS) { int length; @@ -169,7 +164,8 @@ int ehca_error_data(struct ehca_shca *sh print_error_data(shca, data, rblock, length); } else { - EDEB_ERR(4, "Error data could not be fetched: %lx", resource); + ehca_err(&shca->ib_device, + "Error data could not be fetched: %lx", resource); } kfree(rblock); @@ -188,8 +184,6 @@ static void qp_event_callback(struct ehc unsigned long flags; u32 token = EHCA_BMASK_GET(EQE_QP_TOKEN, eqe); - EDEB_EN(7, "eqe=%lx", eqe); - spin_lock_irqsave(&ehca_qp_idr_lock, flags); qp = idr_find(&ehca_qp_idr, token); spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); @@ -209,8 +203,6 @@ static void qp_event_callback(struct ehc qp->ib_qp.event_handler(&event, qp->ib_qp.qp_context); - EDEB_EX(7, "qp=%p", qp); - return; } @@ -221,8 +213,6 @@ static void cq_event_callback(struct ehc unsigned long flags; u32 token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe); - EDEB_EN(7, "eqe=%lx", eqe); - spin_lock_irqsave(&ehca_cq_idr_lock, flags); cq = idr_find(&ehca_cq_idr, token); spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); @@ -232,8 +222,6 @@ static void cq_event_callback(struct ehc ehca_error_data(shca, cq, cq->ipz_cq_handle.handle); - EDEB_EX(7, "cq=%p", cq); - return; } @@ -241,8 +229,6 @@ static void parse_identifier(struct ehca { u8 identifier = EHCA_BMASK_GET(EQE_EE_IDENTIFIER, eqe); - EDEB_EN(7, "shca=%p eqe=%lx", shca, eqe); - switch (identifier) { case 0x02: /* path migrated */ qp_event_callback(shca, eqe, IB_EVENT_PATH_MIG); @@ -262,41 +248,39 @@ static void parse_identifier(struct ehca cq_event_callback(shca, eqe); break; case 0x09: /* MRMWPTE error */ - EDEB_ERR(4, "MRMWPTE error."); + ehca_err(&shca->ib_device, "MRMWPTE error."); break; case 0x0A: /* port event */ - EDEB_ERR(4, "Port event."); + ehca_err(&shca->ib_device, "Port event."); break; case 0x0B: /* MR access error */ - EDEB_ERR(4, "MR access error."); + ehca_err(&shca->ib_device, "MR access error."); break; case 0x0C: /* EQ error */ - EDEB_ERR(4, "EQ error."); + ehca_err(&shca->ib_device, "EQ error."); break; case 0x0D: /* P/Q_Key mismatch */ - EDEB_ERR(4, "P/Q_Key mismatch."); + ehca_err(&shca->ib_device, "P/Q_Key mismatch."); break; case 0x10: /* sampling complete */ - EDEB_ERR(4, "Sampling complete."); + ehca_err(&shca->ib_device, "Sampling complete."); break; case 0x11: /* unaffiliated access error */ - EDEB_ERR(4, "Unaffiliated access error."); + ehca_err(&shca->ib_device, "Unaffiliated access error."); break; case 0x12: /* path migrating error */ - EDEB_ERR(4, "Path migration error."); + ehca_err(&shca->ib_device, "Path migration error."); break; case 0x13: /* interface trace stopped */ - EDEB_ERR(4, "Interface trace stopped."); + ehca_err(&shca->ib_device, "Interface trace stopped."); break; case 0x14: /* first error capture info available */ default: - EDEB_ERR(4, "Unknown identifier: %x on %s.", + ehca_err(&shca->ib_device, "Unknown identifier: %x on %s.", identifier, shca->ib_device.name); break; } - EDEB_EX(7, "eqe=%lx identifier=%x", eqe, identifier); - return; } @@ -306,21 +290,19 @@ static void parse_ec(struct ehca_shca *s u8 ec = EHCA_BMASK_GET(NEQE_EVENT_CODE, eqe); u8 port = EHCA_BMASK_GET(NEQE_PORT_NUMBER, eqe); - EDEB_EN(7, "shca=%p eqe=%lx", shca, eqe); - switch (ec) { case 0x30: /* port availability change */ if (EHCA_BMASK_GET(NEQE_PORT_AVAILABILITY, eqe)) { - EDEB(4, "%s: port %x is active.", - shca->ib_device.name, port); + ehca_info(&shca->ib_device, + "port %x is active.", port); event.device = &shca->ib_device; event.event = IB_EVENT_PORT_ACTIVE; event.element.port_num = port; shca->sport[port - 1].port_state = IB_PORT_ACTIVE; ib_dispatch_event(&event); } else { - EDEB(4, "%s: port %x is inactive.", - shca->ib_device.name, port); + ehca_info(&shca->ib_device, + "port %x is inactive.", port); event.device = &shca->ib_device; event.event = IB_EVENT_PORT_ERR; event.element.port_num = port; @@ -333,19 +315,19 @@ static void parse_ec(struct ehca_shca *s * disruptive change is caused by * LID, PKEY or SM change */ - EDEB(4, "EHCA disruptive port %x " - "configuration change.", port); + ehca_warn(&shca->ib_device, + "disruptive port %x configuration change", port); - EDEB(4, "%s: port %x is inactive.", - shca->ib_device.name, port); + ehca_info(&shca->ib_device, + "port %x is inactive.", port); event.device = &shca->ib_device; event.event = IB_EVENT_PORT_ERR; event.element.port_num = port; shca->sport[port - 1].port_state = IB_PORT_DOWN; ib_dispatch_event(&event); - EDEB(4, "%s: port %x is active.", - shca->ib_device.name, port); + ehca_info(&shca->ib_device, + "port %x is active.", port); event.device = &shca->ib_device; event.event = IB_EVENT_PORT_ACTIVE; event.element.port_num = port; @@ -353,34 +335,27 @@ static void parse_ec(struct ehca_shca *s ib_dispatch_event(&event); break; case 0x32: /* adapter malfunction */ - EDEB_ERR(4, "Adapter malfunction."); + ehca_err(&shca->ib_device, "Adapter malfunction."); break; case 0x33: /* trace stopped */ - EDEB_ERR(4, "Traced stopped."); + ehca_err(&shca->ib_device, "Traced stopped."); break; default: - EDEB_ERR(4, "Unknown event code: %x on %s.", + ehca_err(&shca->ib_device, "Unknown event code: %x on %s.", ec, shca->ib_device.name); break; } - EDEB_EN(7, "eqe=%lx ec=%x", eqe, ec); - return; } static inline void reset_eq_pending(struct ehca_cq *cq) { - u64 CQx_EP = 0; + u64 CQx_EP; struct h_galpa gal = cq->galpas.kernel; - EDEB_EN(7, "cq=%p", cq); - hipz_galpa_store_cq(gal, cqx_ep, 0x0); CQx_EP = hipz_galpa_load(gal, CQTEMM_OFFSET(cqx_ep)); - EDEB(7, "CQx_EP=%lx", CQx_EP); - - EDEB_EX(7, "cq=%p", cq); return; } @@ -389,12 +364,8 @@ irqreturn_t ehca_interrupt_neq(int irq, { struct ehca_shca *shca = (struct ehca_shca*)dev_id; - EDEB_EN(7, "dev_id=%p", dev_id); - tasklet_hi_schedule(&shca->neq.interrupt_task); - EDEB_EX(7, ""); - return IRQ_HANDLED; } @@ -402,9 +373,7 @@ void ehca_tasklet_neq(unsigned long data { struct ehca_shca *shca = (struct ehca_shca*)data; struct ehca_eqe *eqe; - u64 ret = H_SUCCESS; - - EDEB_EN(7, "shca=%p", shca); + u64 ret; eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->neq); @@ -419,9 +388,7 @@ void ehca_tasklet_neq(unsigned long data shca->neq.ipz_eq_handle, 0xFFFFFFFFFFFFFFFFL); if (ret != H_SUCCESS) - EDEB_ERR(4, "Can't clear notification events."); - - EDEB_EX(7, "shca=%p", shca); + ehca_err(&shca->ib_device, "Can't clear notification events."); return; } @@ -430,12 +397,8 @@ irqreturn_t ehca_interrupt_eq(int irq, v { struct ehca_shca *shca = (struct ehca_shca*)dev_id; - EDEB_EN(7, "dev_id=%p", dev_id); - tasklet_hi_schedule(&shca->eq.interrupt_task); - EDEB_EX(7, ""); - return IRQ_HANDLED; } @@ -446,8 +409,6 @@ void ehca_tasklet_eq(unsigned long data) int int_state; int query_cnt = 0; - EDEB_EN(7, "shca=%p", shca); - do { eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); @@ -460,17 +421,18 @@ void ehca_tasklet_eq(unsigned long data) while (eqe) { u64 eqe_value = eqe->entry; - EDEB(7, "eqe_value=%lx", eqe_value); + ehca_dbg(&shca->ib_device, + "eqe_value=%lx", eqe_value); /* TODO: better structure */ if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { - extern struct idr ehca_cq_idr; unsigned long flags; u32 token; struct ehca_cq *cq; - EDEB(6, "... completion event"); + ehca_dbg(&shca->ib_device, + "... completion event"); token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); @@ -494,7 +456,8 @@ void ehca_tasklet_eq(unsigned long data) comp_event_callback(cq); #endif } else { - EDEB(6, "... non completion event"); + ehca_dbg(&shca->ib_device, + "... non completion event"); parse_identifier(shca, eqe_value); } eqe = @@ -518,29 +481,25 @@ void ehca_tasklet_eq(unsigned long data) } } while (int_state != 0); - EDEB_EX(7, "shca=%p", shca); - return; } +#ifdef CONFIG_INFINIBAND_EHCA_SCALING + static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { unsigned long flags_last_cpu; - EDEB_DMP(7, &cpu_online_map, sizeof(cpumask_t), ""); + if (ehca_debug_level) + ehca_dmp(&cpu_online_map, sizeof(cpumask_t), ""); spin_lock_irqsave(&pool->last_cpu_lock, flags_last_cpu); pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map); - if (pool->last_cpu == NR_CPUS) - pool->last_cpu = 0; - if (!cpu_online(pool->last_cpu)) - pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map); - + pool->last_cpu = first_cpu(cpu_online_map); spin_unlock_irqrestore(&pool->last_cpu_lock, flags_last_cpu); - // return pool->last_cpu; - return 1; + return pool->last_cpu; } static void __queue_comp_task(struct ehca_cq *__cq, @@ -549,8 +508,6 @@ static void __queue_comp_task(struct ehc unsigned long flags_cct; unsigned long flags_cq; - EDEB_EN(7, "__cq=%p cct=%p", __cq, cct); - spin_lock_irqsave(&cct->task_lock, flags_cct); spin_lock_irqsave(&__cq->task_lock, flags_cq); @@ -565,10 +522,6 @@ static void __queue_comp_task(struct ehc spin_unlock_irqrestore(&__cq->task_lock, flags_cq); spin_unlock_irqrestore(&cct->task_lock, flags_cct); - - - EDEB_EX(7, ""); - } static void queue_comp_task(struct ehca_cq *__cq) @@ -580,10 +533,6 @@ static void queue_comp_task(struct ehca_ cpu = get_cpu(); cpu_id = find_next_online_cpu(pool); - EDEB_EN(7, "pool=%p cq=%p cq_nr=%x CPU=%x:%x:%x:%x", - pool, __cq, __cq->cq_number, - cpu, cpu_id, num_online_cpus(), num_possible_cpus()); - BUG_ON(!cpu_online(cpu_id)); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); @@ -597,20 +546,15 @@ static void queue_comp_task(struct ehca_ put_cpu(); - EDEB_EX(7, "cct=%p", cct); - return; } static void run_comp_task(struct ehca_cpu_comp_task* cct) { - struct ehca_cq *cq = NULL; + struct ehca_cq *cq; unsigned long flags_cct; unsigned long flags_cq; - - EDEB_EN(7, "cct=%p", cct); - spin_lock_irqsave(&cct->task_lock, flags_cct); while (!list_empty(&cct->cq_list)) { @@ -631,8 +575,6 @@ static void run_comp_task(struct ehca_cp spin_unlock_irqrestore(&cct->task_lock, flags_cct); - EDEB_EX(7, "cct=%p cq=%p", cct, cq); - return; } @@ -641,8 +583,6 @@ static int comp_task(void *__cct) struct ehca_cpu_comp_task* cct = __cct; DECLARE_WAITQUEUE(wait, current); - EDEB_EN(7, "cct=%p", cct); - set_current_state(TASK_INTERRUPTIBLE); while(!kthread_should_stop()) { add_wait_queue(&cct->wait_queue, &wait); @@ -661,8 +601,6 @@ static int comp_task(void *__cct) } __set_current_state(TASK_RUNNING); - EDEB_EX(7, ""); - return 0; } @@ -671,16 +609,12 @@ static struct task_struct *create_comp_t { struct ehca_cpu_comp_task *cct; - EDEB_EN(7, "cpu=%d:%d", cpu, NR_CPUS); - cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu); spin_lock_init(&cct->task_lock); INIT_LIST_HEAD(&cct->cq_list); init_waitqueue_head(&cct->wait_queue); cct->task = kthread_create(comp_task, cct, "ehca_comp/%d", cpu); - EDEB_EX(7, "cct/%d=%p", cpu, cct); - return cct->task; } @@ -691,8 +625,6 @@ static void destroy_comp_task(struct ehc struct task_struct *task; unsigned long flags_cct; - EDEB_EN(7, "pool=%p cpu=%d:%d", pool, cpu, NR_CPUS); - cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu); spin_lock_irqsave(&cct->task_lock, flags_cct); @@ -706,8 +638,6 @@ static void destroy_comp_task(struct ehc if (task) kthread_stop(task); - EDEB_EX(7, ""); - return; } @@ -719,8 +649,6 @@ static void take_over_work(struct ehca_c struct ehca_cq *cq; unsigned long flags_cct; - EDEB_EN(7, "cpu=%x", cpu); - spin_lock_irqsave(&cct->task_lock, flags_cct); list_splice_init(&cct->cq_list, &list); @@ -735,8 +663,6 @@ static void take_over_work(struct ehca_c spin_unlock_irqrestore(&cct->task_lock, flags_cct); - EDEB_EX(7, ""); - } static int comp_pool_callback(struct notifier_block *nfb, @@ -746,55 +672,50 @@ static int comp_pool_callback(struct not unsigned int cpu = (unsigned long)hcpu; struct ehca_cpu_comp_task *cct; - EDEB_EN(7, "CPU number changed (action=%lx)", action); - switch (action) { case CPU_UP_PREPARE: - EDEB(4, "CPU: %x (CPU_PREPARE)", cpu); + ehca_gen_dbg("CPU: %x (CPU_PREPARE)", cpu); if(!create_comp_task(pool, cpu)) { - EDEB_ERR(4, "Can't create comp_task for cpu: %x", cpu); + ehca_gen_err("Can't create comp_task for cpu: %x", cpu); return NOTIFY_BAD; } break; case CPU_UP_CANCELED: - EDEB(4, "CPU: %x (CPU_CANCELED)", cpu); + ehca_gen_dbg("CPU: %x (CPU_CANCELED)", cpu); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu); kthread_bind(cct->task, any_online_cpu(cpu_online_map)); destroy_comp_task(pool, cpu); break; case CPU_ONLINE: - EDEB(4, "CPU: %x (CPU_ONLINE)", cpu); + ehca_gen_dbg("CPU: %x (CPU_ONLINE)", cpu); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu); kthread_bind(cct->task, cpu); wake_up_process(cct->task); break; case CPU_DOWN_PREPARE: - EDEB(4, "CPU: %x (CPU_DOWN_PREPARE)", cpu); + ehca_gen_dbg("CPU: %x (CPU_DOWN_PREPARE)", cpu); break; case CPU_DOWN_FAILED: - EDEB(4, "CPU: %x (CPU_DOWN_FAILED)", cpu); + ehca_gen_dbg("CPU: %x (CPU_DOWN_FAILED)", cpu); break; case CPU_DEAD: - EDEB(4, "CPU: %x (CPU_DEAD)", cpu); + ehca_gen_dbg("CPU: %x (CPU_DEAD)", cpu); destroy_comp_task(pool, cpu); take_over_work(pool, cpu); break; } - EDEB_EX(7, "CPU number changed"); - return NOTIFY_OK; } +#endif + int ehca_create_comp_pool(void) { #ifdef CONFIG_INFINIBAND_EHCA_SCALING int cpu; struct task_struct *task; - EDEB_EN(7, ""); - - pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL); if (pool == NULL) return -ENOMEM; @@ -819,8 +740,6 @@ int ehca_create_comp_pool(void) comp_pool_callback_nb.notifier_call = comp_pool_callback; comp_pool_callback_nb.priority =0; register_cpu_notifier(&comp_pool_callback_nb); - - EDEB_EX(7, "pool=%p", pool); #endif return 0; @@ -831,16 +750,12 @@ void ehca_destroy_comp_pool(void) #ifdef CONFIG_INFINIBAND_EHCA_SCALING int i; - EDEB_EN(7, "pool=%p", pool); - unregister_cpu_notifier(&comp_pool_callback_nb); for (i = 0; i < NR_CPUS; i++) { if (cpu_online(i)) destroy_comp_task(pool, i); } - - EDEB_EN(7, ""); #endif return; diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_main.c linux-2.6/drivers/infiniband/hw/ehca/ehca_main.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_main.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_main.c 2006-08-30 20:00:17.000000000 +0200 @@ -4,6 +4,7 @@ * module start stop, hca detection * * Authors: Heiko J Schick + * Hoang-Nam Nguyen * * Copyright (c) 2005 IBM Corporation * @@ -38,8 +39,6 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#define DEB_PREFIX "shca" - #include "ehca_classes.h" #include "ehca_iverbs.h" #include "ehca_mrmw.h" @@ -49,10 +48,10 @@ MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0012"); +MODULE_VERSION("SVNEHCA_0015"); int ehca_open_aqp1 = 0; -int ehca_debug_level = -1; +int ehca_debug_level = 0; int ehca_hw_level = 0; int ehca_nr_ports = 2; int ehca_use_hp_mr = 0; @@ -73,7 +72,7 @@ MODULE_PARM_DESC(open_aqp1, "AQP1 on startup (0: no (default), 1: yes)"); MODULE_PARM_DESC(debug_level, "debug level" - " (0: node, 6: only errors (default), 9: all)"); + " (0: no debug traces (default), 1: with debug traces)"); MODULE_PARM_DESC(hw_level, "hardware level" " (0: autosensing (default), 1: v. 0.20, 2: v. 0.21)"); @@ -89,170 +88,74 @@ MODULE_PARM_DESC(poll_all_eqs, MODULE_PARM_DESC(static_rate, "set permanent static rate (default: disabled)"); -/* - * This external trace mask controls what will end up in the - * kernel ring buffer. Number 6 means, that everything between - * 0 and 5 will be stored. - */ -u8 ehca_edeb_mask[EHCA_EDEB_TRACE_MASK_SIZE]={6, 6, 6, 6, - 6, 6, 6, 6, - 6, 6, 6, 6, - 6, 6, 6, 6, - 6, 6, 6, 6, - 6, 6, 6, 6, - 6, 6, 6, 6, - 6, 6, 0, 0}; - spinlock_t ehca_qp_idr_lock; spinlock_t ehca_cq_idr_lock; DEFINE_IDR(ehca_qp_idr); DEFINE_IDR(ehca_cq_idr); -struct ehca_module ehca_module; - -void ehca_init_trace(void) -{ - EDEB_EN(7, ""); +static struct list_head shca_list; /* list of all registered ehcas */ +static spinlock_t shca_list_lock; - if (ehca_debug_level != -1) { - int i; - for (i = 0; i < EHCA_EDEB_TRACE_MASK_SIZE; i++) - ehca_edeb_mask[i] = ehca_debug_level; - } - - EDEB_EX(7, ""); -} +static struct timer_list poll_eqs_timer; -int ehca_create_slab_caches(struct ehca_module *ehca_module) +static int ehca_create_slab_caches(void) { - int ret = 0; - - EDEB_EN(7, ""); + int ret; - ehca_module->cache_pd = - kmem_cache_create("ehca_cache_pd", - sizeof(struct ehca_pd), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (!ehca_module->cache_pd) { - EDEB_ERR(4, "Cannot create PD SLAB cache."); - ret = -ENOMEM; - goto create_slab_caches1; + ret = ehca_init_pd_cache(); + if (ret) { + ehca_gen_err("Cannot create PD SLAB cache."); + return ret; } - ehca_module->cache_cq = - kmem_cache_create("ehca_cache_cq", - sizeof(struct ehca_cq), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (!ehca_module->cache_cq) { - EDEB_ERR(4, "Cannot create CQ SLAB cache."); - ret = -ENOMEM; + ret = ehca_init_cq_cache(); + if (ret) { + ehca_gen_err("Cannot create CQ SLAB cache."); goto create_slab_caches2; } - ehca_module->cache_qp = - kmem_cache_create("ehca_cache_qp", - sizeof(struct ehca_qp), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (!ehca_module->cache_qp) { - EDEB_ERR(4, "Cannot create QP SLAB cache."); - ret = -ENOMEM; + ret = ehca_init_qp_cache(); + if (ret) { + ehca_gen_err("Cannot create QP SLAB cache."); goto create_slab_caches3; } - ehca_module->cache_av = - kmem_cache_create("ehca_cache_av", - sizeof(struct ehca_av), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (!ehca_module->cache_av) { - EDEB_ERR(4, "Cannot create AV SLAB cache."); - ret = -ENOMEM; + ret = ehca_init_av_cache(); + if (ret) { + ehca_gen_err("Cannot create AV SLAB cache."); goto create_slab_caches4; } - ehca_module->cache_mw = - kmem_cache_create("ehca_cache_mw", - sizeof(struct ehca_mw), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (!ehca_module->cache_mw) { - EDEB_ERR(4, "Cannot create MW SLAB cache."); - ret = -ENOMEM; + ret = ehca_init_mrmw_cache(); + if (ret) { + ehca_gen_err("Cannot create MR&MW SLAB cache."); goto create_slab_caches5; } - ehca_module->cache_mr = - kmem_cache_create("ehca_cache_mr", - sizeof(struct ehca_mr), - 0, SLAB_HWCACHE_ALIGN, - NULL, NULL); - if (!ehca_module->cache_mr) { - EDEB_ERR(4, "Cannot create MR SLAB cache."); - ret = -ENOMEM; - goto create_slab_caches6; - } - - EDEB_EX(7, "ret=%x", ret); - - return ret; - -create_slab_caches6: - kmem_cache_destroy(ehca_module->cache_mw); + return 0; create_slab_caches5: - kmem_cache_destroy(ehca_module->cache_av); + ehca_cleanup_av_cache(); create_slab_caches4: - kmem_cache_destroy(ehca_module->cache_qp); + ehca_cleanup_qp_cache(); create_slab_caches3: - kmem_cache_destroy(ehca_module->cache_cq); + ehca_cleanup_cq_cache(); create_slab_caches2: - kmem_cache_destroy(ehca_module->cache_pd); - -create_slab_caches1: - EDEB_EX(7, "ret=%x", ret); + ehca_cleanup_pd_cache(); return ret; } -int ehca_destroy_slab_caches(struct ehca_module *ehca_module) +static void ehca_destroy_slab_caches(void) { - int ret; - - EDEB_EN(7, ""); - - ret = kmem_cache_destroy(ehca_module->cache_pd); - if (ret) - EDEB_ERR(4, "Cannot destroy PD SLAB cache. ret=%x", ret); - - ret = kmem_cache_destroy(ehca_module->cache_cq); - if (ret) - EDEB_ERR(4, "Cannot destroy CQ SLAB cache. ret=%x", ret); - - ret = kmem_cache_destroy(ehca_module->cache_qp); - if (ret) - EDEB_ERR(4, "Cannot destroy QP SLAB cache. ret=%x", ret); - - ret = kmem_cache_destroy(ehca_module->cache_av); - if (ret) - EDEB_ERR(4, "Cannot destroy AV SLAB cache. ret=%x", ret); - - ret = kmem_cache_destroy(ehca_module->cache_mw); - if (ret) - EDEB_ERR(4, "Cannot destroy MW SLAB cache. ret=%x", ret); - - ret = kmem_cache_destroy(ehca_module->cache_mr); - if (ret) - EDEB_ERR(4, "Cannot destroy MR SLAB cache. ret=%x", ret); - - EDEB_EX(7, ""); - - return 0; + ehca_cleanup_mrmw_cache(); + ehca_cleanup_av_cache(); + ehca_cleanup_qp_cache(); + ehca_cleanup_cq_cache(); + ehca_cleanup_pd_cache(); } #define EHCA_HCAAVER EHCA_BMASK_IBM(32,39) @@ -260,22 +163,20 @@ int ehca_destroy_slab_caches(struct ehca int ehca_sense_attributes(struct ehca_shca *shca) { - int ret = -EINVAL; - u64 h_ret = H_SUCCESS; + int ret = 0; + u64 h_ret; struct hipz_query_hca *rblock; - EDEB_EN(7, "shca=%p", shca); - rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!rblock) { - EDEB_ERR(4, "Cannot allocate rblock memory."); - ret = -ENOMEM; - goto num_ports0; + ehca_gen_err("Cannot allocate rblock memory."); + return -ENOMEM; } h_ret = hipz_h_query_hca(shca->ipz_hca_handle, rblock); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "Cannot query device properties. h_ret=%lx", h_ret); + ehca_gen_err("Cannot query device properties. h_ret=%lx", + h_ret); ret = -EPERM; goto num_ports1; } @@ -285,7 +186,7 @@ int ehca_sense_attributes(struct ehca_sh else shca->num_ports = (u8)rblock->num_ports; - EDEB(6, " ... found %x ports", rblock->num_ports); + ehca_gen_dbg(" ... found %x ports", rblock->num_ports); if (ehca_hw_level == 0) { u32 hcaaver; @@ -294,8 +195,7 @@ int ehca_sense_attributes(struct ehca_sh hcaaver = EHCA_BMASK_GET(EHCA_HCAAVER, rblock->hw_ver); revid = EHCA_BMASK_GET(EHCA_REVID, rblock->hw_ver); - EDEB(6, " ... hardware version=%x:%x", - hcaaver, revid); + ehca_gen_dbg(" ... hardware version=%x:%x", hcaaver, revid); if ((hcaaver == 1) && (revid == 0)) shca->hw_level = 0; @@ -304,58 +204,43 @@ int ehca_sense_attributes(struct ehca_sh else if ((hcaaver == 1) && (revid == 2)) shca->hw_level = 2; } - EDEB(6, " ... hardware level=%x", shca->hw_level); + ehca_gen_dbg(" ... hardware level=%x", shca->hw_level); shca->sport[0].rate = IB_RATE_30_GBPS; shca->sport[1].rate = IB_RATE_30_GBPS; - ret = 0; - num_ports1: kfree(rblock); - -num_ports0: - EDEB_EX(7, "ret=%x", ret); - return ret; } -static int init_node_guid(struct ehca_shca* shca) +static int init_node_guid(struct ehca_shca *shca) { int ret = 0; struct hipz_query_hca *rblock; - EDEB_EN(7, ""); - rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!rblock) { - EDEB_ERR(4, "Can't allocate rblock memory."); - ret = -ENOMEM; - goto init_node_guid0; + ehca_err(&shca->ib_device, "Can't allocate rblock memory."); + return -ENOMEM; } if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) { - EDEB_ERR(4, "Can't query device properties"); + ehca_err(&shca->ib_device, "Can't query device properties"); ret = -EINVAL; goto init_node_guid1; } - memcpy(&shca->ib_device.node_guid, &rblock->node_guid, (sizeof(u64))); + memcpy(&shca->ib_device.node_guid, &rblock->node_guid, sizeof(u64)); init_node_guid1: kfree(rblock); - -init_node_guid0: - EDEB_EX(7, "node_guid=%lx ret=%x", shca->ib_device.node_guid, ret); - return ret; } int ehca_register_device(struct ehca_shca *shca) { - int ret = 0; - - EDEB_EN(7, "shca=%p", shca); + int ret; ret = init_node_guid(shca); if (ret) @@ -383,7 +268,7 @@ int ehca_register_device(struct ehca_shc (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST) | (1ull << IB_USER_VERBS_CMD_DETACH_MCAST); - shca->ib_device.node_type = IB_NODE_CA; + shca->ib_device.node_type = RDMA_NODE_IB_CA; shca->ib_device.phys_port_cnt = shca->num_ports; shca->ib_device.dma_device = &shca->ibmebus_dev->ofdev.dev; shca->ib_device.query_device = ehca_query_device; @@ -432,38 +317,35 @@ int ehca_register_device(struct ehca_shc shca->ib_device.mmap = ehca_mmap; ret = ib_register_device(&shca->ib_device); - - EDEB_EX(7, "ret=%x", ret); + if (ret) + ehca_err(&shca->ib_device, + "ib_register_device() failed ret=%x", ret); return ret; } static int ehca_create_aqp1(struct ehca_shca *shca, u32 port) { - struct ehca_sport *sport; + struct ehca_sport *sport = &shca->sport[port - 1]; struct ib_cq *ibcq; struct ib_qp *ibqp; struct ib_qp_init_attr qp_init_attr; - int ret = 0; - - EDEB_EN(7, "shca=%p port=%x", shca, port); - - sport = &shca->sport[port - 1]; + int ret; if (sport->ibcq_aqp1) { - EDEB_ERR(4, "AQP1 CQ is already created."); + ehca_err(&shca->ib_device, "AQP1 CQ is already created."); return -EPERM; } ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void*)(-1), 10); if (IS_ERR(ibcq)) { - EDEB_ERR(4, "Cannot create AQP1 CQ."); + ehca_err(&shca->ib_device, "Cannot create AQP1 CQ."); return PTR_ERR(ibcq); } sport->ibcq_aqp1 = ibcq; if (sport->ibqp_aqp1) { - EDEB_ERR(4, "AQP1 QP is already created."); + ehca_err(&shca->ib_device, "AQP1 QP is already created."); ret = -EPERM; goto create_aqp1; } @@ -484,84 +366,62 @@ static int ehca_create_aqp1(struct ehca_ ibqp = ib_create_qp(&shca->pd->ib_pd, &qp_init_attr); if (IS_ERR(ibqp)) { - EDEB_ERR(4, "Cannot create AQP1 QP."); + ehca_err(&shca->ib_device, "Cannot create AQP1 QP."); ret = PTR_ERR(ibqp); goto create_aqp1; } sport->ibqp_aqp1 = ibqp; - goto create_aqp0; + return 0; create_aqp1: ib_destroy_cq(sport->ibcq_aqp1); - -create_aqp0: - EDEB_EX(7, "ret=%x", ret); - return ret; } static int ehca_destroy_aqp1(struct ehca_sport *sport) { - int ret = 0; - - EDEB_EN(7, "sport=%p", sport); + int ret; ret = ib_destroy_qp(sport->ibqp_aqp1); if (ret) { - EDEB_ERR(4, "Cannot destroy AQP1 QP. ret=%x", ret); - goto destroy_aqp1; + ehca_gen_err("Cannot destroy AQP1 QP. ret=%x", ret); + return ret; } ret = ib_destroy_cq(sport->ibcq_aqp1); if (ret) - EDEB_ERR(4, "Cannot destroy AQP1 CQ. ret=%x", ret); - -destroy_aqp1: - EDEB_EX(7, "ret=%x", ret); + ehca_gen_err("Cannot destroy AQP1 CQ. ret=%x", ret); return ret; } -static ssize_t ehca_show_debug_mask(struct device_driver *ddp, char *buf) +static ssize_t ehca_show_debug_level(struct device_driver *ddp, char *buf) { - int i; - int total = 0; - total += snprintf(buf + total, PAGE_SIZE - total, "%d", - ehca_edeb_mask[0]); - for (i = 1; i < EHCA_EDEB_TRACE_MASK_SIZE; i++) { - total += snprintf(buf + total, PAGE_SIZE - total, "%d", - ehca_edeb_mask[i]); - } - - total += snprintf(buf + total, PAGE_SIZE - total, "\n"); - - return total; + return snprintf(buf, PAGE_SIZE, "%d\n", + ehca_debug_level); } -static ssize_t ehca_store_debug_mask(struct device_driver *ddp, - const char *buf, size_t count) +static ssize_t ehca_store_debug_level(struct device_driver *ddp, + const char *buf, size_t count) { - int i; - for (i = 0; i < EHCA_EDEB_TRACE_MASK_SIZE; i++) { - char value = buf[i] - '0'; - if ((value <= 9) && (count >= i)) { - ehca_edeb_mask[i] = value; - } - } - return count; + int value = (*buf) - '0'; + if (value >= 0 && value <= 9) + ehca_debug_level = value; + return 1; } -DRIVER_ATTR(debug_mask, S_IRUSR | S_IWUSR, - ehca_show_debug_mask, ehca_store_debug_mask); + +DRIVER_ATTR(debug_level, S_IRUSR | S_IWUSR, + ehca_show_debug_level, ehca_store_debug_level); void ehca_create_driver_sysfs(struct ibmebus_driver *drv) { - driver_create_file(&drv->driver, &driver_attr_debug_mask); + driver_create_file(&drv->driver, &driver_attr_debug_level); } void ehca_remove_driver_sysfs(struct ibmebus_driver *drv) { - driver_remove_file(&drv->driver, &driver_attr_debug_mask); + driver_remove_file(&drv->driver, &driver_attr_debug_level); } #define EHCA_RESOURCE_ATTR(name) \ @@ -577,14 +437,14 @@ static ssize_t ehca_show_##name(struct \ rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); \ if (!rblock) { \ - EDEB_ERR(4, "Can't allocate rblock memory."); \ + dev_err(dev, "Can't allocate rblock memory."); \ return 0; \ } \ \ if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) { \ - EDEB_ERR(4, "Can't query device properties"); \ - kfree(rblock); \ - return 0; \ + dev_err(dev, "Can't query device properties"); \ + kfree(rblock); \ + return 0; \ } \ \ data = rblock->name; \ @@ -669,26 +529,24 @@ static int __devinit ehca_probe(struct i struct ehca_shca *shca; u64 *handle; struct ib_pd *ibpd; - int ret = 0; - - EDEB_EN(7, ""); + int ret; handle = (u64 *)get_property(dev->ofdev.node, "ibm,hca-handle", NULL); if (!handle) { - EDEB_ERR(4, "Cannot get eHCA handle for adapter: %s.", - dev->ofdev.node->full_name); + ehca_gen_err("Cannot get eHCA handle for adapter: %s.", + dev->ofdev.node->full_name); return -ENODEV; } if (!(*handle)) { - EDEB_ERR(4, "Wrong eHCA handle for adapter: %s.", - dev->ofdev.node->full_name); + ehca_gen_err("Wrong eHCA handle for adapter: %s.", + dev->ofdev.node->full_name); return -ENODEV; } shca = (struct ehca_shca *)ib_alloc_device(sizeof(*shca)); - if (shca == NULL) { - EDEB_ERR(4, "Cannot allocate shca memory."); + if (!shca) { + ehca_gen_err("Cannot allocate shca memory."); return -ENOMEM; } @@ -698,29 +556,35 @@ static int __devinit ehca_probe(struct i ret = ehca_sense_attributes(shca); if (ret < 0) { - EDEB_ERR(4, "Cannot sense eHCA attributes."); + ehca_gen_err("Cannot sense eHCA attributes."); + goto probe1; + } + + ret = ehca_register_device(shca); + if (ret) { + ehca_gen_err("Cannot register Infiniband device"); goto probe1; } /* create event queues */ ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); if (ret) { - EDEB_ERR(4, "Cannot create EQ."); - goto probe1; + ehca_err(&shca->ib_device, "Cannot create EQ."); + goto probe2; } ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513); if (ret) { - EDEB_ERR(4, "Cannot create NEQ."); - goto probe2; + ehca_err(&shca->ib_device, "Cannot create NEQ."); + goto probe3; } /* create internal protection domain */ ibpd = ehca_alloc_pd(&shca->ib_device, (void*)(-1), NULL); if (IS_ERR(ibpd)) { - EDEB_ERR(4, "Cannot create internal PD."); + ehca_err(&shca->ib_device, "Cannot create internal PD."); ret = PTR_ERR(ibpd); - goto probe3; + goto probe4; } shca->pd = container_of(ibpd, struct ehca_pd, ib_pd); @@ -730,13 +594,8 @@ static int __devinit ehca_probe(struct i ret = ehca_reg_internal_maxmr(shca, shca->pd, &shca->maxmr); if (ret) { - EDEB_ERR(4, "Cannot create internal MR. ret=%x", ret); - goto probe4; - } - - ret = ehca_register_device(shca); - if (ret) { - EDEB_ERR(4, "Cannot register Infiniband device."); + ehca_err(&shca->ib_device, "Cannot create internal MR ret=%x", + ret); goto probe5; } @@ -745,7 +604,8 @@ static int __devinit ehca_probe(struct i shca->sport[0].port_state = IB_PORT_DOWN; ret = ehca_create_aqp1(shca, 1); if (ret) { - EDEB_ERR(4, "Cannot create AQP1 for port 1."); + ehca_err(&shca->ib_device, + "Cannot create AQP1 for port 1."); goto probe6; } } @@ -755,54 +615,56 @@ static int __devinit ehca_probe(struct i shca->sport[1].port_state = IB_PORT_DOWN; ret = ehca_create_aqp1(shca, 2); if (ret) { - EDEB_ERR(4, "Cannot create AQP1 for port 2."); + ehca_err(&shca->ib_device, + "Cannot create AQP1 for port 2."); goto probe7; } } ehca_create_device_sysfs(dev); - spin_lock(&ehca_module.shca_lock); - list_add(&shca->shca_list, &ehca_module.shca_list); - spin_unlock(&ehca_module.shca_lock); - - EDEB_EX(7, "ret=%x", ret); + spin_lock(&shca_list_lock); + list_add(&shca->shca_list, &shca_list); + spin_unlock(&shca_list_lock); return 0; probe7: ret = ehca_destroy_aqp1(&shca->sport[0]); if (ret) - EDEB_ERR(4, "Cannot destroy AQP1 for port 1. ret=%x", ret); + ehca_err(&shca->ib_device, + "Cannot destroy AQP1 for port 1. ret=%x", ret); probe6: - ib_unregister_device(&shca->ib_device); + ret = ehca_dereg_internal_maxmr(shca); + if (ret) + ehca_err(&shca->ib_device, + "Cannot destroy internal MR. ret=%x", ret); probe5: - ret = ehca_dereg_internal_maxmr(shca); + ret = ehca_dealloc_pd(&shca->pd->ib_pd); if (ret) - EDEB_ERR(4, "Cannot destroy internal MR. ret=%x", ret); + ehca_err(&shca->ib_device, + "Cannot destroy internal PD. ret=%x", ret); probe4: - ret = ehca_dealloc_pd(&shca->pd->ib_pd); - if (ret != 0) - EDEB_ERR(4, "Cannot destroy internal PD. ret=%x", ret); + ret = ehca_destroy_eq(shca, &shca->neq); + if (ret) + ehca_err(&shca->ib_device, + "Cannot destroy NEQ. ret=%x", ret); probe3: - ret = ehca_destroy_eq(shca, &shca->neq); - if (ret != 0) - EDEB_ERR(4, "Cannot destroy NEQ. ret=%x", ret); + ret = ehca_destroy_eq(shca, &shca->eq); + if (ret) + ehca_err(&shca->ib_device, + "Cannot destroy EQ. ret=%x", ret); probe2: - ret = ehca_destroy_eq(shca, &shca->eq); - if (ret != 0) - EDEB_ERR(4, "Cannot destroy EQ. ret=%x", ret); + ib_unregister_device(&shca->ib_device); probe1: ib_dealloc_device(&shca->ib_device); - EDEB_EX(4, "ret=%x", ret); - return -EINVAL; } @@ -811,18 +673,16 @@ static int __devexit ehca_remove(struct struct ehca_shca *shca = dev->ofdev.dev.driver_data; int ret; - EDEB_EN(7, "shca=%p", shca); - ehca_remove_device_sysfs(dev); if (ehca_open_aqp1 == 1) { int i; - for (i = 0; i < shca->num_ports; i++) { ret = ehca_destroy_aqp1(&shca->sport[i]); - if (ret != 0) - EDEB_ERR(4, "Cannot destroy AQP1 for port %x." - " ret=%x", ret, i); + if (ret) + ehca_err(&shca->ib_device, + "Cannot destroy AQP1 for port %x " + "ret=%x", ret, i); } } @@ -830,27 +690,27 @@ static int __devexit ehca_remove(struct ret = ehca_dereg_internal_maxmr(shca); if (ret) - EDEB_ERR(4, "Cannot destroy internal MR. ret=%x", ret); + ehca_err(&shca->ib_device, + "Cannot destroy internal MR. ret=%x", ret); ret = ehca_dealloc_pd(&shca->pd->ib_pd); if (ret) - EDEB_ERR(4, "Cannot destroy internal PD. ret=%x", ret); + ehca_err(&shca->ib_device, + "Cannot destroy internal PD. ret=%x", ret); ret = ehca_destroy_eq(shca, &shca->eq); if (ret) - EDEB_ERR(4, "Cannot destroy EQ. ret=%x", ret); + ehca_err(&shca->ib_device, "Cannot destroy EQ. ret=%x", ret); ret = ehca_destroy_eq(shca, &shca->neq); if (ret) - EDEB_ERR(4, "Canot destroy NEQ. ret=%x", ret); + ehca_err(&shca->ib_device, "Canot destroy NEQ. ret=%x", ret); ib_dealloc_device(&shca->ib_device); - spin_lock(&ehca_module.shca_lock); + spin_lock(&shca_list_lock); list_del(&shca->shca_list); - spin_unlock(&ehca_module.shca_lock); - - EDEB_EX(7, "ret=%x", ret); + spin_unlock(&shca_list_lock); return ret; } @@ -871,37 +731,46 @@ static struct ibmebus_driver ehca_driver .remove = ehca_remove, }; +void ehca_poll_eqs(unsigned long data) +{ + struct ehca_shca *shca; + + spin_lock(&shca_list_lock); + list_for_each_entry(shca, &shca_list, shca_list) { + if (shca->eq.is_initialized) + ehca_tasklet_eq((unsigned long)(void*)shca); + } + mod_timer(&poll_eqs_timer, jiffies + HZ); + spin_unlock(&shca_list_lock); +} + int __init ehca_module_init(void) { - int ret = 0; + int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0012)\n"); - EDEB_EN(7, ""); - + "(Rel.: SVNEHCA_0015)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); spin_lock_init(&ehca_cq_idr_lock); - INIT_LIST_HEAD(&ehca_module.shca_list); - spin_lock_init(&ehca_module.shca_lock); - - ehca_init_trace(); + INIT_LIST_HEAD(&shca_list); + spin_lock_init(&shca_list_lock); if ((ret = ehca_create_comp_pool())) { - EDEB_ERR(4, "Cannot create comp pool."); - goto module_init0; + ehca_gen_err("Cannot create comp pool."); + return ret; } - if ((ret = ehca_create_slab_caches(&ehca_module))) { - EDEB_ERR(4, "Cannot create SLAB caches"); + if ((ret = ehca_create_slab_caches())) { + ehca_gen_err("Cannot create SLAB caches"); ret = -ENOMEM; goto module_init1; } if ((ret = ibmebus_register_driver(&ehca_driver))) { - EDEB_ERR(4, "Cannot register eHCA device driver"); + ehca_gen_err("Cannot register eHCA device driver"); ret = -EINVAL; goto module_init2; } @@ -909,49 +778,39 @@ int __init ehca_module_init(void) ehca_create_driver_sysfs(&ehca_driver); if (ehca_poll_all_eqs != 1) { - EDEB_ERR(4, "WARNING!!!"); - EDEB_ERR(4, "It is possible to lose interrupts."); + ehca_gen_err("WARNING!!!"); + ehca_gen_err("It is possible to lose interrupts."); } else { - init_timer(&ehca_module.timer); - ehca_module.timer.function = ehca_poll_eqs; - ehca_module.timer.data = (unsigned long)&ehca_module; - ehca_module.timer.expires = jiffies + HZ; - add_timer(&ehca_module.timer); + init_timer(&poll_eqs_timer); + poll_eqs_timer.function = ehca_poll_eqs; + poll_eqs_timer.expires = jiffies + HZ; + add_timer(&poll_eqs_timer); } - goto module_init0; + return 0; module_init2: - ehca_destroy_slab_caches(&ehca_module); + ehca_destroy_slab_caches(); module_init1: ehca_destroy_comp_pool(); - -module_init0: - EDEB_EX(7, "ret=%x", ret); - return ret; }; void __exit ehca_module_exit(void) { - EDEB_EN(7, ""); - if (ehca_poll_all_eqs == 1) - del_timer_sync(&ehca_module.timer); + del_timer_sync(&poll_eqs_timer); ehca_remove_driver_sysfs(&ehca_driver); ibmebus_unregister_driver(&ehca_driver); - if (ehca_destroy_slab_caches(&ehca_module) != 0) - EDEB_ERR(4, "Cannot destroy SLAB caches"); + ehca_destroy_slab_caches(); ehca_destroy_comp_pool(); idr_destroy(&ehca_cq_idr); idr_destroy(&ehca_qp_idr); - - EDEB_EX(7, ""); }; module_init(ehca_module_init); diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mcast.c linux-2.6/drivers/infiniband/hw/ehca/ehca_mcast.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mcast.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_mcast.c 2006-08-30 20:00:16.000000000 +0200 @@ -42,54 +42,38 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#define DEB_PREFIX "mcas" - #include #include #include "ehca_classes.h" #include "ehca_tools.h" #include "ehca_qes.h" #include "ehca_iverbs.h" - #include "hcp_if.h" #define MAX_MC_LID 0xFFFE #define MIN_MC_LID 0xC000 /* Multicast limits */ #define EHCA_VALID_MULTICAST_GID(gid) ((gid)[0] == 0xFF) -#define EHCA_VALID_MULTICAST_LID(lid) (((lid) >= MIN_MC_LID) && ((lid) <= MAX_MC_LID)) +#define EHCA_VALID_MULTICAST_LID(lid) \ + (((lid) >= MIN_MC_LID) && ((lid) <= MAX_MC_LID)) int ehca_attach_mcast(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { - struct ehca_qp *my_qp = NULL; - struct ehca_shca *shca = NULL; + struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + struct ehca_shca *shca = container_of(ibqp->device, struct ehca_shca, + ib_device); union ib_gid my_gid; - u64 subnet_prefix; - u64 interface_id; - u64 h_ret = H_SUCCESS; - int ret = 0; - - EHCA_CHECK_ADR(ibqp); - EHCA_CHECK_ADR(gid); - - my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + u64 subnet_prefix, interface_id, h_ret; - EHCA_CHECK_QP(my_qp); if (ibqp->qp_type != IB_QPT_UD) { - EDEB_ERR(4, "invalid qp_type %x gid, ret=%x", - ibqp->qp_type, EINVAL); + ehca_err(ibqp->device, "invalid qp_type=%x", ibqp->qp_type); return -EINVAL; } - shca = container_of(ibqp->pd->device, struct ehca_shca, ib_device); - EHCA_CHECK_ADR(shca); - if (!(EHCA_VALID_MULTICAST_GID(gid->raw))) { - EDEB_ERR(4, "gid is not valid mulitcast gid ret=%x", - EINVAL); + ehca_err(ibqp->device, "invalid mulitcast gid"); return -EINVAL; } else if ((lid < MIN_MC_LID) || (lid > MAX_MC_LID)) { - EDEB_ERR(4, "lid=%x is not valid mulitcast lid ret=%x", - lid, EINVAL); + ehca_err(ibqp->device, "invalid mulitcast lid=%x", lid); return -EINVAL; } @@ -101,100 +85,47 @@ int ehca_attach_mcast(struct ib_qp *ibqp my_qp->ipz_qp_handle, my_qp->galpas.kernel, lid, subnet_prefix, interface_id); - if (h_ret != H_SUCCESS) { - EDEB_ERR(4, + if (h_ret != H_SUCCESS) + ehca_err(ibqp->device, "ehca_qp=%p qp_num=%x hipz_h_attach_mcqp() failed " "h_ret=%lx", my_qp, ibqp->qp_num, h_ret); - } - ret = ehca2ib_return_code(h_ret); - EDEB_EX(7, "mcast attach ret=%x\n" - "ehca_qp=%p qp_num=%x lid=%x\n" - "my_gid= %x %x %x %x\n" - " %x %x %x %x\n" - " %x %x %x %x\n" - " %x %x %x %x\n", - ret, my_qp, ibqp->qp_num, lid, - my_gid.raw[0], my_gid.raw[1], - my_gid.raw[2], my_gid.raw[3], - my_gid.raw[4], my_gid.raw[5], - my_gid.raw[6], my_gid.raw[7], - my_gid.raw[8], my_gid.raw[9], - my_gid.raw[10], my_gid.raw[11], - my_gid.raw[12], my_gid.raw[13], - my_gid.raw[14], my_gid.raw[15]); - - return ret; + return ehca2ib_return_code(h_ret); } int ehca_detach_mcast(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { - struct ehca_qp *my_qp = NULL; - struct ehca_shca *shca = NULL; + struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + struct ehca_shca *shca = container_of(ibqp->pd->device, + struct ehca_shca, ib_device); union ib_gid my_gid; - u64 subnet_prefix; - u64 interface_id; - u64 h_ret = H_SUCCESS; - int ret = 0; - - EHCA_CHECK_ADR(ibqp); - EHCA_CHECK_ADR(gid); + u64 subnet_prefix, interface_id, h_ret; - my_qp = container_of(ibqp, struct ehca_qp, ib_qp); - - EHCA_CHECK_QP(my_qp); if (ibqp->qp_type != IB_QPT_UD) { - EDEB_ERR(4, "invalid qp_type %x gid, ret=%x", - ibqp->qp_type, EINVAL); + ehca_err(ibqp->device, "invalid qp_type %x", ibqp->qp_type); return -EINVAL; } - shca = container_of(ibqp->pd->device, struct ehca_shca, ib_device); - EHCA_CHECK_ADR(shca); - if (!(EHCA_VALID_MULTICAST_GID(gid->raw))) { - EDEB_ERR(4, "gid is not valid mulitcast gid ret=%x", - EINVAL); + ehca_err(ibqp->device, "invalid mulitcast gid"); return -EINVAL; } else if ((lid < MIN_MC_LID) || (lid > MAX_MC_LID)) { - EDEB_ERR(4, "lid=%x is not valid mulitcast lid ret=%x", - lid, EINVAL); + ehca_err(ibqp->device, "invalid mulitcast lid=%x", lid); return -EINVAL; } - EDEB_EN(7, "dgid=%p qp_numl=%x lid=%x", - gid, ibqp->qp_num, lid); - memcpy(&my_gid.raw, gid->raw, sizeof(union ib_gid)); subnet_prefix = be64_to_cpu(my_gid.global.subnet_prefix); interface_id = be64_to_cpu(my_gid.global.interface_id); h_ret = hipz_h_detach_mcqp(shca->ipz_hca_handle, - my_qp->ipz_qp_handle, - my_qp->galpas.kernel, - lid, subnet_prefix, interface_id); - if (h_ret != H_SUCCESS) { - EDEB_ERR(4, + my_qp->ipz_qp_handle, + my_qp->galpas.kernel, + lid, subnet_prefix, interface_id); + if (h_ret != H_SUCCESS) + ehca_err(ibqp->device, "ehca_qp=%p qp_num=%x hipz_h_detach_mcqp() failed " "h_ret=%lx", my_qp, ibqp->qp_num, h_ret); - } - ret = ehca2ib_return_code(h_ret); - - EDEB_EX(7, "mcast detach ret=%x\n" - "ehca_qp=%p qp_num=%x lid=%x\n" - "my_gid= %x %x %x %x\n" - " %x %x %x %x\n" - " %x %x %x %x\n" - " %x %x %x %x\n", - ret, my_qp, ibqp->qp_num, lid, - my_gid.raw[0], my_gid.raw[1], - my_gid.raw[2], my_gid.raw[3], - my_gid.raw[4], my_gid.raw[5], - my_gid.raw[6], my_gid.raw[7], - my_gid.raw[8], my_gid.raw[9], - my_gid.raw[10], my_gid.raw[11], - my_gid.raw[12], my_gid.raw[13], - my_gid.raw[14], my_gid.raw[15]); - return ret; + return ehca2ib_return_code(h_ret); } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c linux-2.6/drivers/infiniband/hw/ehca/ehca_mrmw.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_mrmw.c 2006-08-30 20:00:16.000000000 +0200 @@ -39,9 +39,6 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#undef DEB_PREFIX -#define DEB_PREFIX "mrmw" - #include #include "ehca_iverbs.h" @@ -49,78 +46,62 @@ #include "hcp_if.h" #include "hipz_hw.h" -extern int ehca_use_hp_mr; +static struct kmem_cache *mr_cache; +static struct kmem_cache *mw_cache; static struct ehca_mr *ehca_mr_new(void) { - extern struct ehca_module ehca_module; struct ehca_mr *me; - me = kmem_cache_alloc(ehca_module.cache_mr, SLAB_KERNEL); + me = kmem_cache_alloc(mr_cache, SLAB_KERNEL); if (me) { memset(me, 0, sizeof(struct ehca_mr)); spin_lock_init(&me->mrlock); - EDEB_EX(7, "ehca_mr=%p sizeof(ehca_mr_t)=%x", me, - (u32) sizeof(struct ehca_mr)); - } else { - EDEB_ERR(3, "alloc failed"); - } + } else + ehca_gen_err("alloc failed"); return me; } static void ehca_mr_delete(struct ehca_mr *me) { - extern struct ehca_module ehca_module; - - kmem_cache_free(ehca_module.cache_mr, me); + kmem_cache_free(mr_cache, me); } static struct ehca_mw *ehca_mw_new(void) { - extern struct ehca_module ehca_module; struct ehca_mw *me; - me = kmem_cache_alloc(ehca_module.cache_mw, SLAB_KERNEL); + me = kmem_cache_alloc(mw_cache, SLAB_KERNEL); if (me) { memset(me, 0, sizeof(struct ehca_mw)); spin_lock_init(&me->mwlock); - EDEB_EX(7, "ehca_mw=%p sizeof(ehca_mw_t)=%x", me, - (u32) sizeof(struct ehca_mw)); - } else { - EDEB_ERR(3, "alloc failed"); - } + } else + ehca_gen_err("alloc failed"); return me; } static void ehca_mw_delete(struct ehca_mw *me) { - extern struct ehca_module ehca_module; - - kmem_cache_free(ehca_module.cache_mw, me); + kmem_cache_free(mw_cache, me); } /*----------------------------------------------------------------------*/ struct ib_mr *ehca_get_dma_mr(struct ib_pd *pd, int mr_access_flags) { - struct ib_mr *ib_mr = NULL; - int ret = 0; - struct ehca_mr *e_maxmr = NULL; - struct ehca_pd *e_pd = NULL; - struct ehca_shca *shca = NULL; - - EDEB_EN(7, "pd=%p mr_access_flags=%x", pd, mr_access_flags); - - EHCA_CHECK_PD_P(pd); - e_pd = container_of(pd, struct ehca_pd, ib_pd); - shca = container_of(pd->device, struct ehca_shca, ib_device); + struct ib_mr *ib_mr; + int ret; + struct ehca_mr *e_maxmr; + struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd); + struct ehca_shca *shca = + container_of(pd->device, struct ehca_shca, ib_device); if (shca->maxmr) { e_maxmr = ehca_mr_new(); if (!e_maxmr) { - EDEB_ERR(4, "out of memory"); + ehca_err(&shca->ib_device, "out of memory"); ib_mr = ERR_PTR(-ENOMEM); goto get_dma_mr_exit0; } @@ -135,18 +116,15 @@ struct ib_mr *ehca_get_dma_mr(struct ib_ } ib_mr = &e_maxmr->ib.ib_mr; } else { - EDEB_ERR(4, "no internal max-MR exist!"); + ehca_err(&shca->ib_device, "no internal max-MR exist!"); ib_mr = ERR_PTR(-EINVAL); goto get_dma_mr_exit0; } get_dma_mr_exit0: if (IS_ERR(ib_mr)) - EDEB_EX(4, "rc=%lx pd=%p mr_access_flags=%x ", - PTR_ERR(ib_mr), pd, mr_access_flags); - else - EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x", - ib_mr, ib_mr->lkey, ib_mr->rkey); + ehca_err(&shca->ib_device, "rc=%lx pd=%p mr_access_flags=%x ", + PTR_ERR(ib_mr), pd, mr_access_flags); return ib_mr; } /* end ehca_get_dma_mr() */ @@ -158,23 +136,20 @@ struct ib_mr *ehca_reg_phys_mr(struct ib int mr_access_flags, u64 *iova_start) { - struct ib_mr *ib_mr = NULL; - int ret = 0; - struct ehca_mr *e_mr = NULL; - struct ehca_shca *shca = NULL; - struct ehca_pd *e_pd = NULL; - u64 size = 0; + struct ib_mr *ib_mr; + int ret; + struct ehca_mr *e_mr; + struct ehca_shca *shca = + container_of(pd->device, struct ehca_shca, ib_device); + struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd); + + u64 size; struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; - u32 num_pages_mr = 0; - u32 num_pages_4k = 0; /* 4k portion "pages" */ + u32 num_pages_mr; + u32 num_pages_4k; /* 4k portion "pages" */ - EDEB_EN(7, "pd=%p phys_buf_array=%p num_phys_buf=%x " - "mr_access_flags=%x iova_start=%p", pd, phys_buf_array, - num_phys_buf, mr_access_flags, iova_start); - - EHCA_CHECK_PD_P(pd); - if ((num_phys_buf <= 0) || ehca_adr_bad(phys_buf_array)) { - EDEB_ERR(4, "bad input values: num_phys_buf=%x " + if ((num_phys_buf <= 0) || !phys_buf_array) { + ehca_err(pd->device, "bad input values: num_phys_buf=%x " "phys_buf_array=%p", num_phys_buf, phys_buf_array); ib_mr = ERR_PTR(-EINVAL); goto reg_phys_mr_exit0; @@ -187,7 +162,7 @@ struct ib_mr *ehca_reg_phys_mr(struct ib * Remote Write Access requires Local Write Access * Remote Atomic Access requires Local Write Access */ - EDEB_ERR(4, "bad input values: mr_access_flags=%x", + ehca_err(pd->device, "bad input values: mr_access_flags=%x", mr_access_flags); ib_mr = ERR_PTR(-EINVAL); goto reg_phys_mr_exit0; @@ -202,18 +177,15 @@ struct ib_mr *ehca_reg_phys_mr(struct ib } if ((size == 0) || (((u64)iova_start + size) < (u64)iova_start)) { - EDEB_ERR(4, "bad input values: size=%lx iova_start=%p", + ehca_err(pd->device, "bad input values: size=%lx iova_start=%p", size, iova_start); ib_mr = ERR_PTR(-EINVAL); goto reg_phys_mr_exit0; } - e_pd = container_of(pd, struct ehca_pd, ib_pd); - shca = container_of(pd->device, struct ehca_shca, ib_device); - e_mr = ehca_mr_new(); if (!e_mr) { - EDEB_ERR(4, "out of memory"); + ehca_err(pd->device, "out of memory"); ib_mr = ERR_PTR(-ENOMEM); goto reg_phys_mr_exit0; } @@ -253,20 +225,16 @@ struct ib_mr *ehca_reg_phys_mr(struct ib } /* successful registration of all pages */ - ib_mr = &e_mr->ib.ib_mr; - goto reg_phys_mr_exit0; + return &e_mr->ib.ib_mr; reg_phys_mr_exit1: ehca_mr_delete(e_mr); reg_phys_mr_exit0: if (IS_ERR(ib_mr)) - EDEB_EX(4, "rc=%lx pd=%p phys_buf_array=%p " - "num_phys_buf=%x mr_access_flags=%x iova_start=%p", - PTR_ERR(ib_mr), pd, phys_buf_array, - num_phys_buf, mr_access_flags, iova_start); - else - EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x", - ib_mr, ib_mr->lkey, ib_mr->rkey); + ehca_err(pd->device, "rc=%lx pd=%p phys_buf_array=%p " + "num_phys_buf=%x mr_access_flags=%x iova_start=%p", + PTR_ERR(ib_mr), pd, phys_buf_array, + num_phys_buf, mr_access_flags, iova_start); return ib_mr; } /* end ehca_reg_phys_mr() */ @@ -277,21 +245,22 @@ struct ib_mr *ehca_reg_user_mr(struct ib int mr_access_flags, struct ib_udata *udata) { - struct ib_mr *ib_mr = NULL; - struct ehca_mr *e_mr = NULL; - struct ehca_shca *shca = NULL; - struct ehca_pd *e_pd = NULL; + struct ib_mr *ib_mr; + struct ehca_mr *e_mr; + struct ehca_shca *shca = + container_of(pd->device, struct ehca_shca, ib_device); + struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd); struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; - int ret = 0; - u32 num_pages_mr = 0; - u32 num_pages_4k = 0; /* 4k portion "pages" */ - - EDEB_EN(7, "pd=%p region=%p mr_access_flags=%x udata=%p", - pd, region, mr_access_flags, udata); - - EHCA_CHECK_PD_P(pd); - if (ehca_adr_bad(region)) { - EDEB_ERR(4, "bad input values: region=%p", region); + int ret; + u32 num_pages_mr; + u32 num_pages_4k; /* 4k portion "pages" */ + + if (!pd) { + ehca_gen_err("bad pd=%p", pd); + return ERR_PTR(-EFAULT); + } + if (!region) { + ehca_err(pd->device, "bad input values: region=%p", region); ib_mr = ERR_PTR(-EINVAL); goto reg_user_mr_exit0; } @@ -303,36 +272,29 @@ struct ib_mr *ehca_reg_user_mr(struct ib * Remote Write Access requires Local Write Access * Remote Atomic Access requires Local Write Access */ - EDEB_ERR(4, "bad input values: mr_access_flags=%x", + ehca_err(pd->device, "bad input values: mr_access_flags=%x", mr_access_flags); ib_mr = ERR_PTR(-EINVAL); goto reg_user_mr_exit0; } - EDEB(7, "user_base=%lx virt_base=%lx length=%lx offset=%x page_size=%x " - "chunk_list.next=%p", - region->user_base, region->virt_base, region->length, - region->offset, region->page_size, region->chunk_list.next); if (region->page_size != PAGE_SIZE) { - EDEB_ERR(4, "page size not supported, region->page_size=%x", - region->page_size); + ehca_err(pd->device, "page size not supported, " + "region->page_size=%x", region->page_size); ib_mr = ERR_PTR(-EINVAL); goto reg_user_mr_exit0; } if ((region->length == 0) || ((region->virt_base + region->length) < region->virt_base)) { - EDEB_ERR(4, "bad input values: length=%lx virt_base=%lx", - region->length, region->virt_base); + ehca_err(pd->device, "bad input values: length=%lx " + "virt_base=%lx", region->length, region->virt_base); ib_mr = ERR_PTR(-EINVAL); goto reg_user_mr_exit0; } - e_pd = container_of(pd, struct ehca_pd, ib_pd); - shca = container_of(pd->device, struct ehca_shca, ib_device); - e_mr = ehca_mr_new(); if (!e_mr) { - EDEB_ERR(4, "out of memory"); + ehca_err(pd->device, "out of memory"); ib_mr = ERR_PTR(-ENOMEM); goto reg_user_mr_exit0; } @@ -362,19 +324,15 @@ struct ib_mr *ehca_reg_user_mr(struct ib } /* successful registration of all pages */ - ib_mr = &e_mr->ib.ib_mr; - goto reg_user_mr_exit0; + return &e_mr->ib.ib_mr; reg_user_mr_exit1: ehca_mr_delete(e_mr); reg_user_mr_exit0: if (IS_ERR(ib_mr)) - EDEB_EX(4, "rc=%lx pd=%p region=%p mr_access_flags=%x " - "udata=%p", - PTR_ERR(ib_mr), pd, region, mr_access_flags, udata); - else - EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x", - ib_mr, ib_mr->lkey, ib_mr->rkey); + ehca_err(pd->device, "rc=%lx pd=%p region=%p mr_access_flags=%x" + " udata=%p", + PTR_ERR(ib_mr), pd, region, mr_access_flags, udata); return ib_mr; } /* end ehca_reg_user_mr() */ @@ -388,32 +346,26 @@ int ehca_rereg_phys_mr(struct ib_mr *mr, int mr_access_flags, u64 *iova_start) { - int ret = 0; - struct ehca_shca *shca = NULL; - struct ehca_mr *e_mr = NULL; - u64 new_size = 0; - u64 *new_start = NULL; - u32 new_acl = 0; - struct ehca_pd *new_pd = NULL; - u32 tmp_lkey = 0; - u32 tmp_rkey = 0; + int ret; + + struct ehca_shca *shca = + container_of(mr->device, struct ehca_shca, ib_device); + struct ehca_mr *e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); + struct ehca_pd *my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); + u64 new_size; + u64 *new_start; + u32 new_acl; + struct ehca_pd *new_pd; + u32 tmp_lkey, tmp_rkey; unsigned long sl_flags; u32 num_pages_mr = 0; u32 num_pages_4k = 0; /* 4k portion "pages" */ struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; - struct ehca_pd *my_pd = NULL; u32 cur_pid = current->tgid; - EDEB_EN(7, "mr=%p mr_rereg_mask=%x pd=%p phys_buf_array=%p " - "num_phys_buf=%x mr_access_flags=%x iova_start=%p", - mr, mr_rereg_mask, pd, phys_buf_array, num_phys_buf, - mr_access_flags, iova_start); - - EHCA_CHECK_MR(mr); - my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && (my_pd->ownpid != cur_pid)) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(mr->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); ret = -EINVAL; goto rereg_phys_mr_exit0; @@ -421,15 +373,19 @@ int ehca_rereg_phys_mr(struct ib_mr *mr, if (!(mr_rereg_mask & IB_MR_REREG_TRANS)) { /* TODO not supported, because PHYP rereg hCall needs pages */ - EDEB_ERR(4, "rereg without IB_MR_REREG_TRANS not supported yet," - " mr_rereg_mask=%x", mr_rereg_mask); + ehca_err(mr->device, "rereg without IB_MR_REREG_TRANS not " + "supported yet, mr_rereg_mask=%x", mr_rereg_mask); ret = -EINVAL; goto rereg_phys_mr_exit0; } - e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); if (mr_rereg_mask & IB_MR_REREG_PD) { - EHCA_CHECK_PD(pd); + if (!pd) { + ehca_err(mr->device, "rereg with bad pd, pd=%p " + "mr_rereg_mask=%x", pd, mr_rereg_mask); + ret = -EINVAL; + goto rereg_phys_mr_exit0; + } } if ((mr_rereg_mask & @@ -439,12 +395,10 @@ int ehca_rereg_phys_mr(struct ib_mr *mr, goto rereg_phys_mr_exit0; } - shca = container_of(mr->device, struct ehca_shca, ib_device); - /* check other parameters */ if (e_mr == shca->maxmr) { /* should be impossible, however reject to be sure */ - EDEB_ERR(3, "rereg internal max-MR impossible, mr=%p " + ehca_err(mr->device, "rereg internal max-MR impossible, mr=%p " "shca->maxmr=%p mr->lkey=%x", mr, shca->maxmr, mr->lkey); ret = -EINVAL; @@ -452,14 +406,14 @@ int ehca_rereg_phys_mr(struct ib_mr *mr, } if (mr_rereg_mask & IB_MR_REREG_TRANS) { /* transl., i.e. addr/size */ if (e_mr->flags & EHCA_MR_FLAG_FMR) { - EDEB_ERR(4, "not supported for FMR, mr=%p flags=%x", - mr, e_mr->flags); + ehca_err(mr->device, "not supported for FMR, mr=%p " + "flags=%x", mr, e_mr->flags); ret = -EINVAL; goto rereg_phys_mr_exit0; } - if (ehca_adr_bad(phys_buf_array) || num_phys_buf <= 0) { - EDEB_ERR(4, "bad input values: mr_rereg_mask=%x " - "phys_buf_array=%p num_phys_buf=%x", + if (!phys_buf_array || num_phys_buf <= 0) { + ehca_err(mr->device, "bad input values: mr_rereg_mask=%x" + " phys_buf_array=%p num_phys_buf=%x", mr_rereg_mask, phys_buf_array, num_phys_buf); ret = -EINVAL; goto rereg_phys_mr_exit0; @@ -474,7 +428,7 @@ int ehca_rereg_phys_mr(struct ib_mr *mr, * Remote Write Access requires Local Write Access * Remote Atomic Access requires Local Write Access */ - EDEB_ERR(4, "bad input values: mr_rereg_mask=%x " + ehca_err(mr->device, "bad input values: mr_rereg_mask=%x " "mr_access_flags=%x", mr_rereg_mask, mr_access_flags); ret = -EINVAL; goto rereg_phys_mr_exit0; @@ -497,7 +451,7 @@ int ehca_rereg_phys_mr(struct ib_mr *mr, goto rereg_phys_mr_exit1; if ((new_size == 0) || (((u64)iova_start + new_size) < (u64)iova_start)) { - EDEB_ERR(4, "bad input values: new_size=%lx " + ehca_err(mr->device, "bad input values: new_size=%lx " "iova_start=%p", new_size, iova_start); ret = -EINVAL; goto rereg_phys_mr_exit1; @@ -519,10 +473,6 @@ int ehca_rereg_phys_mr(struct ib_mr *mr, if (mr_rereg_mask & IB_MR_REREG_PD) new_pd = container_of(pd, struct ehca_pd, ib_pd); - EDEB(7, "mr=%p new_start=%p new_size=%lx new_acl=%x new_pd=%p " - "num_pages_mr=%x num_pages_4k=%x", e_mr, new_start, new_size, - new_acl, new_pd, num_pages_mr, num_pages_4k); - ret = ehca_rereg_mr(shca, e_mr, new_start, new_size, new_acl, new_pd, &pginfo, &tmp_lkey, &tmp_rkey); if (ret) @@ -538,17 +488,11 @@ rereg_phys_mr_exit1: spin_unlock_irqrestore(&e_mr->mrlock, sl_flags); rereg_phys_mr_exit0: if (ret) - EDEB_EX(4, "ret=%x mr=%p mr_rereg_mask=%x pd=%p " - "phys_buf_array=%p num_phys_buf=%x mr_access_flags=%x " - "iova_start=%p", - ret, mr, mr_rereg_mask, pd, phys_buf_array, - num_phys_buf, mr_access_flags, iova_start); - else - EDEB_EX(7, "mr=%p mr_rereg_mask=%x pd=%p phys_buf_array=%p " - "num_phys_buf=%x mr_access_flags=%x iova_start=%p", - mr, mr_rereg_mask, pd, phys_buf_array, num_phys_buf, - mr_access_flags, iova_start); - + ehca_err(mr->device, "ret=%x mr=%p mr_rereg_mask=%x pd=%p " + "phys_buf_array=%p num_phys_buf=%x mr_access_flags=%x " + "iova_start=%p", + ret, mr, mr_rereg_mask, pd, phys_buf_array, + num_phys_buf, mr_access_flags, iova_start); return ret; } /* end ehca_rereg_phys_mr() */ @@ -557,47 +501,36 @@ rereg_phys_mr_exit0: int ehca_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr) { int ret = 0; - u64 h_ret = H_SUCCESS; - struct ehca_shca *shca = NULL; - struct ehca_mr *e_mr = NULL; - struct ehca_pd *my_pd = NULL; + u64 h_ret; + struct ehca_shca *shca = + container_of(mr->device, struct ehca_shca, ib_device); + struct ehca_mr *e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); + struct ehca_pd *my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); u32 cur_pid = current->tgid; unsigned long sl_flags; struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; - EDEB_EN(7, "mr=%p mr_attr=%p", mr, mr_attr); - - EHCA_CHECK_MR(mr); - - my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && (my_pd->ownpid != cur_pid)) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(mr->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); ret = -EINVAL; goto query_mr_exit0; } - e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); - if (ehca_adr_bad(mr_attr)) { - EDEB_ERR(4, "bad input values: mr_attr=%p", mr_attr); - ret = -EINVAL; - goto query_mr_exit0; - } if ((e_mr->flags & EHCA_MR_FLAG_FMR)) { - EDEB_ERR(4, "not supported for FMR, mr=%p e_mr=%p " + ehca_err(mr->device, "not supported for FMR, mr=%p e_mr=%p " "e_mr->flags=%x", mr, e_mr, e_mr->flags); ret = -EINVAL; goto query_mr_exit0; } - shca = container_of(mr->device, struct ehca_shca, ib_device); memset(mr_attr, 0, sizeof(struct ib_mr_attr)); spin_lock_irqsave(&e_mr->mrlock, sl_flags); h_ret = hipz_h_query_mr(shca->ipz_hca_handle, e_mr, &hipzout); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_mr_query failed, h_ret=%lx mr=%p " + ehca_err(mr->device, "hipz_mr_query failed, h_ret=%lx mr=%p " "hca_hndl=%lx mr_hndl=%lx lkey=%x", h_ret, mr, shca->ipz_hca_handle.handle, e_mr->ipz_mr_handle.handle, mr->lkey); @@ -615,13 +548,8 @@ query_mr_exit1: spin_unlock_irqrestore(&e_mr->mrlock, sl_flags); query_mr_exit0: if (ret) - EDEB_EX(4, "ret=%x mr=%p mr_attr=%p", ret, mr, mr_attr); - else - EDEB_EX(7, "pd=%p device_virt_addr=%lx size=%lx " - "mr_access_flags=%x lkey=%x rkey=%x", - mr_attr->pd, mr_attr->device_virt_addr, - mr_attr->size, mr_attr->mr_access_flags, - mr_attr->lkey, mr_attr->rkey); + ehca_err(mr->device, "ret=%x mr=%p mr_attr=%p", + ret, mr, mr_attr); return ret; } /* end ehca_query_mr() */ @@ -630,35 +558,29 @@ query_mr_exit0: int ehca_dereg_mr(struct ib_mr *mr) { int ret = 0; - u64 h_ret = H_SUCCESS; - struct ehca_shca *shca = NULL; - struct ehca_mr *e_mr = NULL; - struct ehca_pd *my_pd = NULL; + u64 h_ret; + struct ehca_shca *shca = + container_of(mr->device, struct ehca_shca, ib_device); + struct ehca_mr *e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); + struct ehca_pd *my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); u32 cur_pid = current->tgid; - EDEB_EN(7, "mr=%p", mr); - - EHCA_CHECK_MR(mr); - my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && (my_pd->ownpid != cur_pid)) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(mr->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); ret = -EINVAL; goto dereg_mr_exit0; } - e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); - shca = container_of(mr->device, struct ehca_shca, ib_device); - if ((e_mr->flags & EHCA_MR_FLAG_FMR)) { - EDEB_ERR(4, "not supported for FMR, mr=%p e_mr=%p " + ehca_err(mr->device, "not supported for FMR, mr=%p e_mr=%p " "e_mr->flags=%x", mr, e_mr, e_mr->flags); ret = -EINVAL; goto dereg_mr_exit0; } else if (e_mr == shca->maxmr) { /* should be impossible, however reject to be sure */ - EDEB_ERR(3, "dereg internal max-MR impossible, mr=%p " + ehca_err(mr->device, "dereg internal max-MR impossible, mr=%p " "shca->maxmr=%p mr->lkey=%x", mr, shca->maxmr, mr->lkey); ret = -EINVAL; @@ -668,8 +590,8 @@ int ehca_dereg_mr(struct ib_mr *mr) /* TODO: BUSY: MR still has bound window(s) */ h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_mr); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx shca=%p e_mr=%p" - " hca_hndl=%lx mr_hndl=%lx mr->lkey=%x", + ehca_err(mr->device, "hipz_free_mr failed, h_ret=%lx shca=%p " + "e_mr=%p hca_hndl=%lx mr_hndl=%lx mr->lkey=%x", h_ret, shca, e_mr, shca->ipz_hca_handle.handle, e_mr->ipz_mr_handle.handle, mr->lkey); ret = ehca_mrmw_map_hrc_free_mr(h_ret); @@ -681,9 +603,7 @@ int ehca_dereg_mr(struct ib_mr *mr) dereg_mr_exit0: if (ret) - EDEB_EX(4, "ret=%x mr=%p", ret, mr); - else - EDEB_EX(7, ""); + ehca_err(mr->device, "ret=%x mr=%p", ret, mr); return ret; } /* end ehca_dereg_mr() */ @@ -691,19 +611,14 @@ dereg_mr_exit0: struct ib_mw *ehca_alloc_mw(struct ib_pd *pd) { - struct ib_mw *ib_mw = NULL; - u64 h_ret = H_SUCCESS; - struct ehca_shca *shca = NULL; - struct ehca_mw *e_mw = NULL; - struct ehca_pd *e_pd = NULL; + struct ib_mw *ib_mw; + u64 h_ret; + struct ehca_mw *e_mw; + struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd); + struct ehca_shca *shca = + container_of(pd->device, struct ehca_shca, ib_device); struct ehca_mw_hipzout_parms hipzout = {{0},0}; - EDEB_EN(7, "pd=%p", pd); - - EHCA_CHECK_PD_P(pd); - e_pd = container_of(pd, struct ehca_pd, ib_pd); - shca = container_of(pd->device, struct ehca_shca, ib_device); - e_mw = ehca_mw_new(); if (!e_mw) { ib_mw = ERR_PTR(-ENOMEM); @@ -713,25 +628,22 @@ struct ib_mw *ehca_alloc_mw(struct ib_pd h_ret = hipz_h_alloc_resource_mw(shca->ipz_hca_handle, e_mw, e_pd->fw_pd, &hipzout); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_mw_allocate failed, h_ret=%lx shca=%p " - "hca_hndl=%lx mw=%p", h_ret, shca, - shca->ipz_hca_handle.handle, e_mw); + ehca_err(pd->device, "hipz_mw_allocate failed, h_ret=%lx " + "shca=%p hca_hndl=%lx mw=%p", + h_ret, shca, shca->ipz_hca_handle.handle, e_mw); ib_mw = ERR_PTR(ehca_mrmw_map_hrc_alloc(h_ret)); goto alloc_mw_exit1; } /* successful MW allocation */ e_mw->ipz_mw_handle = hipzout.handle; e_mw->ib_mw.rkey = hipzout.rkey; - ib_mw = &e_mw->ib_mw; - goto alloc_mw_exit0; + return &e_mw->ib_mw; alloc_mw_exit1: ehca_mw_delete(e_mw); alloc_mw_exit0: if (IS_ERR(ib_mw)) - EDEB_EX(4, "rc=%lx pd=%p", PTR_ERR(ib_mw), pd); - else - EDEB_EX(7, "ib_mw=%p rkey=%x", ib_mw, ib_mw->rkey); + ehca_err(pd->device, "rc=%lx pd=%p", PTR_ERR(ib_mw), pd); return ib_mw; } /* end ehca_alloc_mw() */ @@ -741,55 +653,32 @@ int ehca_bind_mw(struct ib_qp *qp, struct ib_mw *mw, struct ib_mw_bind *mw_bind) { - int ret = 0; - /* TODO: not supported up to now */ - EDEB_ERR(4, "bind MW currently not supported by HCAD"); - ret = -EPERM; - goto bind_mw_exit0; + ehca_gen_err("bind MW currently not supported by HCAD"); -bind_mw_exit0: - if (ret) - EDEB_EX(4, "ret=%x qp=%p mw=%p mw_bind=%p", - ret, qp, mw, mw_bind); - else - EDEB_EX(7, "qp=%p mw=%p mw_bind=%p", qp, mw, mw_bind); - return ret; + return -EPERM; } /* end ehca_bind_mw() */ /*----------------------------------------------------------------------*/ int ehca_dealloc_mw(struct ib_mw *mw) { - int ret = 0; - u64 h_ret = H_SUCCESS; - struct ehca_shca *shca = NULL; - struct ehca_mw *e_mw = NULL; - - EDEB_EN(7, "mw=%p", mw); - - EHCA_CHECK_MW(mw); - e_mw = container_of(mw, struct ehca_mw, ib_mw); - shca = container_of(mw->device, struct ehca_shca, ib_device); + u64 h_ret; + struct ehca_shca *shca = + container_of(mw->device, struct ehca_shca, ib_device); + struct ehca_mw *e_mw = container_of(mw, struct ehca_mw, ib_mw); h_ret = hipz_h_free_resource_mw(shca->ipz_hca_handle, e_mw); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_free_mw failed, h_ret=%lx shca=%p mw=%p " - "rkey=%x hca_hndl=%lx mw_hndl=%lx", + ehca_err(mw->device, "hipz_free_mw failed, h_ret=%lx shca=%p " + "mw=%p rkey=%x hca_hndl=%lx mw_hndl=%lx", h_ret, shca, mw, mw->rkey, shca->ipz_hca_handle.handle, e_mw->ipz_mw_handle.handle); - ret = ehca_mrmw_map_hrc_free_mw(h_ret); - goto dealloc_mw_exit0; + return ehca_mrmw_map_hrc_free_mw(h_ret); } /* successful deallocation */ ehca_mw_delete(e_mw); - -dealloc_mw_exit0: - if (ret) - EDEB_EX(4, "ret=%x mw=%p", ret, mw); - else - EDEB_EX(7, ""); - return ret; + return 0; } /* end ehca_dealloc_mw() */ /*----------------------------------------------------------------------*/ @@ -798,28 +687,15 @@ struct ib_fmr *ehca_alloc_fmr(struct ib_ int mr_access_flags, struct ib_fmr_attr *fmr_attr) { - struct ib_fmr *ib_fmr = NULL; - struct ehca_shca *shca = NULL; - struct ehca_mr *e_fmr = NULL; - int ret = 0; - struct ehca_pd *e_pd = NULL; - u32 tmp_lkey = 0; - u32 tmp_rkey = 0; + struct ib_fmr *ib_fmr; + struct ehca_shca *shca = + container_of(pd->device, struct ehca_shca, ib_device); + struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd); + struct ehca_mr *e_fmr; + int ret; + u32 tmp_lkey, tmp_rkey; struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; - EDEB_EN(7, "pd=%p mr_access_flags=%x fmr_attr=%p", - pd, mr_access_flags, fmr_attr); - - EHCA_CHECK_PD_P(pd); - if (ehca_adr_bad(fmr_attr)) { - EDEB_ERR(4, "bad input values: fmr_attr=%p", fmr_attr); - ib_fmr = ERR_PTR(-EINVAL); - goto alloc_fmr_exit0; - } - - EDEB(7, "max_pages=%x max_maps=%x page_shift=%x", - fmr_attr->max_pages, fmr_attr->max_maps, fmr_attr->page_shift); - /* check other parameters */ if (((mr_access_flags & IB_ACCESS_REMOTE_WRITE) && !(mr_access_flags & IB_ACCESS_LOCAL_WRITE)) || @@ -829,19 +705,19 @@ struct ib_fmr *ehca_alloc_fmr(struct ib_ * Remote Write Access requires Local Write Access * Remote Atomic Access requires Local Write Access */ - EDEB_ERR(4, "bad input values: mr_access_flags=%x", + ehca_err(pd->device, "bad input values: mr_access_flags=%x", mr_access_flags); ib_fmr = ERR_PTR(-EINVAL); goto alloc_fmr_exit0; } if (mr_access_flags & IB_ACCESS_MW_BIND) { - EDEB_ERR(4, "bad input values: mr_access_flags=%x", + ehca_err(pd->device, "bad input values: mr_access_flags=%x", mr_access_flags); ib_fmr = ERR_PTR(-EINVAL); goto alloc_fmr_exit0; } if ((fmr_attr->max_pages == 0) || (fmr_attr->max_maps == 0)) { - EDEB_ERR(4, "bad input values: fmr_attr->max_pages=%x " + ehca_err(pd->device, "bad input values: fmr_attr->max_pages=%x " "fmr_attr->max_maps=%x fmr_attr->page_shift=%x", fmr_attr->max_pages, fmr_attr->max_maps, fmr_attr->page_shift); @@ -850,15 +726,12 @@ struct ib_fmr *ehca_alloc_fmr(struct ib_ } if (((1 << fmr_attr->page_shift) != EHCA_PAGESIZE) && ((1 << fmr_attr->page_shift) != PAGE_SIZE)) { - EDEB_ERR(4, "unsupported fmr_attr->page_shift=%x", + ehca_err(pd->device, "unsupported fmr_attr->page_shift=%x", fmr_attr->page_shift); ib_fmr = ERR_PTR(-EINVAL); goto alloc_fmr_exit0; } - e_pd = container_of(pd, struct ehca_pd, ib_pd); - shca = container_of(pd->device, struct ehca_shca, ib_device); - e_fmr = ehca_mr_new(); if (!e_fmr) { ib_fmr = ERR_PTR(-ENOMEM); @@ -881,19 +754,15 @@ struct ib_fmr *ehca_alloc_fmr(struct ib_ e_fmr->fmr_max_pages = fmr_attr->max_pages; e_fmr->fmr_max_maps = fmr_attr->max_maps; e_fmr->fmr_map_cnt = 0; - ib_fmr = &e_fmr->ib.ib_fmr; - goto alloc_fmr_exit0; + return &e_fmr->ib.ib_fmr; alloc_fmr_exit1: ehca_mr_delete(e_fmr); alloc_fmr_exit0: if (IS_ERR(ib_fmr)) - EDEB_EX(4, "rc=%lx pd=%p mr_access_flags=%x " - "fmr_attr=%p", PTR_ERR(ib_fmr), pd, - mr_access_flags, fmr_attr); - else - EDEB_EX(7, "ib_fmr=%p tmp_lkey=%x tmp_rkey=%x", - ib_fmr, tmp_lkey, tmp_rkey); + ehca_err(pd->device, "rc=%lx pd=%p mr_access_flags=%x " + "fmr_attr=%p", PTR_ERR(ib_fmr), pd, + mr_access_flags, fmr_attr); return ib_fmr; } /* end ehca_alloc_fmr() */ @@ -904,24 +773,16 @@ int ehca_map_phys_fmr(struct ib_fmr *fmr int list_len, u64 iova) { - int ret = 0; - struct ehca_shca *shca = NULL; - struct ehca_mr *e_fmr = NULL; - struct ehca_pd *e_pd = NULL; + int ret; + struct ehca_shca *shca = + container_of(fmr->device, struct ehca_shca, ib_device); + struct ehca_mr *e_fmr = container_of(fmr, struct ehca_mr, ib.ib_fmr); + struct ehca_pd *e_pd = container_of(fmr->pd, struct ehca_pd, ib_pd); struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; - u32 tmp_lkey = 0; - u32 tmp_rkey = 0; - - EDEB_EN(7, "fmr=%p page_list=%p list_len=%x iova=%lx", - fmr, page_list, list_len, iova); - - EHCA_CHECK_FMR(fmr); - e_fmr = container_of(fmr, struct ehca_mr, ib.ib_fmr); - shca = container_of(fmr->device, struct ehca_shca, ib_device); - e_pd = container_of(fmr->pd, struct ehca_pd, ib_pd); + u32 tmp_lkey, tmp_rkey; if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) { - EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x", + ehca_err(fmr->device, "not a FMR, e_fmr=%p e_fmr->flags=%x", e_fmr, e_fmr->flags); ret = -EINVAL; goto map_phys_fmr_exit0; @@ -931,16 +792,16 @@ int ehca_map_phys_fmr(struct ib_fmr *fmr goto map_phys_fmr_exit0; if (iova % e_fmr->fmr_page_size) { /* only whole-numbered pages */ - EDEB_ERR(4, "bad iova, iova=%lx fmr_page_size=%x", + ehca_err(fmr->device, "bad iova, iova=%lx fmr_page_size=%x", iova, e_fmr->fmr_page_size); ret = -EINVAL; goto map_phys_fmr_exit0; } if (e_fmr->fmr_map_cnt >= e_fmr->fmr_max_maps) { /* HCAD does not limit the maps, however trace this anyway */ - EDEB(6, "map limit exceeded, fmr=%p e_fmr->fmr_map_cnt=%x " - "e_fmr->fmr_max_maps=%x", - fmr, e_fmr->fmr_map_cnt, e_fmr->fmr_max_maps); + ehca_info(fmr->device, "map limit exceeded, fmr=%p " + "e_fmr->fmr_map_cnt=%x e_fmr->fmr_max_maps=%x", + fmr, e_fmr->fmr_map_cnt, e_fmr->fmr_max_maps); } pginfo.type = EHCA_MR_PGI_FMR; @@ -960,14 +821,13 @@ int ehca_map_phys_fmr(struct ib_fmr *fmr e_fmr->fmr_map_cnt++; e_fmr->ib.ib_fmr.lkey = tmp_lkey; e_fmr->ib.ib_fmr.rkey = tmp_rkey; + return 0; map_phys_fmr_exit0: if (ret) - EDEB_EX(4, "ret=%x fmr=%p page_list=%p list_len=%x iova=%lx", - ret, fmr, page_list, list_len, iova); - else - EDEB_EX(7, "lkey=%x rkey=%x", - e_fmr->ib.ib_fmr.lkey, e_fmr->ib.ib_fmr.rkey); + ehca_err(fmr->device, "ret=%x fmr=%p page_list=%p list_len=%x " + "iova=%lx", + ret, fmr, page_list, list_len, iova); return ret; } /* end ehca_map_phys_fmr() */ @@ -976,31 +836,34 @@ map_phys_fmr_exit0: int ehca_unmap_fmr(struct list_head *fmr_list) { int ret = 0; - struct ib_fmr *ib_fmr = NULL; + struct ib_fmr *ib_fmr; struct ehca_shca *shca = NULL; - struct ehca_shca *prev_shca = NULL; - struct ehca_mr *e_fmr = NULL; + struct ehca_shca *prev_shca; + struct ehca_mr *e_fmr; u32 num_fmr = 0; u32 unmap_fmr_cnt = 0; - EDEB_EN(7, "fmr_list=%p", fmr_list); - /* check all FMR belong to same SHCA, and check internal flag */ list_for_each_entry(ib_fmr, fmr_list, list) { prev_shca = shca; + if (!ib_fmr) { + ehca_gen_err("bad fmr=%p in list", ib_fmr); + ret = -EINVAL; + goto unmap_fmr_exit0; + } shca = container_of(ib_fmr->device, struct ehca_shca, ib_device); - EHCA_CHECK_FMR(ib_fmr); e_fmr = container_of(ib_fmr, struct ehca_mr, ib.ib_fmr); if ((shca != prev_shca) && prev_shca) { - EDEB_ERR(4, "SHCA mismatch, shca=%p prev_shca=%p " - "e_fmr=%p", shca, prev_shca, e_fmr); + ehca_err(&shca->ib_device, "SHCA mismatch, shca=%p " + "prev_shca=%p e_fmr=%p", + shca, prev_shca, e_fmr); ret = -EINVAL; goto unmap_fmr_exit0; } if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) { - EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x", - e_fmr, e_fmr->flags); + ehca_err(&shca->ib_device, "not a FMR, e_fmr=%p " + "e_fmr->flags=%x", e_fmr, e_fmr->flags); ret = -EINVAL; goto unmap_fmr_exit0; } @@ -1016,20 +879,18 @@ int ehca_unmap_fmr(struct list_head *fmr ret = ehca_unmap_one_fmr(shca, e_fmr); if (ret) { /* unmap failed, stop unmapping of rest of FMRs */ - EDEB_ERR(4, "unmap of one FMR failed, stop rest, " - "e_fmr=%p num_fmr=%x unmap_fmr_cnt=%x lkey=%x", - e_fmr, num_fmr, unmap_fmr_cnt, - e_fmr->ib.ib_fmr.lkey); + ehca_err(&shca->ib_device, "unmap of one FMR failed, " + "stop rest, e_fmr=%p num_fmr=%x " + "unmap_fmr_cnt=%x lkey=%x", e_fmr, num_fmr, + unmap_fmr_cnt, e_fmr->ib.ib_fmr.lkey); goto unmap_fmr_exit0; } } unmap_fmr_exit0: if (ret) - EDEB_EX(4, "ret=%x fmr_list=%p num_fmr=%x unmap_fmr_cnt=%x", - ret, fmr_list, num_fmr, unmap_fmr_cnt); - else - EDEB_EX(7, "num_fmr=%x", num_fmr); + ehca_gen_err("ret=%x fmr_list=%p num_fmr=%x unmap_fmr_cnt=%x", + ret, fmr_list, num_fmr, unmap_fmr_cnt); return ret; } /* end ehca_unmap_fmr() */ @@ -1037,19 +898,14 @@ unmap_fmr_exit0: int ehca_dealloc_fmr(struct ib_fmr *fmr) { - int ret = 0; - u64 h_ret = H_SUCCESS; - struct ehca_shca *shca = NULL; - struct ehca_mr *e_fmr = NULL; - - EDEB_EN(7, "fmr=%p", fmr); - - EHCA_CHECK_FMR(fmr); - e_fmr = container_of(fmr, struct ehca_mr, ib.ib_fmr); - shca = container_of(fmr->device, struct ehca_shca, ib_device); + int ret; + u64 h_ret; + struct ehca_shca *shca = + container_of(fmr->device, struct ehca_shca, ib_device); + struct ehca_mr *e_fmr = container_of(fmr, struct ehca_mr, ib.ib_fmr); if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) { - EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x", + ehca_err(fmr->device, "not a FMR, e_fmr=%p e_fmr->flags=%x", e_fmr, e_fmr->flags); ret = -EINVAL; goto free_fmr_exit0; @@ -1057,21 +913,20 @@ int ehca_dealloc_fmr(struct ib_fmr *fmr) h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_fmr); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx e_fmr=%p " + ehca_err(fmr->device, "hipz_free_mr failed, h_ret=%lx e_fmr=%p " "hca_hndl=%lx fmr_hndl=%lx fmr->lkey=%x", h_ret, e_fmr, shca->ipz_hca_handle.handle, e_fmr->ipz_mr_handle.handle, fmr->lkey); - ehca_mrmw_map_hrc_free_mr(h_ret); + ret = ehca_mrmw_map_hrc_free_mr(h_ret); goto free_fmr_exit0; } /* successful deregistration */ ehca_mr_delete(e_fmr); + return 0; free_fmr_exit0: if (ret) - EDEB_EX(4, "ret=%x fmr=%p", ret, fmr); - else - EDEB_EX(7, ""); + ehca_err(&shca->ib_device, "ret=%x fmr=%p", ret, fmr); return ret; } /* end ehca_dealloc_fmr() */ @@ -1087,15 +942,11 @@ int ehca_reg_mr(struct ehca_shca *shca, u32 *lkey, /*OUT*/ u32 *rkey) /*OUT*/ { - int ret = 0; - u64 h_ret = H_SUCCESS; - u32 hipz_acl = 0; + int ret; + u64 h_ret; + u32 hipz_acl; struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; - EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x e_pd=%p " - "pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr, iova_start, - size, acl, e_pd, pginfo, pginfo->num_pages, pginfo->num_4k); - ehca_mrmw_map_acl(acl, &hipz_acl); ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl); if (ehca_use_hp_mr == 1) @@ -1105,8 +956,8 @@ int ehca_reg_mr(struct ehca_shca *shca, (u64)iova_start, size, hipz_acl, e_pd->fw_pd, &hipzout); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_alloc_mr failed, h_ret=%lx hca_hndl=%lx", - h_ret, shca->ipz_hca_handle.handle); + ehca_err(&shca->ib_device, "hipz_alloc_mr failed, h_ret=%lx " + "hca_hndl=%lx", h_ret, shca->ipz_hca_handle.handle); ret = ehca_mrmw_map_hrc_alloc(h_ret); goto ehca_reg_mr_exit0; } @@ -1125,26 +976,27 @@ int ehca_reg_mr(struct ehca_shca *shca, e_mr->acl = acl; *lkey = hipzout.lkey; *rkey = hipzout.rkey; - goto ehca_reg_mr_exit0; + return 0; ehca_reg_mr_exit1: h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_mr); if (h_ret != H_SUCCESS) { - EDEB_ERR(1, "h_ret=%lx shca=%p e_mr=%p iova_start=%p " - "size=%lx acl=%x e_pd=%p lkey=%x pginfo=%p " - "num_pages=%lx num_4k=%lx ret=%x", h_ret, shca, e_mr, - iova_start, size, acl, e_pd, hipzout.lkey, pginfo, - pginfo->num_pages, pginfo->num_4k, ret); - EDEB_ERR(1, "internal error in ehca_reg_mr, not recoverable"); + ehca_err(&shca->ib_device, "h_ret=%lx shca=%p e_mr=%p " + "iova_start=%p size=%lx acl=%x e_pd=%p lkey=%x " + "pginfo=%p num_pages=%lx num_4k=%lx ret=%x", + h_ret, shca, e_mr, iova_start, size, acl, e_pd, + hipzout.lkey, pginfo, pginfo->num_pages, + pginfo->num_4k, ret); + ehca_err(&shca->ib_device, "internal error in ehca_reg_mr, " + "not recoverable"); } ehca_reg_mr_exit0: if (ret) - EDEB_EX(4, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx " - "acl=%x e_pd=%p pginfo=%p num_pages=%lx num_4k=%lx", - ret, shca, e_mr, iova_start, size, acl, e_pd, pginfo, - pginfo->num_pages, pginfo->num_4k); - else - EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey); + ehca_err(&shca->ib_device, "ret=%x shca=%p e_mr=%p " + "iova_start=%p size=%lx acl=%x e_pd=%p pginfo=%p " + "num_pages=%lx num_4k=%lx", + ret, shca, e_mr, iova_start, size, acl, e_pd, pginfo, + pginfo->num_pages, pginfo->num_4k); return ret; } /* end ehca_reg_mr() */ @@ -1155,18 +1007,15 @@ int ehca_reg_mr_rpages(struct ehca_shca struct ehca_mr_pginfo *pginfo) { int ret = 0; - u64 h_ret = H_SUCCESS; - u32 rnum = 0; - u64 rpage = 0; + u64 h_ret; + u32 rnum; + u64 rpage; u32 i; - u64 *kpage = NULL; - - EDEB_EN(7, "shca=%p e_mr=%p pginfo=%p num_pages=%lx num_4k=%lx", - shca, e_mr, pginfo, pginfo->num_pages, pginfo->num_4k); + u64 *kpage; kpage = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!kpage) { - EDEB_ERR(4, "kpage alloc failed"); + ehca_err(&shca->ib_device, "kpage alloc failed"); ret = -ENOMEM; goto ehca_reg_mr_rpages_exit0; } @@ -1184,29 +1033,29 @@ int ehca_reg_mr_rpages(struct ehca_shca if (rnum > 1) { ret = ehca_set_pagebuf(e_mr, pginfo, rnum, kpage); if (ret) { - EDEB_ERR(4, "ehca_set_pagebuf bad rc, ret=%x " - "rnum=%x kpage=%p", ret, rnum, kpage); + ehca_err(&shca->ib_device, "ehca_set_pagebuf " + "bad rc, ret=%x rnum=%x kpage=%p", + ret, rnum, kpage); ret = -EFAULT; goto ehca_reg_mr_rpages_exit1; } rpage = virt_to_abs(kpage); if (!rpage) { - EDEB_ERR(4, "kpage=%p i=%x", kpage, i); + ehca_err(&shca->ib_device, "kpage=%p i=%x", + kpage, i); ret = -EFAULT; goto ehca_reg_mr_rpages_exit1; } } else { /* rnum==1 */ ret = ehca_set_pagebuf_1(e_mr, pginfo, &rpage); if (ret) { - EDEB_ERR(4, "ehca_set_pagebuf_1 bad rc, " - "ret=%x i=%x", ret, i); + ehca_err(&shca->ib_device, "ehca_set_pagebuf_1 " + "bad rc, ret=%x i=%x", ret, i); ret = -EFAULT; goto ehca_reg_mr_rpages_exit1; } } - EDEB(9, "i=%x rnum=%x rpage=%lx", i, rnum, rpage); - h_ret = hipz_h_register_rpage_mr(shca->ipz_hca_handle, e_mr, 0, /* pagesize 4k */ 0, rpage, rnum); @@ -1217,9 +1066,10 @@ int ehca_reg_mr_rpages(struct ehca_shca * and for 'page registered'==H_PAGE_REGISTERED */ if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "last hipz_reg_rpage_mr failed, " - "h_ret=%lx e_mr=%p i=%x hca_hndl=%lx " - "mr_hndl=%lx lkey=%x", h_ret, e_mr, i, + ehca_err(&shca->ib_device, "last " + "hipz_reg_rpage_mr failed, h_ret=%lx " + "e_mr=%p i=%x hca_hndl=%lx mr_hndl=%lx" + " lkey=%x", h_ret, e_mr, i, shca->ipz_hca_handle.handle, e_mr->ipz_mr_handle.handle, e_mr->ib.ib_mr.lkey); @@ -1228,8 +1078,8 @@ int ehca_reg_mr_rpages(struct ehca_shca } else ret = 0; } else if (h_ret != H_PAGE_REGISTERED) { - EDEB_ERR(4, "hipz_reg_rpage_mr failed, h_ret=%lx " - "e_mr=%p i=%x lkey=%x hca_hndl=%lx " + ehca_err(&shca->ib_device, "hipz_reg_rpage_mr failed, " + "h_ret=%lx e_mr=%p i=%x lkey=%x hca_hndl=%lx " "mr_hndl=%lx", h_ret, e_mr, i, e_mr->ib.ib_mr.lkey, shca->ipz_hca_handle.handle, @@ -1245,11 +1095,9 @@ ehca_reg_mr_rpages_exit1: kfree(kpage); ehca_reg_mr_rpages_exit0: if (ret) - EDEB_EX(4, "ret=%x shca=%p e_mr=%p pginfo=%p num_pages=%lx " - "num_4k=%lx", ret, shca, e_mr, pginfo, - pginfo->num_pages, pginfo->num_4k); - else - EDEB_EX(7, "ret=%x", ret); + ehca_err(&shca->ib_device, "ret=%x shca=%p e_mr=%p pginfo=%p " + "num_pages=%lx num_4k=%lx", ret, shca, e_mr, pginfo, + pginfo->num_pages, pginfo->num_4k); return ret; } /* end ehca_reg_mr_rpages() */ @@ -1265,25 +1113,20 @@ inline int ehca_rereg_mr_rereg1(struct e u32 *lkey, /*OUT*/ u32 *rkey) /*OUT*/ { - int ret = 0; - u64 h_ret = H_SUCCESS; - u32 hipz_acl = 0; - u64 *kpage = NULL; - u64 rpage = 0; + int ret; + u64 h_ret; + u32 hipz_acl; + u64 *kpage; + u64 rpage; struct ehca_mr_pginfo pginfo_save; struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; - EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x " - "e_pd=%p pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr, - iova_start, size, acl, e_pd, pginfo, pginfo->num_pages, - pginfo->num_4k); - ehca_mrmw_map_acl(acl, &hipz_acl); ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl); kpage = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (!kpage) { - EDEB_ERR(4, "kpage alloc failed"); + ehca_err(&shca->ib_device, "kpage alloc failed"); ret = -ENOMEM; goto ehca_rereg_mr_rereg1_exit0; } @@ -1291,14 +1134,15 @@ inline int ehca_rereg_mr_rereg1(struct e pginfo_save = *pginfo; ret = ehca_set_pagebuf(e_mr, pginfo, pginfo->num_4k, kpage); if (ret) { - EDEB_ERR(4, "set pagebuf failed, e_mr=%p pginfo=%p type=%x " - "num_pages=%lx num_4k=%lx kpage=%p", e_mr, pginfo, - pginfo->type, pginfo->num_pages, pginfo->num_4k,kpage); + ehca_err(&shca->ib_device, "set pagebuf failed, e_mr=%p " + "pginfo=%p type=%x num_pages=%lx num_4k=%lx kpage=%p", + e_mr, pginfo, pginfo->type, pginfo->num_pages, + pginfo->num_4k,kpage); goto ehca_rereg_mr_rereg1_exit1; } rpage = virt_to_abs(kpage); if (!rpage) { - EDEB_ERR(4, "kpage=%p", kpage); + ehca_err(&shca->ib_device, "kpage=%p", kpage); ret = -EFAULT; goto ehca_rereg_mr_rereg1_exit1; } @@ -1311,13 +1155,13 @@ inline int ehca_rereg_mr_rereg1(struct e * e.g. this is required in case H_MR_CONDITION * (MW bound or MR is shared) */ - EDEB(6, "hipz_h_reregister_pmr failed (Rereg1), h_ret=%lx " - "e_mr=%p", h_ret, e_mr); + ehca_warn(&shca->ib_device, "hipz_h_reregister_pmr failed " + "(Rereg1), h_ret=%lx e_mr=%p", h_ret, e_mr); *pginfo = pginfo_save; ret = -EAGAIN; } else if ((u64*)hipzout.vaddr != iova_start) { - EDEB_ERR(4, "PHYP changed iova_start in rereg_pmr, " - "iova_start=%p iova_start_out=%lx e_mr=%p " + ehca_err(&shca->ib_device, "PHYP changed iova_start in " + "rereg_pmr, iova_start=%p iova_start_out=%lx e_mr=%p " "mr_handle=%lx lkey=%x lkey_out=%x", iova_start, hipzout.vaddr, e_mr, e_mr->ipz_mr_handle.handle, e_mr->ib.ib_mr.lkey, hipzout.lkey); @@ -1340,13 +1184,10 @@ ehca_rereg_mr_rereg1_exit1: kfree(kpage); ehca_rereg_mr_rereg1_exit0: if ( ret && (ret != -EAGAIN) ) - EDEB_EX(4, "ret=%x h_ret=%lx lkey=%x rkey=%x pginfo=%p " - "num_pages=%lx num_4k=%lx", ret, h_ret, *lkey, *rkey, - pginfo, pginfo->num_pages, pginfo->num_4k); - else - EDEB_EX(7, "ret=%x h_ret=%lx lkey=%x rkey=%x pginfo=%p " - "num_pages=%lx num_4k=%lx", ret, h_ret, *lkey, *rkey, - pginfo, pginfo->num_pages, pginfo->num_4k); + ehca_err(&shca->ib_device, "ret=%x lkey=%x rkey=%x " + "pginfo=%p num_pages=%lx num_4k=%lx", + ret, *lkey, *rkey, pginfo, pginfo->num_pages, + pginfo->num_4k); return ret; } /* end ehca_rereg_mr_rereg1() */ @@ -1363,20 +1204,15 @@ int ehca_rereg_mr(struct ehca_shca *shca u32 *rkey) { int ret = 0; - u64 h_ret = H_SUCCESS; + u64 h_ret; int rereg_1_hcall = 1; /* 1: use hipz_h_reregister_pmr directly */ int rereg_3_hcall = 0; /* 1: use 3 hipz calls for reregistration */ - EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x " - "e_pd=%p pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr, - iova_start, size, acl, e_pd, pginfo, pginfo->num_pages, - pginfo->num_4k); - /* first determine reregistration hCall(s) */ if ((pginfo->num_4k > 512) || (e_mr->num_4k > 512) || (pginfo->num_4k > e_mr->num_4k)) { - EDEB(7, "Rereg3 case, pginfo->num_4k=%lx " - "e_mr->num_4k=%x", pginfo->num_4k, e_mr->num_4k); + ehca_dbg(&shca->ib_device, "Rereg3 case, pginfo->num_4k=%lx " + "e_mr->num_4k=%x", pginfo->num_4k, e_mr->num_4k); rereg_1_hcall = 0; rereg_3_hcall = 1; } @@ -1385,7 +1221,8 @@ int ehca_rereg_mr(struct ehca_shca *shca rereg_1_hcall = 0; rereg_3_hcall = 1; e_mr->flags &= ~EHCA_MR_FLAG_MAXMR; - EDEB(4, "Rereg MR for max-MR! e_mr=%p", e_mr); + ehca_err(&shca->ib_device, "Rereg MR for max-MR! e_mr=%p", + e_mr); } if (rereg_1_hcall) { @@ -1405,8 +1242,9 @@ int ehca_rereg_mr(struct ehca_shca *shca /* first deregister old MR */ h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_mr); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx e_mr=%p " - "hca_hndl=%lx mr_hndl=%lx mr->lkey=%x", + ehca_err(&shca->ib_device, "hipz_free_mr failed, " + "h_ret=%lx e_mr=%p hca_hndl=%lx mr_hndl=%lx " + "mr->lkey=%x", h_ret, e_mr, shca->ipz_hca_handle.handle, e_mr->ipz_mr_handle.handle, e_mr->ib.ib_mr.lkey); @@ -1436,18 +1274,12 @@ int ehca_rereg_mr(struct ehca_shca *shca ehca_rereg_mr_exit0: if (ret) - EDEB_EX(4, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx " - "acl=%x e_pd=%p pginfo=%p num_pages=%lx lkey=%x rkey=%x" - " rereg_1_hcall=%x rereg_3_hcall=%x", ret, shca, e_mr, - iova_start, size, acl, e_pd, pginfo, pginfo->num_pages, - *lkey, *rkey, rereg_1_hcall, rereg_3_hcall); - else - EDEB_EX(7, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx " - "acl=%x e_pd=%p pginfo=%p num_pages=%lx lkey=%x rkey=%x" - " rereg_1_hcall=%x rereg_3_hcall=%x", ret, shca, e_mr, - iova_start, size, acl, e_pd, pginfo, pginfo->num_pages, - *lkey, *rkey, rereg_1_hcall, rereg_3_hcall); - + ehca_err(&shca->ib_device, "ret=%x shca=%p e_mr=%p " + "iova_start=%p size=%lx acl=%x e_pd=%p pginfo=%p " + "num_pages=%lx lkey=%x rkey=%x rereg_1_hcall=%x " + "rereg_3_hcall=%x", ret, shca, e_mr, iova_start, size, + acl, e_pd, pginfo, pginfo->num_pages, *lkey, *rkey, + rereg_1_hcall, rereg_3_hcall); return ret; } /* end ehca_rereg_mr() */ @@ -1457,26 +1289,22 @@ int ehca_unmap_one_fmr(struct ehca_shca struct ehca_mr *e_fmr) { int ret = 0; - u64 h_ret = H_SUCCESS; + u64 h_ret; int rereg_1_hcall = 1; /* 1: use hipz_mr_reregister directly */ int rereg_3_hcall = 0; /* 1: use 3 hipz calls for unmapping */ - struct ehca_pd *e_pd = NULL; + struct ehca_pd *e_pd = + container_of(e_fmr->ib.ib_fmr.pd, struct ehca_pd, ib_pd); struct ehca_mr save_fmr; - u32 tmp_lkey = 0; - u32 tmp_rkey = 0; + u32 tmp_lkey, tmp_rkey; struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; - EDEB_EN(7, "shca=%p e_fmr=%p", shca, e_fmr); - /* first check if reregistration hCall can be used for unmap */ if (e_fmr->fmr_max_pages > 512) { rereg_1_hcall = 0; rereg_3_hcall = 1; } - e_pd = container_of(e_fmr->ib.ib_fmr.pd, struct ehca_pd, ib_pd); - if (rereg_1_hcall) { /* * note: after using rereg hcall with len=0, @@ -1489,10 +1317,10 @@ int ehca_unmap_one_fmr(struct ehca_shca * should not happen, because length checked above, * FMRs are not shared and no MW bound to FMRs */ - EDEB_ERR(4, "hipz_reregister_pmr failed (Rereg1), " - "h_ret=%lx e_fmr=%p hca_hndl=%lx mr_hndl=%lx " - "lkey=%x lkey_out=%x", h_ret, e_fmr, - shca->ipz_hca_handle.handle, + ehca_err(&shca->ib_device, "hipz_reregister_pmr failed " + "(Rereg1), h_ret=%lx e_fmr=%p hca_hndl=%lx " + "mr_hndl=%lx lkey=%x lkey_out=%x", + h_ret, e_fmr, shca->ipz_hca_handle.handle, e_fmr->ipz_mr_handle.handle, e_fmr->ib.ib_fmr.lkey, hipzout.lkey); rereg_3_hcall = 1; @@ -1511,9 +1339,10 @@ int ehca_unmap_one_fmr(struct ehca_shca /* first free old FMR */ h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_fmr); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx e_fmr=%p " - "hca_hndl=%lx mr_hndl=%lx lkey=%x", h_ret, - e_fmr, shca->ipz_hca_handle.handle, + ehca_err(&shca->ib_device, "hipz_free_mr failed, " + "h_ret=%lx e_fmr=%p hca_hndl=%lx mr_hndl=%lx " + "lkey=%x", + h_ret, e_fmr, shca->ipz_hca_handle.handle, e_fmr->ipz_mr_handle.handle, e_fmr->ib.ib_fmr.lkey); ret = ehca_mrmw_map_hrc_free_mr(h_ret); @@ -1547,9 +1376,11 @@ int ehca_unmap_one_fmr(struct ehca_shca } ehca_unmap_one_fmr_exit0: - EDEB_EX(7, "ret=%x tmp_lkey=%x tmp_rkey=%x fmr_max_pages=%x " - "rereg_1_hcall=%x rereg_3_hcall=%x", ret, tmp_lkey, tmp_rkey, - e_fmr->fmr_max_pages, rereg_1_hcall, rereg_3_hcall); + if (ret) + ehca_err(&shca->ib_device, "ret=%x tmp_lkey=%x tmp_rkey=%x " + "fmr_max_pages=%x rereg_1_hcall=%x rereg_3_hcall=%x", + ret, tmp_lkey, tmp_rkey, e_fmr->fmr_max_pages, + rereg_1_hcall, rereg_3_hcall); return ret; } /* end ehca_unmap_one_fmr() */ @@ -1565,13 +1396,10 @@ int ehca_reg_smr(struct ehca_shca *shca, u32 *rkey) /*OUT*/ { int ret = 0; - u64 h_ret = H_SUCCESS; - u32 hipz_acl = 0; + u64 h_ret; + u32 hipz_acl; struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; - EDEB_EN(7,"shca=%p e_origmr=%p e_newmr=%p iova_start=%p acl=%x e_pd=%p", - shca, e_origmr, e_newmr, iova_start, acl, e_pd); - ehca_mrmw_map_acl(acl, &hipz_acl); ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl); @@ -1579,10 +1407,11 @@ int ehca_reg_smr(struct ehca_shca *shca, (u64)iova_start, hipz_acl, e_pd->fw_pd, &hipzout); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_reg_smr failed, h_ret=%lx shca=%p e_origmr=%p" - " e_newmr=%p iova_start=%p acl=%x e_pd=%p hca_hndl=%lx" - " mr_hndl=%lx lkey=%x", h_ret, shca, e_origmr, e_newmr, - iova_start, acl, e_pd, shca->ipz_hca_handle.handle, + ehca_err(&shca->ib_device, "hipz_reg_smr failed, h_ret=%lx " + "shca=%p e_origmr=%p e_newmr=%p iova_start=%p acl=%x " + "e_pd=%p hca_hndl=%lx mr_hndl=%lx lkey=%x", + h_ret, shca, e_origmr, e_newmr, iova_start, acl, e_pd, + shca->ipz_hca_handle.handle, e_origmr->ipz_mr_handle.handle, e_origmr->ib.ib_mr.lkey); ret = ehca_mrmw_map_hrc_reg_smr(h_ret); @@ -1597,15 +1426,13 @@ int ehca_reg_smr(struct ehca_shca *shca, e_newmr->ipz_mr_handle = hipzout.handle; *lkey = hipzout.lkey; *rkey = hipzout.rkey; - goto ehca_reg_smr_exit0; + return 0; ehca_reg_smr_exit0: if (ret) - EDEB_EX(4, "ret=%x shca=%p e_origmr=%p e_newmr=%p " - "iova_start=%p acl=%x e_pd=%p", - ret, shca, e_origmr, e_newmr, iova_start, acl, e_pd); - else - EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey); + ehca_err(&shca->ib_device, "ret=%x shca=%p e_origmr=%p " + "e_newmr=%p iova_start=%p acl=%x e_pd=%p", + ret, shca, e_origmr, e_newmr, iova_start, acl, e_pd); return ret; } /* end ehca_reg_smr() */ @@ -1617,27 +1444,18 @@ int ehca_reg_internal_maxmr( struct ehca_pd *e_pd, struct ehca_mr **e_maxmr) /*OUT*/ { - int ret = 0; - struct ehca_mr *e_mr = NULL; - u64 *iova_start = NULL; - u64 size_maxmr = 0; + int ret; + struct ehca_mr *e_mr; + u64 *iova_start; + u64 size_maxmr; struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; struct ib_phys_buf ib_pbuf; - u32 num_pages_mr = 0; - u32 num_pages_4k = 0; /* 4k portion "pages" */ - - EDEB_EN(7, "shca=%p e_pd=%p e_maxmr=%p", shca, e_pd, e_maxmr); - - if (ehca_adr_bad(shca) || ehca_adr_bad(e_pd) || ehca_adr_bad(e_maxmr)) { - EDEB_ERR(4, "bad input values: shca=%p e_pd=%p e_maxmr=%p", - shca, e_pd, e_maxmr); - ret = -EINVAL; - goto ehca_reg_internal_maxmr_exit0; - } + u32 num_pages_mr; + u32 num_pages_4k; /* 4k portion "pages" */ e_mr = ehca_mr_new(); if (!e_mr) { - EDEB_ERR(4, "out of memory"); + ehca_err(&shca->ib_device, "out of memory"); ret = -ENOMEM; goto ehca_reg_internal_maxmr_exit0; } @@ -1645,7 +1463,6 @@ int ehca_reg_internal_maxmr( /* register internal max-MR on HCA */ size_maxmr = (u64)high_memory - PAGE_OFFSET; - EDEB(7, "high_memory=%p PAGE_OFFSET=%lx", high_memory, PAGE_OFFSET); iova_start = (u64*)KERNELBASE; ib_pbuf.addr = 0; ib_pbuf.size = size_maxmr; @@ -1664,8 +1481,8 @@ int ehca_reg_internal_maxmr( &pginfo, &e_mr->ib.ib_mr.lkey, &e_mr->ib.ib_mr.rkey); if (ret) { - EDEB_ERR(4, "reg of internal max MR failed, e_mr=%p " - "iova_start=%p size_maxmr=%lx num_pages_mr=%x " + ehca_err(&shca->ib_device, "reg of internal max MR failed, " + "e_mr=%p iova_start=%p size_maxmr=%lx num_pages_mr=%x " "num_pages_4k=%x", e_mr, iova_start, size_maxmr, num_pages_mr, num_pages_4k); goto ehca_reg_internal_maxmr_exit1; @@ -1678,18 +1495,14 @@ int ehca_reg_internal_maxmr( atomic_inc(&(e_pd->ib_pd.usecnt)); atomic_set(&(e_mr->ib.ib_mr.usecnt), 0); *e_maxmr = e_mr; - goto ehca_reg_internal_maxmr_exit0; + return 0; ehca_reg_internal_maxmr_exit1: ehca_mr_delete(e_mr); ehca_reg_internal_maxmr_exit0: if (ret) - EDEB_EX(4, "ret=%x shca=%p e_pd=%p e_maxmr=%p", - ret, shca, e_pd, e_maxmr); - else - EDEB_EX(7, "*e_maxmr=%p lkey=%x rkey=%x", - *e_maxmr, (*e_maxmr)->ib.ib_mr.lkey, - (*e_maxmr)->ib.ib_mr.rkey); + ehca_err(&shca->ib_device, "ret=%x shca=%p e_pd=%p e_maxmr=%p", + ret, shca, e_pd, e_maxmr); return ret; } /* end ehca_reg_internal_maxmr() */ @@ -1703,15 +1516,11 @@ int ehca_reg_maxmr(struct ehca_shca *shc u32 *lkey, u32 *rkey) { - int ret = 0; - u64 h_ret = H_SUCCESS; + u64 h_ret; struct ehca_mr *e_origmr = shca->maxmr; - u32 hipz_acl = 0; + u32 hipz_acl; struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; - EDEB_EN(7,"shca=%p e_origmr=%p e_newmr=%p iova_start=%p acl=%x e_pd=%p", - shca, e_origmr, e_newmr, iova_start, acl, e_pd); - ehca_mrmw_map_acl(acl, &hipz_acl); ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl); @@ -1719,13 +1528,12 @@ int ehca_reg_maxmr(struct ehca_shca *shc (u64)iova_start, hipz_acl, e_pd->fw_pd, &hipzout); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_reg_smr failed, h_ret=%lx e_origmr=%p " - "hca_hndl=%lx mr_hndl=%lx lkey=%x", + ehca_err(&shca->ib_device, "hipz_reg_smr failed, h_ret=%lx " + "e_origmr=%p hca_hndl=%lx mr_hndl=%lx lkey=%x", h_ret, e_origmr, shca->ipz_hca_handle.handle, e_origmr->ipz_mr_handle.handle, e_origmr->ib.ib_mr.lkey); - ret = ehca_mrmw_map_hrc_reg_smr(h_ret); - goto ehca_reg_maxmr_exit0; + return ehca_mrmw_map_hrc_reg_smr(h_ret); } /* successful registration */ e_newmr->num_pages = e_origmr->num_pages; @@ -1736,24 +1544,19 @@ int ehca_reg_maxmr(struct ehca_shca *shc e_newmr->ipz_mr_handle = hipzout.handle; *lkey = hipzout.lkey; *rkey = hipzout.rkey; - -ehca_reg_maxmr_exit0: - EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey); - return ret; + return 0; } /* end ehca_reg_maxmr() */ /*----------------------------------------------------------------------*/ int ehca_dereg_internal_maxmr(struct ehca_shca *shca) { - int ret = 0; - struct ehca_mr *e_maxmr = NULL; - struct ib_pd *ib_pd = NULL; - - EDEB_EN(7, "shca=%p shca->maxmr=%p", shca, shca->maxmr); + int ret; + struct ehca_mr *e_maxmr; + struct ib_pd *ib_pd; if (!shca->maxmr) { - EDEB_ERR(4, "bad call, shca=%p", shca); + ehca_err(&shca->ib_device, "bad call, shca=%p", shca); ret = -EINVAL; goto ehca_dereg_internal_maxmr_exit0; } @@ -1764,7 +1567,7 @@ int ehca_dereg_internal_maxmr(struct ehc ret = ehca_dereg_mr(&e_maxmr->ib.ib_mr); if (ret) { - EDEB_ERR(3, "dereg internal max-MR failed, " + ehca_err(&shca->ib_device, "dereg internal max-MR failed, " "ret=%x e_maxmr=%p shca=%p lkey=%x", ret, e_maxmr, shca, e_maxmr->ib.ib_mr.lkey); shca->maxmr = e_maxmr; @@ -1775,10 +1578,8 @@ int ehca_dereg_internal_maxmr(struct ehc ehca_dereg_internal_maxmr_exit0: if (ret) - EDEB_EX(4, "ret=%x shca=%p shca->maxmr=%p", - ret, shca, shca->maxmr); - else - EDEB_EX(7, ""); + ehca_err(&shca->ib_device, "ret=%x shca=%p shca->maxmr=%p", + ret, shca, shca->maxmr); return ret; } /* end ehca_dereg_internal_maxmr() */ @@ -1798,34 +1599,35 @@ int ehca_mr_chk_buf_and_calc_size(struct u32 i; if (num_phys_buf == 0) { - EDEB_ERR(4, "bad phys buf array len, num_phys_buf=0"); + ehca_gen_err("bad phys buf array len, num_phys_buf=0"); return -EINVAL; } /* check first buffer */ if (((u64)iova_start & ~PAGE_MASK) != (pbuf->addr & ~PAGE_MASK)) { - EDEB_ERR(4, "iova_start/addr mismatch, iova_start=%p " - "pbuf->addr=%lx pbuf->size=%lx", - iova_start, pbuf->addr, pbuf->size); + ehca_gen_err("iova_start/addr mismatch, iova_start=%p " + "pbuf->addr=%lx pbuf->size=%lx", + iova_start, pbuf->addr, pbuf->size); return -EINVAL; } if (((pbuf->addr + pbuf->size) % PAGE_SIZE) && (num_phys_buf > 1)) { - EDEB_ERR(4, "addr/size mismatch in 1st buf, pbuf->addr=%lx " - "pbuf->size=%lx", pbuf->addr, pbuf->size); + ehca_gen_err("addr/size mismatch in 1st buf, pbuf->addr=%lx " + "pbuf->size=%lx", pbuf->addr, pbuf->size); return -EINVAL; } for (i = 0; i < num_phys_buf; i++) { if ((i > 0) && (pbuf->addr % PAGE_SIZE)) { - EDEB_ERR(4, "bad address, i=%x pbuf->addr=%lx " - "pbuf->size=%lx", i, pbuf->addr, pbuf->size); + ehca_gen_err("bad address, i=%x pbuf->addr=%lx " + "pbuf->size=%lx", + i, pbuf->addr, pbuf->size); return -EINVAL; } if (((i > 0) && /* not 1st */ (i < (num_phys_buf - 1)) && /* not last */ (pbuf->size % PAGE_SIZE)) || (pbuf->size == 0)) { - EDEB_ERR(4, "bad size, i=%x pbuf->size=%lx", - i, pbuf->size); + ehca_gen_err("bad size, i=%x pbuf->size=%lx", + i, pbuf->size); return -EINVAL; } size_count += pbuf->size; @@ -1844,17 +1646,12 @@ int ehca_fmr_check_page_list(struct ehca int list_len) { u32 i; - u64 *page = NULL; - - if (ehca_adr_bad(page_list)) { - EDEB_ERR(4, "bad page_list, page_list=%p fmr=%p", - page_list, e_fmr); - return -EINVAL; - } + u64 *page; if ((list_len == 0) || (list_len > e_fmr->fmr_max_pages)) { - EDEB_ERR(4, "bad list_len, list_len=%x e_fmr->fmr_max_pages=%x " - "fmr=%p", list_len, e_fmr->fmr_max_pages, e_fmr); + ehca_gen_err("bad list_len, list_len=%x " + "e_fmr->fmr_max_pages=%x fmr=%p", + list_len, e_fmr->fmr_max_pages, e_fmr); return -EINVAL; } @@ -1862,9 +1659,9 @@ int ehca_fmr_check_page_list(struct ehca page = page_list; for (i = 0; i < list_len; i++) { if (*page % e_fmr->fmr_page_size) { - EDEB_ERR(4, "bad page, i=%x *page=%lx page=%p " - "fmr=%p fmr_page_size=%x", - i, *page, page, e_fmr, e_fmr->fmr_page_size); + ehca_gen_err("bad page, i=%x *page=%lx page=%p fmr=%p " + "fmr_page_size=%x", i, *page, page, e_fmr, + e_fmr->fmr_page_size); return -EINVAL; } page++; @@ -1882,24 +1679,14 @@ int ehca_set_pagebuf(struct ehca_mr *e_m u64 *kpage) { int ret = 0; - struct ib_umem_chunk *prev_chunk = NULL; - struct ib_umem_chunk *chunk = NULL; - struct ib_phys_buf *pbuf = NULL; - u64 *fmrlist = NULL; - u64 num4k = 0; - u64 pgaddr = 0; - u64 offs4k = 0; + struct ib_umem_chunk *prev_chunk; + struct ib_umem_chunk *chunk; + struct ib_phys_buf *pbuf; + u64 *fmrlist; + u64 num4k, pgaddr, offs4k; u32 i = 0; u32 j = 0; - EDEB_EN(7, "pginfo=%p type=%x num_pages=%lx num_4k=%lx next_buf=%lx " - "next_4k=%lx number=%x kpage=%p page_cnt=%lx page_4k_cnt=%lx " - "next_listelem=%lx region=%p next_chunk=%p next_nmap=%lx", - pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k, - pginfo->next_buf, pginfo->next_4k, number, kpage, - pginfo->page_cnt, pginfo->page_4k_cnt, pginfo->next_listelem, - pginfo->region, pginfo->next_chunk, pginfo->next_nmap); - if (pginfo->type == EHCA_MR_PGI_PHYS) { /* loop over desired phys_buf_array entries */ while (i < number) { @@ -1911,23 +1698,27 @@ int ehca_set_pagebuf(struct ehca_mr *e_m /* sanity check */ if ((pginfo->page_cnt >= pginfo->num_pages) || (pginfo->page_4k_cnt >= pginfo->num_4k)) { - EDEB_ERR(4, "page_cnt >= num_pages, " - "page_cnt=%lx num_pages=%lx " - "page_4k_cnt=%lx num_4k=%lx " - "i=%x", pginfo->page_cnt, - pginfo->num_pages, - pginfo->page_4k_cnt, - pginfo->num_4k, i); + ehca_gen_err("page_cnt >= num_pages, " + "page_cnt=%lx " + "num_pages=%lx " + "page_4k_cnt=%lx " + "num_4k=%lx i=%x", + pginfo->page_cnt, + pginfo->num_pages, + pginfo->page_4k_cnt, + pginfo->num_4k, i); ret = -EFAULT; + goto ehca_set_pagebuf_exit0; } *kpage = phys_to_abs( (pbuf->addr & EHCA_PAGEMASK) + (pginfo->next_4k * EHCA_PAGESIZE)); if ( !(*kpage) && pbuf->addr ) { - EDEB_ERR(4, "pbuf->addr=%lx " - "pbuf->size=%lx next_4k=%lx", - pbuf->addr, pbuf->size, - pginfo->next_4k); + ehca_gen_err("pbuf->addr=%lx " + "pbuf->size=%lx " + "next_4k=%lx", pbuf->addr, + pbuf->size, + pginfo->next_4k); ret = -EFAULT; goto ehca_set_pagebuf_exit0; } @@ -1952,23 +1743,21 @@ int ehca_set_pagebuf(struct ehca_mr *e_m list_for_each_entry_continue(chunk, (&(pginfo->region->chunk_list)), list) { - EDEB(9, "chunk->page_list[0]=%lx", - (u64)sg_dma_address(&chunk->page_list[0])); for (i = pginfo->next_nmap; i < chunk->nmap; ) { pgaddr = ( page_to_pfn(chunk->page_list[i].page) << PAGE_SHIFT ); *kpage = phys_to_abs(pgaddr + (pginfo->next_4k * EHCA_PAGESIZE)); - EDEB(9,"pgaddr=%lx *kpage=%lx next_4k=%lx", - pgaddr, *kpage, pginfo->next_4k); if ( !(*kpage) ) { - EDEB_ERR(4, "pgaddr=%lx " - "chunk->page_list[i]=%lx i=%x " - "next_4k=%lx mr=%p", pgaddr, - (u64)sg_dma_address( - &chunk->page_list[i]), - i, pginfo->next_4k, e_mr); + ehca_gen_err("pgaddr=%lx " + "chunk->page_list[i]=%lx " + "i=%x next_4k=%lx mr=%p", + pgaddr, + (u64)sg_dma_address( + &chunk-> + page_list[i]), + i, pginfo->next_4k, e_mr); ret = -EFAULT; goto ehca_set_pagebuf_exit0; } @@ -2009,10 +1798,11 @@ int ehca_set_pagebuf(struct ehca_mr *e_m *kpage = phys_to_abs((*fmrlist & EHCA_PAGEMASK) + pginfo->next_4k * EHCA_PAGESIZE); if ( !(*kpage) ) { - EDEB_ERR(4, "*fmrlist=%lx fmrlist=%p " - "next_listelem=%lx next_4k=%lx", - *fmrlist, fmrlist, - pginfo->next_listelem,pginfo->next_4k); + ehca_gen_err("*fmrlist=%lx fmrlist=%p " + "next_listelem=%lx next_4k=%lx", + *fmrlist, fmrlist, + pginfo->next_listelem, + pginfo->next_4k); ret = -EFAULT; goto ehca_set_pagebuf_exit0; } @@ -2028,32 +1818,23 @@ int ehca_set_pagebuf(struct ehca_mr *e_m } } } else { - EDEB_ERR(4, "bad pginfo->type=%x", pginfo->type); + ehca_gen_err("bad pginfo->type=%x", pginfo->type); ret = -EFAULT; goto ehca_set_pagebuf_exit0; } ehca_set_pagebuf_exit0: if (ret) - EDEB_EX(4, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " - "num_4k=%lx next_buf=%lx next_4k=%lx number=%x " - "kpage=%p page_cnt=%lx page_4k_cnt=%lx i=%x " - "next_listelem=%lx region=%p next_chunk=%p " - "next_nmap=%lx", ret, e_mr, pginfo, pginfo->type, - pginfo->num_pages, pginfo->num_4k, pginfo->next_buf, - pginfo->next_4k, number, kpage, pginfo->page_cnt, - pginfo->page_4k_cnt, i, pginfo->next_listelem, - pginfo->region, pginfo->next_chunk, pginfo->next_nmap); - else - EDEB_EX(7, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " - "num_4k=%lx next_buf=%lx next_4k=%lx number=%x " - "kpage=%p page_cnt=%lx page_4k_cnt=%lx i=%x " - "next_listelem=%lx region=%p next_chunk=%p " - "next_nmap=%lx", ret, e_mr, pginfo, pginfo->type, - pginfo->num_pages, pginfo->num_4k, pginfo->next_buf, - pginfo->next_4k, number, kpage, pginfo->page_cnt, - pginfo->page_4k_cnt, i, pginfo->next_listelem, - pginfo->region, pginfo->next_chunk, pginfo->next_nmap); + ehca_gen_err("ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " + "num_4k=%lx next_buf=%lx next_4k=%lx number=%x " + "kpage=%p page_cnt=%lx page_4k_cnt=%lx i=%x " + "next_listelem=%lx region=%p next_chunk=%p " + "next_nmap=%lx", ret, e_mr, pginfo, pginfo->type, + pginfo->num_pages, pginfo->num_4k, + pginfo->next_buf, pginfo->next_4k, number, kpage, + pginfo->page_cnt, pginfo->page_4k_cnt, i, + pginfo->next_listelem, pginfo->region, + pginfo->next_chunk, pginfo->next_nmap); return ret; } /* end ehca_set_pagebuf() */ @@ -2065,30 +1846,20 @@ int ehca_set_pagebuf_1(struct ehca_mr *e u64 *rpage) { int ret = 0; - struct ib_phys_buf *tmp_pbuf = NULL; - u64 *fmrlist = NULL; - struct ib_umem_chunk *chunk = NULL; - struct ib_umem_chunk *prev_chunk = NULL; - u64 pgaddr = 0; - u64 num4k = 0; - u64 offs4k = 0; - - EDEB_EN(7, "pginfo=%p type=%x num_pages=%lx num_4k=%lx next_buf=%lx " - "next_4k=%lx rpage=%p page_cnt=%lx page_4k_cnt=%lx " - "next_listelem=%lx region=%p next_chunk=%p next_nmap=%lx", - pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k, - pginfo->next_buf, pginfo->next_4k, rpage, pginfo->page_cnt, - pginfo->page_4k_cnt, pginfo->next_listelem, pginfo->region, - pginfo->next_chunk, pginfo->next_nmap); + struct ib_phys_buf *tmp_pbuf; + u64 *fmrlist; + struct ib_umem_chunk *chunk; + struct ib_umem_chunk *prev_chunk; + u64 pgaddr, num4k, offs4k; if (pginfo->type == EHCA_MR_PGI_PHYS) { /* sanity check */ if ((pginfo->page_cnt >= pginfo->num_pages) || (pginfo->page_4k_cnt >= pginfo->num_4k)) { - EDEB_ERR(4, "page_cnt >= num_pages, page_cnt=%lx " - "num_pages=%lx page_4k_cnt=%lx num_4k=%lx", - pginfo->page_cnt, pginfo->num_pages, - pginfo->page_4k_cnt, pginfo->num_4k); + ehca_gen_err("page_cnt >= num_pages, page_cnt=%lx " + "num_pages=%lx page_4k_cnt=%lx num_4k=%lx", + pginfo->page_cnt, pginfo->num_pages, + pginfo->page_4k_cnt, pginfo->num_4k); ret = -EFAULT; goto ehca_set_pagebuf_1_exit0; } @@ -2099,10 +1870,10 @@ int ehca_set_pagebuf_1(struct ehca_mr *e *rpage = phys_to_abs((tmp_pbuf->addr & EHCA_PAGEMASK) + (pginfo->next_4k * EHCA_PAGESIZE)); if ( !(*rpage) && tmp_pbuf->addr ) { - EDEB_ERR(4, "tmp_pbuf->addr=%lx" - " tmp_pbuf->size=%lx next_4k=%lx", - tmp_pbuf->addr, tmp_pbuf->size, - pginfo->next_4k); + ehca_gen_err("tmp_pbuf->addr=%lx" + " tmp_pbuf->size=%lx next_4k=%lx", + tmp_pbuf->addr, tmp_pbuf->size, + pginfo->next_4k); ret = -EFAULT; goto ehca_set_pagebuf_1_exit0; } @@ -2125,16 +1896,15 @@ int ehca_set_pagebuf_1(struct ehca_mr *e << PAGE_SHIFT); *rpage = phys_to_abs(pgaddr + (pginfo->next_4k * EHCA_PAGESIZE)); - EDEB(9,"pgaddr=%lx *rpage=%lx next_4k=%lx", pgaddr, - *rpage, pginfo->next_4k); if ( !(*rpage) ) { - EDEB_ERR(4, "pgaddr=%lx chunk->page_list[]=%lx " - "next_nmap=%lx next_4k=%lx mr=%p", - pgaddr, (u64)sg_dma_address( - &chunk->page_list[ - pginfo->next_nmap]), - pginfo->next_nmap, pginfo->next_4k, - e_mr); + ehca_gen_err("pgaddr=%lx chunk->page_list[]=%lx" + " next_nmap=%lx next_4k=%lx mr=%p", + pgaddr, (u64)sg_dma_address( + &chunk->page_list[ + pginfo-> + next_nmap]), + pginfo->next_nmap, pginfo->next_4k, + e_mr); ret = -EFAULT; goto ehca_set_pagebuf_1_exit0; } @@ -2161,9 +1931,10 @@ int ehca_set_pagebuf_1(struct ehca_mr *e *rpage = phys_to_abs((*fmrlist & EHCA_PAGEMASK) + pginfo->next_4k * EHCA_PAGESIZE); if ( !(*rpage) ) { - EDEB_ERR(4, "*fmrlist=%lx fmrlist=%p next_listelem=%lx " - "next_4k=%lx", *fmrlist, fmrlist, - pginfo->next_listelem, pginfo->next_4k); + ehca_gen_err("*fmrlist=%lx fmrlist=%p " + "next_listelem=%lx next_4k=%lx", + *fmrlist, fmrlist, pginfo->next_listelem, + pginfo->next_4k); ret = -EFAULT; goto ehca_set_pagebuf_1_exit0; } @@ -2176,32 +1947,22 @@ int ehca_set_pagebuf_1(struct ehca_mr *e pginfo->next_4k = 0; } } else { - EDEB_ERR(4, "bad pginfo->type=%x", pginfo->type); + ehca_gen_err("bad pginfo->type=%x", pginfo->type); ret = -EFAULT; goto ehca_set_pagebuf_1_exit0; } ehca_set_pagebuf_1_exit0: if (ret) - EDEB_EX(4, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " - "num_4k=%lx next_buf=%lx next_4k=%lx rpage=%p " - "page_cnt=%lx page_4k_cnt=%lx next_listelem=%lx " - "region=%p next_chunk=%p next_nmap=%lx", ret, e_mr, - pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k, - pginfo->next_buf, pginfo->next_4k, rpage, - pginfo->page_cnt, pginfo->page_4k_cnt, - pginfo->next_listelem, pginfo->region, - pginfo->next_chunk, pginfo->next_nmap); - else - EDEB_EX(7, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " - "num_4k=%lx next_buf=%lx next_4k=%lx rpage=%p " - "page_cnt=%lx page_4k_cnt=%lx next_listelem=%lx " - "region=%p next_chunk=%p next_nmap=%lx", ret, e_mr, - pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k, - pginfo->next_buf, pginfo->next_4k, rpage, - pginfo->page_cnt, pginfo->page_4k_cnt, - pginfo->next_listelem, pginfo->region, - pginfo->next_chunk, pginfo->next_nmap); + ehca_gen_err("ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " + "num_4k=%lx next_buf=%lx next_4k=%lx rpage=%p " + "page_cnt=%lx page_4k_cnt=%lx next_listelem=%lx " + "region=%p next_chunk=%p next_nmap=%lx", ret, e_mr, + pginfo, pginfo->type, pginfo->num_pages, + pginfo->num_4k, pginfo->next_buf, pginfo->next_4k, + rpage, pginfo->page_cnt, pginfo->page_4k_cnt, + pginfo->next_listelem, pginfo->region, + pginfo->next_chunk, pginfo->next_nmap); return ret; } /* end ehca_set_pagebuf_1() */ @@ -2217,7 +1978,7 @@ int ehca_mr_is_maxmr(u64 size, /* a MR is treated as max-MR only if it fits following: */ if ((size == ((u64)high_memory - PAGE_OFFSET)) && (iova_start == (void*)KERNELBASE)) { - EDEB(6, "this is a max-MR"); + ehca_gen_dbg("this is a max-MR"); return 1; } else return 0; @@ -2470,3 +2231,31 @@ void ehca_mr_deletenew(struct ehca_mr *m mr->nr_of_pages = 0; mr->pagearray = NULL; } /* end ehca_mr_deletenew() */ + +int ehca_init_mrmw_cache(void) +{ + mr_cache = kmem_cache_create("ehca_cache_mr", + sizeof(struct ehca_mr), 0, + SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!mr_cache) + return -ENOMEM; + mw_cache = kmem_cache_create("ehca_cache_mw", + sizeof(struct ehca_mw), 0, + SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!mw_cache) { + kmem_cache_destroy(mr_cache); + mr_cache = NULL; + return -ENOMEM; + } + return 0; +} + +void ehca_cleanup_mrmw_cache(void) +{ + if (mr_cache) + kmem_cache_destroy(mr_cache); + if (mw_cache) + kmem_cache_destroy(mw_cache); +} diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mrmw.h linux-2.6/drivers/infiniband/hw/ehca/ehca_mrmw.h --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mrmw.h 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_mrmw.h 2006-08-30 20:00:16.000000000 +0200 @@ -42,9 +42,6 @@ #ifndef _EHCA_MRMW_H_ #define _EHCA_MRMW_H_ -#undef DEB_PREFIX -#define DEB_PREFIX "mrmw" - int ehca_reg_mr(struct ehca_shca *shca, struct ehca_mr *e_mr, u64 *iova_start, diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_pd.c linux-2.6/drivers/infiniband/hw/ehca/ehca_pd.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_pd.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_pd.c 2006-08-30 20:00:16.000000000 +0200 @@ -38,29 +38,22 @@ * POSSIBILITY OF SUCH DAMAGE. */ - -#define DEB_PREFIX "vpd " - #include #include "ehca_tools.h" #include "ehca_iverbs.h" +static struct kmem_cache *pd_cache; + struct ib_pd *ehca_alloc_pd(struct ib_device *device, struct ib_ucontext *context, struct ib_udata *udata) { - extern struct ehca_module ehca_module; - struct ib_pd *mypd = NULL; - struct ehca_pd *pd = NULL; - - EDEB_EN(7, "device=%p context=%p udata=%p", device, context, udata); + struct ehca_pd *pd; - EHCA_CHECK_DEVICE_P(device); - - pd = kmem_cache_alloc(ehca_module.cache_pd, SLAB_KERNEL); + pd = kmem_cache_alloc(pd_cache, SLAB_KERNEL); if (!pd) { - EDEB_ERR(4, "ERROR device=%p context=%p pd=%p" - " out of memory", device, context, mypd); + ehca_err(device, "device=%p context=%p out of memory", + device, context); return ERR_PTR(-ENOMEM); } @@ -82,39 +75,40 @@ struct ib_pd *ehca_alloc_pd(struct ib_de } else pd->fw_pd.value = (u64)pd; - mypd = &pd->ib_pd; - - EHCA_REGISTER_PD(device, pd); - - EDEB_EX(7, "device=%p context=%p pd=%p", device, context, mypd); - - return mypd; + return &pd->ib_pd; } int ehca_dealloc_pd(struct ib_pd *pd) { - extern struct ehca_module ehca_module; - int ret = 0; u32 cur_pid = current->tgid; - struct ehca_pd *my_pd = NULL; + struct ehca_pd *my_pd = container_of(pd, struct ehca_pd, ib_pd); - EDEB_EN(7, "pd=%p", pd); - - EHCA_CHECK_PD(pd); - my_pd = container_of(pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && my_pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(pd->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); return -EINVAL; } - EHCA_DEREGISTER_PD(pd); - - kmem_cache_free(ehca_module.cache_pd, + kmem_cache_free(pd_cache, container_of(pd, struct ehca_pd, ib_pd)); - EDEB_EX(7, "pd=%p", pd); + return 0; +} - return ret; +int ehca_init_pd_cache(void) +{ + pd_cache = kmem_cache_create("ehca_cache_pd", + sizeof(struct ehca_pd), 0, + SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!pd_cache) + return -ENOMEM; + return 0; +} + +void ehca_cleanup_pd_cache(void) +{ + if (pd_cache) + kmem_cache_destroy(pd_cache); } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_qp.c linux-2.6/drivers/infiniband/hw/ehca/ehca_qp.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_qp.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_qp.c 2006-08-30 20:00:16.000000000 +0200 @@ -42,8 +42,6 @@ */ -#define DEB_PREFIX "e_qp" - #include #include "ehca_classes.h" @@ -53,6 +51,8 @@ #include "hcp_if.h" #include "hipz_fns.h" +static struct kmem_cache *qp_cache; + /* * attributes not supported by query qp */ @@ -114,7 +114,7 @@ static inline enum ehca_qp_state ib2ehca case IB_QPS_ERR: return EHCA_QPS_ERR; default: - EDEB_ERR(4, "invalid ib_qp_state=%x", ib_qp_state); + ehca_gen_err("invalid ib_qp_state=%x", ib_qp_state); return -EINVAL; } } @@ -142,7 +142,7 @@ static inline enum ib_qp_state ehca2ib_q case EHCA_QPS_ERR: return IB_QPS_ERR; default: - EDEB_ERR(4,"invalid ehca_qp_state=%x",ehca_qp_state); + ehca_gen_err("invalid ehca_qp_state=%x", ehca_qp_state); return -EINVAL; } } @@ -176,7 +176,7 @@ static inline enum ehca_qp_type ib2ehcaq case IB_QPT_UD: return QPT_UD; default: - EDEB_ERR(4,"Invalid ibqptype=%x", ibqptype); + ehca_gen_err("Invalid ibqptype=%x", ibqptype); return -EINVAL; } } @@ -190,24 +190,34 @@ static inline enum ib_qp_statetrans get_ index = IB_QPST_ANY2RESET; break; case IB_QPS_INIT: - if (ib_fromstate == IB_QPS_RESET) + switch (ib_fromstate) { + case IB_QPS_RESET: index = IB_QPST_RESET2INIT; - else if (ib_fromstate == IB_QPS_INIT) + break; + case IB_QPS_INIT: index = IB_QPST_INIT2INIT; + break; + } break; case IB_QPS_RTR: if (ib_fromstate == IB_QPS_INIT) index = IB_QPST_INIT2RTR; break; case IB_QPS_RTS: - if (ib_fromstate == IB_QPS_RTR) + switch (ib_fromstate) { + case IB_QPS_RTR: index = IB_QPST_RTR2RTS; - else if (ib_fromstate == IB_QPS_RTS) + break; + case IB_QPS_RTS: index = IB_QPST_RTS2RTS; - else if (ib_fromstate == IB_QPS_SQD) + break; + case IB_QPS_SQD: index = IB_QPST_SQD2RTS; - else if (ib_fromstate == IB_QPS_SQE) + break; + case IB_QPS_SQE: index = IB_QPST_SQE2RTS; + break; + } break; case IB_QPS_SQD: if (ib_fromstate == IB_QPS_RTS) @@ -252,7 +262,7 @@ static inline int ibqptype2servicetype(e case IB_QPT_RAW_ETY: return -EINVAL; default: - EDEB_ERR(4, "Invalid ibqptype=%x", ibqptype); + ehca_gen_err("Invalid ibqptype=%x", ibqptype); return -EINVAL; } } @@ -260,7 +270,7 @@ static inline int ibqptype2servicetype(e /* * init_qp_queues initializes/constructs r/squeue and registers queue pages. */ -static inline int init_qp_queues(struct ipz_adapter_handle ipz_hca_handle, +static inline int init_qp_queues(struct ehca_shca *shca, struct ehca_qp *my_qp, int nr_sq_pages, int nr_rq_pages, @@ -268,28 +278,26 @@ static inline int init_qp_queues(struct int rwqe_size, int nr_send_sges, int nr_receive_sges) { - int ret = -EINVAL; - int cnt = 0; - void *vpage = NULL; - u64 rpage = 0; - int ipz_rc = -1; - u64 h_ret = H_PARAMETER; + int ret, cnt, ipz_rc; + void *vpage; + u64 rpage, h_ret; + struct ib_device *ib_dev = &shca->ib_device; + struct ipz_adapter_handle ipz_hca_handle = shca->ipz_hca_handle; ipz_rc = ipz_queue_ctor(&my_qp->ipz_squeue, nr_sq_pages, EHCA_PAGESIZE, swqe_size, nr_send_sges); if (!ipz_rc) { - EDEB_ERR(4, "Cannot allocate page for squeue. ipz_rc=%x", + ehca_err(ib_dev,"Cannot allocate page for squeue. ipz_rc=%x", ipz_rc); - ret = -EBUSY; - return ret; + return -EBUSY; } ipz_rc = ipz_queue_ctor(&my_qp->ipz_rqueue, nr_rq_pages, EHCA_PAGESIZE, rwqe_size, nr_receive_sges); if (!ipz_rc) { - EDEB_ERR(4, "Cannot allocate page for rqueue. ipz_rc=%x", + ehca_err(ib_dev, "Cannot allocate page for rqueue. ipz_rc=%x", ipz_rc); ret = -EBUSY; goto init_qp_queues0; @@ -298,7 +306,7 @@ static inline int init_qp_queues(struct for (cnt = 0; cnt < nr_sq_pages; cnt++) { vpage = ipz_qpageit_get_inc(&my_qp->ipz_squeue); if (!vpage) { - EDEB_ERR(4, "SQ ipz_qpageit_get_inc() " + ehca_err(ib_dev, "SQ ipz_qpageit_get_inc() " "failed p_vpage= %p", vpage); ret = -EINVAL; goto init_qp_queues1; @@ -311,8 +319,8 @@ static inline int init_qp_queues(struct rpage, 1, my_qp->galpas.kernel); if (h_ret < H_SUCCESS) { - EDEB_ERR(4,"SQ hipz_qp_register_rpage() faield " - "rc=%lx", h_ret); + ehca_err(ib_dev, "SQ hipz_qp_register_rpage()" + " failed rc=%lx", h_ret); ret = ehca2ib_return_code(h_ret); goto init_qp_queues1; } @@ -324,9 +332,8 @@ static inline int init_qp_queues(struct for (cnt = 0; cnt < nr_rq_pages; cnt++) { vpage = ipz_qpageit_get_inc(&my_qp->ipz_rqueue); if (!vpage) { - EDEB_ERR(4,"RQ ipz_qpageit_get_inc() " + ehca_err(ib_dev, "RQ ipz_qpageit_get_inc() " "failed p_vpage = %p", vpage); - h_ret = H_RESOURCE; ret = -EINVAL; goto init_qp_queues1; } @@ -338,29 +345,28 @@ static inline int init_qp_queues(struct &my_qp->pf, 0, 1, rpage, 1,my_qp->galpas.kernel); if (h_ret < H_SUCCESS) { - EDEB_ERR(4, "RQ hipz_qp_register_rpage() failed " + ehca_err(ib_dev, "RQ hipz_qp_register_rpage() failed " "rc=%lx", h_ret); ret = ehca2ib_return_code(h_ret); goto init_qp_queues1; } if (cnt == (nr_rq_pages - 1)) { /* last page! */ if (h_ret != H_SUCCESS) { - EDEB_ERR(4,"RQ hipz_qp_register_rpage() " + ehca_err(ib_dev, "RQ hipz_qp_register_rpage() " "h_ret= %lx ", h_ret); ret = ehca2ib_return_code(h_ret); goto init_qp_queues1; } vpage = ipz_qpageit_get_inc(&my_qp->ipz_rqueue); if (vpage) { - EDEB_ERR(4,"ipz_qpageit_get_inc() " - "should not succeed vpage=%p", - vpage); + ehca_err(ib_dev, "ipz_qpageit_get_inc() " + "should not succeed vpage=%p", vpage); ret = -EINVAL; goto init_qp_queues1; } } else { if (h_ret != H_PAGE_REGISTERED) { - EDEB_ERR(4,"RQ hipz_qp_register_rpage() " + ehca_err(ib_dev, "RQ hipz_qp_register_rpage() " "h_ret= %lx ", h_ret); ret = ehca2ib_return_code(h_ret); goto init_qp_queues1; @@ -379,37 +385,30 @@ init_qp_queues0: return ret; } - struct ib_qp *ehca_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *init_attr, struct ib_udata *udata) { - extern struct ehca_module ehca_module; - static int da_msg_size[]={ 128, 256, 512, 1024, 2048, 4096 }; - int ret = -EINVAL; - - struct ehca_qp *my_qp = NULL; - struct ehca_pd *my_pd = NULL; - struct ehca_shca *shca = NULL; + static int da_rc_msg_size[]={ 128, 256, 512, 1024, 2048, 4096 }; + static int da_ud_sq_msg_size[]={ 128, 384, 896, 1920, 3968 }; + struct ehca_qp *my_qp; + struct ehca_pd *my_pd = container_of(pd, struct ehca_pd, ib_pd); + struct ehca_shca *shca = container_of(pd->device, struct ehca_shca, + ib_device); struct ib_ucontext *context = NULL; - u64 h_ret = H_PARAMETER; - int max_send_sge; - int max_recv_sge; + u64 h_ret; + int max_send_sge, max_recv_sge, ret; /* h_call's out parameters */ struct ehca_alloc_qp_parms parms; - u32 qp_nr = 0, swqe_size = 0, rwqe_size = 0; + u32 swqe_size = 0, rwqe_size = 0; u8 daqp_completion, isdaqp; unsigned long flags; - EDEB_EN(7,"pd=%p init_attr=%p", pd, init_attr); - EHCA_CHECK_PD_P(pd); - EHCA_CHECK_ADR_P(init_attr); - if (init_attr->sq_sig_type != IB_SIGNAL_REQ_WR && init_attr->sq_sig_type != IB_SIGNAL_ALL_WR) { - EDEB_ERR(4, "init_attr->sg_sig_type=%x not allowed", - init_attr->sq_sig_type); + ehca_err(pd->device, "init_attr->sg_sig_type=%x not allowed", + init_attr->sq_sig_type); return ERR_PTR(-EINVAL); } @@ -424,20 +423,36 @@ struct ib_qp *ehca_create_qp(struct ib_p init_attr->qp_type != IB_QPT_GSI && init_attr->qp_type != IB_QPT_UC && init_attr->qp_type != IB_QPT_RC) { - EDEB_ERR(4,"wrong QP Type=%x",init_attr->qp_type); + ehca_err(pd->device, "wrong QP Type=%x", init_attr->qp_type); return ERR_PTR(-EINVAL); } - if (init_attr->qp_type != IB_QPT_RC && isdaqp != 0) { - EDEB_ERR(4,"unsupported LL QP Type=%x",init_attr->qp_type); + if ((init_attr->qp_type != IB_QPT_RC && init_attr->qp_type != IB_QPT_UD) + && isdaqp) { + ehca_err(pd->device, "unsupported LL QP Type=%x", + init_attr->qp_type); + return ERR_PTR(-EINVAL); + } else if (init_attr->qp_type == IB_QPT_RC && isdaqp && + (init_attr->cap.max_send_wr > 255 || + init_attr->cap.max_recv_wr > 255 )) { + ehca_err(pd->device, "Invalid Number of max_sq_wr =%x " + "or max_rq_wr=%x for QP Type=%x", + init_attr->cap.max_send_wr, + init_attr->cap.max_recv_wr,init_attr->qp_type); + return ERR_PTR(-EINVAL); + } else if (init_attr->qp_type == IB_QPT_UD && isdaqp && + init_attr->cap.max_send_wr > 255) { + ehca_err(pd->device, + "Invalid Number of max_send_wr=%x for UD QP_TYPE=%x", + init_attr->cap.max_send_wr, init_attr->qp_type); return ERR_PTR(-EINVAL); } if (pd->uobject && udata) context = pd->uobject->context; - my_qp = kmem_cache_alloc(ehca_module.cache_qp, SLAB_KERNEL); + my_qp = kmem_cache_alloc(qp_cache, SLAB_KERNEL); if (!my_qp) { - EDEB_ERR(4, "pd=%p not enough memory to alloc qp", pd); + ehca_err(pd->device, "pd=%p not enough memory to alloc qp", pd); return ERR_PTR(-ENOMEM); } @@ -446,9 +461,6 @@ struct ib_qp *ehca_create_qp(struct ib_p spin_lock_init(&my_qp->spinlock_s); spin_lock_init(&my_qp->spinlock_r); - my_pd = container_of(pd, struct ehca_pd, ib_pd); - - shca = container_of(pd->device, struct ehca_shca, ib_device); my_qp->recv_cq = container_of(init_attr->recv_cq, struct ehca_cq, ib_cq); my_qp->send_cq = @@ -459,7 +471,7 @@ struct ib_qp *ehca_create_qp(struct ib_p do { if (!idr_pre_get(&ehca_qp_idr, GFP_KERNEL)) { ret = -ENOMEM; - EDEB_ERR(4, "Can't reserve idr resources."); + ehca_err(pd->device, "Can't reserve idr resources."); goto create_qp_exit0; } @@ -471,14 +483,14 @@ struct ib_qp *ehca_create_qp(struct ib_p if (ret) { ret = -ENOMEM; - EDEB_ERR(4, "Can't allocate new idr entry."); + ehca_err(pd->device, "Can't allocate new idr entry."); goto create_qp_exit0; } parms.servicetype = ibqptype2servicetype(init_attr->qp_type); if (parms.servicetype < 0) { ret = -EINVAL; - EDEB_ERR(4, "Invalid qp_type=%x", init_attr->qp_type); + ehca_err(pd->device, "Invalid qp_type=%x", init_attr->qp_type); goto create_qp_exit0; } @@ -497,8 +509,6 @@ struct ib_qp *ehca_create_qp(struct ib_p max_recv_sge += 2; } - EDEB(7, "isdaqp=%x daqp_completion=%x", isdaqp, daqp_completion); - parms.ipz_eq_handle = shca->eq.ipz_eq_handle; parms.daqp_ctrl = isdaqp | daqp_completion; parms.pd = my_pd->fw_pd; @@ -508,7 +518,8 @@ struct ib_qp *ehca_create_qp(struct ib_p h_ret = hipz_h_alloc_resource_qp(shca->ipz_hca_handle, my_qp, &parms); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "h_alloc_resource_qp() failed h_ret=%lx", h_ret); + ehca_err(pd->device, "h_alloc_resource_qp() failed h_ret=%lx", + h_ret); ret = ehca2ib_return_code(h_ret); goto create_qp_exit1; } @@ -521,8 +532,8 @@ struct ib_qp *ehca_create_qp(struct ib_p rwqe_size = offsetof(struct ehca_wqe, u.nud.sg_list[ (parms.act_nr_recv_sges)]); } else { /* for daqp we need to use msg size, not wqe size */ - swqe_size = da_msg_size[max_send_sge]; - rwqe_size = da_msg_size[max_recv_sge]; + swqe_size = da_rc_msg_size[max_send_sge]; + rwqe_size = da_rc_msg_size[max_recv_sge]; parms.act_nr_send_sges = 1; parms.act_nr_recv_sges = 1; } @@ -540,10 +551,17 @@ struct ib_qp *ehca_create_qp(struct ib_p /* UD circumvention */ parms.act_nr_recv_sges -= 2; parms.act_nr_send_sges -= 2; - swqe_size = offsetof(struct ehca_wqe, - u.ud_av.sg_list[parms.act_nr_send_sges]); - rwqe_size = offsetof(struct ehca_wqe, - u.ud_av.sg_list[parms.act_nr_recv_sges]); + if (isdaqp) { + swqe_size = da_ud_sq_msg_size[max_send_sge]; + rwqe_size = da_rc_msg_size[max_recv_sge]; + parms.act_nr_send_sges = 1; + parms.act_nr_recv_sges = 1; + } else { + swqe_size = offsetof(struct ehca_wqe, + u.ud_av.sg_list[parms.act_nr_send_sges]); + rwqe_size = offsetof(struct ehca_wqe, + u.ud_av.sg_list[parms.act_nr_recv_sges]); + } if (IB_QPT_GSI == init_attr->qp_type || IB_QPT_SMI == init_attr->qp_type) { @@ -562,13 +580,13 @@ struct ib_qp *ehca_create_qp(struct ib_p } /* initializes r/squeue and registers queue pages */ - ret = init_qp_queues(shca->ipz_hca_handle, my_qp, + ret = init_qp_queues(shca, my_qp, parms.nr_sq_pages, parms.nr_rq_pages, swqe_size, rwqe_size, parms.act_nr_send_sges, parms.act_nr_recv_sges); if (ret) { - EDEB_ERR(4,"Couldn't initialize r/squeue and pages ret=%x", - ret); + ehca_err(pd->device, + "Couldn't initialize r/squeue and pages ret=%x", ret); goto create_qp_exit2; } @@ -597,7 +615,8 @@ struct ib_qp *ehca_create_qp(struct ib_p if (init_attr->qp_type == IB_QPT_GSI) { h_ret = ehca_define_sqp(shca, my_qp, init_attr); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "ehca_define_sqp() failed rc=%lx",h_ret); + ehca_err(pd->device, "ehca_define_sqp() failed rc=%lx", + h_ret); ret = ehca2ib_return_code(h_ret); goto create_qp_exit3; } @@ -607,7 +626,7 @@ struct ib_qp *ehca_create_qp(struct ib_p struct ehca_cq, ib_cq); ret = ehca_cq_assign_qp(cq, my_qp); if (ret) { - EDEB_ERR(4, "Couldn't assign qp to send_cq ret=%x", + ehca_err(pd->device, "Couldn't assign qp to send_cq ret=%x", ret); goto create_qp_exit3; } @@ -637,7 +656,7 @@ struct ib_qp *ehca_create_qp(struct ib_p (void**)&resp.ipz_rqueue.queue, &vma); if (ret) { - EDEB_ERR(4, "Could not mmap rqueue pages"); + ehca_err(pd->device, "Could not mmap rqueue pages"); goto create_qp_exit3; } my_qp->uspace_rqueue = resp.ipz_rqueue.queue; @@ -652,7 +671,7 @@ struct ib_qp *ehca_create_qp(struct ib_p (void**)&resp.ipz_squeue.queue, &vma); if (ret) { - EDEB_ERR(4, "Could not mmap squeue pages"); + ehca_err(pd->device, "Could not mmap squeue pages"); goto create_qp_exit4; } my_qp->uspace_squeue = resp.ipz_squeue.queue; @@ -662,20 +681,18 @@ struct ib_qp *ehca_create_qp(struct ib_p (void**)&resp.galpas.kernel.fw_handle, &vma); if (ret) { - EDEB_ERR(4, "Could not mmap fw_handle"); + ehca_err(pd->device, "Could not mmap fw_handle"); goto create_qp_exit5; } my_qp->uspace_fwh = (u64)resp.galpas.kernel.fw_handle; if (ib_copy_to_udata(udata, &resp, sizeof resp)) { - EDEB_ERR(4, "Copy to udata failed"); + ehca_err(pd->device, "Copy to udata failed"); ret = -EINVAL; goto create_qp_exit6; } } - EDEB_EX(7, "ehca_qp=%p qp_num=%x, token=%x", - my_qp, qp_nr, my_qp->token); return &my_qp->ib_qp; create_qp_exit6: @@ -700,10 +717,8 @@ create_qp_exit1: spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); create_qp_exit0: - kmem_cache_free(ehca_module.cache_qp, my_qp); - EDEB_EX(4, "failed ret=%x", ret); + kmem_cache_free(qp_cache, my_qp); return ERR_PTR(ret); - } /* @@ -714,48 +729,45 @@ create_qp_exit0: static int prepare_sqe_rts(struct ehca_qp *my_qp, struct ehca_shca *shca, int *bad_wqe_cnt) { - int ret = 0; - u64 h_ret = H_SUCCESS; - struct ipz_queue *squeue = NULL; - void *bad_send_wqe_p = NULL; - void *bad_send_wqe_v = NULL; - void *squeue_start_p = NULL; - void *squeue_end_p = NULL; - void *squeue_start_v = NULL; - void *squeue_end_v = NULL; - struct ehca_wqe *wqe = NULL; + u64 h_ret; + struct ipz_queue *squeue; + void *bad_send_wqe_p, *bad_send_wqe_v; + void *squeue_start_p, *squeue_end_p; + void *squeue_start_v, *squeue_end_v; + struct ehca_wqe *wqe; int qp_num = my_qp->ib_qp.qp_num; - EDEB_EN(7, "ehca_qp=%p qp_num=%x ", my_qp, qp_num); - /* get send wqe pointer */ h_ret = hipz_h_disable_and_get_wqe(shca->ipz_hca_handle, my_qp->ipz_qp_handle, &my_qp->pf, &bad_send_wqe_p, NULL, 2); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_h_disable_and_get_wqe() failed " - "ehca_qp=%p qp_num=%x h_ret=%lx",my_qp, qp_num, h_ret); - ret = ehca2ib_return_code(h_ret); - goto prepare_sqe_rts_exit1; + ehca_err(&shca->ib_device, "hipz_h_disable_and_get_wqe() failed" + " ehca_qp=%p qp_num=%x h_ret=%lx", + my_qp, qp_num, h_ret); + return ehca2ib_return_code(h_ret); } bad_send_wqe_p = (void*)((u64)bad_send_wqe_p & (~(1L<<63))); - EDEB(7, "qp_num=%x bad_send_wqe_p=%p", qp_num, bad_send_wqe_p); + ehca_dbg(&shca->ib_device, "qp_num=%x bad_send_wqe_p=%p", + qp_num, bad_send_wqe_p); /* convert wqe pointer to vadr */ bad_send_wqe_v = abs_to_virt((u64)bad_send_wqe_p); - EDEB_DMP(6, bad_send_wqe_v, 32, "qp_num=%x bad_wqe", qp_num); + if (ehca_debug_level) + ehca_dmp(bad_send_wqe_v, 32, "qp_num=%x bad_wqe", qp_num); squeue = &my_qp->ipz_squeue; squeue_start_p = (void*)virt_to_abs(ipz_qeit_calc(squeue, 0L)); squeue_end_p = squeue_start_p+squeue->queue_length; squeue_start_v = abs_to_virt((u64)squeue_start_p); squeue_end_v = abs_to_virt((u64)squeue_end_p); - EDEB(6, "qp_num=%x squeue_start_v=%p squeue_end_v=%p", - qp_num, squeue_start_v, squeue_end_v); + ehca_dbg(&shca->ib_device, "qp_num=%x squeue_start_v=%p squeue_end_v=%p", + qp_num, squeue_start_v, squeue_end_v); /* loop sets wqe's purge bit */ wqe = (struct ehca_wqe*)bad_send_wqe_v; *bad_wqe_cnt = 0; while (wqe->optype != 0xff && wqe->wqef != 0xff) { - EDEB_DMP(6, wqe, 32, "qp_num=%x wqe", qp_num); + if (ehca_debug_level) + ehca_dmp(wqe, 32, "qp_num=%x wqe", qp_num); wqe->nr_of_data_seg = 0; /* suppress data access */ wqe->wqef = WQEF_PURGE; /* WQE to be purged */ wqe = (struct ehca_wqe*)((u8*)wqe+squeue->qe_size); @@ -768,13 +780,11 @@ static int prepare_sqe_rts(struct ehca_q * bad wqe will be reprocessed and ignored when pol_cq() is called, * i.e. nr of wqes with flush error status is one less */ - EDEB(6, "qp_num=%x flusherr_wqe_cnt=%x", qp_num, (*bad_wqe_cnt)-1); + ehca_dbg(&shca->ib_device, "qp_num=%x flusherr_wqe_cnt=%x", + qp_num, (*bad_wqe_cnt)-1); wqe->wqef = 0; -prepare_sqe_rts_exit1: - - EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x", my_qp, qp_num, ret); - return ret; + return 0; } /* @@ -787,34 +797,25 @@ static int internal_modify_qp(struct ib_ struct ib_qp_attr *attr, int attr_mask, int smi_reset2init) { - enum ib_qp_state qp_cur_state = 0, qp_new_state = 0; - int cnt = 0, qp_attr_idx = 0, ret = 0; - + enum ib_qp_state qp_cur_state, qp_new_state; + int cnt, qp_attr_idx, ret = 0; enum ib_qp_statetrans statetrans; - struct hcp_modify_qp_control_block *mqpcb = NULL; - struct ehca_qp *my_qp = NULL; - struct ehca_shca *shca = NULL; - u64 update_mask = 0; - u64 h_ret = H_SUCCESS; + struct hcp_modify_qp_control_block *mqpcb; + struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + struct ehca_shca *shca = + container_of(ibqp->pd->device, struct ehca_shca, ib_device); + u64 update_mask; + u64 h_ret; int bad_wqe_cnt = 0; int squeue_locked = 0; unsigned long spl_flags = 0; - my_qp = container_of(ibqp, struct ehca_qp, ib_qp); - shca = container_of(ibqp->pd->device, struct ehca_shca, ib_device); - - EDEB_EN(7, "ehca_qp=%p qp_num=%x ibqp_type=%x " - "new qp_state=%x attribute_mask=%x", - my_qp, ibqp->qp_num, ibqp->qp_type, - attr->qp_state, attr_mask); - /* do query_qp to obtain current attr values */ mqpcb = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); if (mqpcb == NULL) { - ret = -ENOMEM; - EDEB_ERR(4, "Could not get zeroed page for mqpcb " + ehca_err(ibqp->device, "Could not get zeroed page for mqpcb " "ehca_qp=%p qp_num=%x ", my_qp, ibqp->qp_num); - goto modify_qp_exit0; + return -ENOMEM; } h_ret = hipz_h_query_qp(shca->ipz_hca_handle, @@ -822,20 +823,18 @@ static int internal_modify_qp(struct ib_ &my_qp->pf, mqpcb, my_qp->galpas.kernel); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_h_query_qp() failed " + ehca_err(ibqp->device, "hipz_h_query_qp() failed " "ehca_qp=%p qp_num=%x h_ret=%lx", my_qp, ibqp->qp_num, h_ret); ret = ehca2ib_return_code(h_ret); goto modify_qp_exit1; } - EDEB(7, "ehca_qp=%p qp_num=%x ehca_qp_state=%x", - my_qp, ibqp->qp_num, mqpcb->qp_state); qp_cur_state = ehca2ib_qp_state(mqpcb->qp_state); if (qp_cur_state == -EINVAL) { /* invalid qp state */ ret = -EINVAL; - EDEB_ERR(4, "Invalid current ehca_qp_state=%x " + ehca_err(ibqp->device, "Invalid current ehca_qp_state=%x " "ehca_qp=%p qp_num=%x", mqpcb->qp_state, my_qp, ibqp->qp_num); goto modify_qp_exit1; @@ -860,37 +859,38 @@ static int internal_modify_qp(struct ib_ int smirc = internal_modify_qp( ibqp, &smiqp_attr, smiqp_attr_mask, 1); if (smirc) { - EDEB_ERR(4, "SMI RESET -> INIT failed. " + ehca_err(ibqp->device, "SMI RESET -> INIT failed. " "ehca_modify_qp() rc=%x", smirc); ret = H_PARAMETER; goto modify_qp_exit1; } qp_cur_state = IB_QPS_INIT; - EDEB(7, "SMI RESET -> INIT succeeded"); + ehca_dbg(ibqp->device, "SMI RESET -> INIT succeeded"); } /* is transmitted current state equal to "real" current state */ if ((attr_mask & IB_QP_CUR_STATE) && qp_cur_state != attr->cur_qp_state) { ret = -EINVAL; - EDEB_ERR(4, "Invalid IB_QP_CUR_STATE attr->curr_qp_state=%x <>" + ehca_err(ibqp->device, + "Invalid IB_QP_CUR_STATE attr->curr_qp_state=%x <>" " actual cur_qp_state=%x. ehca_qp=%p qp_num=%x", attr->cur_qp_state, qp_cur_state, my_qp, ibqp->qp_num); goto modify_qp_exit1; } - EDEB(7, "ehca_qp=%p qp_num=%x current qp_state=%x " - "new qp_state=%x attribute_mask=%x", - my_qp, ibqp->qp_num, qp_cur_state, attr->qp_state, attr_mask); + ehca_dbg(ibqp->device,"ehca_qp=%p qp_num=%x current qp_state=%x " + "new qp_state=%x attribute_mask=%x", + my_qp, ibqp->qp_num, qp_cur_state, attr->qp_state, attr_mask); qp_new_state = attr_mask & IB_QP_STATE ? attr->qp_state : qp_cur_state; if (!smi_reset2init && !ib_modify_qp_is_ok(qp_cur_state, qp_new_state, ibqp->qp_type, attr_mask)) { ret = -EINVAL; - EDEB_ERR(4, "Invalid qp transition new_state=%x cur_state=%x " - "ehca_qp=%p qp_num=%x attr_mask=%x", - qp_new_state, qp_cur_state, my_qp, ibqp->qp_num, - attr_mask); + ehca_err(ibqp->device, + "Invalid qp transition new_state=%x cur_state=%x " + "ehca_qp=%p qp_num=%x attr_mask=%x", qp_new_state, + qp_cur_state, my_qp, ibqp->qp_num, attr_mask); goto modify_qp_exit1; } @@ -898,7 +898,7 @@ static int internal_modify_qp(struct ib_ update_mask = EHCA_BMASK_SET(MQPCB_MASK_QP_STATE, 1); else { ret = -EINVAL; - EDEB_ERR(4, "Invalid new qp state=%x " + ehca_err(ibqp->device, "Invalid new qp state=%x " "ehca_qp=%p qp_num=%x", qp_new_state, my_qp, ibqp->qp_num); goto modify_qp_exit1; @@ -908,10 +908,9 @@ static int internal_modify_qp(struct ib_ statetrans = get_modqp_statetrans(qp_cur_state, qp_new_state); if (statetrans < 0) { ret = -EINVAL; - EDEB_ERR(4, " qp_cur_state=%x " - "new_qp_state=%x State_xsition=%x " - "ehca_qp=%p qp_num=%x", - qp_cur_state, qp_new_state, + ehca_err(ibqp->device, " qp_cur_state=%x " + "new_qp_state=%x State_xsition=%x ehca_qp=%p " + "qp_num=%x", qp_cur_state, qp_new_state, statetrans, my_qp, ibqp->qp_num); goto modify_qp_exit1; } @@ -920,13 +919,15 @@ static int internal_modify_qp(struct ib_ if (qp_attr_idx < 0) { ret = qp_attr_idx; - EDEB_ERR(4, "Invalid QP type=%x ehca_qp=%p qp_num=%x", + ehca_err(ibqp->device, + "Invalid QP type=%x ehca_qp=%p qp_num=%x", ibqp->qp_type, my_qp, ibqp->qp_num); goto modify_qp_exit1; } - EDEB(7, "ehca_qp=%p qp_num=%x qp_state_xsit=%x", - my_qp, ibqp->qp_num, statetrans); + ehca_dbg(ibqp->device, + "ehca_qp=%p qp_num=%x qp_state_xsit=%x", + my_qp, ibqp->qp_num, statetrans); /* sqe -> rts: set purge bit of bad wqe before actual trans */ if ((my_qp->qp_type == IB_QPT_UD || @@ -935,7 +936,7 @@ static int internal_modify_qp(struct ib_ statetrans == IB_QPST_SQE2RTS) { /* mark next free wqe if kernel */ if (my_qp->uspace_squeue == 0) { - struct ehca_wqe *wqe = NULL; + struct ehca_wqe *wqe; /* lock send queue */ spin_lock_irqsave(&my_qp->spinlock_s, spl_flags); squeue_locked = 1; @@ -943,12 +944,12 @@ static int internal_modify_qp(struct ib_ wqe = (struct ehca_wqe*) ipz_qeit_get(&my_qp->ipz_squeue); wqe->optype = wqe->wqef = 0xff; - EDEB(7, "qp_num=%x next_free_wqe=%p", - ibqp->qp_num, wqe); + ehca_dbg(ibqp->device, "qp_num=%x next_free_wqe=%p", + ibqp->qp_num, wqe); } ret = prepare_sqe_rts(my_qp, shca, &bad_wqe_cnt); if (ret) { - EDEB_ERR(4, "prepare_sqe_rts() failed " + ehca_err(ibqp->device, "prepare_sqe_rts() failed " "ehca_qp=%p qp_num=%x ret=%x", my_qp, ibqp->qp_num, ret); goto modify_qp_exit2; @@ -977,14 +978,11 @@ static int internal_modify_qp(struct ib_ if (attr_mask & IB_QP_PKEY_INDEX) { mqpcb->prim_p_key_idx = attr->pkey_index; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PRIM_P_KEY_IDX, 1); - EDEB(7, "ehca_qp=%p qp_num=%x " - "IB_QP_PKEY_INDEX update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_PORT) { if (attr->port_num < 1 || attr->port_num > shca->num_ports) { ret = -EINVAL; - EDEB_ERR(4, "Invalid port=%x. " + ehca_err(ibqp->device, "Invalid port=%x. " "ehca_qp=%p qp_num=%x num_ports=%x", attr->port_num, my_qp, ibqp->qp_num, shca->num_ports); @@ -992,14 +990,10 @@ static int internal_modify_qp(struct ib_ } mqpcb->prim_phys_port = attr->port_num; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PRIM_PHYS_PORT, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_PORT update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_QKEY) { mqpcb->qkey = attr->qkey; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_QKEY, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_QKEY update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_AV) { int ah_mult = ib_rate_to_mult(attr->ah_attr.static_rate); @@ -1013,18 +1007,12 @@ static int internal_modify_qp(struct ib_ mqpcb->service_level = attr->ah_attr.sl; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_SERVICE_LEVEL, 1); - if (ah_mult < ehca_mult) + if (ah_mult < ehca_mult) mqpcb->max_static_rate = (ah_mult > 0) ? ((ehca_mult - 1) / ah_mult) : 0; else mqpcb->max_static_rate = 0; - EDEB(7, " ipd=mqpcb->max_static_rate set %x " - " ah_mult=%x ehca_mult=%x " - " attr->ah_attr.static_rate=%x", - mqpcb->max_static_rate,ah_mult,ehca_mult, - attr->ah_attr.static_rate); - update_mask |= EHCA_BMASK_SET(MQPCB_MASK_MAX_STATIC_RATE, 1); /* @@ -1052,48 +1040,33 @@ static int internal_modify_qp(struct ib_ update_mask |= EHCA_BMASK_SET(MQPCB_MASK_TRAFFIC_CLASS, 1); } - - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_AV update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_PATH_MTU) { mqpcb->path_mtu = attr->path_mtu; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PATH_MTU, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_PATH_MTU update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_TIMEOUT) { mqpcb->timeout = attr->timeout; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_TIMEOUT, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_TIMEOUT update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_RETRY_CNT) { mqpcb->retry_count = attr->retry_cnt; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RETRY_COUNT, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RETRY_CNT update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_RNR_RETRY) { mqpcb->rnr_retry_count = attr->rnr_retry; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RNR_RETRY_COUNT, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RNR_RETRY update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_RQ_PSN) { mqpcb->receive_psn = attr->rq_psn; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RECEIVE_PSN, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RQ_PSN update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) { mqpcb->rdma_nr_atomic_resp_res = attr->max_dest_rd_atomic < 3 ? - attr->max_dest_rd_atomic : 2; /* max is 2 */ + attr->max_dest_rd_atomic : 2; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RDMA_NR_ATOMIC_RESP_RES, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_MAX_DEST_RD_ATOMIC " - "update_mask=%lx", my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC) { mqpcb->rdma_atomic_outst_dest_qp = attr->max_rd_atomic < 3 ? @@ -1101,8 +1074,6 @@ static int internal_modify_qp(struct ib_ update_mask |= EHCA_BMASK_SET (MQPCB_MASK_RDMA_ATOMIC_OUTST_DEST_QP, 1); - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_MAX_QP_RD_ATOMIC " - "update_mask=%lx", my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_ALT_PATH) { int ah_mult = ib_rate_to_mult(attr->alt_ah_attr.static_rate); @@ -1123,10 +1094,6 @@ static int internal_modify_qp(struct ib_ else mqpcb->max_static_rate_al = 0; - EDEB(7, " ipd=mqpcb->max_static_rate set %x," - " ah_mult=%x ehca_mult=%x", - mqpcb->max_static_rate,ah_mult,ehca_mult); - update_mask |= EHCA_BMASK_SET(MQPCB_MASK_MAX_STATIC_RATE_AL, 1); /* @@ -1159,43 +1126,28 @@ static int internal_modify_qp(struct ib_ update_mask |= EHCA_BMASK_SET(MQPCB_MASK_TRAFFIC_CLASS_AL, 1); } - - EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_ALT_PATH update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_MIN_RNR_TIMER) { mqpcb->min_rnr_nak_timer_field = attr->min_rnr_timer; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_MIN_RNR_NAK_TIMER_FIELD, 1); - EDEB(7, "ehca_qp=%p qp_num=%x " - "IB_QP_MIN_RNR_TIMER update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_SQ_PSN) { mqpcb->send_psn = attr->sq_psn; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_SEND_PSN, 1); - EDEB(7, "ehca_qp=%p qp_num=%x " - "IB_QP_SQ_PSN update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_DEST_QPN) { mqpcb->dest_qp_nr = attr->dest_qp_num; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_DEST_QP_NR, 1); - EDEB(7, "ehca_qp=%p qp_num=%x " - "IB_QP_DEST_QPN update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_PATH_MIG_STATE) { mqpcb->path_migration_state = attr->path_mig_state; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PATH_MIGRATION_STATE, 1); - EDEB(7, "ehca_qp=%p qp_num=%x " - "IB_QP_PATH_MIG_STATE update_mask=%lx", my_qp, - ibqp->qp_num, update_mask); } if (attr_mask & IB_QP_CAP) { @@ -1205,13 +1157,11 @@ static int internal_modify_qp(struct ib_ mqpcb->max_nr_outst_recv_wr = attr->cap.max_recv_wr+1; update_mask |= EHCA_BMASK_SET(MQPCB_MASK_MAX_NR_OUTST_RECV_WR, 1); - EDEB(7, "ehca_qp=%p qp_num=%x " - "IB_QP_CAP update_mask=%lx", - my_qp, ibqp->qp_num, update_mask); /* no support for max_send/recv_sge yet */ } - EDEB_DMP(7, mqpcb, 4*70, "ehca_qp=%p qp_num=%x", my_qp, ibqp->qp_num); + if (ehca_debug_level) + ehca_dmp(mqpcb, 4*70, "qp_num=%x", ibqp->qp_num); h_ret = hipz_h_modify_qp(shca->ipz_hca_handle, my_qp->ipz_qp_handle, @@ -1221,9 +1171,8 @@ static int internal_modify_qp(struct ib_ if (h_ret != H_SUCCESS) { ret = ehca2ib_return_code(h_ret); - EDEB_ERR(4, "hipz_h_modify_qp() failed rc=%lx " - "ehca_qp=%p qp_num=%x", - h_ret, my_qp, ibqp->qp_num); + ehca_err(ibqp->device, "hipz_h_modify_qp() failed rc=%lx " + "ehca_qp=%p qp_num=%x",h_ret, my_qp, ibqp->qp_num); goto modify_qp_exit2; } @@ -1234,7 +1183,7 @@ static int internal_modify_qp(struct ib_ /* doorbell to reprocessing wqes */ iosync(); /* serialize GAL register access */ hipz_update_sqa(my_qp, bad_wqe_cnt-1); - EDEB(6, "doorbell for %x wqes", bad_wqe_cnt); + ehca_gen_dbg("doorbell for %x wqes", bad_wqe_cnt); } if (statetrans == IB_QPST_RESET2INIT || @@ -1244,10 +1193,6 @@ static int internal_modify_qp(struct ib_ update_mask = 0; update_mask = EHCA_BMASK_SET(MQPCB_MASK_QP_ENABLE, 1); - EDEB(7, "ehca_qp=%p qp_num=%x " - "RESET_2_INIT needs an additional enable " - "-> update_mask=%lx", my_qp, ibqp->qp_num, update_mask); - h_ret = hipz_h_modify_qp(shca->ipz_hca_handle, my_qp->ipz_qp_handle, &my_qp->pf, @@ -1257,10 +1202,9 @@ static int internal_modify_qp(struct ib_ if (h_ret != H_SUCCESS) { ret = ehca2ib_return_code(h_ret); - EDEB_ERR(4, "ENABLE in context of " - "RESET_2_INIT failed! " - "Maybe you didn't get a LID" - "h_ret=%lx ehca_qp=%p qp_num=%x", + ehca_err(ibqp->device, "ENABLE in context of " + "RESET_2_INIT failed! Maybe you didn't get " + "a LID h_ret=%lx ehca_qp=%p qp_num=%x", h_ret, my_qp, ibqp->qp_num); goto modify_qp_exit2; } @@ -1283,91 +1227,60 @@ modify_qp_exit2: modify_qp_exit1: kfree(mqpcb); -modify_qp_exit0: - EDEB_EX(7, "ehca_qp=%p qp_num=%x ibqp_type=%x ret=%x", - my_qp, ibqp->qp_num, ibqp->qp_type, ret); return ret; } int ehca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask) { - int ret = 0; - struct ehca_qp *my_qp = NULL; - struct ehca_pd *my_pd = NULL; + struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + struct ehca_pd *my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, + ib_pd); u32 cur_pid = current->tgid; - EHCA_CHECK_ADR(ibqp); - EHCA_CHECK_ADR(attr); - EHCA_CHECK_ADR(ibqp->device); - - my_qp = container_of(ibqp, struct ehca_qp, ib_qp); - - EDEB_EN(7, "ehca_qp=%p qp_num=%x ibqp_type=%x attr_mask=%x", - my_qp, ibqp->qp_num, ibqp->qp_type, attr_mask); - - my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && my_pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(ibqp->pd->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); - ret = -EINVAL; - } else - ret = internal_modify_qp(ibqp, attr, attr_mask, 0); + return -EINVAL; + } - EDEB_EX(7, "ehca_qp=%p qp_num=%x ibqp_type=%x ret=%x", - my_qp, ibqp->qp_num, ibqp->qp_type, ret); - return ret; + return internal_modify_qp(ibqp, attr, attr_mask, 0); } int ehca_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr) { - struct ehca_qp *my_qp = NULL; - struct ehca_shca *shca = NULL; - struct hcp_modify_qp_control_block *qpcb = NULL; - struct ipz_adapter_handle adapter_handle; - struct ehca_pd *my_pd = NULL; + struct ehca_qp *my_qp = container_of(qp, struct ehca_qp, ib_qp); + struct ehca_pd *my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, + ib_pd); + struct ehca_shca *shca = container_of(qp->device, struct ehca_shca, + ib_device); + struct ipz_adapter_handle adapter_handle = shca->ipz_hca_handle; + struct hcp_modify_qp_control_block *qpcb; u32 cur_pid = current->tgid; - int cnt = 0, ret = 0; - u64 h_ret = H_SUCCESS; + int cnt, ret = 0; + u64 h_ret; - EHCA_CHECK_ADR(qp); - EHCA_CHECK_ADR(qp_attr); - EHCA_CHECK_DEVICE(qp->device); - - my_qp = container_of(qp, struct ehca_qp, ib_qp); - - EDEB_EN(7, "ehca_qp=%p qp_num=%x " - "qp_attr=%p qp_attr_mask=%x qp_init_attr=%p", - my_qp, qp->qp_num, qp_attr, qp_attr_mask, qp_init_attr); - - my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && my_pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(qp->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); - ret = -EINVAL; - goto query_qp_exit0; + return -EINVAL; } - shca = container_of(qp->device, struct ehca_shca, ib_device); - adapter_handle = shca->ipz_hca_handle; - if (qp_attr_mask & QP_ATTR_QUERY_NOT_SUPPORTED) { - ret = -EINVAL; - EDEB_ERR(4,"Invalid attribute mask " + ehca_err(qp->device,"Invalid attribute mask " "ehca_qp=%p qp_num=%x qp_attr_mask=%x ", my_qp, qp->qp_num, qp_attr_mask); - goto query_qp_exit0; + return -EINVAL; } qpcb = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL ); if (!qpcb) { - ret = -ENOMEM; - EDEB_ERR(4,"Out of memory for qpcb " + ehca_err(qp->device,"Out of memory for qpcb " "ehca_qp=%p qp_num=%x", my_qp, qp->qp_num); - goto query_qp_exit0; + return -ENOMEM; } h_ret = hipz_h_query_qp(adapter_handle, @@ -1377,7 +1290,7 @@ int ehca_query_qp(struct ib_qp *qp, if (h_ret != H_SUCCESS) { ret = ehca2ib_return_code(h_ret); - EDEB_ERR(4,"hipz_h_query_qp() failed " + ehca_err(qp->device,"hipz_h_query_qp() failed " "ehca_qp=%p qp_num=%x h_ret=%lx", my_qp, qp->qp_num, h_ret); goto query_qp_exit1; @@ -1385,9 +1298,10 @@ int ehca_query_qp(struct ib_qp *qp, qp_attr->cur_qp_state = ehca2ib_qp_state(qpcb->qp_state); qp_attr->qp_state = qp_attr->cur_qp_state; + if (qp_attr->cur_qp_state == -EINVAL) { ret = -EINVAL; - EDEB_ERR(4,"Got invalid ehca_qp_state=%x " + ehca_err(qp->device,"Got invalid ehca_qp_state=%x " "ehca_qp=%p qp_num=%x", qpcb->qp_state, my_qp, qp->qp_num); goto query_qp_exit1; @@ -1482,54 +1396,33 @@ int ehca_query_qp(struct ib_qp *qp, if (qp_init_attr) *qp_init_attr = my_qp->init_attr; - EDEB(7, "ehca_qp=%p qp_number=%x dest_qp_number=%x " - "dlid=%x path_mtu=%x dest_gid=%lx_%lx " - "service_level=%x qp_state=%x", - my_qp, qpcb->qp_number, qpcb->dest_qp_nr, - qpcb->dlid, qpcb->path_mtu, - qpcb->dest_gid.dw[0], qpcb->dest_gid.dw[1], - qpcb->service_level, qpcb->qp_state); - - EDEB_DMP(7, qpcb, 4*70, "ehca_qp=%p qp_num=%x", my_qp, qp->qp_num); + if (ehca_debug_level) + ehca_dmp(qpcb, 4*70, "qp_num=%x", qp->qp_num); query_qp_exit1: kfree(qpcb); -query_qp_exit0: - EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x", - my_qp, qp->qp_num, ret); return ret; } int ehca_destroy_qp(struct ib_qp *ibqp) { - extern struct ehca_module ehca_module; - struct ehca_qp *my_qp = NULL; - struct ehca_shca *shca = NULL; - struct ehca_pfqp *qp_pf = NULL; - struct ehca_pd *my_pd = NULL; + struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + struct ehca_shca *shca = container_of(ibqp->device, struct ehca_shca, + ib_device); + struct ehca_pd *my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, + ib_pd); u32 cur_pid = current->tgid; - u32 qp_num = 0; - int ret = 0; - u64 h_ret = H_SUCCESS; - u8 port_num = 0; + u32 qp_num = ibqp->qp_num; + int ret; + u64 h_ret; + u8 port_num; enum ib_qp_type qp_type; unsigned long flags; - EHCA_CHECK_ADR(ibqp); - - my_qp = container_of(ibqp, struct ehca_qp, ib_qp); - qp_num = ibqp->qp_num; - qp_pf = &my_qp->pf; - - shca = container_of(ibqp->device, struct ehca_shca, ib_device); - - EDEB_EN(7, "ehca_qp=%p qp_num=%x", my_qp, ibqp->qp_num); - - my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd); if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && my_pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(ibqp->device, "Invalid caller pid=%x ownpid=%x", cur_pid, my_pd->ownpid); return -EINVAL; } @@ -1538,11 +1431,10 @@ int ehca_destroy_qp(struct ib_qp *ibqp) ret = ehca_cq_unassign_qp(my_qp->send_cq, my_qp->real_qp_num); if (ret) { - EDEB_ERR(4, "Couldn't unassign qp from send_cq " - "ret=%x qp_num=%x cq_num=%x", - ret, my_qp->ib_qp.qp_num, - my_qp->send_cq->cq_number); - goto destroy_qp_exit0; + ehca_err(ibqp->device, "Couldn't unassign qp from " + "send_cq ret=%x qp_num=%x cq_num=%x", ret, + my_qp->ib_qp.qp_num, my_qp->send_cq->cq_number); + return ret; } } @@ -1554,17 +1446,25 @@ int ehca_destroy_qp(struct ib_qp *ibqp) if (my_qp->uspace_rqueue) { ret = ehca_munmap(my_qp->uspace_rqueue, my_qp->ipz_rqueue.queue_length); + if (ret) + ehca_err(ibqp->device, "Could not munmap rqueue " + "qp_num=%x", qp_num); ret = ehca_munmap(my_qp->uspace_squeue, my_qp->ipz_squeue.queue_length); + if (ret) + ehca_err(ibqp->device, "Could not munmap squeue " + "qp_num=%x", qp_num); ret = ehca_munmap(my_qp->uspace_fwh, EHCA_PAGESIZE); + if (ret) + ehca_err(ibqp->device, "Could not munmap fwh qp_num=%x", + qp_num); } h_ret = hipz_h_destroy_qp(shca->ipz_hca_handle, my_qp); if (h_ret != H_SUCCESS) { - EDEB_ERR(4, "hipz_h_destroy_qp() failed " - "rc=%lx ehca_qp=%p qp_num=%x", - h_ret, qp_pf, qp_num); - goto destroy_qp_exit0; + ehca_err(ibqp->device, "hipz_h_destroy_qp() failed rc=%lx " + "ehca_qp=%p qp_num=%x", h_ret, my_qp, qp_num); + return ehca2ib_return_code(h_ret); } port_num = my_qp->init_attr.port_num; @@ -1573,9 +1473,8 @@ int ehca_destroy_qp(struct ib_qp *ibqp) /* no support for IB_QPT_SMI yet */ if (qp_type == IB_QPT_GSI) { struct ib_event event; - - EDEB(4, "device %s: port %x is inactive.", - shca->ib_device.name, port_num); + ehca_info(ibqp->device, "device %s: port %x is inactive.", + shca->ib_device.name, port_num); event.device = &shca->ib_device; event.event = IB_EVENT_PORT_ERR; event.element.port_num = port_num; @@ -1585,10 +1484,23 @@ int ehca_destroy_qp(struct ib_qp *ibqp) ipz_queue_dtor(&my_qp->ipz_rqueue); ipz_queue_dtor(&my_qp->ipz_squeue); - kmem_cache_free(ehca_module.cache_qp, my_qp); + kmem_cache_free(qp_cache, my_qp); + return 0; +} -destroy_qp_exit0: - ret = ehca2ib_return_code(h_ret); - EDEB_EX(7,"ret=%x", ret); - return ret; +int ehca_init_qp_cache(void) +{ + qp_cache = kmem_cache_create("ehca_cache_qp", + sizeof(struct ehca_qp), 0, + SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!qp_cache) + return -ENOMEM; + return 0; +} + +void ehca_cleanup_qp_cache(void) +{ + if (qp_cache) + kmem_cache_destroy(qp_cache); } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_reqs.c linux-2.6/drivers/infiniband/hw/ehca/ehca_reqs.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_reqs.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_reqs.c 2006-08-30 20:00:16.000000000 +0200 @@ -41,8 +41,6 @@ */ -#define DEB_PREFIX "reqs" - #include #include "ehca_classes.h" #include "ehca_tools.h" @@ -58,7 +56,7 @@ static inline int ehca_write_rwqe(struct u8 cnt_ds; if (unlikely((recv_wr->num_sge < 0) || (recv_wr->num_sge > ipz_rqueue->act_nr_of_sg))) { - EDEB_ERR(4, "Invalid number of WQE SGE. " + ehca_gen_err("Invalid number of WQE SGE. " "num_sqe=%x max_nr_of_sg=%x", recv_wr->num_sge, ipz_rqueue->act_nr_of_sg); return -EINVAL; /* invalid SG list length */ @@ -79,9 +77,9 @@ static inline int ehca_write_rwqe(struct recv_wr->sg_list[cnt_ds].length; } - if (IS_EDEB_ON(7)) { - EDEB(7, "RECEIVE WQE written into ipz_rqueue=%p", ipz_rqueue); - EDEB_DMP(7, wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "recv wqe"); + if (ehca_debug_level) { + ehca_gen_dbg("RECEIVE WQE written into ipz_rqueue=%p", ipz_rqueue); + ehca_dmp( wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "recv wqe"); } return 0; @@ -94,31 +92,35 @@ static inline int ehca_write_rwqe(struct static void trace_send_wr_ud(const struct ib_send_wr *send_wr) { - int idx = 0; - int j = 0; + int idx; + int j; while (send_wr) { struct ib_mad_hdr *mad_hdr = send_wr->wr.ud.mad_hdr; struct ib_sge *sge = send_wr->sg_list; - EDEB(4, "send_wr#%x wr_id=%lx num_sge=%x " - "send_flags=%x opcode=%x",idx, send_wr->wr_id, - send_wr->num_sge, send_wr->send_flags, send_wr->opcode); + ehca_gen_dbg("send_wr#%x wr_id=%lx num_sge=%x " + "send_flags=%x opcode=%x",idx, send_wr->wr_id, + send_wr->num_sge, send_wr->send_flags, + send_wr->opcode); if (mad_hdr) { - EDEB(4, "send_wr#%x mad_hdr base_version=%x " - "mgmt_class=%x class_version=%x method=%x " - "status=%x class_specific=%x tid=%lx attr_id=%x " - "resv=%x attr_mod=%x", - idx, mad_hdr->base_version, mad_hdr->mgmt_class, - mad_hdr->class_version, mad_hdr->method, - mad_hdr->status, mad_hdr->class_specific, - mad_hdr->tid, mad_hdr->attr_id, mad_hdr->resv, - mad_hdr->attr_mod); + ehca_gen_dbg("send_wr#%x mad_hdr base_version=%x " + "mgmt_class=%x class_version=%x method=%x " + "status=%x class_specific=%x tid=%lx " + "attr_id=%x resv=%x attr_mod=%x", + idx, mad_hdr->base_version, + mad_hdr->mgmt_class, + mad_hdr->class_version, mad_hdr->method, + mad_hdr->status, mad_hdr->class_specific, + mad_hdr->tid, mad_hdr->attr_id, + mad_hdr->resv, + mad_hdr->attr_mod); } for (j = 0; j < send_wr->num_sge; j++) { u8 *data = (u8 *) abs_to_virt(sge->addr); - EDEB(4, "send_wr#%x sge#%x addr=%p length=%x lkey=%x", - idx, j, data, sge->length, sge->lkey); + ehca_gen_dbg("send_wr#%x sge#%x addr=%p length=%x " + "lkey=%x", + idx, j, data, sge->length, sge->lkey); /* assume length is n*16 */ - EDEB_DMP(4, data, sge->length, "send_wr#%x sge#%x", + ehca_dmp(data, sge->length, "send_wr#%x sge#%x", idx, j); sge++; } /* eof for j */ @@ -140,7 +142,7 @@ static inline int ehca_write_swqe(struct if (unlikely((send_wr->num_sge < 0) || (send_wr->num_sge > qp->ipz_squeue.act_nr_of_sg))) { - EDEB_ERR(4, "Invalid number of WQE SGE. " + ehca_gen_err("Invalid number of WQE SGE. " "num_sqe=%x max_nr_of_sg=%x", send_wr->num_sge, qp->ipz_squeue.act_nr_of_sg); return -EINVAL; /* invalid SG list length */ @@ -164,7 +166,7 @@ static inline int ehca_write_swqe(struct wqe_p->optype = WQE_OPTYPE_RDMAREAD; break; default: - EDEB_ERR(4, "Invalid opcode=%x", send_wr->opcode); + ehca_gen_err("Invalid opcode=%x", send_wr->opcode); return -EINVAL; /* invalid opcode */ } @@ -196,7 +198,7 @@ static inline int ehca_write_swqe(struct wqe_p->destination_qp_number = send_wr->wr.ud.remote_qpn << 8; wqe_p->local_ee_context_qkey = remote_qkey; if (!send_wr->wr.ud.ah) { - EDEB_ERR(4, "wr.ud.ah is NULL. qp=%p", qp); + ehca_gen_err("wr.ud.ah is NULL. qp=%p", qp); return -EINVAL; } my_av = container_of(send_wr->wr.ud.ah, struct ehca_av, ib_ah); @@ -254,13 +256,13 @@ static inline int ehca_write_swqe(struct break; default: - EDEB_ERR(4, "Invalid qptype=%x", qp->qp_type); + ehca_gen_err("Invalid qptype=%x", qp->qp_type); return -EINVAL; } - if (IS_EDEB_ON(7)) { - EDEB(7, "SEND WQE written into queue qp=%p ", qp); - EDEB_DMP(7, wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "send wqe"); + if (ehca_debug_level) { + ehca_gen_dbg("SEND WQE written into queue qp=%p ", qp); + ehca_dmp( wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "send wqe"); } return 0; } @@ -355,19 +357,12 @@ int ehca_post_send(struct ib_qp *qp, struct ib_send_wr *send_wr, struct ib_send_wr **bad_send_wr) { - struct ehca_qp *my_qp = NULL; - struct ib_send_wr *cur_send_wr = NULL; - struct ehca_wqe *wqe_p = NULL; + struct ehca_qp *my_qp = container_of(qp, struct ehca_qp, ib_qp); + struct ib_send_wr *cur_send_wr; + struct ehca_wqe *wqe_p; int wqe_cnt = 0; int ret = 0; - unsigned long spl_flags = 0; - - EHCA_CHECK_ADR(qp); - my_qp = container_of(qp, struct ehca_qp, ib_qp); - EHCA_CHECK_QP(my_qp); - EHCA_CHECK_ADR(send_wr); - EDEB_EN(7, "ehca_qp=%p qp_num=%x send_wr=%p bad_send_wr=%p", - my_qp, qp->qp_num, send_wr, bad_send_wr); + unsigned long spl_flags; /* LOCK the QUEUE */ spin_lock_irqsave(&my_qp->spinlock_s, spl_flags); @@ -384,8 +379,8 @@ int ehca_post_send(struct ib_qp *qp, *bad_send_wr = cur_send_wr; if (wqe_cnt == 0) { ret = -ENOMEM; - EDEB_ERR(4, "Too many posted WQEs qp_num=%x", - qp->qp_num); + ehca_err(qp->device, "Too many posted WQEs " + "qp_num=%x", qp->qp_num); } goto post_send_exit0; } @@ -400,14 +395,14 @@ int ehca_post_send(struct ib_qp *qp, *bad_send_wr = cur_send_wr; if (wqe_cnt == 0) { ret = -EINVAL; - EDEB_ERR(4, "Could not write WQE qp_num=%x", - qp->qp_num); + ehca_err(qp->device, "Could not write WQE " + "qp_num=%x", qp->qp_num); } goto post_send_exit0; } wqe_cnt++; - EDEB(7, "ehca_qp=%p qp_num=%x wqe_cnt=%d", - my_qp, qp->qp_num, wqe_cnt); + ehca_dbg(qp->device, "ehca_qp=%p qp_num=%x wqe_cnt=%d", + my_qp, qp->qp_num, wqe_cnt); } /* eof for cur_send_wr */ post_send_exit0: @@ -415,8 +410,6 @@ post_send_exit0: spin_unlock_irqrestore(&my_qp->spinlock_s, spl_flags); iosync(); /* serialize GAL register access */ hipz_update_sqa(my_qp, wqe_cnt); - EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x wqe_cnt=%d", - my_qp, qp->qp_num, ret, wqe_cnt); return ret; } @@ -424,19 +417,12 @@ int ehca_post_recv(struct ib_qp *qp, struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr) { - struct ehca_qp *my_qp = NULL; - struct ib_recv_wr *cur_recv_wr = NULL; - struct ehca_wqe *wqe_p = NULL; + struct ehca_qp *my_qp = container_of(qp, struct ehca_qp, ib_qp); + struct ib_recv_wr *cur_recv_wr; + struct ehca_wqe *wqe_p; int wqe_cnt = 0; int ret = 0; - unsigned long spl_flags = 0; - - EHCA_CHECK_ADR(qp); - my_qp = container_of(qp, struct ehca_qp, ib_qp); - EHCA_CHECK_QP(my_qp); - EHCA_CHECK_ADR(recv_wr); - EDEB_EN(7, "ehca_qp=%p qp_num=%x recv_wr=%p bad_recv_wr=%p", - my_qp, qp->qp_num, recv_wr, bad_recv_wr); + unsigned long spl_flags; /* LOCK the QUEUE */ spin_lock_irqsave(&my_qp->spinlock_r, spl_flags); @@ -453,14 +439,13 @@ int ehca_post_recv(struct ib_qp *qp, *bad_recv_wr = cur_recv_wr; if (wqe_cnt == 0) { ret = -ENOMEM; - EDEB_ERR(4, "Too many posted WQEs qp_num=%x", - qp->qp_num); + ehca_err(qp->device, "Too many posted WQEs " + "qp_num=%x", qp->qp_num); } goto post_recv_exit0; } /* write a RECV WQE into the QUEUE */ - ret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p, - cur_recv_wr); + ret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p, cur_recv_wr); /* * if something failed, * reset the free entry pointer to the start value @@ -470,13 +455,13 @@ int ehca_post_recv(struct ib_qp *qp, *bad_recv_wr = cur_recv_wr; if (wqe_cnt == 0) { ret = -EINVAL; - EDEB_ERR(4, "Could not write WQE qp_num=%x", - qp->qp_num); + ehca_err(qp->device, "Could not write WQE " + "qp_num=%x", qp->qp_num); } goto post_recv_exit0; } wqe_cnt++; - EDEB(7, "ehca_qp=%p qp_num=%x wqe_cnt=%d", + ehca_gen_dbg("ehca_qp=%p qp_num=%x wqe_cnt=%d", my_qp, qp->qp_num, wqe_cnt); } /* eof for cur_recv_wr */ @@ -484,8 +469,6 @@ post_recv_exit0: spin_unlock_irqrestore(&my_qp->spinlock_r, spl_flags); iosync(); /* serialize GAL register access */ hipz_update_rqa(my_qp, wqe_cnt); - EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x wqe_cnt=%d", - my_qp, qp->qp_num, ret, wqe_cnt); return ret; } @@ -510,18 +493,16 @@ static inline int ehca_poll_cq_one(struc { int ret = 0; struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); - struct ehca_cqe *cqe = NULL; + struct ehca_cqe *cqe; int cqe_count = 0; - EDEB_EN(7, "ehca_cq=%p cq_num=%x wc=%p", my_cq, my_cq->cq_number, wc); - poll_cq_one_read_cqe: cqe = (struct ehca_cqe *) ipz_qeit_get_inc_valid(&my_cq->ipz_queue); if (!cqe) { ret = -EAGAIN; - EDEB(7, "Completion queue is empty ehca_cq=%p cq_num=%x " - "ret=%x", my_cq, my_cq->cq_number, ret); + ehca_dbg(cq->device, "Completion queue is empty ehca_cq=%p " + "cq_num=%x ret=%x", my_cq, my_cq->cq_number, ret); goto poll_cq_one_exit0; } @@ -531,13 +512,13 @@ poll_cq_one_read_cqe: cqe_count++; if (unlikely(cqe->status & WC_STATUS_PURGE_BIT)) { struct ehca_qp *qp=ehca_cq_get_qp(my_cq, cqe->local_qp_number); - int purgeflag = 0; - unsigned long spl_flags = 0; + int purgeflag; + unsigned long spl_flags; if (!qp) { - EDEB_ERR(4, "cq_num=%x qp_num=%x " + ehca_err(cq->device, "cq_num=%x qp_num=%x " "could not find qp -> ignore cqe", my_cq->cq_number, cqe->local_qp_number); - EDEB_DMP(4, cqe, 64, "cq_num=%x qp_num=%x", + ehca_dmp(cqe, 64, "cq_num=%x qp_num=%x", my_cq->cq_number, cqe->local_qp_number); /* ignore this purged cqe */ goto poll_cq_one_read_cqe; @@ -547,10 +528,13 @@ poll_cq_one_read_cqe: spin_unlock_irqrestore(&qp->spinlock_s, spl_flags); if (purgeflag) { - EDEB(6, "Got CQE with purged bit qp_num=%x src_qp=%x", - cqe->local_qp_number, cqe->remote_qp_number); - EDEB_DMP(6, cqe, 64, "qp_num=%x src_qp=%x", + ehca_dbg(cq->device, "Got CQE with purged bit qp_num=%x " + "src_qp=%x", cqe->local_qp_number, cqe->remote_qp_number); + if (ehca_debug_level) + ehca_dmp(cqe, 64, "qp_num=%x src_qp=%x", + cqe->local_qp_number, + cqe->remote_qp_number); /* * ignore this to avoid double cqes of bad wqe * that caused sqe and turn off purge flag @@ -561,13 +545,15 @@ poll_cq_one_read_cqe: } /* tracing cqe */ - if (IS_EDEB_ON(7)) { - EDEB(7, "Received COMPLETION ehca_cq=%p cq_num=%x -----", - my_cq, my_cq->cq_number); - EDEB_DMP(7, cqe, 64, "ehca_cq=%p cq_num=%x", + if (ehca_debug_level) { + ehca_dbg(cq->device, + "Received COMPLETION ehca_cq=%p cq_num=%x -----", + my_cq, my_cq->cq_number); + ehca_dmp(cqe, 64, "ehca_cq=%p cq_num=%x", + my_cq, my_cq->cq_number); + ehca_dbg(cq->device, + "ehca_cq=%p cq_num=%x -------------------------", my_cq, my_cq->cq_number); - EDEB(7, "ehca_cq=%p cq_num=%x -------------------------", - my_cq, my_cq->cq_number); } /* we got a completion! */ @@ -576,11 +562,11 @@ poll_cq_one_read_cqe: /* eval ib_wc_opcode */ wc->opcode = ib_wc_opcode[cqe->optype]-1; if (unlikely(wc->opcode == -1)) { - EDEB_ERR(4, "Invalid cqe->OPType=%x cqe->status=%x " + ehca_err(cq->device, "Invalid cqe->OPType=%x cqe->status=%x " "ehca_cq=%p cq_num=%x", cqe->optype, cqe->status, my_cq, my_cq->cq_number); /* dump cqe for other infos */ - EDEB_DMP(4, cqe, 64, "ehca_cq=%p cq_num=%x", + ehca_dmp(cqe, 64, "ehca_cq=%p cq_num=%x", my_cq, my_cq->cq_number); /* update also queue adder to throw away this entry!!! */ goto poll_cq_one_exit0; @@ -604,49 +590,35 @@ poll_cq_one_read_cqe: wc->sl = cqe->service_level; if (wc->status != IB_WC_SUCCESS) - EDEB(6, "ehca_cq=%p cq_num=%x WARNING unsuccessful cqe " - "OPType=%x status=%x qp_num=%x src_qp=%x wr_id=%lx cqe=%p", - my_cq, my_cq->cq_number, cqe->optype, cqe->status, - cqe->local_qp_number, cqe->remote_qp_number, - cqe->work_request_id, cqe); + ehca_dbg(cq->device, + "ehca_cq=%p cq_num=%x WARNING unsuccessful cqe " + "OPType=%x status=%x qp_num=%x src_qp=%x wr_id=%lx " + "cqe=%p", my_cq, my_cq->cq_number, cqe->optype, + cqe->status, cqe->local_qp_number, + cqe->remote_qp_number, cqe->work_request_id, cqe); poll_cq_one_exit0: if (cqe_count > 0) hipz_update_feca(my_cq, cqe_count); - EDEB_EX(7, "ret=%x ehca_cq=%p cq_number=%x wc=%p " - "status=%x opcode=%x qp_num=%x byte_len=%x", - ret, my_cq, my_cq->cq_number, wc, wc->status, - wc->opcode, wc->qp_num, wc->byte_len); - return ret; } int ehca_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc) { - struct ehca_cq *my_cq = NULL; - int nr = 0; - struct ib_wc *current_wc = NULL; + struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); + int nr; + struct ib_wc *current_wc = wc; int ret = 0; - unsigned long spl_flags = 0; - - EHCA_CHECK_CQ(cq); - EHCA_CHECK_ADR(wc); - - my_cq = container_of(cq, struct ehca_cq, ib_cq); - EHCA_CHECK_CQ(my_cq); - - EDEB_EN(7, "ehca_cq=%p cq_num=%x num_entries=%d wc=%p", - my_cq, my_cq->cq_number, num_entries, wc); + unsigned long spl_flags; if (num_entries < 1) { - EDEB_ERR(4, "Invalid num_entries=%d ehca_cq=%p cq_num=%x", - num_entries, my_cq, my_cq->cq_number); + ehca_err(cq->device, "Invalid num_entries=%d ehca_cq=%p " + "cq_num=%x", num_entries, my_cq, my_cq->cq_number); ret = -EINVAL; goto poll_cq_exit0; } - current_wc = wc; spin_lock_irqsave(&my_cq->spinlock, spl_flags); for (nr = 0; nr < num_entries; nr++) { ret = ehca_poll_cq_one(cq, current_wc); @@ -659,22 +631,12 @@ int ehca_poll_cq(struct ib_cq *cq, int n ret = nr; poll_cq_exit0: - EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x wc=%p nr_entries=%d", - my_cq, my_cq->cq_number, ret, wc, nr); - return ret; } int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify) { - struct ehca_cq *my_cq = NULL; - int ret = 0; - - EHCA_CHECK_CQ(cq); - my_cq = container_of(cq, struct ehca_cq, ib_cq); - EHCA_CHECK_CQ(my_cq); - EDEB_EN(7, "ehca_cq=%p cq_num=%x cq_notif=%x", - my_cq, my_cq->cq_number, cq_notify); + struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); switch (cq_notify) { case IB_CQ_SOLICITED: @@ -687,8 +649,5 @@ int ehca_req_notify_cq(struct ib_cq *cq, return -EINVAL; } - EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x", - my_cq, my_cq->cq_number, ret); - - return ret; + return 0; } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_sqp.c linux-2.6/drivers/infiniband/hw/ehca/ehca_sqp.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_sqp.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_sqp.c 2006-08-30 20:00:16.000000000 +0200 @@ -40,8 +40,6 @@ */ -#define DEB_PREFIX "e_qp" - #include #include #include "ehca_classes.h" @@ -51,11 +49,6 @@ #include "hcp_if.h" -extern int ehca_create_aqp1(struct ehca_shca *shca, struct ehca_sport *sport); -extern int ehca_destroy_aqp1(struct ehca_sport *sport); - -extern int ehca_port_act_time; - /** * ehca_define_sqp - Defines special queue pair 1 (GSI QP). When special queue * pair is created successfully, the corresponding port gets active. @@ -69,15 +62,10 @@ u64 ehca_define_sqp(struct ehca_shca *sh struct ehca_qp *ehca_qp, struct ib_qp_init_attr *qp_init_attr) { - - u32 pma_qp_nr = 0; - u32 bma_qp_nr = 0; - u64 ret = H_SUCCESS; + u32 pma_qp_nr, bma_qp_nr; + u64 ret; u8 port = qp_init_attr->port_num; - int counter = 0; - - EDEB_EN(7, "port=%x qp_type=%x", - port, qp_init_attr->qp_type); + int counter; shca->sport[port - 1].port_state = IB_PORT_DOWN; @@ -93,31 +81,31 @@ u64 ehca_define_sqp(struct ehca_shca *sh &pma_qp_nr, &bma_qp_nr); if (ret != H_SUCCESS) { - EDEB_ERR(4, "Can't define AQP1 for port %x. rc=%lx", - port, ret); - goto ehca_define_aqp1; + ehca_err(&shca->ib_device, + "Can't define AQP1 for port %x. rc=%lx", + port, ret); + return ret; } break; default: - ret = H_PARAMETER; - goto ehca_define_aqp1; + ehca_err(&shca->ib_device, "invalid qp_type=%x", + qp_init_attr->qp_type); + return H_PARAMETER; } - while ((shca->sport[port - 1].port_state != IB_PORT_ACTIVE) && - (counter < ehca_port_act_time)) { - EDEB(6, "... wait until port %x is active", - port); + for (counter = 0; + shca->sport[port - 1].port_state != IB_PORT_ACTIVE && + counter < ehca_port_act_time; + counter++) { + ehca_dbg(&shca->ib_device, "... wait until port %x is active", + port); msleep_interruptible(1000); - counter++; } if (counter == ehca_port_act_time) { - EDEB_ERR(4, "Port %x is not active.", port); - ret = H_HARDWARE; + ehca_err(&shca->ib_device, "Port %x is not active.", port); + return H_HARDWARE; } -ehca_define_aqp1: - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return H_SUCCESS; } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_tools.h linux-2.6/drivers/infiniband/hw/ehca/ehca_tools.h --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_tools.h 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_tools.h 2006-08-30 20:00:17.000000000 +0200 @@ -57,195 +57,70 @@ #include #include #include +#include #include #include #include #include -#define EHCA_EDEB_TRACE_MASK_SIZE 32 -extern u8 ehca_edeb_mask[EHCA_EDEB_TRACE_MASK_SIZE]; -#define EDEB_ID_TO_U32(str4) (str4[3] | (str4[2] << 8) | (str4[1] << 16) | \ - (str4[0] << 24)) +extern int ehca_debug_level; -static inline u64 ehca_edeb_filter(const u32 level, - const u32 id, const u32 line) -{ - u64 ret = 0; - u32 filenr = 0; - u32 filter_level = 9; - u32 dynamic_level = 0; - - /* - * This is code written for the gcc -O2 optimizer - * which should collapse to two single ints. - * Filter_level is the first level kicked out by - * compiler and means trace everything below 6. - */ - - if (id == EDEB_ID_TO_U32("ehav")) { - filenr = 0x01; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("clas")) { - filenr = 0x02; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("cqeq")) { - filenr = 0x03; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("shca")) { - filenr = 0x05; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("eirq")) { - filenr = 0x06; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("lMad")) { - filenr = 0x07; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("mcas")) { - filenr = 0x08; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("mrmw")) { - filenr = 0x09; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("vpd ")) { - filenr = 0x0a; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("e_qp")) { - filenr = 0x0b; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("uqes")) { - filenr = 0x0c; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("PHYP")) { - filenr = 0x0d; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("hcpi")) { - filenr = 0x0e; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("iptz")) { - filenr = 0x0f; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("spta")) { - filenr = 0x10; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("simp")) { - filenr = 0x11; - filter_level = 8; - } - if (id == EDEB_ID_TO_U32("reqs")) { - filenr = 0x12; - filter_level = 8; - } - - if ((filenr - 1) > sizeof(ehca_edeb_mask)) { - filenr = 0; - } - - if (filenr == 0) { - filter_level = 9; - } /* default */ - ret = filenr * 0x10000 + line; - if (filter_level <= level) { - return ret | 0x100000000L; /* this is the flag to not trace */ - } - dynamic_level = ehca_edeb_mask[filenr]; - if (likely(dynamic_level <= level)) { - ret = ret | 0x100000000L; - }; - return ret; -} - -#ifdef EHCA_USE_HCALL_KERNEL -#ifdef CONFIG_PPC_PSERIES - -#include +#define ehca_dbg(ib_dev, format, arg...) \ + do { \ + if (unlikely(ehca_debug_level)) \ + dev_printk(KERN_DEBUG, (ib_dev)->dma_device, \ + "PU%04x EHCA_DBG:%s " format "\n", \ + get_paca()->paca_index, __FUNCTION__, \ + ## arg); \ + } while (0) -/* - * IS_EDEB_ON - Checks if debug is on for the given level. - */ -#define IS_EDEB_ON(level) \ -((ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), __LINE__) & \ - 0x100000000L) == 0) - -#define EDEB_P_GENERIC(level,idstring,format,args...) \ -do { \ - u64 ehca_edeb_filterresult = \ - ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), __LINE__);\ - if ((ehca_edeb_filterresult & 0x100000000L) == 0) \ - printk("PU%04x %08x:%s " idstring " "format "\n", \ - get_paca()->paca_index, (u32)(ehca_edeb_filterresult), \ - __func__, ##args); \ -} while (1 == 0) - -#elif REAL_HCALL - -#define EDEB_P_GENERIC(level,idstring,format,args...) \ -do { \ - u64 ehca_edeb_filterresult = \ - ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), __LINE__); \ - if ((ehca_edeb_filterresult & 0x100000000L) == 0) \ - printk("%08x:%s " idstring " "format "\n", \ - (u32)(ehca_edeb_filterresult), \ - __func__, ##args); \ -} while (1 == 0) - -#endif -#else - -#define IS_EDEB_ON(level) (1) - -#define EDEB_P_GENERIC(level,idstring,format,args...) \ -do { \ - printk("%s " idstring " "format "\n", \ - __func__, ##args); \ -} while (1 == 0) +#define ehca_info(ib_dev, format, arg...) \ + dev_info((ib_dev)->dma_device, "PU%04x EHCA_INFO:%s " format "\n", \ + get_paca()->paca_index, __FUNCTION__, ## arg) + +#define ehca_warn(ib_dev, format, arg...) \ + dev_warn((ib_dev)->dma_device, "PU%04x EHCA_WARN:%s " format "\n", \ + get_paca()->paca_index, __FUNCTION__, ## arg) + +#define ehca_err(ib_dev, format, arg...) \ + dev_err((ib_dev)->dma_device, "PU%04x EHCA_ERR:%s " format "\n", \ + get_paca()->paca_index, __FUNCTION__, ## arg) + +/* use this one only if no ib_dev available */ +#define ehca_gen_dbg(format, arg...) \ + do { \ + if (unlikely(ehca_debug_level)) \ + printk(KERN_DEBUG "PU%04x EHCA_DBG:%s " format "\n",\ + get_paca()->paca_index, __FUNCTION__, ## arg); \ + } while (0) -#endif +#define ehca_gen_warn(format, arg...) \ + do { \ + if (unlikely(ehca_debug_level)) \ + printk(KERN_INFO "PU%04x EHCA_WARN:%s " format "\n",\ + get_paca()->paca_index, __FUNCTION__, ## arg); \ + } while (0) -/** - * EDEB - Trace output macro. - * @level: tracelevel - * @format: optional format string, use "" if not desired - * @args: printf like arguments for trace - */ -#define EDEB(level,format,args...) \ - EDEB_P_GENERIC(level,"",format,##args) -#define EDEB_ERR(level,format,args...) \ - EDEB_P_GENERIC(level,"HCAD_ERROR ",format,##args) -#define EDEB_EN(level,format,args...) \ - EDEB_P_GENERIC(level,">>>",format,##args) -#define EDEB_EX(level,format,args...) \ - EDEB_P_GENERIC(level,"<<<",format,##args) +#define ehca_gen_err(format, arg...) \ + printk(KERN_ERR "PU%04x EHCA_ERR:%s " format "\n", \ + get_paca()->paca_index, __FUNCTION__, ## arg) /** - * EDEB_DMP - macro to dump a memory block, whose length is n*8 bytes. + * ehca_dmp - printk a memory block, whose length is n*8 bytes. * Each line has the following layout: * adr=X ofs=Y <8 bytes hex> <8 bytes hex> */ -#define EDEB_DMP(level,adr,len,format,args...) \ +#define ehca_dmp(adr, len, format, args...) \ do { \ unsigned int x; \ unsigned int l = (unsigned int)(len); \ unsigned char *deb = (unsigned char*)(adr); \ for (x = 0; x < l; x += 16) { \ - EDEB(level, format " adr=%p ofs=%04x %016lx %016lx", \ - ##args, deb, x, \ - *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ + printk("EHCA_DMP:%s" format \ + " adr=%p ofs=%04x %016lx %016lx\n", \ + __FUNCTION__, ##args, deb, x, \ + *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ deb += 16; \ } \ } while (0) @@ -275,129 +150,8 @@ do { \ * EHCA_BMASK_GET - extract a parameter from value by mask */ #define EHCA_BMASK_GET(mask,value) \ - ( EHCA_BMASK_MASK(mask)& (((u64)(value))>>EHCA_BMASK_SHIFTPOS(mask))) - -#define PARANOIA_MODE -#ifdef PARANOIA_MODE + (EHCA_BMASK_MASK(mask)& (((u64)(value))>>EHCA_BMASK_SHIFTPOS(mask))) -#define EHCA_CHECK_ADR_P(adr) \ - if (unlikely(adr == 0)) { \ - EDEB_ERR(4, "adr=%p check failed line %i", adr, \ - __LINE__); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_ADR(adr) \ - if (unlikely(adr == 0)) { \ - EDEB_ERR(4, "adr=%p check failed line %i", adr, \ - __LINE__); \ - return -EFAULT; } - -#define EHCA_CHECK_DEVICE_P(device) \ - if (unlikely(device == 0)) { \ - EDEB_ERR(4, "device=%p check failed", device); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_DEVICE(device) \ - if (unlikely(device == 0)) { \ - EDEB_ERR(4, "device=%p check failed", device); \ - return -EFAULT; } - -#define EHCA_CHECK_PD(pd) \ - if (unlikely(pd == 0)) { \ - EDEB_ERR(4, "pd=%p check failed", pd); \ - return -EFAULT; } - -#define EHCA_CHECK_PD_P(pd) \ - if (unlikely(pd == 0)) { \ - EDEB_ERR(4, "pd=%p check failed", pd); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_AV(av) \ - if (unlikely(av == 0)) { \ - EDEB_ERR(4, "av=%p check failed", av); \ - return -EFAULT; } - -#define EHCA_CHECK_AV_P(av) \ - if (unlikely(av == 0)) { \ - EDEB_ERR(4, "av=%p check failed", av); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_CQ(cq) \ - if (unlikely(cq == 0)) { \ - EDEB_ERR(4, "cq=%p check failed", cq); \ - return -EFAULT; } - -#define EHCA_CHECK_CQ_P(cq) \ - if (unlikely(cq == 0)) { \ - EDEB_ERR(4, "cq=%p check failed", cq); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_EQ(eq) \ - if (unlikely(eq == 0)) { \ - EDEB_ERR(4, "eq=%p check failed", eq); \ - return -EFAULT; } - -#define EHCA_CHECK_EQ_P(eq) \ - if (unlikely(eq == 0)) { \ - EDEB_ERR(4, "eq=%p check failed", eq); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_QP(qp) \ - if (unlikely(qp == 0)) { \ - EDEB_ERR(4, "qp=%p check failed", qp); \ - return -EFAULT; } - -#define EHCA_CHECK_QP_P(qp) \ - if (unlikely(qp == 0)) { \ - EDEB_ERR(4, "qp=%p check failed", qp); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_MR(mr) \ - if (unlikely(mr == 0)) { \ - EDEB_ERR(4, "mr=%p check failed", mr); \ - return -EFAULT; } - -#define EHCA_CHECK_MR_P(mr) \ - if (unlikely(mr == 0)) { \ - EDEB_ERR(4, "mr=%p check failed", mr); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_MW(mw) \ - if (unlikely(mw == 0)) { \ - EDEB_ERR(4, "mw=%p check failed", mw); \ - return -EFAULT; } - -#define EHCA_CHECK_MW_P(mw) \ - if (unlikely(mw == 0)) { \ - EDEB_ERR(4, "mw=%p check failed", mw); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_CHECK_FMR(fmr) \ - if (unlikely(fmr == 0)) { \ - EDEB_ERR(4, "fmr=%p check failed", fmr); \ - return -EFAULT; } - -#define EHCA_CHECK_FMR_P(fmr) \ - if (unlikely(fmr == 0)) { \ - EDEB_ERR(4, "fmr=%p check failed", fmr); \ - return ERR_PTR(-EFAULT); } - -#define EHCA_REGISTER_PD(device,pd) -#define EHCA_REGISTER_AV(pd,av) -#define EHCA_DEREGISTER_PD(PD) -#define EHCA_DEREGISTER_AV(av) -#else -#define EHCA_CHECK_DEVICE_P(device) - -#define EHCA_CHECK_PD(pd) -#define EHCA_REGISTER_PD(device,pd) -#define EHCA_DEREGISTER_PD(PD) -#endif - -static inline int ehca_adr_bad(void *adr) -{ - return !adr; -} /* Converts ehca to ib return code */ static inline int ehca2ib_return_code(u64 ehca_rc) @@ -414,4 +168,5 @@ static inline int ehca2ib_return_code(u6 } } + #endif /* EHCA_TOOLS_H */ diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_uverbs.c linux-2.6/drivers/infiniband/hw/ehca/ehca_uverbs.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_uverbs.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ehca_uverbs.c 2006-08-30 20:00:16.000000000 +0200 @@ -40,9 +40,6 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#undef DEB_PREFIX -#define DEB_PREFIX "uver" - #include #include "ehca_classes.h" @@ -54,30 +51,20 @@ struct ib_ucontext *ehca_alloc_ucontext(struct ib_device *device, struct ib_udata *udata) { - struct ehca_ucontext *my_context = NULL; - - EHCA_CHECK_ADR_P(device); - EDEB_EN(7, "device=%p name=%s", device, device->name); + struct ehca_ucontext *my_context; my_context = kzalloc(sizeof *my_context, GFP_KERNEL); if (!my_context) { - EDEB_ERR(4, "Out of memory device=%p", device); + ehca_err(device, "Out of memory device=%p", device); return ERR_PTR(-ENOMEM); } - EDEB_EX(7, "device=%p ucontext=%p", device, my_context); - return &my_context->ib_ucontext; } int ehca_dealloc_ucontext(struct ib_ucontext *context) { - struct ehca_ucontext *my_context = NULL; - EHCA_CHECK_ADR(context); - EDEB_EN(7, "ucontext=%p", context); - my_context = container_of(context, struct ehca_ucontext, ib_ucontext); - kfree(my_context); - EDEB_EN(7, "ucontext=%p", context); + kfree(container_of(context, struct ehca_ucontext, ib_ucontext)); return 0; } @@ -91,83 +78,88 @@ struct page *ehca_nopage(struct vm_area_ u32 rsrc_type = (fileoffset >> 24) & 0xF; /* sq,rq,cmnd_window */ u32 cur_pid = current->tgid; unsigned long flags; + struct ehca_cq *cq; + struct ehca_qp *qp; + struct ehca_pd *pd; + u64 offset; + void *vaddr; - EDEB_EN(7, "vm_start=%lx vm_end=%lx vm_page_prot=%lx vm_fileoff=%lx " - "address=%lx", - vma->vm_start, vma->vm_end, vma->vm_page_prot, fileoffset, - address); - - if (q_type == 1) { /* CQ */ - struct ehca_cq *cq = NULL; - u64 offset; - void *vaddr = NULL; - + switch (q_type) { + case 1: /* CQ */ spin_lock_irqsave(&ehca_cq_idr_lock, flags); cq = idr_find(&ehca_cq_idr, idr_handle); spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - if (cq->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", - cur_pid, cq->ownpid); + /* make sure this mmap really belongs to the authorized user */ + if (!cq) { + ehca_gen_err("cq is NULL ret=NOPAGE_SIGBUS"); return NOPAGE_SIGBUS; } - /* make sure this mmap really belongs to the authorized user */ - if (!cq) { - EDEB_ERR(4, "cq is NULL ret=NOPAGE_SIGBUS"); + if (cq->ownpid != cur_pid) { + ehca_err(cq->ib_cq.device, + "Invalid caller pid=%x ownpid=%x", + cur_pid, cq->ownpid); return NOPAGE_SIGBUS; } + if (rsrc_type == 2) { - EDEB(6, "cq=%p cq queuearea", cq); + ehca_dbg(cq->ib_cq.device, "cq=%p cq queuearea", cq); offset = address - vma->vm_start; vaddr = ipz_qeit_calc(&cq->ipz_queue, offset); - EDEB(6, "offset=%lx vaddr=%p", offset, vaddr); + ehca_dbg(cq->ib_cq.device, "offset=%lx vaddr=%p", + offset, vaddr); mypage = virt_to_page(vaddr); } - } else if (q_type == 2) { /* QP */ - struct ehca_qp *qp = NULL; - struct ehca_pd *pd = NULL; - u64 offset; - void *vaddr = NULL; + break; + case 2: /* QP */ spin_lock_irqsave(&ehca_qp_idr_lock, flags); qp = idr_find(&ehca_qp_idr, idr_handle); spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + /* make sure this mmap really belongs to the authorized user */ + if (!qp) { + ehca_gen_err("qp is NULL ret=NOPAGE_SIGBUS"); + return NOPAGE_SIGBUS; + } pd = container_of(qp->ib_qp.pd, struct ehca_pd, ib_pd); if (pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(qp->ib_qp.device, + "Invalid caller pid=%x ownpid=%x", cur_pid, pd->ownpid); return NOPAGE_SIGBUS; } - /* make sure this mmap really belongs to the authorized user */ - if (!qp) { - EDEB_ERR(4, "qp is NULL ret=NOPAGE_SIGBUS"); - return NOPAGE_SIGBUS; - } if (rsrc_type == 2) { /* rqueue */ - EDEB(6, "qp=%p qp rqueuearea", qp); + ehca_dbg(qp->ib_qp.device, "qp=%p qp rqueuearea", qp); offset = address - vma->vm_start; vaddr = ipz_qeit_calc(&qp->ipz_rqueue, offset); - EDEB(6, "offset=%lx vaddr=%p", offset, vaddr); + ehca_dbg(qp->ib_qp.device, "offset=%lx vaddr=%p", + offset, vaddr); mypage = virt_to_page(vaddr); } else if (rsrc_type == 3) { /* squeue */ - EDEB(6, "qp=%p qp squeuearea", qp); + ehca_dbg(qp->ib_qp.device, "qp=%p qp squeuearea", qp); offset = address - vma->vm_start; vaddr = ipz_qeit_calc(&qp->ipz_squeue, offset); - EDEB(6, "offset=%lx vaddr=%p", offset, vaddr); + ehca_dbg(qp->ib_qp.device, "offset=%lx vaddr=%p", + offset, vaddr); mypage = virt_to_page(vaddr); } + break; + + default: + ehca_gen_err("bad queue type %x", q_type); + return NOPAGE_SIGBUS; } if (!mypage) { - EDEB_ERR(4, "Invalid page adr==NULL ret=NOPAGE_SIGBUS"); + ehca_gen_err("Invalid page adr==NULL ret=NOPAGE_SIGBUS"); return NOPAGE_SIGBUS; } get_page(mypage); - EDEB_EX(7, "page adr=%p", mypage); + return mypage; } @@ -181,159 +173,161 @@ int ehca_mmap(struct ib_ucontext *contex u32 idr_handle = fileoffset >> 32; u32 q_type = (fileoffset >> 28) & 0xF; /* CQ, QP,... */ u32 rsrc_type = (fileoffset >> 24) & 0xF; /* sq,rq,cmnd_window */ - u32 ret = -EFAULT; /* assume the worst */ - u64 vsize = 0; /* must be calculated/set below */ - u64 physical = 0; /* must be calculated/set below */ u32 cur_pid = current->tgid; + u32 ret; + u64 vsize, physical; unsigned long flags; + struct ehca_cq *cq; + struct ehca_qp *qp; + struct ehca_pd *pd; - EDEB_EN(7, "vm_start=%lx vm_end=%lx vm_page_prot=%lx vm_fileoff=%lx", - vma->vm_start, vma->vm_end, vma->vm_page_prot, fileoffset); - - if (q_type == 1) { /* CQ */ - struct ehca_cq *cq; - + switch (q_type) { + case 1: /* CQ */ spin_lock_irqsave(&ehca_cq_idr_lock, flags); cq = idr_find(&ehca_cq_idr, idr_handle); spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + /* make sure this mmap really belongs to the authorized user */ + if (!cq) + return -EINVAL; + if (cq->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(cq->ib_cq.device, + "Invalid caller pid=%x ownpid=%x", cur_pid, cq->ownpid); return -ENOMEM; } - /* make sure this mmap really belongs to the authorized user */ - if (!cq) - return -EINVAL; - if (!cq->ib_cq.uobject) - return -EINVAL; - if (cq->ib_cq.uobject->context != context) + if (!cq->ib_cq.uobject || cq->ib_cq.uobject->context != context) return -EINVAL; - if (rsrc_type == 1) { /* galpa fw handle */ - EDEB(6, "cq=%p cq triggerarea", cq); + + switch (rsrc_type) { + case 1: /* galpa fw handle */ + ehca_dbg(cq->ib_cq.device, "cq=%p cq triggerarea", cq); vma->vm_flags |= VM_RESERVED; vsize = vma->vm_end - vma->vm_start; if (vsize != EHCA_PAGESIZE) { - EDEB_ERR(4, "invalid vsize=%lx", + ehca_err(cq->ib_cq.device, "invalid vsize=%lx", vma->vm_end - vma->vm_start); - ret = -EINVAL; - goto mmap_exit0; + return -EINVAL; } physical = cq->galpas.user.fw_handle; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_flags |= VM_IO | VM_RESERVED; - EDEB(6, "vsize=%lx physical=%lx", vsize, physical); + ehca_dbg(cq->ib_cq.device, + "vsize=%lx physical=%lx", vsize, physical); ret = remap_pfn_range(vma, vma->vm_start, physical >> PAGE_SHIFT, vsize, vma->vm_page_prot); if (ret) { - EDEB_ERR(4, "remap_pfn_range() failed ret=%x", + ehca_err(cq->ib_cq.device, + "remap_pfn_range() failed ret=%x", ret); - ret = -ENOMEM; + return -ENOMEM; } - goto mmap_exit0; - } else if (rsrc_type == 2) { /* cq queue_addr */ - EDEB(6, "cq=%p cq q_addr", cq); + break; + + case 2: /* cq queue_addr */ + ehca_dbg(cq->ib_cq.device, "cq=%p cq q_addr", cq); vma->vm_flags |= VM_RESERVED; vma->vm_ops = &ehcau_vm_ops; - ret = 0; - goto mmap_exit0; - } else { - EDEB_ERR(6, "bad resource type %x", rsrc_type); - ret = -EINVAL; - goto mmap_exit0; + break; + + default: + ehca_err(cq->ib_cq.device, "bad resource type %x", + rsrc_type); + return -EINVAL; } - } else if (q_type == 2) { /* QP */ - struct ehca_qp *qp = NULL; - struct ehca_pd *pd = NULL; + break; + case 2: /* QP */ spin_lock_irqsave(&ehca_qp_idr_lock, flags); qp = idr_find(&ehca_qp_idr, idr_handle); spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + /* make sure this mmap really belongs to the authorized user */ + if (!qp) + return -EINVAL; + pd = container_of(qp->ib_qp.pd, struct ehca_pd, ib_pd); if (pd->ownpid != cur_pid) { - EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + ehca_err(qp->ib_qp.device, + "Invalid caller pid=%x ownpid=%x", cur_pid, pd->ownpid); return -ENOMEM; } - /* make sure this mmap really belongs to the authorized user */ - if (!qp || !qp->ib_qp.uobject || - qp->ib_qp.uobject->context != context) { - EDEB(6, "qp=%p, uobject=%p, context=%p", - qp, qp->ib_qp.uobject, qp->ib_qp.uobject->context); - ret = -EINVAL; - goto mmap_exit0; - } - if (rsrc_type == 1) { /* galpa fw handle */ - EDEB(6, "qp=%p qp triggerarea", qp); + if (!qp->ib_qp.uobject || qp->ib_qp.uobject->context != context) + return -EINVAL; + + switch (rsrc_type) { + case 1: /* galpa fw handle */ + ehca_dbg(qp->ib_qp.device, "qp=%p qp triggerarea", qp); vma->vm_flags |= VM_RESERVED; vsize = vma->vm_end - vma->vm_start; if (vsize != EHCA_PAGESIZE) { - EDEB_ERR(4, "invalid vsize=%lx", + ehca_err(qp->ib_qp.device, "invalid vsize=%lx", vma->vm_end - vma->vm_start); - ret = -EINVAL; - goto mmap_exit0; + return -EINVAL; } physical = qp->galpas.user.fw_handle; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_flags |= VM_IO | VM_RESERVED; - EDEB(6, "vsize=%lx physical=%lx", vsize, physical); + ehca_dbg(qp->ib_qp.device, "vsize=%lx physical=%lx", + vsize, physical); ret = remap_pfn_range(vma, vma->vm_start, physical >> PAGE_SHIFT, vsize, vma->vm_page_prot); if (ret) { - EDEB_ERR(4, "remap_pfn_range() failed ret=%x", + ehca_err(qp->ib_qp.device, + "remap_pfn_range() failed ret=%x", ret); - ret = -ENOMEM; + return -ENOMEM; } - goto mmap_exit0; - } else if (rsrc_type == 2) { /* qp rqueue_addr */ - EDEB(6, "qp=%p qp rqueue_addr", qp); + break; + + case 2: /* qp rqueue_addr */ + ehca_dbg(qp->ib_qp.device, "qp=%p qp rqueue_addr", qp); vma->vm_flags |= VM_RESERVED; vma->vm_ops = &ehcau_vm_ops; - ret = 0; - goto mmap_exit0; - } else if (rsrc_type == 3) { /* qp squeue_addr */ - EDEB(6, "qp=%p qp squeue_addr", qp); + break; + + case 3: /* qp squeue_addr */ + ehca_dbg(qp->ib_qp.device, "qp=%p qp squeue_addr", qp); vma->vm_flags |= VM_RESERVED; vma->vm_ops = &ehcau_vm_ops; - ret = 0; - goto mmap_exit0; - } else { - EDEB_ERR(4, "bad resource type %x", rsrc_type); - ret = -EINVAL; - goto mmap_exit0; + break; + + default: + ehca_err(qp->ib_qp.device, "bad resource type %x", + rsrc_type); + return -EINVAL; } - } else { - EDEB_ERR(4, "bad queue type %x", q_type); - ret = -EINVAL; - goto mmap_exit0; + break; + + default: + ehca_gen_err("bad queue type %x", q_type); + return -EINVAL; } -mmap_exit0: - EDEB_EX(7, "ret=%x", ret); - return ret; + return 0; } -int ehca_mmap_nopage(u64 foffset, u64 length, void ** mapped, - struct vm_area_struct ** vma) +int ehca_mmap_nopage(u64 foffset, u64 length, void **mapped, + struct vm_area_struct **vma) { - EDEB_EN(7, "foffset=%lx length=%lx", foffset, length); down_write(¤t->mm->mmap_sem); *mapped = (void*)do_mmap(NULL,0, length, PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, foffset); up_write(¤t->mm->mmap_sem); if (!(*mapped)) { - EDEB_ERR(4, "couldn't mmap foffset=%lx length=%lx", - foffset, length); + ehca_gen_err("couldn't mmap foffset=%lx length=%lx", + foffset, length); return -EINVAL; } @@ -342,49 +336,47 @@ int ehca_mmap_nopage(u64 foffset, u64 le down_write(¤t->mm->mmap_sem); do_munmap(current->mm, 0, length); up_write(¤t->mm->mmap_sem); - EDEB_ERR(4, "couldn't find vma queue=%p", *mapped); + ehca_gen_err("couldn't find vma queue=%p", *mapped); return -EINVAL; } (*vma)->vm_flags |= VM_RESERVED; (*vma)->vm_ops = &ehcau_vm_ops; - EDEB_EX(7, "mapped=%p", *mapped); return 0; } -int ehca_mmap_register(u64 physical, void ** mapped, - struct vm_area_struct ** vma) +int ehca_mmap_register(u64 physical, void **mapped, + struct vm_area_struct **vma) { - int ret = 0; + int ret; unsigned long vsize; /* ehca hw supports only 4k page */ ret = ehca_mmap_nopage(0, EHCA_PAGESIZE, mapped, vma); if (ret) { - EDEB(4, "could'nt mmap physical=%lx", physical); + ehca_gen_err("could'nt mmap physical=%lx", physical); return ret; } (*vma)->vm_flags |= VM_RESERVED; vsize = (*vma)->vm_end - (*vma)->vm_start; if (vsize != EHCA_PAGESIZE) { - EDEB_ERR(4, "invalid vsize=%lx", - (*vma)->vm_end - (*vma)->vm_start); - ret = -EINVAL; - return ret; + ehca_gen_err("invalid vsize=%lx", + (*vma)->vm_end - (*vma)->vm_start); + return -EINVAL; } (*vma)->vm_page_prot = pgprot_noncached((*vma)->vm_page_prot); (*vma)->vm_flags |= VM_IO | VM_RESERVED; - EDEB(6, "vsize=%lx physical=%lx", vsize, physical); ret = remap_pfn_range((*vma), (*vma)->vm_start, physical >> PAGE_SHIFT, vsize, (*vma)->vm_page_prot); if (ret) { - EDEB_ERR(4, "remap_pfn_range() failed ret=%x", ret); - ret = -ENOMEM; + ehca_gen_err("remap_pfn_range() failed ret=%x", ret); + return -ENOMEM; } - return ret; + + return 0; } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_if.c linux-2.6/drivers/infiniband/hw/ehca/hcp_if.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_if.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/hcp_if.c 2006-08-30 20:00:17.000000000 +0200 @@ -41,13 +41,12 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#define DEB_PREFIX "hcpi" - #include #include "ehca_tools.h" #include "hcp_if.h" #include "hcp_phyp.h" #include "hipz_fns.h" +#include "ipz_pt_fn.h" #define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9,11) #define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12,12) @@ -112,12 +111,12 @@ static long ehca_hcall_7arg_7ret(unsigne unsigned long *out6, unsigned long *out7) { - long ret = H_SUCCESS; + long ret; int i, sleep_msecs; - EDEB_EN(7, "opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx" - " arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5, - arg6, arg7); + ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx " + "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5, + arg6, arg7); for (i = 0; i < 5; i++) { ret = plpar_hcall_7arg_7ret(opcode, @@ -133,26 +132,24 @@ static long ehca_hcall_7arg_7ret(unsigne } if (ret < H_SUCCESS) - EDEB_ERR(4, "opcode=%lx ret=%lx" - " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" - " arg5=%lx arg6=%lx arg7=%lx" - " out1=%lx out2=%lx out3=%lx out4=%lx" - " out5=%lx out6=%lx out7=%lx", - opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7); - - EDEB_EX(7, "opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " - "out4=%lx out5=%lx out6=%lx out7=%lx", - opcode, ret, *out1, *out2, *out3, *out4, *out5, - *out6, *out7); + ehca_gen_err("opcode=%lx ret=%lx" + " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" + " arg5=%lx arg6=%lx arg7=%lx" + " out1=%lx out2=%lx out3=%lx out4=%lx" + " out5=%lx out6=%lx out7=%lx", + opcode, ret, + arg1, arg2, arg3, arg4, + arg5, arg6, arg7, + *out1, *out2, *out3, *out4, + *out5, *out6, *out7); + + ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " + "out4=%lx out5=%lx out6=%lx out7=%lx", + opcode, ret, *out1, *out2, *out3, *out4, *out5, + *out6, *out7); return ret; } - EDEB_EX(7, "opcode=%lx ret=H_BUSY", opcode); - return H_BUSY; } @@ -176,14 +173,13 @@ static long ehca_hcall_9arg_9ret(unsigne unsigned long *out8, unsigned long *out9) { - long ret = H_SUCCESS; + long ret; int i, sleep_msecs; - EDEB_EN(7, "opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx " - "arg5=%lx arg6=%lx arg7=%lx arg8=%lx arg9=%lx", - opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7, - arg8, arg9); - + ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx " + "arg5=%lx arg6=%lx arg7=%lx arg8=%lx arg9=%lx", + opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7, + arg8, arg9); for (i = 0; i < 5; i++) { ret = plpar_hcall_9arg_9ret(opcode, @@ -201,32 +197,32 @@ static long ehca_hcall_9arg_9ret(unsigne } if (ret < H_SUCCESS) - EDEB_ERR(4, "opcode=%lx ret=%lx" - " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" - " arg5=%lx arg6=%lx arg7=%lx arg8=%lx" - " arg9=%lx" - " out1=%lx out2=%lx out3=%lx out4=%lx" - " out5=%lx out6=%lx out7=%lx out8=%lx" - " out9=%lx", - opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, - *out9); - - EDEB_EX(7, "opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " - "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx out9=%lx", - opcode, ret,*out1, *out2, *out3, *out4, *out5, *out6, - *out7, *out8, *out9); + ehca_gen_err("opcode=%lx ret=%lx" + " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" + " arg5=%lx arg6=%lx arg7=%lx arg8=%lx" + " arg9=%lx" + " out1=%lx out2=%lx out3=%lx out4=%lx" + " out5=%lx out6=%lx out7=%lx out8=%lx" + " out9=%lx", + opcode, ret, + arg1, arg2, arg3, arg4, + arg5, arg6, arg7, arg8, + arg9, + *out1, *out2, *out3, *out4, + *out5, *out6, *out7, *out8, + *out9); + + ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " + "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx " + "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4, + *out5, *out6, *out7, *out8, *out9); return ret; } - EDEB_EX(7, "opcode=%lx ret=H_BUSY", opcode); return H_BUSY; } + u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle, struct ehca_pfeq *pfeq, const u32 neq_control, @@ -236,18 +232,10 @@ u64 hipz_h_alloc_resource_eq(const struc u32 * act_pages, u32 * eq_ist) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 act_nr_of_entries_out = 0; - u64 act_pages_out = 0; - u64 eq_ist_out = 0; - u64 allocate_controls = 0; - u32 x = (u64)(&x); - - EDEB_EN(7, "pfeq=%p adapter_handle=%lx new_control=%x" - " number_of_entries=%x", - pfeq, adapter_handle.handle, neq_control, - number_of_entries); + u64 allocate_controls; + u64 act_nr_of_entries_out, act_pages_out, eq_ist_out; /* resource type */ allocate_controls = 3ULL; @@ -276,10 +264,7 @@ u64 hipz_h_alloc_resource_eq(const struc *eq_ist = (u32)eq_ist_out; if (ret == H_NOT_ENOUGH_RESOURCES) - EDEB_ERR(4, "Not enough resource - ret=%lx ", ret); - - EDEB_EX(7, "act_nr_of_entries=%x act_pages=%x eq_ist=%x", - *act_nr_of_entries, *act_pages, *eq_ist); + ehca_gen_err("Not enough resource - ret=%lx ", ret); return ret; } @@ -288,45 +273,30 @@ u64 hipz_h_reset_event(const struct ipz_ struct ipz_eq_handle eq_handle, const u64 event_mask) { - u64 ret = H_SUCCESS; u64 dummy; - EDEB_EN(7, "eq_handle=%lx, adapter_handle=%lx event_mask=%lx", - eq_handle.handle, adapter_handle.handle, event_mask); - - ret = ehca_hcall_7arg_7ret(H_RESET_EVENTS, - adapter_handle.handle, /* r4 */ - eq_handle.handle, /* r5 */ - event_mask, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - EDEB(7, "ret=%lx", ret); - - return ret; + return ehca_hcall_7arg_7ret(H_RESET_EVENTS, + adapter_handle.handle, /* r4 */ + eq_handle.handle, /* r5 */ + event_mask, /* r6 */ + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, struct ehca_cq *cq, struct ehca_alloc_cq_parms *param) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 act_nr_of_entries_out; - u64 act_pages_out; - u64 g_la_privileged_out; - u64 g_la_user_out; - - EDEB_EN(7, "Adapter_handle=%lx eq_handle=%lx cq_token=%x" - " cq_number_of_entries=%x", - adapter_handle.handle, param->eq_handle.handle, - cq->token, param->nr_cqe); + u64 act_nr_of_entries_out, act_pages_out; + u64 g_la_privileged_out, g_la_user_out; ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, adapter_handle.handle, /* r4 */ @@ -350,10 +320,7 @@ u64 hipz_h_alloc_resource_cq(const struc hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out); if (ret == H_NOT_ENOUGH_RESOURCES) - EDEB_ERR(4, "Not enough resources. ret=%lx", ret); - - EDEB_EX(7, "cq_handle=%lx act_nr_of_entries=%x act_pages=%x", - cq->ipz_cq_handle.handle, param->act_nr_of_entries, param->act_pages); + ehca_gen_err("Not enough resources. ret=%lx", ret); return ret; } @@ -362,32 +329,13 @@ u64 hipz_h_alloc_resource_qp(const struc struct ehca_qp *qp, struct ehca_alloc_qp_parms *parms) { - u64 ret = H_SUCCESS; - u64 allocate_controls; - u64 max_r10_reg; - u64 dummy = 0; - u64 qp_nr_out = 0; - u64 r6_out = 0; - u64 r7_out = 0; - u64 r8_out = 0; - u64 g_la_user_out = 0; - u64 r11_out = 0; + u64 ret; + u64 dummy, allocate_controls, max_r10_reg; + u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out; u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1; u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1; int daqp_ctrl = parms->daqp_ctrl; - EDEB_EN(7, "Adapter_handle=%lx servicetype=%x signalingtype=%x" - " ud_av_l_key=%x send_cq_handle=%lx receive_cq_handle=%lx" - " async_eq_handle=%lx qp_token=%x pd=%x max_nr_send_wqes=%x" - " max_nr_receive_wqes=%x max_nr_send_sges=%x" - " max_nr_receive_sges=%x ud_av_l_key=%x galpa.pid=%x", - adapter_handle.handle, parms->servicetype, parms->sigtype, - parms->ud_av_l_key_ctl, qp->send_cq->ipz_cq_handle.handle, - qp->recv_cq->ipz_cq_handle.handle, parms->ipz_eq_handle.handle, - qp->token, parms->pd.value, max_nr_send_wqes, - max_nr_receive_wqes, parms->max_send_sge, parms->max_recv_sge, - parms->ud_av_l_key_ctl, qp->galpas.pid); - allocate_controls = EHCA_BMASK_SET(H_ALL_RES_QP_ENHANCED_OPS, (daqp_ctrl & DAQP_CTRL_ENABLE) ? 1 : 0) @@ -453,17 +401,7 @@ u64 hipz_h_alloc_resource_qp(const struc hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out); if (ret == H_NOT_ENOUGH_RESOURCES) - EDEB_ERR(4, "Not enough resources. ret=%lx",ret); - - EDEB_EX(7, "qp_nr=%x act_nr_send_wqes=%x" - " act_nr_receive_wqes=%x act_nr_send_sges=%x" - " act_nr_receive_sges=%x nr_sq_pages=%x" - " nr_rq_pages=%x galpa.user=%lx galpa.kernel=%lx", - qp->real_qp_num, parms->act_nr_send_wqes, - parms->act_nr_recv_wqes, parms->act_nr_send_sges, - parms->act_nr_recv_sges, parms->nr_sq_pages, - parms->nr_rq_pages, qp->galpas.user.fw_handle, - qp->galpas.kernel.fw_handle); + ehca_gen_err("Not enough resources. ret=%lx",ret); return ret; } @@ -472,20 +410,15 @@ u64 hipz_h_query_port(const struct ipz_a const u8 port_id, struct hipz_query_port *query_port_response_block) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 r_cb; - - EDEB_EN(7, "adapter_handle=%lx port_id %x", - adapter_handle.handle, port_id); + u64 r_cb = virt_to_abs(query_port_response_block); - if (((u64)query_port_response_block) & 0xfff) { - EDEB_ERR(4, "response block not page aligned"); + if (r_cb & (EHCA_PAGESIZE-1)) { + ehca_gen_err("response block not page aligned"); return H_PARAMETER; } - r_cb = virt_to_abs(query_port_response_block); - ret = ehca_hcall_7arg_7ret(H_QUERY_PORT, adapter_handle.handle, /* r4 */ port_id, /* r5 */ @@ -499,19 +432,8 @@ u64 hipz_h_query_port(const struct ipz_a &dummy, &dummy); - EDEB_DMP(7, query_port_response_block, 64, "query_port_response_block"); - EDEB(7, "offset31=%x offset35=%x offset36=%x", - ((u32*)query_port_response_block)[32], - ((u32*)query_port_response_block)[36], - ((u32*)query_port_response_block)[37]); - EDEB(7, "offset200=%x offset201=%x offset202=%x " - "offset203=%x", - ((u32*)query_port_response_block)[0x200], - ((u32*)query_port_response_block)[0x201], - ((u32*)query_port_response_block)[0x202], - ((u32*)query_port_response_block)[0x203]); - - EDEB_EX(7, "ret=%lx", ret); + if (ehca_debug_level) + ehca_dmp(query_port_response_block, 64, "response_block"); return ret; } @@ -519,62 +441,26 @@ u64 hipz_h_query_port(const struct ipz_a u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle, struct hipz_query_hca *query_hca_rblock) { - u64 ret = H_SUCCESS; u64 dummy; - u64 r_cb; - EDEB_EN(7, "adapter_handle=%lx", adapter_handle.handle); + u64 r_cb = virt_to_abs(query_hca_rblock); - if (((u64)query_hca_rblock) & 0xfff) { - EDEB_ERR(4, "response_block=%p not page aligned", - query_hca_rblock); + if (r_cb & (EHCA_PAGESIZE-1)) { + ehca_gen_err("response_block=%p not page aligned", + query_hca_rblock); return H_PARAMETER; } - r_cb = virt_to_abs(query_hca_rblock); - - ret = ehca_hcall_7arg_7ret(H_QUERY_HCA, - adapter_handle.handle, /* r4 */ - r_cb, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - EDEB(7, "offset0=%x offset1=%x offset2=%x offset3=%x", - ((u32*)query_hca_rblock)[0], - ((u32*)query_hca_rblock)[1], - ((u32*)query_hca_rblock)[2], ((u32*)query_hca_rblock)[3]); - EDEB(7, "offset4=%x offset5=%x offset6=%x offset7=%x", - ((u32*)query_hca_rblock)[4], - ((u32*)query_hca_rblock)[5], - ((u32*)query_hca_rblock)[6], ((u32*)query_hca_rblock)[7]); - EDEB(7, "offset8=%x offset9=%x offseta=%x offsetb=%x", - ((u32*)query_hca_rblock)[8], - ((u32*)query_hca_rblock)[9], - ((u32*)query_hca_rblock)[10], ((u32*)query_hca_rblock)[11]); - EDEB(7, "offsetc=%x offsetd=%x offsete=%x offsetf=%x", - ((u32*)query_hca_rblock)[12], - ((u32*)query_hca_rblock)[13], - ((u32*)query_hca_rblock)[14], ((u32*)query_hca_rblock)[15]); - EDEB(7, "offset136=%x offset192=%x offset204=%x", - ((u32*)query_hca_rblock)[32], - ((u32*)query_hca_rblock)[48], ((u32*)query_hca_rblock)[51]); - EDEB(7, "offset231=%x offset235=%x", - ((u32*)query_hca_rblock)[57], ((u32*)query_hca_rblock)[58]); - EDEB(7, "offset200=%x offset201=%x offset202=%x offset203=%x", - ((u32*)query_hca_rblock)[0x201], - ((u32*)query_hca_rblock)[0x202], - ((u32*)query_hca_rblock)[0x203], - ((u32*)query_hca_rblock)[0x204]); - - EDEB_EX(7, "ret=%lx adapter_handle=%lx", - ret, adapter_handle.handle); - - return ret; + return ehca_hcall_7arg_7ret(H_QUERY_HCA, + adapter_handle.handle, /* r4 */ + r_cb, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle, @@ -584,32 +470,22 @@ u64 hipz_h_register_rpage(const struct i const u64 logical_address_of_page, u64 count) { - u64 ret = H_SUCCESS; u64 dummy; - EDEB_EN(7, "adapter_handle=%lx pagesize=%x queue_type=%x" - " resource_handle=%lx logical_address_of_page=%lx count=%lx", - adapter_handle.handle, pagesize, queue_type, - resource_handle, logical_address_of_page, count); - - ret = ehca_hcall_7arg_7ret(H_REGISTER_RPAGES, - adapter_handle.handle, /* r4 */ - queue_type | pagesize << 8, /* r5 */ - resource_handle, /* r6 */ - logical_address_of_page, /* r7 */ - count, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES, + adapter_handle.handle, /* r4 */ + queue_type | pagesize << 8, /* r5 */ + resource_handle, /* r6 */ + logical_address_of_page, /* r7 */ + count, /* r8 */ + 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle, @@ -620,34 +496,22 @@ u64 hipz_h_register_rpage_eq(const struc const u64 logical_address_of_page, const u64 count) { - u64 ret = H_SUCCESS; - - EDEB_EN(7, "pfeq=%p adapter_handle=%lx eq_handle=%lx pagesize=%x" - " queue_type=%x logical_address_of_page=%lx count=%lx", - pfeq, adapter_handle.handle, eq_handle.handle, pagesize, - queue_type,logical_address_of_page, count); - if (count != 1) { - EDEB_ERR(4, "Ppage counter=%lx", count); + ehca_gen_err("Ppage counter=%lx", count); return H_PARAMETER; } - ret = hipz_h_register_rpage(adapter_handle, - pagesize, - queue_type, - eq_handle.handle, - logical_address_of_page, count); - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return hipz_h_register_rpage(adapter_handle, + pagesize, + queue_type, + eq_handle.handle, + logical_address_of_page, count); } u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, u32 ist) { - u32 ret = H_SUCCESS; - u64 dummy = 0; - - EDEB_EN(7, "ist=%x", ist); + u32 ret; + u64 dummy; ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE, adapter_handle.handle, /* r4 */ @@ -662,9 +526,7 @@ u32 hipz_h_query_int_state(const struct &dummy); if (ret != H_SUCCESS && ret != H_BUSY) - EDEB_ERR(4, "Could not query interrupt state."); - - EDEB_EX(7, "interrupt state: %x", ret); + ehca_gen_err("Could not query interrupt state."); return ret; } @@ -678,24 +540,14 @@ u64 hipz_h_register_rpage_cq(const struc const u64 count, const struct h_galpa gal) { - u64 ret = H_SUCCESS; - - EDEB_EN(7, "pfcq=%p adapter_handle=%lx cq_handle=%lx pagesize=%x" - " queue_type=%x logical_address_of_page=%lx count=%lx", - pfcq, adapter_handle.handle, cq_handle.handle, pagesize, - queue_type, logical_address_of_page, count); - if (count != 1) { - EDEB_ERR(4, "Page counter=%lx", count); + ehca_gen_err("Page counter=%lx", count); return H_PARAMETER; } - ret = hipz_h_register_rpage(adapter_handle, pagesize, queue_type, - cq_handle.handle, logical_address_of_page, - count); - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return hipz_h_register_rpage(adapter_handle, pagesize, queue_type, + cq_handle.handle, logical_address_of_page, + count); } u64 hipz_h_register_rpage_qp(const struct ipz_adapter_handle adapter_handle, @@ -707,24 +559,14 @@ u64 hipz_h_register_rpage_qp(const struc const u64 count, const struct h_galpa galpa) { - u64 ret = H_SUCCESS; - - EDEB_EN(7, "pfqp=%p adapter_handle=%lx qp_handle=%lx pagesize=%x" - " queue_type=%x logical_address_of_page=%lx count=%lx", - pfqp, adapter_handle.handle, qp_handle.handle, pagesize, - queue_type, logical_address_of_page, count); - if (count != 1) { - EDEB_ERR(4, "Page counter=%lx", count); + ehca_gen_err("Page counter=%lx", count); return H_PARAMETER; } - ret = hipz_h_register_rpage(adapter_handle,pagesize,queue_type, - qp_handle.handle,logical_address_of_page, - count); - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return hipz_h_register_rpage(adapter_handle,pagesize,queue_type, + qp_handle.handle,logical_address_of_page, + count); } u64 hipz_h_disable_and_get_wqe(const struct ipz_adapter_handle adapter_handle, @@ -734,36 +576,25 @@ u64 hipz_h_disable_and_get_wqe(const str void **log_addr_next_rq_wqe2processed, int dis_and_get_function_code) { - u64 ret = H_SUCCESS; - u8 function_code = 1; u64 dummy, dummy1, dummy2; - EDEB_EN(7, "pfqp=%p adapter_handle=%lx function=%x qp_handle=%lx", - pfqp, adapter_handle.handle, function_code, qp_handle.handle); - if (!log_addr_next_sq_wqe2processed) log_addr_next_sq_wqe2processed = (void**)&dummy1; if (!log_addr_next_rq_wqe2processed) log_addr_next_rq_wqe2processed = (void**)&dummy2; - ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - dis_and_get_function_code, /* r5 */ - qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - (void*)log_addr_next_sq_wqe2processed, - (void*)log_addr_next_rq_wqe2processed, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - EDEB_EX(7, "ret=%lx ladr_next_rq_wqe_out=%p" - " ladr_next_sq_wqe_out=%p", ret, - *log_addr_next_sq_wqe2processed, - *log_addr_next_rq_wqe2processed); - - return ret; + return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, + adapter_handle.handle, /* r4 */ + dis_and_get_function_code, /* r5 */ + qp_handle.handle, /* r6 */ + 0, 0, 0, 0, + (void*)log_addr_next_sq_wqe2processed, + (void*)log_addr_next_rq_wqe2processed, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle, @@ -773,22 +604,15 @@ u64 hipz_h_modify_qp(const struct ipz_ad struct hcp_modify_qp_control_block *mqpcb, struct h_galpa gal) { - u64 ret = H_SUCCESS; - u64 invalid_attribute_identifier = 0; - u64 rc_attrib_mask = 0; - u64 dummy; - u64 r_cb; - EDEB_EN(7, "pfqp=%p adapter_handle=%lx qp_handle=%lx" - " update_mask=%lx qp_state=%x mqpcb=%p", - pfqp, adapter_handle.handle, qp_handle.handle, - update_mask, mqpcb->qp_state, mqpcb); + u64 ret; + u64 dummy; + u64 invalid_attribute_identifier, rc_attrib_mask; - r_cb = virt_to_abs(mqpcb); ret = ehca_hcall_7arg_7ret(H_MODIFY_QP, adapter_handle.handle, /* r4 */ qp_handle.handle, /* r5 */ update_mask, /* r6 */ - r_cb, /* r7 */ + virt_to_abs(mqpcb), /* r7 */ 0, 0, 0, &invalid_attribute_identifier, /* r4 */ &dummy, /* r5 */ @@ -797,12 +621,9 @@ u64 hipz_h_modify_qp(const struct ipz_ad &dummy, /* r8 */ &rc_attrib_mask, /* r9 */ &dummy); - if (ret == H_NOT_ENOUGH_RESOURCES) - EDEB_ERR(4, "Insufficient resources ret=%lx", ret); - EDEB_EX(7, "ret=%lx invalid_attribute_identifier=%lx" - " invalid_attribute_MASK=%lx", ret, - invalid_attribute_identifier, rc_attrib_mask); + if (ret == H_NOT_ENOUGH_RESOURCES) + ehca_gen_err("Insufficient resources ret=%lx", ret); return ret; } @@ -813,47 +634,32 @@ u64 hipz_h_query_qp(const struct ipz_ada struct hcp_modify_qp_control_block *qqpcb, struct h_galpa gal) { - u64 ret = H_SUCCESS; u64 dummy; - u64 r_cb; - EDEB_EN(7, "adapter_handle=%lx qp_handle=%lx", - adapter_handle.handle, qp_handle.handle); - - r_cb = virt_to_abs(qqpcb); - EDEB(7, "r_cb=%lx", r_cb); - - ret = ehca_hcall_7arg_7ret(H_QUERY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - r_cb, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - EDEB_EX(7, "ret=%lx", ret); - return ret; + return ehca_hcall_7arg_7ret(H_QUERY_QP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + virt_to_abs(qqpcb), /* r6 */ + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle, struct ehca_qp *qp) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 ladr_next_sq_wqe_out; - u64 ladr_next_rq_wqe_out; - - EDEB_EN(7, "qp=%p ipz_qp_handle=%lx adapter_handle=%lx", - qp, qp->ipz_qp_handle.handle, adapter_handle.handle); + u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out; ret = hcp_galpas_dtor(&qp->galpas); if (ret) { - EDEB_ERR(4, "Could not destruct qp->galpas"); + ehca_gen_err("Could not destruct qp->galpas"); return H_RESOURCE; } ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, @@ -870,7 +676,7 @@ u64 hipz_h_destroy_qp(const struct ipz_a &dummy, &dummy); if (ret == H_HARDWARE) - EDEB_ERR(4, "HCA not operational. ret=%lx", ret); + ehca_gen_err("HCA not operational. ret=%lx", ret); ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, adapter_handle.handle, /* r4 */ @@ -885,9 +691,7 @@ u64 hipz_h_destroy_qp(const struct ipz_a &dummy); if (ret == H_RESOURCE) - EDEB_ERR(4, "Resource still in use. ret=%lx", ret); - - EDEB_EX(7, "ret=%lx", ret); + ehca_gen_err("Resource still in use. ret=%lx", ret); return ret; } @@ -897,28 +701,20 @@ u64 hipz_h_define_aqp0(const struct ipz_ struct h_galpa gal, u32 port) { - u64 ret = H_SUCCESS; u64 dummy; - EDEB_EN(7, "port=%x ipz_qp_handle=%lx adapter_handle=%lx", - port, qp_handle.handle, adapter_handle.handle); - - ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP0, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return ehca_hcall_7arg_7ret(H_DEFINE_AQP0, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle, @@ -927,13 +723,9 @@ u64 hipz_h_define_aqp1(const struct ipz_ u32 port, u32 * pma_qp_nr, u32 * bma_qp_nr) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 pma_qp_nr_out; - u64 bma_qp_nr_out; - - EDEB_EN(7, "port=%x qp_handle=%lx adapter_handle=%lx", - port, qp_handle.handle, adapter_handle.handle); + u64 pma_qp_nr_out, bma_qp_nr_out; ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1, adapter_handle.handle, /* r4 */ @@ -952,10 +744,7 @@ u64 hipz_h_define_aqp1(const struct ipz_ *bma_qp_nr = (u32)bma_qp_nr_out; if (ret == H_ALIAS_EXIST) - EDEB_ERR(4, "AQP1 already exists. ret=%lx", ret); - - EDEB_EX(7, "ret=%lx pma_qp_nr=%i bma_qp_nr=%i", - ret, (int)*pma_qp_nr, (int)*bma_qp_nr); + ehca_gen_err("AQP1 already exists. ret=%lx", ret); return ret; } @@ -966,23 +755,8 @@ u64 hipz_h_attach_mcqp(const struct ipz_ u16 mcg_dlid, u64 subnet_prefix, u64 interface_id) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u8 *dgid_sp = (u8*)&subnet_prefix; - u8 *dgid_ii = (u8*)&interface_id; - - EDEB_EN(7, "qp_handle=%lx adapter_handle=%lx\nMCG_DGID =" - " %d.%d.%d.%d.%d.%d.%d.%d." - " %d.%d.%d.%d.%d.%d.%d.%d", - qp_handle.handle, adapter_handle.handle, - dgid_sp[0], dgid_sp[1], - dgid_sp[2], dgid_sp[3], - dgid_sp[4], dgid_sp[5], - dgid_sp[6], dgid_sp[7], - dgid_ii[0], dgid_ii[1], - dgid_ii[2], dgid_ii[3], - dgid_ii[4], dgid_ii[5], - dgid_ii[6], dgid_ii[7]); ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP, adapter_handle.handle, /* r4 */ @@ -1000,9 +774,7 @@ u64 hipz_h_attach_mcqp(const struct ipz_ &dummy); if (ret == H_NOT_ENOUGH_RESOURCES) - EDEB_ERR(4, "Not enough resources. ret=%lx", ret); - - EDEB_EX(7, "ret=%lx", ret); + ehca_gen_err("Not enough resources. ret=%lx", ret); return ret; } @@ -1013,56 +785,34 @@ u64 hipz_h_detach_mcqp(const struct ipz_ u16 mcg_dlid, u64 subnet_prefix, u64 interface_id) { - u64 ret = H_SUCCESS; u64 dummy; - u8 *dgid_sp = (u8*)&subnet_prefix; - u8 *dgid_ii = (u8*)&interface_id; - EDEB_EN(7, "qp_handle=%lx adapter_handle=%lx\nMCG_DGID =" - " %d.%d.%d.%d.%d.%d.%d.%d." - " %d.%d.%d.%d.%d.%d.%d.%d", - qp_handle.handle, adapter_handle.handle, - dgid_sp[0], dgid_sp[1], - dgid_sp[2], dgid_sp[3], - dgid_sp[4], dgid_sp[5], - dgid_sp[6], dgid_sp[7], - dgid_ii[0], dgid_ii[1], - dgid_ii[2], dgid_ii[3], - dgid_ii[4], dgid_ii[5], - dgid_ii[6], dgid_ii[7]); - ret = ehca_hcall_7arg_7ret(H_DETACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - EDEB(7, "ret=%lx", ret); - - return ret; + return ehca_hcall_7arg_7ret(H_DETACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle, struct ehca_cq *cq, u8 force_flag) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - EDEB_EN(7, "cq->pf=%p cq=.%p ipz_cq_handle=%lx adapter_handle=%lx", - &cq->pf, cq, cq->ipz_cq_handle.handle, adapter_handle.handle); - ret = hcp_galpas_dtor(&cq->galpas); if (ret) { - EDEB_ERR(4, "Could not destruct cp->galpas"); + ehca_gen_err("Could not destruct cp->galpas"); return H_RESOURCE; } @@ -1080,9 +830,7 @@ u64 hipz_h_destroy_cq(const struct ipz_a &dummy); if (ret == H_RESOURCE) - EDEB(4, "ret=%lx ", ret); - - EDEB_EX(7, "ret=%lx", ret); + ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret); return ret; } @@ -1090,16 +838,12 @@ u64 hipz_h_destroy_cq(const struct ipz_a u64 hipz_h_destroy_eq(const struct ipz_adapter_handle adapter_handle, struct ehca_eq *eq) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - EDEB_EN(7, "eq->pf=%p eq=%p ipz_eq_handle=%lx adapter_handle=%lx", - &eq->pf, eq, eq->ipz_eq_handle.handle, - adapter_handle.handle); - ret = hcp_galpas_dtor(&eq->galpas); if (ret) { - EDEB_ERR(4, "Could not destruct eq->galpas"); + ehca_gen_err("Could not destruct eq->galpas"); return H_RESOURCE; } @@ -1117,9 +861,7 @@ u64 hipz_h_destroy_eq(const struct ipz_a if (ret == H_RESOURCE) - EDEB_ERR(4, "Resource in use. ret=%lx ", ret); - - EDEB_EX(7, "ret=%lx", ret); + ehca_gen_err("Resource in use. ret=%lx ", ret); return ret; } @@ -1132,16 +874,11 @@ u64 hipz_h_alloc_resource_mr(const struc const struct ipz_pd pd, struct ehca_mr_hipzout_parms *outparms) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; u64 lkey_out; u64 rkey_out; - EDEB_EN(7, "adapter_handle=%lx mr=%p vaddr=%lx length=%lx" - " access_ctrl=%x pd=%x", - adapter_handle.handle, mr, vaddr, length, access_ctrl, - pd.value); - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, adapter_handle.handle, /* r4 */ 5, /* r5 */ @@ -1160,9 +897,6 @@ u64 hipz_h_alloc_resource_mr(const struc outparms->lkey = (u32)lkey_out; outparms->rkey = (u32)rkey_out; - EDEB_EX(7, "ret=%lx mr_handle=%lx lkey=%x rkey=%x", - ret, outparms->handle.handle, outparms->lkey, outparms->rkey); - return ret; } @@ -1173,27 +907,22 @@ u64 hipz_h_register_rpage_mr(const struc const u64 logical_address_of_page, const u64 count) { - u64 ret = H_SUCCESS; + u64 ret; - EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx pagesize=%x" - " queue_type=%x logical_address_of_page=%lx count=%lx", - adapter_handle.handle, mr, mr->ipz_mr_handle.handle, pagesize, - queue_type, logical_address_of_page, count); - - if ((count > 1) && (logical_address_of_page & 0xfff)) { - EDEB_ERR(4, "logical_address_of_page not on a 4k boundary " - "adapter_handle=%lx mr=%p mr_handle=%lx " - "pagesize=%x queue_type=%x logical_address_of_page=%lx" - " count=%lx", - adapter_handle.handle, mr, mr->ipz_mr_handle.handle, - pagesize, queue_type, logical_address_of_page, count); + if ((count > 1) && (logical_address_of_page & (EHCA_PAGESIZE-1))) { + ehca_gen_err("logical_address_of_page not on a 4k boundary " + "adapter_handle=%lx mr=%p mr_handle=%lx " + "pagesize=%x queue_type=%x " + "logical_address_of_page=%lx count=%lx", + adapter_handle.handle, mr, + mr->ipz_mr_handle.handle, pagesize, queue_type, + logical_address_of_page, count); ret = H_PARAMETER; } else ret = hipz_h_register_rpage(adapter_handle, pagesize, queue_type, mr->ipz_mr_handle.handle, logical_address_of_page, count); - EDEB_EX(7, "ret=%lx", ret); return ret; } @@ -1202,15 +931,9 @@ u64 hipz_h_query_mr(const struct ipz_ada const struct ehca_mr *mr, struct ehca_mr_hipzout_parms *outparms) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 remote_len_out; - u64 remote_vaddr_out; - u64 acc_ctrl_pd_out; - u64 r9_out; - - EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx", - adapter_handle.handle, mr, mr->ipz_mr_handle.handle); + u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out; ret = ehca_hcall_7arg_7ret(H_QUERY_MR, adapter_handle.handle, /* r4 */ @@ -1228,38 +951,25 @@ u64 hipz_h_query_mr(const struct ipz_ada outparms->lkey = (u32)(r9_out >> 32); outparms->rkey = (u32)(r9_out & (0xffffffff)); - EDEB_EX(7, "ret=%lx mr_local_length=%lx mr_local_vaddr=%lx " - "mr_remote_length=%lx mr_remote_vaddr=%lx access_ctrl=%x " - "pd=%x lkey=%x rkey=%x", ret, outparms->len, - outparms->vaddr, remote_len_out, remote_vaddr_out, - outparms->acl, outparms->acl, outparms->lkey, outparms->rkey); - return ret; } u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle, const struct ehca_mr *mr) { - u64 ret = H_SUCCESS; u64 dummy; - EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx", - adapter_handle.handle, mr, mr->ipz_mr_handle.handle); - - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, @@ -1271,15 +981,9 @@ u64 hipz_h_reregister_pmr(const struct i const u64 mr_addr_cb, struct ehca_mr_hipzout_parms *outparms) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 lkey_out; - u64 rkey_out; - - EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx vaddr_in=%lx " - "length=%lx access_ctrl=%x pd=%x mr_addr_cb=%lx", - adapter_handle.handle, mr, mr->ipz_mr_handle.handle, vaddr_in, - length, access_ctrl, pd.value, mr_addr_cb); + u64 lkey_out, rkey_out; ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR, adapter_handle.handle, /* r4 */ @@ -1301,8 +1005,6 @@ u64 hipz_h_reregister_pmr(const struct i outparms->lkey = (u32)lkey_out; outparms->rkey = (u32)rkey_out; - EDEB_EX(7, "ret=%lx vaddr=%lx lkey=%x rkey=%x", - ret, outparms->vaddr, outparms->lkey, outparms->rkey); return ret; } @@ -1314,16 +1016,9 @@ u64 hipz_h_register_smr(const struct ipz const struct ipz_pd pd, struct ehca_mr_hipzout_parms *outparms) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 lkey_out; - u64 rkey_out; - - EDEB_EN(7, "adapter_handle=%lx orig_mr=%p orig_mr_handle=%lx " - "vaddr_in=%lx access_ctrl=%x pd=%x", adapter_handle.handle, - orig_mr, orig_mr->ipz_mr_handle.handle, vaddr_in, access_ctrl, - pd.value); - + u64 lkey_out, rkey_out; ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR, adapter_handle.handle, /* r4 */ @@ -1342,9 +1037,6 @@ u64 hipz_h_register_smr(const struct ipz outparms->lkey = (u32)lkey_out; outparms->rkey = (u32)rkey_out; - EDEB_EX(7, "ret=%lx mr_handle=%lx lkey=%x rkey=%x", - ret, outparms->handle.handle, outparms->lkey, outparms->rkey); - return ret; } @@ -1353,13 +1045,10 @@ u64 hipz_h_alloc_resource_mw(const struc const struct ipz_pd pd, struct ehca_mw_hipzout_parms *outparms) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; u64 rkey_out; - EDEB_EN(7, "adapter_handle=%lx mw=%p pd=%x", - adapter_handle.handle, mw, pd.value); - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, adapter_handle.handle, /* r4 */ 6, /* r5 */ @@ -1375,8 +1064,6 @@ u64 hipz_h_alloc_resource_mw(const struc outparms->rkey = (u32)rkey_out; - EDEB_EX(7, "ret=%lx mw_handle=%lx rkey=%x", - ret, outparms->handle.handle, outparms->rkey); return ret; } @@ -1384,13 +1071,9 @@ u64 hipz_h_query_mw(const struct ipz_ada const struct ehca_mw *mw, struct ehca_mw_hipzout_parms *outparms) { - u64 ret = H_SUCCESS; + u64 ret; u64 dummy; - u64 pd_out; - u64 rkey_out; - - EDEB_EN(7, "adapter_handle=%lx mw=%p mw_handle=%lx", - adapter_handle.handle, mw, mw->ipz_mw_handle.handle); + u64 pd_out, rkey_out; ret = ehca_hcall_7arg_7ret(H_QUERY_MW, adapter_handle.handle, /* r4 */ @@ -1405,34 +1088,25 @@ u64 hipz_h_query_mw(const struct ipz_ada &dummy); outparms->rkey = (u32)rkey_out; - EDEB_EX(7, "ret=%lx rkey=%x pd=%lx", ret, outparms->rkey, pd_out); - return ret; } u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle, const struct ehca_mw *mw) { - u64 ret = H_SUCCESS; u64 dummy; - EDEB_EN(7, "adapter_handle=%lx mw=%p mw_handle=%lx", - adapter_handle.handle, mw, mw->ipz_mw_handle.handle); - - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, @@ -1440,34 +1114,24 @@ u64 hipz_h_error_data(const struct ipz_a void *rblock, unsigned long *byte_count) { - u64 ret = H_SUCCESS; u64 dummy; - u64 r_cb; - - EDEB_EN(7, "adapter_handle=%lx ressource_handle=%lx rblock=%p", - adapter_handle.handle, ressource_handle, rblock); + u64 r_cb = virt_to_abs(rblock); - if (((u64)rblock) & 0xfff) { - EDEB_ERR(4, "rblock not page aligned."); + if (r_cb & (EHCA_PAGESIZE-1)) { + ehca_gen_err("rblock not page aligned."); return H_PARAMETER; } - r_cb = virt_to_abs(rblock); - - ret = ehca_hcall_7arg_7ret(H_ERROR_DATA, - adapter_handle.handle, - ressource_handle, - r_cb, - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - EDEB_EX(7, "ret=%lx", ret); - - return ret; + return ehca_hcall_7arg_7ret(H_ERROR_DATA, + adapter_handle.handle, + ressource_handle, + r_cb, + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_phyp.c linux-2.6/drivers/infiniband/hw/ehca/hcp_phyp.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_phyp.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/hcp_phyp.c 2006-08-30 20:00:16.000000000 +0200 @@ -39,22 +39,17 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#define DEB_PREFIX "PHYP" - #include "ehca_classes.h" #include "hipz_hw.h" int hcall_map_page(u64 physaddr, u64 *mapaddr) { *mapaddr = (u64)(ioremap(physaddr, EHCA_PAGESIZE)); - - EDEB(7, "ioremap physaddr=%lx mapaddr=%lx", physaddr, *mapaddr); return 0; } int hcall_unmap_page(u64 mapaddr) { - EDEB(7, "mapaddr=%lx", mapaddr); iounmap((volatile void __iomem*)mapaddr); return 0; } @@ -68,25 +63,18 @@ int hcp_galpas_ctor(struct h_galpas *gal galpas->user.fw_handle = paddr_user; - EDEB(7, "paddr_kernel=%lx paddr_user=%lx galpas->kernel=%lx" - " galpas->user=%lx", - paddr_kernel, paddr_user, galpas->kernel.fw_handle, - galpas->user.fw_handle); - - return ret; + return 0; } int hcp_galpas_dtor(struct h_galpas *galpas) { - int ret = 0; - - if (galpas->kernel.fw_handle) - ret = hcall_unmap_page(galpas->kernel.fw_handle); - - if (ret) - return ret; + if (galpas->kernel.fw_handle) { + int ret = hcall_unmap_page(galpas->kernel.fw_handle); + if (ret) + return ret; + } galpas->user.fw_handle = galpas->kernel.fw_handle = 0; - return ret; + return 0; } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_phyp.h linux-2.6/drivers/infiniband/hw/ehca/hcp_phyp.h --- linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_phyp.h 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/hcp_phyp.h 2006-08-30 20:00:16.000000000 +0200 @@ -69,19 +69,13 @@ struct h_galpas { static inline u64 hipz_galpa_load(struct h_galpa galpa, u32 offset) { u64 addr = galpa.fw_handle + offset; - u64 out; - EDEB_EN(7, "addr=%lx offset=%x ", addr, offset); - out = *(u64 *) addr; - EDEB_EX(7, "addr=%lx value=%lx", addr, out); - return out; + return *(volatile u64 __force *)addr; } static inline void hipz_galpa_store(struct h_galpa galpa, u32 offset, u64 value) { u64 addr = galpa.fw_handle + offset; - EDEB(7, "addr=%lx offset=%x value=%lx", addr, - offset, value); - *(u64 *) addr = value; + *(volatile u64 __force *)addr = value; } int hcp_galpas_ctor(struct h_galpas *galpas, diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/hipz_fns_core.h linux-2.6/drivers/infiniband/hw/ehca/hipz_fns_core.h --- linux-2.6_orig/drivers/infiniband/hw/ehca/hipz_fns_core.h 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/hipz_fns_core.h 2006-08-30 20:00:16.000000000 +0200 @@ -60,63 +60,41 @@ static inline void hipz_update_sqa(struct ehca_qp *qp, u16 nr_wqes) { - struct h_galpa gal; - - EDEB_EN(7, "qp=%p", qp); - gal = qp->galpas.kernel; /* ringing doorbell :-) */ - hipz_galpa_store_qp(gal, qpx_sqa, EHCA_BMASK_SET(QPX_SQADDER, nr_wqes)); - EDEB_EX(7, "qp=%p QPx_SQA = %i", qp, nr_wqes); + hipz_galpa_store_qp(qp->galpas.kernel, qpx_sqa, + EHCA_BMASK_SET(QPX_SQADDER, nr_wqes)); } static inline void hipz_update_rqa(struct ehca_qp *qp, u16 nr_wqes) { - struct h_galpa gal; - - EDEB_EN(7, "qp=%p", qp); - gal = qp->galpas.kernel; /* ringing doorbell :-) */ - hipz_galpa_store_qp(gal, qpx_rqa, EHCA_BMASK_SET(QPX_RQADDER, nr_wqes)); - EDEB_EX(7, "qp=%p QPx_RQA = %i", qp, nr_wqes); + hipz_galpa_store_qp(qp->galpas.kernel, qpx_rqa, + EHCA_BMASK_SET(QPX_RQADDER, nr_wqes)); } static inline void hipz_update_feca(struct ehca_cq *cq, u32 nr_cqes) { - struct h_galpa gal; - - EDEB_EN(7, "cq=%p", cq); - gal = cq->galpas.kernel; - hipz_galpa_store_cq(gal, cqx_feca, + hipz_galpa_store_cq(cq->galpas.kernel, cqx_feca, EHCA_BMASK_SET(CQX_FECADDER, nr_cqes)); - EDEB_EX(7, "cq=%p CQx_FECA = %i", cq, nr_cqes); } static inline void hipz_set_cqx_n0(struct ehca_cq *cq, u32 value) { - struct h_galpa gal; - u64 CQx_N0_reg = 0; + u64 cqx_n0_reg; - EDEB_EN(7, "cq=%p event on solicited completion -- write CQx_N0", cq); - gal = cq->galpas.kernel; - hipz_galpa_store_cq(gal, cqx_n0, + hipz_galpa_store_cq(cq->galpas.kernel, cqx_n0, EHCA_BMASK_SET(CQX_N0_GENERATE_SOLICITED_COMP_EVENT, value)); - CQx_N0_reg = hipz_galpa_load_cq(gal, cqx_n0); - EDEB_EX(7, "cq=%p loaded CQx_N0=%lx", cq, (unsigned long)CQx_N0_reg); + cqx_n0_reg = hipz_galpa_load_cq(cq->galpas.kernel, cqx_n0); } static inline void hipz_set_cqx_n1(struct ehca_cq *cq, u32 value) { - struct h_galpa gal; - u64 CQx_N1_reg = 0; + u64 cqx_n1_reg; - EDEB_EN(7, "cq=%p event on completion -- write CQx_N1", - cq); - gal = cq->galpas.kernel; - hipz_galpa_store_cq(gal, cqx_n1, + hipz_galpa_store_cq(cq->galpas.kernel, cqx_n1, EHCA_BMASK_SET(CQX_N1_GENERATE_COMP_EVENT, value)); - CQx_N1_reg = hipz_galpa_load_cq(gal, cqx_n1); - EDEB_EX(7, "cq=%p loaded CQx_N1=%lx", cq, (unsigned long)CQx_N1_reg); + cqx_n1_reg = hipz_galpa_load_cq(cq->galpas.kernel, cqx_n1); } #endif /* __HIPZ_FNC_CORE_H__ */ diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.c linux-2.6/drivers/infiniband/hw/ehca/ipz_pt_fn.c --- linux-2.6_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.c 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ipz_pt_fn.c 2006-08-30 20:00:16.000000000 +0200 @@ -38,13 +38,9 @@ * POSSIBILITY OF SUCH DAMAGE. */ -#define DEB_PREFIX "iptz" - #include "ehca_tools.h" #include "ipz_pt_fn.h" -extern int ehca_hwlevel; - void *ipz_qpageit_get_inc(struct ipz_queue *queue) { void *ret = ipz_qeit_get(queue); @@ -54,10 +50,9 @@ void *ipz_qpageit_get_inc(struct ipz_que ret = NULL; } if (((u64)ret) % EHCA_PAGESIZE) { - EDEB(4, "ERROR!! not at PAGE-Boundary"); + ehca_gen_err("ERROR!! not at PAGE-Boundary"); return NULL; } - EDEB(7, "queue=%p ret=%p", queue, ret); return ret; } @@ -65,15 +60,13 @@ void *ipz_qeit_eq_get_inc(struct ipz_que { void *ret = ipz_qeit_get(queue); u64 last_entry_in_q = queue->queue_length - queue->qe_size; + queue->current_q_offset += queue->qe_size; if (queue->current_q_offset > last_entry_in_q) { queue->current_q_offset = 0; queue->toggle_state = (~queue->toggle_state) & 1; } - EDEB(7, "queue=%p ret=%p new current_q_offset=%lx qe_size=%x", - queue, ret, queue->current_q_offset, queue->qe_size); - return ret; } @@ -84,22 +77,20 @@ int ipz_queue_ctor(struct ipz_queue *que int pages_per_kpage = PAGE_SIZE >> EHCA_PAGESHIFT; int f; - EDEB_EN(7, "nr_of_pages=%x pagesize=%x qe_size=%x pages_per_kpage=%x", - nr_of_pages, pagesize, qe_size, pages_per_kpage); if (pagesize > PAGE_SIZE) { - EDEB_ERR(4, "FATAL ERROR: pagesize=%x is greater than " - "kernel page size", pagesize); + ehca_gen_err("FATAL ERROR: pagesize=%x is greater " + "than kernel page size", pagesize); return 0; } if (!pages_per_kpage) { - EDEB_ERR(4, "FATAL ERROR: invalid kernel page size. " - "pages_per_kpage=%x", pages_per_kpage); + ehca_gen_err("FATAL ERROR: invalid kernel page size. " + "pages_per_kpage=%x", pages_per_kpage); return 0; } queue->queue_length = nr_of_pages * pagesize; queue->queue_pages = vmalloc(nr_of_pages * sizeof(void *)); if (!queue->queue_pages) { - EDEB(4, "ERROR!! didn't get the memory"); + ehca_gen_err("ERROR!! didn't get the memory"); return 0; } memset(queue->queue_pages, 0, nr_of_pages * sizeof(void *)); @@ -126,14 +117,11 @@ int ipz_queue_ctor(struct ipz_queue *que queue->act_nr_of_sg = nr_of_sg; queue->pagesize = pagesize; queue->toggle_state = 1; - EDEB_EX(7, "queue_length=%x queue_pages=%p qe_size=%x" - " act_nr_of_sg=%x", queue->queue_length, queue->queue_pages, - queue->qe_size, queue->act_nr_of_sg); return 1; ipz_queue_ctor_exit0: - EDEB_ERR(4, "Couldn't get alloc pages queue=%p f=%x nr_of_pages=%x", - queue, f, nr_of_pages); + ehca_gen_err("Couldn't get alloc pages queue=%p f=%x nr_of_pages=%x", + queue, f, nr_of_pages); for (f = 0; f < nr_of_pages; f += pages_per_kpage) { if (!(queue->queue_pages)[f]) break; @@ -148,19 +136,14 @@ int ipz_queue_dtor(struct ipz_queue *que int g; int nr_pages; - EDEB_EN(7, "ipz_queue pointer=%p", queue); if (!queue || !queue->queue_pages) { - EDEB_ERR(4, "queue or queue_pages is NULL"); + ehca_gen_dbg("queue or queue_pages is NULL"); return 0; } - EDEB(7, "destructing a queue with the following " - "properties:\n nr_of_pages=%x pagesize=%x qe_size=%x", - queue->act_nr_of_sg, queue->pagesize, queue->qe_size); nr_pages = queue->queue_length / queue->pagesize; for (g = 0; g < nr_pages; g += pages_per_kpage) free_page((unsigned long)(queue->queue_pages)[g]); vfree(queue->queue_pages); - EDEB_EX(7, "queue freed!"); return 1; } diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h linux-2.6/drivers/infiniband/hw/ehca/ipz_pt_fn.h --- linux-2.6_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h 2006-09-08 00:16:13.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/ehca/ipz_pt_fn.h 2006-08-30 20:00:17.000000000 +0200 @@ -43,7 +43,6 @@ #ifndef __IPZ_PT_FN_H__ #define __IPZ_PT_FN_H__ -#include "ehca_qes.h" #define EHCA_PAGESHIFT 12 #define EHCA_PAGESIZE 4096UL #define EHCA_PAGEMASK (~(EHCA_PAGESIZE-1)) @@ -76,7 +75,7 @@ struct ipz_queue { */ static inline void *ipz_qeit_calc(struct ipz_queue *queue, u64 q_offset) { - struct ipz_page *current_page = NULL; + struct ipz_page *current_page; if (q_offset >= queue->queue_length) return NULL; current_page = (queue->queue_pages)[q_offset >> EHCA_PAGESHIFT]; @@ -118,9 +117,6 @@ static inline void *ipz_qeit_get_inc(str queue->toggle_state = (~queue->toggle_state) & 1; } - EDEB(7, "queue=%p ret=%p new current_q_addr=%lx qe_size=%x", - queue, ret, queue->current_q_offset, queue->qe_size); - return ret; } @@ -230,7 +226,6 @@ static inline void *ipz_eqit_eq_get_inc_ { void *ret = ipz_qeit_get(queue); u32 qe = *(u8 *) ret; - EDEB(7, "ipz_QEit_EQ_get_inc_valid qe=%x", qe); if ((qe >> 7) == (queue->toggle_state & 1)) ipz_qeit_eq_get_inc(queue); /* this is a good one */ else -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5203 bytes Desc: S/MIME Cryptographic Signature URL: From mst at mellanox.co.il Thu Sep 7 14:45:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 8 Sep 2006 00:45:24 +0300 Subject: [openib-general] [PATCH] IB/cma: add rdma_establish Message-ID: <20060907214524.GA14791@mellanox.co.il> OK, we are hitting the lost RTU case quite a lot in OFED. So the following patch will ship with OFED. Sean, did we decide what to do for upstream yet? I would say we need something like the below for 2.6.19 too (probably just need to update node type check). And, I like it that this approach leaves all matters of policy to users (such as whether move QP to RTS after asynchronous event or after completion event). As a side note, reasons for frequent loss of RTU must be investigated. --- IB/cma: add rdma_establish Make it possible for ULPs to handle RTU loss by calling rdma_establish. Signed-off-by: Sean Hefty Signed-off-by: Michael S. Tsirkin Index: a/include/rdma/rdma_cm.h =================================================================== --- a/include/rdma/rdma_cm.h (revision 8822) +++ a/include/rdma/rdma_cm.h (working copy) @@ -256,6 +256,16 @@ int rdma_listen(struct rdma_cm_id *id, i int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); /** + * rdma_establish - Forces a connection state to established. + * @id: Connection identifier to transition to established. + * + * This routine should be invoked by users who receive messages on a + * QP before being notified that the connection has been established by the + * RDMA CM. + */ +int rdma_establish(struct rdma_cm_id *id); + +/** * rdma_reject - Called to reject a connection request or response. */ int rdma_reject(struct rdma_cm_id *id, const void *private_data, Index: a/drivers/infiniband/core/cm.c =================================================================== --- a/drivers/infiniband/core/cm.c (revision 8823) +++ a/drivers/infiniband/core/cm.c (working copy) @@ -3207,6 +3207,10 @@ static int cm_init_qp_rts_attr(struct cm spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { + /* Allow transition to RTS before sending REP */ + case IB_CM_REQ_RCVD: + case IB_CM_MRA_REQ_SENT: + case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: case IB_CM_REP_SENT: Index: a/drivers/infiniband/core/cma.c =================================================================== --- a/drivers/infiniband/core/cma.c (revision 8822) +++ a/drivers/infiniband/core/cma.c (working copy) @@ -840,22 +840,6 @@ static int cma_verify_rep(struct rdma_id return 0; } -static int cma_rtu_recv(struct rdma_id_private *id_priv) -{ - int ret; - - ret = cma_modify_qp_rts(&id_priv->id); - if (ret) - goto reject; - - return 0; -reject: - cma_modify_qp_err(&id_priv->id); - ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, - NULL, 0, NULL, 0); - return ret; -} - static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; @@ -886,9 +870,8 @@ static int cma_ib_handler(struct ib_cm_i private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; break; case IB_CM_RTU_RECEIVED: - status = cma_rtu_recv(id_priv); - event = status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + case IB_CM_USER_ESTABLISHED: + event = RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: status = -ETIMEDOUT; /* fall through */ @@ -1981,11 +1964,25 @@ static int cma_accept_ib(struct rdma_id_ struct rdma_conn_param *conn_param) { struct ib_cm_rep_param rep; - int ret; + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; - ret = cma_modify_qp_rtr(&id_priv->id); - if (ret) - return ret; + if (id_priv->id.qp) { + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + goto out; + + qp_attr.qp_state = IB_QPS_RTS; + ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, &qp_attr, + &qp_attr_mask); + if (ret) + goto out; + + qp_attr.max_rd_atomic = conn_param->initiator_depth; + ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); + if (ret) + goto out; + } memset(&rep, 0, sizeof rep); rep.qp_num = id_priv->qp_num; @@ -2000,7 +1997,9 @@ static int cma_accept_ib(struct rdma_id_ rep.rnr_retry_count = conn_param->rnr_retry_count; rep.srq = id_priv->srq ? 1 : 0; - return ib_send_cm_rep(id_priv->cm_id.ib, &rep); + ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep); +out: + return ret; } static int cma_send_sidr_rep(struct rdma_id_private *id_priv, @@ -2058,6 +2057,27 @@ reject: } EXPORT_SYMBOL(rdma_accept); +int rdma_establish(struct rdma_cm_id *id) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_CONNECT)) + return -EINVAL; + + switch (id->device->node_type) { + case IB_NODE_CA: + ret = ib_cm_establish(id_priv->cm_id.ib); + break; + default: + ret = 0; + break; + } + return ret; +} +EXPORT_SYMBOL(rdma_establish); + int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len) { -- MST From mst at mellanox.co.il Thu Sep 7 14:46:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 8 Sep 2006 00:46:15 +0300 Subject: [openib-general] [PATCH for-2.6.19] IB/ipoib: Fix flush/start xmit race take 2 (from code review) Message-ID: <20060907214614.GB14791@mellanox.co.il> Hello, Roland! The following patch in the for-2.6.19 series: IB/ipoib: Fix flush/start xmit race (from code review) introduces a sleep-under spinlock condition: we don't drop tx_lock while scanning remove_list (look at ipoib_flush_paths, I think it'll be obvious). Here's a fixed version, pls queue in for-2.6.19. -- ipoib race reported after code review by Eitan Rabin: http://openib.org/pipermail/openib-general/2006-June/022916.html Prevent flush task from freeing the ipoib_neigh pointer, while ipoib_start_xmit is accessing the ipoib_neigh through the pointer is has loaded from the hardware address. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index cf71d2a..31c4b05 100644 Index: ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- ofed_1_1.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-08-01 15:29:48.000000000 +0300 +++ ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-09-05 11:47:07.000000000 +0300 @@ -336,7 +336,8 @@ void ipoib_flush_paths(struct net_device struct ipoib_path *path, *tp; LIST_HEAD(remove_list); - spin_lock_irq(&priv->lock); + spin_lock_irq(&priv->tx_lock); + spin_lock(&priv->lock); list_splice(&priv->path_list, &remove_list); INIT_LIST_HEAD(&priv->path_list); @@ -347,12 +348,15 @@ void ipoib_flush_paths(struct net_device list_for_each_entry_safe(path, tp, &remove_list, list) { if (path->query) ib_sa_cancel_query(path->query_id, path->query); - spin_unlock_irq(&priv->lock); + spin_unlock(&priv->lock); + spin_unlock_irq(&priv->tx_lock); wait_for_completion(&path->done); path_free(dev, path); - spin_lock_irq(&priv->lock); + spin_lock_irq(&priv->tx_lock); + spin_lock(&priv->lock); } - spin_unlock_irq(&priv->lock); + spin_unlock(&priv->lock); + spin_unlock_irq(&priv->tx_lock); } static void path_rec_completion(int status, -- MST From rdreier at cisco.com Thu Sep 7 14:49:51 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 07 Sep 2006 14:49:51 -0700 Subject: [openib-general] [PATCH for-2.6.19] IB/ipoib: Fix flush/start xmit race take 2 (from code review) In-Reply-To: <20060907214614.GB14791@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 8 Sep 2006 00:46:15 +0300") References: <20060907214614.GB14791@mellanox.co.il> Message-ID: Thanks, I replaced the patch in my tree. From ralphc at pathscale.com Thu Sep 7 16:03:33 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Thu, 07 Sep 2006 16:03:33 -0700 Subject: [openib-general] [PATCH] IB/ipath Fix RPM build for libipathverbs Message-ID: <1157670213.8759.117.camel@brick.pathscale.com> A minor change to fix RPM builds for libipathverbs. Signed-off-by: Ralph Campbell Index: src/userspace/libipathverbs/Makefile.am =================================================================== --- src/userspace/libipathverbs/Makefile.am (revision 9347) +++ src/userspace/libipathverbs/Makefile.am (working copy) @@ -49,6 +49,7 @@ src_ipathverbs_la_LDFLAGS = -avoid-versi $(ipathverbs_version_script) EXTRA_DIST = src/ipathverbs.h \ + src/ipath-abi.h \ src/ipathverbs.map \ libipathverbs.spec.in From rjwalsh at pathscale.com Thu Sep 7 17:04:15 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Thu, 07 Sep 2006 17:04:15 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <4500B37F.3080705@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Woodruff, Robert J wrote: > Robert Walsh wrote, >> I'll give it a spin this afternoon: it looks quite a bit more >> comprehensive than the small patch I did. > > I also just tried running the ib_rdma_bw test and it seems to > be flaky if you stress it. If you just run the defaults, it seems to > work, but if you crank up the iterations and the message size, > it sometimes fails with..... > > [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 > 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | > iters=10000 | duplex=0 | cma=0 | > 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 > VAddr 0x00002a95dd3480 > 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 > VAddr 0x00002a95c85480 > 4730:main: Completion with error at client: > 4730:main: Failed status 9: wr_id 3 > 4730:main: scnt=7584, ccnt=6584 > [woody at rkl-13 bin]$ Hi Woody, When RC4 is available, there should be a patch in there that will fix this. Can you let us know if you continue to see problems? Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQCzfvzvnpzTd9fxAQLfoAf+JWrBo/pPf/tAvTRFckCqjOn3dpH59mJK n1KuN/M9lsP0UobIOEAMAR3KLvTfFe2czEb7ThMxcKjYgJHiikxuiSomB3pbsRK5 W0qTEqMmS5QYFXfpPlvVof4xxdvWZDDUzzkxG0bve4zBVjeJMUnu/8jVTTBmGbqd nmqfLrIP+N8n876x1RZade3DTz0NEDDYRT5d25asbUVuoiF7ldVtbX5RmK6rRdFZ 1ym6fIyHT+fTZ5wnVoTJRdjV8icrR9JpPj/BFL6OoxDQvgMksplDnJaTGc4XinFl WdwZV2NfImYvwSB4QUgqe4Me/BS1xl4gj+OpaviE2TzP7U6tqQVaHQ== =OLHZ -----END PGP SIGNATURE----- From dledford at redhat.com Thu Sep 7 17:44:06 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 07 Sep 2006 20:44:06 -0400 Subject: [openib-general] OpenSM/osm_log API: Use symbol versionsrather than polluting namespace In-Reply-To: <20060907062243.GH6928@mellanox.co.il> References: <1157602561.4652.53.camel@fc6.xsintricity.com> <20060907062243.GH6928@mellanox.co.il> Message-ID: <1157676246.15761.182.camel@fc6.xsintricity.com> On Thu, 2006-09-07 at 09:22 +0300, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Subject: Re: [openib-general] OpenSM/osm_log API: Use symbol versionsrather than polluting namespace > > > > On Wed, 2006-09-06 at 18:16 +0300, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > > > > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote: > > > > > Quoting r. Hal Rosenstock : > > > > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace > > > > > > > > > > > > OpenSM/osm_log API: Rather than polluting the namespace with needless > > > > > > symbols, use symbol versions and have a versioned osm_log_init rather > > > > > > than adding osm_log_init_v2 as an additional API > > > > > > > > > > > > This patch is intended to be applied to both trunk and 1.1 versions. > > > > > > > > > > > > Signed-off-by: Doug Ledford > > > > > > Signed-off-by: Hal Rosenstock > > > > > > > > > > This preserves the ABI, but would this not break the API? > > > > > > > > Yes, this patch changes the API (in a most trivial way). > > > > > > So all users need to change code or they won't compile against the new > > > library? > > > > Yes, and that is the correct way to handle this change. > > I disagree. > > In my opinion, asking all users to add a parameter they don't care about is > worse than having multiple functions with a convenient set of options. Dude, you can't do that. Ulrich Drepper, long time Cygnus/Red Hat employee, upstream maintainer of glibc, and probably the most all around authoritative person I know when it comes to open source library management, keeps most of the papers he has delivered to different conventions on his web site: http://people.redhat.com/drepper/ Amongst those papers are several best practices papers on shared library design and maintenance. Two things in particular jump out as reasons to keep the list of exported symbols in a DSO to the absolute bare minimum: 1) every symbol in the DSO is part of the global symbol table for the app, which has to be searched through during run time symbol resolution, so the more symbols you have, the more dynamic linking slows down the application (ever wondered why it takes from 20s to 1min to start OpenOffice? It's because of the proliferation of DSO symbol exports and symbol table relocations and lookups in the OO libs) and 2) if you keep the exported symbols at the bare minimum needed to implement the API, it helps to free up the library to change and evolve behind the scenes without requiring as many changes to linked applications, where as the more you expose to the applications, the more likely you are to have to change that exposed interface at some point in time. What you just argued for is the opposite of both of those accepted and commonly used practices. Not only that, but since all the old code *could* be made to work with the new API using nothing more than a macro in a header file, to argue for an extra symbol export is *really* the wrong thing to do. It's violating those two best practices above when you could achieve the same goal without violating either. > And if > there is a low cost way to help apps compile without code change, I don't see > why it makes sense to create work. Dude, you can't do that either. You have to keep the API clean. If you try to push the maintenance burden into the libraries instead of making the apps carry their share of the maintenance, then the library just ends up imploding under the impossible complexity of keeping all those different API code bases working. This particular issue certainly wouldn't have been the end of the world, but when enough things like this creep in over time, you'll eventually need to make a gen3 stack because this one is unusable. You have to put your foot down and just say no on stuff like this. > > APIs change. > > APIs should not change with every release. With a mature product, no. This is hardly a mature product. > > Any app you can build can compensate. > > Sure it seems simple if you are RedHat and rebuild the whole OS. OFED == whole stack == same thing. If you are allowing some out of stack software configuration issue to cause you as OFED maintainer to put hacks into OFED, then you need to put your OFED interests in front of your Mellanox interests when making OFED decisions. > We are past code freeze. I agree with Hal that it might be hard to > draw a line > between a critical and a non-critical bugfix. However, an API change > that > 1. is purely cosmetical > 2. requires code changes in dependent applications > 3. is not uncontroversial > is, for me, obviously beyond that line. While I agree with the past code freeze and would support yanking the entire log file truncation change for that reason, the preservation of a clean API, proper library symbol versions, and DSO best practices are far from "cosmetical". -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From maheshbarve at gmail.com Thu Sep 7 21:23:38 2006 From: maheshbarve at gmail.com (Mahesh Barve) Date: Fri, 8 Sep 2006 09:53:38 +0530 Subject: [openib-general] Multicast: help needed Message-ID: <507df10d0609072123y7348a115q558bcdb83d3347d6@mail.gmail.com> Hi, I am trying to perform multicast over Infiniband. Can someone let me know where I can get some sample code for it? Awaiting your reply, -Mahesh Barve -------------- next part -------------- An HTML attachment was scrubbed... URL: From krkumar2 at in.ibm.com Thu Sep 7 22:13:01 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 08 Sep 2006 10:43:01 +0530 Subject: [openib-general] [PATCH] Modify callers of cma_get_net_info for better error handling. Message-ID: <20060908051301.5221.63041.sendpatchset@K50wks273895wss.in.ibm.com> Re-organize code relating to cma_get_net_info() and rdma_create_id() to optimize error case handling (no need to alloc memory/etc as part of rdma_create_id() if input parameters are wrong). Signed-off-by: Krishna Kumar diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-08 09:51:40.000000000 +0530 +++ new/core/cma.c 2006-09-08 09:52:05.000000000 +0530 @@ -939,23 +939,24 @@ static struct rdma_id_private* cma_new_c __u16 port; u8 ip_ver; + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto out; id = rdma_create_id(listen_id->event_handler, listen_id->context, listen_id->ps); if (IS_ERR(id)) - return NULL; + goto out; + + cma_save_net_info(&id->route.addr, &listen_id->route.addr, + ip_ver, port, src, dst); rt = &id->route; rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1; - rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); + rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, + GFP_KERNEL); if (!rt->path_rec) - goto err; + goto destroy_id; - if (cma_get_net_info(ib_event->private_data, listen_id->ps, - &ip_ver, &port, &src, &dst)) - goto err; - - cma_save_net_info(&id->route.addr, &listen_id->route.addr, - ip_ver, port, src, dst); rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; if (rt->num_paths == 2) rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; @@ -968,8 +969,10 @@ static struct rdma_id_private* cma_new_c id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = CMA_CONNECT; return id_priv; -err: + +destroy_id: rdma_destroy_id(id); +out : return NULL; } @@ -981,31 +984,30 @@ static struct rdma_id_private* cma_new_u union cma_ip_addr *src, *dst; __u16 port; u8 ip_ver; - int ret; + + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto out; id = rdma_create_id(listen_id->event_handler, listen_id->context, listen_id->ps); if (IS_ERR(id)) - return NULL; - - - if (cma_get_net_info(ib_event->private_data, listen_id->ps, - &ip_ver, &port, &src, &dst)) - goto err; + goto out; cma_save_net_info(&id->route.addr, &listen_id->route.addr, ip_ver, port, src, dst); - ret = rdma_translate_ip(&id->route.addr.src_addr, - &id->route.addr.dev_addr); - if (ret) - goto err; + if (rdma_translate_ip(&id->route.addr.src_addr, + &id->route.addr.dev_addr)) + goto destroy_id; id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = CMA_CONNECT; return id_priv; -err: + +destroy_id: rdma_destroy_id(id); +out: return NULL; } From krkumar2 at in.ibm.com Thu Sep 7 22:13:13 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 08 Sep 2006 10:43:13 +0530 Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases. Message-ID: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com> cma_connect_ib leaks an struct ib_cm_id* in failure cases. Signed-off-by: Krishna Kumar diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-08 09:52:05.000000000 +0530 +++ new/core/cma.c 2006-09-08 09:52:30.000000000 +0530 @@ -1954,6 +1954,11 @@ static int cma_connect_ib(struct rdma_id ret = ib_send_cm_req(id_priv->cm_id.ib, &req); out: + if (ret && !IS_ERR(id_priv->cm_id.ib)) { + ib_destroy_cm_id(id_priv->cm_id.ib); + id_priv->cm_id.ib = NULL; + } + kfree(private_data); return ret; } From krkumar2 at in.ibm.com Thu Sep 7 22:14:39 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Fri, 08 Sep 2006 10:44:39 +0530 Subject: [openib-general] [PATCH] Optimize cma_process_remove() Message-ID: <20060908051439.5229.71544.sendpatchset@K50wks273895wss.in.ibm.com> Optimize cma_process_remove() by using the remove_list. Signed-off-by: Krishna Kumar diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-08 09:52:30.000000000 +0530 +++ new/core/cma.c 2006-09-08 09:57:03.000000000 +0530 @@ -2332,7 +2332,7 @@ static int cma_remove_id_dev(struct rdma static void cma_process_remove(struct cma_device *cma_dev) { struct list_head remove_list; - struct rdma_id_private *id_priv; + struct rdma_id_private *id_priv, *tmp; int ret; INIT_LIST_HEAD(&remove_list); @@ -2344,22 +2344,20 @@ static void cma_process_remove(struct cm if (cma_internal_listen(id_priv)) { cma_destroy_listen(id_priv); - continue; + } else { + list_del(&id_priv->list); + list_add_tail(&id_priv->list, &remove_list); } + } + mutex_unlock(&lock); - list_del(&id_priv->list); - list_add_tail(&id_priv->list, &remove_list); + list_for_each_entry_safe(id_priv, tmp, &remove_list, list) { atomic_inc(&id_priv->refcount); - mutex_unlock(&lock); - ret = cma_remove_id_dev(id_priv); cma_deref_id(id_priv); if (ret) rdma_destroy_id(&id_priv->id); - - mutex_lock(&lock); } - mutex_unlock(&lock); cma_deref_dev(cma_dev); wait_for_completion(&cma_dev->comp); From or.gerlitz at gmail.com Thu Sep 7 22:21:36 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 8 Sep 2006 07:21:36 +0200 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: References: <200609071902.57379.toralf.foerster@gmx.de> <15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com> Message-ID: <15ddcffd0609072221y1151d8dey48fdffc287660fbc@mail.gmail.com> On 9/7/06, Roland Dreier wrote: > Or> you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i > Or> think you are missing CONFIG_INET=m > > Seems like a bug in the iSER Kconfig -- it shouldn't be possible to > select iSER without everything it needs to compile. OK, this makes sense, we will look into that and send patch early next week. Or. From mst at mellanox.co.il Fri Sep 8 02:29:39 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 8 Sep 2006 12:29:39 +0300 Subject: [openib-general] OpenSM/osm_log API: Use symbol versionsratherthan polluting namespace In-Reply-To: <1157676246.15761.182.camel@fc6.xsintricity.com> References: <1157676246.15761.182.camel@fc6.xsintricity.com> Message-ID: <20060908092939.GA10741@mellanox.co.il> Quoting r. Doug Ledford : > What you just argued for is the opposite of both of those accepted and > commonly used practices. Not only that, but since all the old code > *could* be made to work with the new API using nothing more than a macro > in a header file, to argue for an extra symbol export is *really* the > wrong thing to do. I didn't argue about exporting symbols at all. Macro in a header file to make existing code work would be fine, but it is not present in the patch that was posted. -- MST From johnt1johnt2 at gmail.com Fri Sep 8 03:19:57 2006 From: johnt1johnt2 at gmail.com (john t) Date: Fri, 8 Sep 2006 15:49:57 +0530 Subject: [openib-general] HCAs with and without memory Message-ID: Hi OpenIB group, What is the difference between HCAs with memory and without memory. How is the on-board memory used by HCAs? Is it that data is first copied into this memory and then into physical memory? Regards, John T. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.bub at thomson.net Fri Sep 8 04:07:47 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Fri, 8 Sep 2006 13:07:47 +0200 Subject: [openib-general] Wrong byte order in lid of struct ibv_port_attr reported by ibv_query port!? Message-ID: Sean, with the help of your modified cmpost.c example I found out that the byte order in the lid your query_for_path in cmpost.c is getting into the ib_sa_path_rec is the opposite to the one reported by ibv_query_port. Since I'm doing the connection establishment based on lid, GUID and subnetID this explains why I can't connect to a client to antoher machine. Can you tell me which one is wrong or am I doing something wrong here? Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From devesh28 at gmail.com Fri Sep 8 05:01:01 2006 From: devesh28 at gmail.com (Devesh Sharma) Date: Fri, 8 Sep 2006 17:31:01 +0530 Subject: [openib-general] mthca_modify_qp : acquiring Send/Receive Q locks while modifying qp Message-ID: <309a667c0609080501h23b2b54em716d94a025676c69@mail.gmail.com> Hello all, In mthca_modify_qp function, to read current qp state both send and receive queues are locked, why locking both WQ is required? Is there any dependency on other qp operations? if (attr_mask & IB_QP_CUR_STATE) { cur_state = attr->cur_qp_state; } else { spin_lock_irq(&qp->sq.lock); spin_lock(&qp->rq.lock); cur_state = qp->state; spin_unlock(&qp->rq.lock); spin_unlock_irq(&qp->sq.lock); } Devesh From thomas.bub at thomson.net Fri Sep 8 06:57:39 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Fri, 8 Sep 2006 15:57:39 +0200 Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work Message-ID: Dortan Barak wrote: If you are using RC QP: the reason for not getting any completion in the CQ is that Did you post any RR (Receive Request) at the listener side? Dotan, with the cmpost.c example I now get a cm connection even with another machine. However I don't get the cq event, on the sender side, when the IBV_WR_SEND is done. Is this correct? Is this what you are saying below? If it is correct this is different from gen1 drivers where I got a VAPI_SUCCESS cq event. Is there a way to get this back? On the receiver side I get an cq event for the receive request. Thanks Thomas From swise at opengridcomputing.com Fri Sep 8 07:14:31 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 08 Sep 2006 09:14:31 -0500 Subject: [openib-general] Multicast: help needed In-Reply-To: <507df10d0609072123y7348a115q558bcdb83d3347d6@mail.gmail.com> References: <507df10d0609072123y7348a115q558bcdb83d3347d6@mail.gmail.com> Message-ID: <1157724871.31760.18.camel@stevo-desktop> There's a simple test case at: gen2/trunk/src/userspace/librdmacm/examples/mckey.c On Fri, 2006-09-08 at 09:53 +0530, Mahesh Barve wrote: > Hi, > > I am trying to perform multicast over Infiniband. Can someone let me > know where I can get some sample code for it? > > Awaiting your reply, > -Mahesh Barve > > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From HNGUYEN at de.ibm.com Fri Sep 8 07:28:41 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Fri, 8 Sep 2006 16:28:41 +0200 Subject: [openib-general] OFED 1.1 status In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7892@mtlexch01.mtl.com> Message-ID: Hello Tziporet! First sorry for this late response regarding ehca build test in OFED 1.1 rc3. 1) The userspace lib dir for libehca contains only a few c-files, but no header files. On svn dir branches/1.1/src/userspace/libehca/src/ I saw all files needed. Please correct this for rc4! Will you pick new version of libehca from that dir? 2) When I used the install.sh script to install the software packages or compile them on ppc64, kernel 2.6.18-rc5/6 I got the following error messages: gcc -m64 -Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1 /drivers/infiniband/core/.ib_addr.mod.o.d -nos M/BUILD/openib-1.1/include -I/var/tmp/OFEDRPM/BUILD/openib-1.1 /drivers/infiniband/include -Iinclu oft-float -pipe -mminimal-toc -mtraceback=none -mcall-aixdesc -mtune=power4 -mno-altivec -funit-at lude -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include -I/var/tmp/OFEDRPM/BUILD/openi g -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(ib_addr.mod)" -D"KBUILD_MODNAME=KBUILD_STR( o /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/ib_addr.mod.c In file included from include/asm/system.h:9, from include/linux/spinlock.h:56, from include/linux/capability.h:45, from include/linux/sched.h:44, from include/linux/module.h:9, from /var/tmp/OFEDRPM/BUILD/openib-1.1 /drivers/infiniband/core/ib_addr.mod.c:1: include/asm/hw_irq.h: In function `local_irq_disable': include/asm/hw_irq.h:51: warning: implicit declaration of function `__mtmsrd' In file included from include/asm/current.h:15, from include/linux/capability.h:46, from include/linux/sched.h:44, from include/linux/module.h:9, from /var/tmp/OFEDRPM/BUILD/openib-1.1 /drivers/infiniband/core/ib_addr.mod.c:1: include/asm/paca.h: At top level: include/asm/paca.h:84: error: `SLB_CACHE_ENTRIES' undeclared here (not in a function) In file included from include/linux/sched.h:49, from include/linux/module.h:9, from /var/tmp/OFEDRPM/BUILD/openib-1.1 /drivers/infiniband/core/ib_addr.mod.c:1: include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined If I use the kernel Makefile in /usr/src/linux-2.6.18-rc5 to compile e.g. make -C /usr/src/linux-2.6.18-rc5 SUBDIRS=/var/tmp/OFEDRPM/BUILD/openib-1.1 /drivers/infiniband/core then it works fine. We found out that the top-level kernel Makefile does the following settings LINUXINCLUDE := -Iinclude \ $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ -include include/linux/autoconf.h CPPFLAGS := -D__KERNEL__ $(LINUXINCLUDE) that include autoconf.h with all configured kernel configs like CONFIG_PPC64 etc. And obviously those config defines are lost if one uses /usr/src/linux-2.6.18-rc5/scripts/Makefile.build as OFED install.sh does. I'm wondering if anyone else also sees this problem on other architectures? Is there any reasons not to use the top-level kernel Makefile? Thanks! Nam Nguyen openib-general-bounces at openib.org wrote on 07.09.2006 22:01:30: > Hi, > OFED 1.1 RC4 will be published on Monday 11-Sep. > We currently work on several showstoppers: > 1. 223: mthca.so not properly linked to libibverbs – Vlad & Jack > 2. 221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118 - Roland > 3. 219: OFED 1.1rc3 contains prerelease unstable libibverbs code – Vlad & Jack > > Thus final release date will be delayed to end of next week > > > Tziporet Koren > Software Director > Mellanox Technologies > mailto: tziporet at mellanox.co.il > Tel +972-4-9097200, ext 380 > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rolandd at cisco.com Fri Sep 8 14:55:41 2006 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 8 Sep 2006 14:55:41 -0700 Subject: [openib-general] [PATCH 1/2] RDMA: iWARP connection manager In-Reply-To: <2006981455.F7Cau4RN2pBSAVMu@cisco.com> Message-ID: <2006981455.AsEvtu6ZdAKrdkcn@cisco.com> From: Tom Tucker Add an iWARP Connection Manager (CM), which abstracts connection management for iWARP devices (RNICs). It is a logical instance of the xx_cm where xx is the transport type (ib or iw). The symbols exported are used by the transport independent rdma_cm module, and are available also for transport dependent ULPs. Signed-off-by: Tom Tucker Signed-off-by: Steve Wise Signed-off-by: Roland Dreier --- drivers/infiniband/core/iwcm.c | 1019 ++++++++++++++++++++++++++++++++++++++++ drivers/infiniband/core/iwcm.h | 62 ++ include/rdma/iw_cm.h | 258 ++++++++++ 3 files changed, 1339 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c new file mode 100644 index 0000000..c3fb304 --- /dev/null +++ b/drivers/infiniband/core/iwcm.c @@ -0,0 +1,1019 @@ +/* + * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2004 Topspin Corporation. All rights reserved. + * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. + * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "iwcm.h" + +MODULE_AUTHOR("Tom Tucker"); +MODULE_DESCRIPTION("iWARP CM"); +MODULE_LICENSE("Dual BSD/GPL"); + +static struct workqueue_struct *iwcm_wq; +struct iwcm_work { + struct work_struct work; + struct iwcm_id_private *cm_id; + struct list_head list; + struct iw_cm_event event; + struct list_head free_list; +}; + +/* + * The following services provide a mechanism for pre-allocating iwcm_work + * elements. The design pre-allocates them based on the cm_id type: + * LISTENING IDS: Get enough elements preallocated to handle the + * listen backlog. + * ACTIVE IDS: 4: CONNECT_REPLY, ESTABLISHED, DISCONNECT, CLOSE + * PASSIVE IDS: 3: ESTABLISHED, DISCONNECT, CLOSE + * + * Allocating them in connect and listen avoids having to deal + * with allocation failures on the event upcall from the provider (which + * is called in the interrupt context). + * + * One exception is when creating the cm_id for incoming connection requests. + * There are two cases: + * 1) in the event upcall, cm_event_handler(), for a listening cm_id. If + * the backlog is exceeded, then no more connection request events will + * be processed. cm_event_handler() returns -ENOMEM in this case. Its up + * to the provider to reject the connectino request. + * 2) in the connection request workqueue handler, cm_conn_req_handler(). + * If work elements cannot be allocated for the new connect request cm_id, + * then IWCM will call the provider reject method. This is ok since + * cm_conn_req_handler() runs in the workqueue thread context. + */ + +static struct iwcm_work *get_work(struct iwcm_id_private *cm_id_priv) +{ + struct iwcm_work *work; + + if (list_empty(&cm_id_priv->work_free_list)) + return NULL; + work = list_entry(cm_id_priv->work_free_list.next, struct iwcm_work, + free_list); + list_del_init(&work->free_list); + return work; +} + +static void put_work(struct iwcm_work *work) +{ + list_add(&work->free_list, &work->cm_id->work_free_list); +} + +static void dealloc_work_entries(struct iwcm_id_private *cm_id_priv) +{ + struct list_head *e, *tmp; + + list_for_each_safe(e, tmp, &cm_id_priv->work_free_list) + kfree(list_entry(e, struct iwcm_work, free_list)); +} + +static int alloc_work_entries(struct iwcm_id_private *cm_id_priv, int count) +{ + struct iwcm_work *work; + + BUG_ON(!list_empty(&cm_id_priv->work_free_list)); + while (count--) { + work = kmalloc(sizeof(struct iwcm_work), GFP_KERNEL); + if (!work) { + dealloc_work_entries(cm_id_priv); + return -ENOMEM; + } + work->cm_id = cm_id_priv; + INIT_LIST_HEAD(&work->list); + put_work(work); + } + return 0; +} + +/* + * Save private data from incoming connection requests in the + * cm_id_priv so the low level driver doesn't have to. Adjust + * the event ptr to point to the local copy. + */ +static int copy_private_data(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *event) +{ + void *p; + + p = kmalloc(event->private_data_len, GFP_ATOMIC); + if (!p) + return -ENOMEM; + memcpy(p, event->private_data, event->private_data_len); + event->private_data = p; + return 0; +} + +/* + * Release a reference on cm_id. If the last reference is being removed + * and iw_destroy_cm_id is waiting, wake up the waiting thread. + */ +static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) +{ + int ret = 0; + + BUG_ON(atomic_read(&cm_id_priv->refcount)==0); + if (atomic_dec_and_test(&cm_id_priv->refcount)) { + BUG_ON(!list_empty(&cm_id_priv->work_list)); + if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) { + BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); + BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, + &cm_id_priv->flags)); + ret = 1; + } + complete(&cm_id_priv->destroy_comp); + } + + return ret; +} + +static void add_ref(struct iw_cm_id *cm_id) +{ + struct iwcm_id_private *cm_id_priv; + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + atomic_inc(&cm_id_priv->refcount); +} + +static void rem_ref(struct iw_cm_id *cm_id) +{ + struct iwcm_id_private *cm_id_priv; + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + iwcm_deref_id(cm_id_priv); +} + +static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event); + +struct iw_cm_id *iw_create_cm_id(struct ib_device *device, + iw_cm_handler cm_handler, + void *context) +{ + struct iwcm_id_private *cm_id_priv; + + cm_id_priv = kzalloc(sizeof(*cm_id_priv), GFP_KERNEL); + if (!cm_id_priv) + return ERR_PTR(-ENOMEM); + + cm_id_priv->state = IW_CM_STATE_IDLE; + cm_id_priv->id.device = device; + cm_id_priv->id.cm_handler = cm_handler; + cm_id_priv->id.context = context; + cm_id_priv->id.event_handler = cm_event_handler; + cm_id_priv->id.add_ref = add_ref; + cm_id_priv->id.rem_ref = rem_ref; + spin_lock_init(&cm_id_priv->lock); + atomic_set(&cm_id_priv->refcount, 1); + init_waitqueue_head(&cm_id_priv->connect_wait); + init_completion(&cm_id_priv->destroy_comp); + INIT_LIST_HEAD(&cm_id_priv->work_list); + INIT_LIST_HEAD(&cm_id_priv->work_free_list); + + return &cm_id_priv->id; +} +EXPORT_SYMBOL(iw_create_cm_id); + + +static int iwcm_modify_qp_err(struct ib_qp *qp) +{ + struct ib_qp_attr qp_attr; + + if (!qp) + return -EINVAL; + + qp_attr.qp_state = IB_QPS_ERR; + return ib_modify_qp(qp, &qp_attr, IB_QP_STATE); +} + +/* + * This is really the RDMAC CLOSING state. It is most similar to the + * IB SQD QP state. + */ +static int iwcm_modify_qp_sqd(struct ib_qp *qp) +{ + struct ib_qp_attr qp_attr; + + BUG_ON(qp == NULL); + qp_attr.qp_state = IB_QPS_SQD; + return ib_modify_qp(qp, &qp_attr, IB_QP_STATE); +} + +/* + * CM_ID <-- CLOSING + * + * Block if a passive or active connection is currenlty being processed. Then + * process the event as follows: + * - If we are ESTABLISHED, move to CLOSING and modify the QP state + * based on the abrupt flag + * - If the connection is already in the CLOSING or IDLE state, the peer is + * disconnecting concurrently with us and we've already seen the + * DISCONNECT event -- ignore the request and return 0 + * - Disconnect on a listening endpoint returns -EINVAL + */ +int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt) +{ + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret = 0; + struct ib_qp *qp = NULL; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + /* Wait if we're currently in a connect or accept downcall */ + wait_event(cm_id_priv->connect_wait, + !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags)); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_ESTABLISHED: + cm_id_priv->state = IW_CM_STATE_CLOSING; + + /* QP could be for user-mode client */ + if (cm_id_priv->qp) + qp = cm_id_priv->qp; + else + ret = -EINVAL; + break; + case IW_CM_STATE_LISTEN: + ret = -EINVAL; + break; + case IW_CM_STATE_CLOSING: + /* remote peer closed first */ + case IW_CM_STATE_IDLE: + /* accept or connect returned !0 */ + break; + case IW_CM_STATE_CONN_RECV: + /* + * App called disconnect before/without calling accept after + * connect_request event delivered. + */ + break; + case IW_CM_STATE_CONN_SENT: + /* Can only get here if wait above fails */ + default: + BUG(); + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + if (qp) { + if (abrupt) + ret = iwcm_modify_qp_err(qp); + else + ret = iwcm_modify_qp_sqd(qp); + + /* + * If both sides are disconnecting the QP could + * already be in ERR or SQD states + */ + ret = 0; + } + + return ret; +} +EXPORT_SYMBOL(iw_cm_disconnect); + +/* + * CM_ID <-- DESTROYING + * + * Clean up all resources associated with the connection and release + * the initial reference taken by iw_create_cm_id. + */ +static void destroy_cm_id(struct iw_cm_id *cm_id) +{ + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + /* + * Wait if we're currently in a connect or accept downcall. A + * listening endpoint should never block here. + */ + wait_event(cm_id_priv->connect_wait, + !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags)); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_LISTEN: + cm_id_priv->state = IW_CM_STATE_DESTROYING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + /* destroy the listening endpoint */ + ret = cm_id->device->iwcm->destroy_listen(cm_id); + spin_lock_irqsave(&cm_id_priv->lock, flags); + break; + case IW_CM_STATE_ESTABLISHED: + cm_id_priv->state = IW_CM_STATE_DESTROYING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + /* Abrupt close of the connection */ + (void)iwcm_modify_qp_err(cm_id_priv->qp); + spin_lock_irqsave(&cm_id_priv->lock, flags); + break; + case IW_CM_STATE_IDLE: + case IW_CM_STATE_CLOSING: + cm_id_priv->state = IW_CM_STATE_DESTROYING; + break; + case IW_CM_STATE_CONN_RECV: + /* + * App called destroy before/without calling accept after + * receiving connection request event notification. + */ + cm_id_priv->state = IW_CM_STATE_DESTROYING; + break; + case IW_CM_STATE_CONN_SENT: + case IW_CM_STATE_DESTROYING: + default: + BUG(); + break; + } + if (cm_id_priv->qp) { + cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); + cm_id_priv->qp = NULL; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + (void)iwcm_deref_id(cm_id_priv); +} + +/* + * This function is only called by the application thread and cannot + * be called by the event thread. The function will wait for all + * references to be released on the cm_id and then kfree the cm_id + * object. + */ +void iw_destroy_cm_id(struct iw_cm_id *cm_id) +{ + struct iwcm_id_private *cm_id_priv; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)); + + destroy_cm_id(cm_id); + + wait_for_completion(&cm_id_priv->destroy_comp); + + dealloc_work_entries(cm_id_priv); + + kfree(cm_id_priv); +} +EXPORT_SYMBOL(iw_destroy_cm_id); + +/* + * CM_ID <-- LISTEN + * + * Start listening for connect requests. Generates one CONNECT_REQUEST + * event for each inbound connect request. + */ +int iw_cm_listen(struct iw_cm_id *cm_id, int backlog) +{ + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret = 0; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + + ret = alloc_work_entries(cm_id_priv, backlog); + if (ret) + return ret; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_IDLE: + cm_id_priv->state = IW_CM_STATE_LISTEN; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = cm_id->device->iwcm->create_listen(cm_id, backlog); + if (ret) + cm_id_priv->state = IW_CM_STATE_IDLE; + spin_lock_irqsave(&cm_id_priv->lock, flags); + break; + default: + ret = -EINVAL; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + return ret; +} +EXPORT_SYMBOL(iw_cm_listen); + +/* + * CM_ID <-- IDLE + * + * Rejects an inbound connection request. No events are generated. + */ +int iw_cm_reject(struct iw_cm_id *cm_id, + const void *private_data, + u8 private_data_len) +{ + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + return -EINVAL; + } + cm_id_priv->state = IW_CM_STATE_IDLE; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + ret = cm_id->device->iwcm->reject(cm_id, private_data, + private_data_len); + + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + + return ret; +} +EXPORT_SYMBOL(iw_cm_reject); + +/* + * CM_ID <-- ESTABLISHED + * + * Accepts an inbound connection request and generates an ESTABLISHED + * event. Callers of iw_cm_disconnect and iw_destroy_cm_id will block + * until the ESTABLISHED event is received from the provider. + */ +int iw_cm_accept(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *iw_param) +{ + struct iwcm_id_private *cm_id_priv; + struct ib_qp *qp; + unsigned long flags; + int ret; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + return -EINVAL; + } + /* Get the ib_qp given the QPN */ + qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn); + if (!qp) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return -EINVAL; + } + cm_id->device->iwcm->add_ref(qp); + cm_id_priv->qp = qp; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + ret = cm_id->device->iwcm->accept(cm_id, iw_param); + if (ret) { + /* An error on accept precludes provider events */ + BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV); + cm_id_priv->state = IW_CM_STATE_IDLE; + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->qp) { + cm_id->device->iwcm->rem_ref(qp); + cm_id_priv->qp = NULL; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + } + + return ret; +} +EXPORT_SYMBOL(iw_cm_accept); + +/* + * Active Side: CM_ID <-- CONN_SENT + * + * If successful, results in the generation of a CONNECT_REPLY + * event. iw_cm_disconnect and iw_cm_destroy will block until the + * CONNECT_REPLY event is received from the provider. + */ +int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) +{ + struct iwcm_id_private *cm_id_priv; + int ret = 0; + unsigned long flags; + struct ib_qp *qp; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + + ret = alloc_work_entries(cm_id_priv, 4); + if (ret) + return ret; + + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + spin_lock_irqsave(&cm_id_priv->lock, flags); + + if (cm_id_priv->state != IW_CM_STATE_IDLE) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + return -EINVAL; + } + + /* Get the ib_qp given the QPN */ + qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn); + if (!qp) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return -EINVAL; + } + cm_id->device->iwcm->add_ref(qp); + cm_id_priv->qp = qp; + cm_id_priv->state = IW_CM_STATE_CONN_SENT; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + ret = cm_id->device->iwcm->connect(cm_id, iw_param); + if (ret) { + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->qp) { + cm_id->device->iwcm->rem_ref(qp); + cm_id_priv->qp = NULL; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT); + cm_id_priv->state = IW_CM_STATE_IDLE; + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + } + + return ret; +} +EXPORT_SYMBOL(iw_cm_connect); + +/* + * Passive Side: new CM_ID <-- CONN_RECV + * + * Handles an inbound connect request. The function creates a new + * iw_cm_id to represent the new connection and inherits the client + * callback function and other attributes from the listening parent. + * + * The work item contains a pointer to the listen_cm_id and the event. The + * listen_cm_id contains the client cm_handler, context and + * device. These are copied when the device is cloned. The event + * contains the new four tuple. + * + * An error on the child should not affect the parent, so this + * function does not return a value. + */ +static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + struct iw_cm_id *cm_id; + struct iwcm_id_private *cm_id_priv; + int ret; + + /* + * The provider should never generate a connection request + * event with a bad status. + */ + BUG_ON(iw_event->status); + + /* + * We could be destroying the listening id. If so, ignore this + * upcall. + */ + spin_lock_irqsave(&listen_id_priv->lock, flags); + if (listen_id_priv->state != IW_CM_STATE_LISTEN) { + spin_unlock_irqrestore(&listen_id_priv->lock, flags); + return; + } + spin_unlock_irqrestore(&listen_id_priv->lock, flags); + + cm_id = iw_create_cm_id(listen_id_priv->id.device, + listen_id_priv->id.cm_handler, + listen_id_priv->id.context); + /* If the cm_id could not be created, ignore the request */ + if (IS_ERR(cm_id)) + return; + + cm_id->provider_data = iw_event->provider_data; + cm_id->local_addr = iw_event->local_addr; + cm_id->remote_addr = iw_event->remote_addr; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + cm_id_priv->state = IW_CM_STATE_CONN_RECV; + + ret = alloc_work_entries(cm_id_priv, 3); + if (ret) { + iw_cm_reject(cm_id, NULL, 0); + iw_destroy_cm_id(cm_id); + return; + } + + /* Call the client CM handler */ + ret = cm_id->cm_handler(cm_id, iw_event); + if (ret) { + set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); + destroy_cm_id(cm_id); + if (atomic_read(&cm_id_priv->refcount)==0) + kfree(cm_id); + } + + if (iw_event->private_data_len) + kfree(iw_event->private_data); +} + +/* + * Passive Side: CM_ID <-- ESTABLISHED + * + * The provider generated an ESTABLISHED event which means that + * the MPA negotion has completed successfully and we are now in MPA + * FPDU mode. + * + * This event can only be received in the CONN_RECV state. If the + * remote peer closed, the ESTABLISHED event would be received followed + * by the CLOSE event. If the app closes, it will block until we wake + * it up after processing this event. + */ +static int cm_conn_est_handler(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + int ret = 0; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + + /* + * We clear the CONNECT_WAIT bit here to allow the callback + * function to call iw_cm_disconnect. Calling iw_destroy_cm_id + * from a callback handler is not allowed. + */ + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV); + cm_id_priv->state = IW_CM_STATE_ESTABLISHED; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); + wake_up_all(&cm_id_priv->connect_wait); + + return ret; +} + +/* + * Active Side: CM_ID <-- ESTABLISHED + * + * The app has called connect and is waiting for the established event to + * post it's requests to the server. This event will wake up anyone + * blocked in iw_cm_disconnect or iw_destroy_id. + */ +static int cm_conn_rep_handler(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + int ret = 0; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + /* + * Clear the connect wait bit so a callback function calling + * iw_cm_disconnect will not wait and deadlock this thread + */ + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT); + if (iw_event->status == IW_CM_EVENT_STATUS_ACCEPTED) { + cm_id_priv->id.local_addr = iw_event->local_addr; + cm_id_priv->id.remote_addr = iw_event->remote_addr; + cm_id_priv->state = IW_CM_STATE_ESTABLISHED; + } else { + /* REJECTED or RESET */ + cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); + cm_id_priv->qp = NULL; + cm_id_priv->state = IW_CM_STATE_IDLE; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); + + if (iw_event->private_data_len) + kfree(iw_event->private_data); + + /* Wake up waiters on connect complete */ + wake_up_all(&cm_id_priv->connect_wait); + + return ret; +} + +/* + * CM_ID <-- CLOSING + * + * If in the ESTABLISHED state, move to CLOSING. + */ +static void cm_disconnect_handler(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->state == IW_CM_STATE_ESTABLISHED) + cm_id_priv->state = IW_CM_STATE_CLOSING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); +} + +/* + * CM_ID <-- IDLE + * + * If in the ESTBLISHED or CLOSING states, the QP will have have been + * moved by the provider to the ERR state. Disassociate the CM_ID from + * the QP, move to IDLE, and remove the 'connected' reference. + * + * If in some other state, the cm_id was destroyed asynchronously. + * This is the last reference that will result in waking up + * the app thread blocked in iw_destroy_cm_id. + */ +static int cm_close_handler(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + int ret = 0; + spin_lock_irqsave(&cm_id_priv->lock, flags); + + if (cm_id_priv->qp) { + cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); + cm_id_priv->qp = NULL; + } + switch (cm_id_priv->state) { + case IW_CM_STATE_ESTABLISHED: + case IW_CM_STATE_CLOSING: + cm_id_priv->state = IW_CM_STATE_IDLE; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); + spin_lock_irqsave(&cm_id_priv->lock, flags); + break; + case IW_CM_STATE_DESTROYING: + break; + default: + BUG(); + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + return ret; +} + +static int process_event(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + int ret = 0; + + switch (iw_event->event) { + case IW_CM_EVENT_CONNECT_REQUEST: + cm_conn_req_handler(cm_id_priv, iw_event); + break; + case IW_CM_EVENT_CONNECT_REPLY: + ret = cm_conn_rep_handler(cm_id_priv, iw_event); + break; + case IW_CM_EVENT_ESTABLISHED: + ret = cm_conn_est_handler(cm_id_priv, iw_event); + break; + case IW_CM_EVENT_DISCONNECT: + cm_disconnect_handler(cm_id_priv, iw_event); + break; + case IW_CM_EVENT_CLOSE: + ret = cm_close_handler(cm_id_priv, iw_event); + break; + default: + BUG(); + } + + return ret; +} + +/* + * Process events on the work_list for the cm_id. If the callback + * function requests that the cm_id be deleted, a flag is set in the + * cm_id flags to indicate that when the last reference is + * removed, the cm_id is to be destroyed. This is necessary to + * distinguish between an object that will be destroyed by the app + * thread asleep on the destroy_comp list vs. an object destroyed + * here synchronously when the last reference is removed. + */ +static void cm_work_handler(void *arg) +{ + struct iwcm_work *work = arg, lwork; + struct iwcm_id_private *cm_id_priv = work->cm_id; + unsigned long flags; + int empty; + int ret = 0; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + empty = list_empty(&cm_id_priv->work_list); + while (!empty) { + work = list_entry(cm_id_priv->work_list.next, + struct iwcm_work, list); + list_del_init(&work->list); + empty = list_empty(&cm_id_priv->work_list); + lwork = *work; + put_work(work); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + ret = process_event(cm_id_priv, &work->event); + if (ret) { + set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); + destroy_cm_id(&cm_id_priv->id); + } + BUG_ON(atomic_read(&cm_id_priv->refcount)==0); + if (iwcm_deref_id(cm_id_priv)) + return; + + if (atomic_read(&cm_id_priv->refcount)==0 && + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { + dealloc_work_entries(cm_id_priv); + kfree(cm_id_priv); + return; + } + spin_lock_irqsave(&cm_id_priv->lock, flags); + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); +} + +/* + * This function is called on interrupt context. Schedule events on + * the iwcm_wq thread to allow callback functions to downcall into + * the CM and/or block. Events are queued to a per-CM_ID + * work_list. If this is the first event on the work_list, the work + * element is also queued on the iwcm_wq thread. + * + * Each event holds a reference on the cm_id. Until the last posted + * event has been delivered and processed, the cm_id cannot be + * deleted. + * + * Returns: + * 0 - the event was handled. + * -ENOMEM - the event was not handled due to lack of resources. + */ +static int cm_event_handler(struct iw_cm_id *cm_id, + struct iw_cm_event *iw_event) +{ + struct iwcm_work *work; + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret = 0; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + work = get_work(cm_id_priv); + if (!work) { + ret = -ENOMEM; + goto out; + } + + INIT_WORK(&work->work, cm_work_handler, work); + work->cm_id = cm_id_priv; + work->event = *iw_event; + + if ((work->event.event == IW_CM_EVENT_CONNECT_REQUEST || + work->event.event == IW_CM_EVENT_CONNECT_REPLY) && + work->event.private_data_len) { + ret = copy_private_data(cm_id_priv, &work->event); + if (ret) { + put_work(work); + goto out; + } + } + + atomic_inc(&cm_id_priv->refcount); + if (list_empty(&cm_id_priv->work_list)) { + list_add_tail(&work->list, &cm_id_priv->work_list); + queue_work(iwcm_wq, &work->work); + } else + list_add_tail(&work->list, &cm_id_priv->work_list); +out: + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return ret; +} + +static int iwcm_init_qp_init_attr(struct iwcm_id_private *cm_id_priv, + struct ib_qp_attr *qp_attr, + int *qp_attr_mask) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_IDLE: + case IW_CM_STATE_CONN_SENT: + case IW_CM_STATE_CONN_RECV: + case IW_CM_STATE_ESTABLISHED: + *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; + qp_attr->qp_access_flags = IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE| + IB_ACCESS_REMOTE_READ; + ret = 0; + break; + default: + ret = -EINVAL; + break; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return ret; +} + +static int iwcm_init_qp_rts_attr(struct iwcm_id_private *cm_id_priv, + struct ib_qp_attr *qp_attr, + int *qp_attr_mask) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_IDLE: + case IW_CM_STATE_CONN_SENT: + case IW_CM_STATE_CONN_RECV: + case IW_CM_STATE_ESTABLISHED: + *qp_attr_mask = 0; + ret = 0; + break; + default: + ret = -EINVAL; + break; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return ret; +} + +int iw_cm_init_qp_attr(struct iw_cm_id *cm_id, + struct ib_qp_attr *qp_attr, + int *qp_attr_mask) +{ + struct iwcm_id_private *cm_id_priv; + int ret; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + switch (qp_attr->qp_state) { + case IB_QPS_INIT: + case IB_QPS_RTR: + ret = iwcm_init_qp_init_attr(cm_id_priv, + qp_attr, qp_attr_mask); + break; + case IB_QPS_RTS: + ret = iwcm_init_qp_rts_attr(cm_id_priv, + qp_attr, qp_attr_mask); + break; + default: + ret = -EINVAL; + break; + } + return ret; +} +EXPORT_SYMBOL(iw_cm_init_qp_attr); + +static int __init iw_cm_init(void) +{ + iwcm_wq = create_singlethread_workqueue("iw_cm_wq"); + if (!iwcm_wq) + return -ENOMEM; + + return 0; +} + +static void __exit iw_cm_cleanup(void) +{ + destroy_workqueue(iwcm_wq); +} + +module_init(iw_cm_init); +module_exit(iw_cm_cleanup); diff --git a/drivers/infiniband/core/iwcm.h b/drivers/infiniband/core/iwcm.h new file mode 100644 index 0000000..3f6cc82 --- /dev/null +++ b/drivers/infiniband/core/iwcm.h @@ -0,0 +1,62 @@ +/* + * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef IWCM_H +#define IWCM_H + +enum iw_cm_state { + IW_CM_STATE_IDLE, /* unbound, inactive */ + IW_CM_STATE_LISTEN, /* listen waiting for connect */ + IW_CM_STATE_CONN_RECV, /* inbound waiting for user accept */ + IW_CM_STATE_CONN_SENT, /* outbound waiting for peer accept */ + IW_CM_STATE_ESTABLISHED, /* established */ + IW_CM_STATE_CLOSING, /* disconnect */ + IW_CM_STATE_DESTROYING /* object being deleted */ +}; + +struct iwcm_id_private { + struct iw_cm_id id; + enum iw_cm_state state; + unsigned long flags; + struct ib_qp *qp; + struct completion destroy_comp; + wait_queue_head_t connect_wait; + struct list_head work_list; + spinlock_t lock; + atomic_t refcount; + struct list_head work_free_list; +}; + +#define IWCM_F_CALLBACK_DESTROY 1 +#define IWCM_F_CONNECT_WAIT 2 + +#endif /* IWCM_H */ diff --git a/include/rdma/iw_cm.h b/include/rdma/iw_cm.h new file mode 100644 index 0000000..aeefa9b --- /dev/null +++ b/include/rdma/iw_cm.h @@ -0,0 +1,258 @@ +/* + * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef IW_CM_H +#define IW_CM_H + +#include +#include + +struct iw_cm_id; + +enum iw_cm_event_type { + IW_CM_EVENT_CONNECT_REQUEST = 1, /* connect request received */ + IW_CM_EVENT_CONNECT_REPLY, /* reply from active connect request */ + IW_CM_EVENT_ESTABLISHED, /* passive side accept successful */ + IW_CM_EVENT_DISCONNECT, /* orderly shutdown */ + IW_CM_EVENT_CLOSE /* close complete */ +}; + +enum iw_cm_event_status { + IW_CM_EVENT_STATUS_OK = 0, /* request successful */ + IW_CM_EVENT_STATUS_ACCEPTED = 0, /* connect request accepted */ + IW_CM_EVENT_STATUS_REJECTED, /* connect request rejected */ + IW_CM_EVENT_STATUS_TIMEOUT, /* the operation timed out */ + IW_CM_EVENT_STATUS_RESET, /* reset from remote peer */ + IW_CM_EVENT_STATUS_EINVAL, /* asynchronous failure for bad parm */ +}; + +struct iw_cm_event { + enum iw_cm_event_type event; + enum iw_cm_event_status status; + struct sockaddr_in local_addr; + struct sockaddr_in remote_addr; + void *private_data; + u8 private_data_len; + void* provider_data; +}; + +/** + * iw_cm_handler - Function to be called by the IW CM when delivering events + * to the client. + * + * @cm_id: The IW CM identifier associated with the event. + * @event: Pointer to the event structure. + */ +typedef int (*iw_cm_handler)(struct iw_cm_id *cm_id, + struct iw_cm_event *event); + +/** + * iw_event_handler - Function called by the provider when delivering provider + * events to the IW CM. Returns either 0 indicating the event was processed + * or -errno if the event could not be processed. + * + * @cm_id: The IW CM identifier associated with the event. + * @event: Pointer to the event structure. + */ +typedef int (*iw_event_handler)(struct iw_cm_id *cm_id, + struct iw_cm_event *event); + +struct iw_cm_id { + iw_cm_handler cm_handler; /* client callback function */ + void *context; /* client cb context */ + struct ib_device *device; + struct sockaddr_in local_addr; + struct sockaddr_in remote_addr; + void *provider_data; /* provider private data */ + iw_event_handler event_handler; /* cb for provider + events */ + /* Used by provider to add and remove refs on IW cm_id */ + void (*add_ref)(struct iw_cm_id *); + void (*rem_ref)(struct iw_cm_id *); +}; + +struct iw_cm_conn_param { + const void *private_data; + u16 private_data_len; + u32 ord; + u32 ird; + u32 qpn; +}; + +struct iw_cm_verbs { + void (*add_ref)(struct ib_qp *qp); + + void (*rem_ref)(struct ib_qp *qp); + + struct ib_qp * (*get_qp)(struct ib_device *device, + int qpn); + + int (*connect)(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *conn_param); + + int (*accept)(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *conn_param); + + int (*reject)(struct iw_cm_id *cm_id, + const void *pdata, u8 pdata_len); + + int (*create_listen)(struct iw_cm_id *cm_id, + int backlog); + + int (*destroy_listen)(struct iw_cm_id *cm_id); +}; + +/** + * iw_create_cm_id - Create an IW CM identifier. + * + * @device: The IB device on which to create the IW CM identier. + * @event_handler: User callback invoked to report events associated with the + * returned IW CM identifier. + * @context: User specified context associated with the id. + */ +struct iw_cm_id *iw_create_cm_id(struct ib_device *device, + iw_cm_handler cm_handler, void *context); + +/** + * iw_destroy_cm_id - Destroy an IW CM identifier. + * + * @cm_id: The previously created IW CM identifier to destroy. + * + * The client can assume that no events will be delivered for the CM ID after + * this function returns. + */ +void iw_destroy_cm_id(struct iw_cm_id *cm_id); + +/** + * iw_cm_bind_qp - Unbind the specified IW CM identifier and QP + * + * @cm_id: The IW CM idenfier to unbind from the QP. + * @qp: The QP + * + * This is called by the provider when destroying the QP to ensure + * that any references held by the IWCM are released. It may also + * be called by the IWCM when destroying a CM_ID to that any + * references held by the provider are released. + */ +void iw_cm_unbind_qp(struct iw_cm_id *cm_id, struct ib_qp *qp); + +/** + * iw_cm_get_qp - Return the ib_qp associated with a QPN + * + * @ib_device: The IB device + * @qpn: The queue pair number + */ +struct ib_qp *iw_cm_get_qp(struct ib_device *device, int qpn); + +/** + * iw_cm_listen - Listen for incoming connection requests on the + * specified IW CM id. + * + * @cm_id: The IW CM identifier. + * @backlog: The maximum number of outstanding un-accepted inbound listen + * requests to queue. + * + * The source address and port number are specified in the IW CM identifier + * structure. + */ +int iw_cm_listen(struct iw_cm_id *cm_id, int backlog); + +/** + * iw_cm_accept - Called to accept an incoming connect request. + * + * @cm_id: The IW CM identifier associated with the connection request. + * @iw_param: Pointer to a structure containing connection establishment + * parameters. + * + * The specified cm_id will have been provided in the event data for a + * CONNECT_REQUEST event. Subsequent events related to this connection will be + * delivered to the specified IW CM identifier prior and may occur prior to + * the return of this function. If this function returns a non-zero value, the + * client can assume that no events will be delivered to the specified IW CM + * identifier. + */ +int iw_cm_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param); + +/** + * iw_cm_reject - Reject an incoming connection request. + * + * @cm_id: Connection identifier associated with the request. + * @private_daa: Pointer to data to deliver to the remote peer as part of the + * reject message. + * @private_data_len: The number of bytes in the private_data parameter. + * + * The client can assume that no events will be delivered to the specified IW + * CM identifier following the return of this function. The private_data + * buffer is available for reuse when this function returns. + */ +int iw_cm_reject(struct iw_cm_id *cm_id, const void *private_data, + u8 private_data_len); + +/** + * iw_cm_connect - Called to request a connection to a remote peer. + * + * @cm_id: The IW CM identifier for the connection. + * @iw_param: Pointer to a structure containing connection establishment + * parameters. + * + * Events may be delivered to the specified IW CM identifier prior to the + * return of this function. If this function returns a non-zero value, the + * client can assume that no events will be delivered to the specified IW CM + * identifier. + */ +int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param); + +/** + * iw_cm_disconnect - Close the specified connection. + * + * @cm_id: The IW CM identifier to close. + * @abrupt: If 0, the connection will be closed gracefully, otherwise, the + * connection will be reset. + * + * The IW CM identifier is still active until the IW_CM_EVENT_CLOSE event is + * delivered. + */ +int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt); + +/** + * iw_cm_init_qp_attr - Called to initialize the attributes of the QP + * associated with a IW CM identifier. + * + * @cm_id: The IW CM identifier associated with the QP + * @qp_attr: Pointer to the QP attributes structure. + * @qp_attr_mask: Pointer to a bit vector specifying which QP attributes are + * valid. + */ +int iw_cm_init_qp_attr(struct iw_cm_id *cm_id, struct ib_qp_attr *qp_attr, + int *qp_attr_mask); + +#endif /* IW_CM_H */ -- 1.4.1 From rolandd at cisco.com Fri Sep 8 14:55:41 2006 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 8 Sep 2006 14:55:41 -0700 Subject: [openib-general] [PATCH 2/2] RDMA: iWARP changes to IB core In-Reply-To: <2006981455.AsEvtu6ZdAKrdkcn@cisco.com> Message-ID: <2006981455.5zPhTm8jRQnxTde2@cisco.com> From: Tom Tucker Modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP, including: - Hook iWARP CM into the build system and use it in rdma_cm. - Convert enum ib_node_type to enum rdma_node_type, which includes the possibility of RDMA_NODE_RNIC, and update everything for this. Signed-off-by: Tom Tucker Signed-off-by: Steve Wise Signed-off-by: Roland Dreier --- drivers/infiniband/core/Makefile | 4 drivers/infiniband/core/addr.c | 18 + drivers/infiniband/core/cache.c | 5 drivers/infiniband/core/cm.c | 3 drivers/infiniband/core/cma.c | 355 +++++++++++++++++++++++--- drivers/infiniband/core/device.c | 4 drivers/infiniband/core/mad.c | 7 - drivers/infiniband/core/sa_query.c | 5 drivers/infiniband/core/smi.c | 16 + drivers/infiniband/core/sysfs.c | 11 - drivers/infiniband/core/ucm.c | 3 drivers/infiniband/core/user_mad.c | 5 drivers/infiniband/core/verbs.c | 17 + drivers/infiniband/hw/ehca/ehca_main.c | 2 drivers/infiniband/hw/ipath/ipath_verbs.c | 2 drivers/infiniband/hw/mthca/mthca_provider.c | 2 drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 + drivers/infiniband/ulp/srp/ib_srp.c | 2 include/rdma/ib_addr.h | 17 + include/rdma/ib_verbs.h | 25 ++ 20 files changed, 430 insertions(+), 81 deletions(-) diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 68e73ec..163d991 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -1,7 +1,7 @@ infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ - ib_cm.o $(infiniband-y) + ib_cm.o iw_cm.o $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o @@ -14,6 +14,8 @@ ib_sa-y := sa_query.o ib_cm-y := cm.o +iw_cm-y := iwcm.o + rdma_cm-y := cma.o ib_addr-y := addr.o diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index d8e54e0..9cbf09e 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -61,12 +61,15 @@ static LIST_HEAD(req_list); static DECLARE_WORK(work, process_req, NULL); static struct workqueue_struct *addr_wq; -static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, - unsigned char *dst_dev_addr) +int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, + const unsigned char *dst_dev_addr) { switch (dev->type) { case ARPHRD_INFINIBAND: - dev_addr->dev_type = IB_NODE_CA; + dev_addr->dev_type = RDMA_NODE_IB_CA; + break; + case ARPHRD_ETHER: + dev_addr->dev_type = RDMA_NODE_RNIC; break; default: return -EADDRNOTAVAIL; @@ -78,6 +81,7 @@ static int copy_addr(struct rdma_dev_add memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN); return 0; } +EXPORT_SYMBOL(rdma_copy_addr); int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) { @@ -89,7 +93,7 @@ int rdma_translate_ip(struct sockaddr *a if (!dev) return -EADDRNOTAVAIL; - ret = copy_addr(dev_addr, dev, NULL); + ret = rdma_copy_addr(dev_addr, dev, NULL); dev_put(dev); return ret; } @@ -161,7 +165,7 @@ static int addr_resolve_remote(struct so /* If the device does ARP internally, return 'done' */ if (rt->idev->dev->flags & IFF_NOARP) { - copy_addr(addr, rt->idev->dev, NULL); + rdma_copy_addr(addr, rt->idev->dev, NULL); goto put; } @@ -181,7 +185,7 @@ static int addr_resolve_remote(struct so src_in->sin_addr.s_addr = rt->rt_src; } - ret = copy_addr(addr, neigh->dev, neigh->ha); + ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); release: neigh_release(neigh); put: @@ -245,7 +249,7 @@ static int addr_resolve_local(struct soc if (ZERONET(src_ip)) { src_in->sin_family = dst_in->sin_family; src_in->sin_addr.s_addr = dst_ip; - ret = copy_addr(addr, dev, dev->dev_addr); + ret = rdma_copy_addr(addr, dev, dev->dev_addr); } else if (LOOPBACK(src_ip)) { ret = rdma_translate_ip((struct sockaddr *)dst_in, addr); if (!ret) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 75313ad..20e9f64 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -62,12 +62,13 @@ struct ib_update_work { static inline int start_port(struct ib_device *device) { - return device->node_type == IB_NODE_SWITCH ? 0 : 1; + return (device->node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1; } static inline int end_port(struct ib_device *device) { - return device->node_type == IB_NODE_SWITCH ? 0 : device->phys_port_cnt; + return (device->node_type == RDMA_NODE_IB_SWITCH) ? + 0 : device->phys_port_cnt; } int ib_get_cached_gid(struct ib_device *device, diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 1c145fe..e130d2e 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3280,6 +3280,9 @@ static void cm_add_one(struct ib_device int ret; u8 i; + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + cm_dev = kmalloc(sizeof(*cm_dev) + sizeof(*port) * device->phys_port_cnt, GFP_KERNEL); if (!cm_dev) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f7be5e7..c54c55a 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -35,6 +35,7 @@ #include #include #include #include +#include #include @@ -43,6 +44,7 @@ #include #include #include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); @@ -124,6 +126,7 @@ struct rdma_id_private { int query_id; union { struct ib_cm_id *ib; + struct iw_cm_id *iw; } cm_id; u32 seq_num; @@ -259,14 +262,23 @@ static void cma_detach_from_dev(struct r id_priv->cma_dev = NULL; } -static int cma_acquire_ib_dev(struct rdma_id_private *id_priv) +static int cma_acquire_dev(struct rdma_id_private *id_priv) { + enum rdma_node_type dev_type = id_priv->id.route.addr.dev_addr.dev_type; struct cma_device *cma_dev; union ib_gid gid; int ret = -ENODEV; - ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid), - + switch (rdma_node_get_transport(dev_type)) { + case RDMA_TRANSPORT_IB: + ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); + break; + case RDMA_TRANSPORT_IWARP: + iw_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); + break; + default: + return -ENODEV; + } mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) { ret = ib_find_cached_gid(cma_dev->device, &gid, @@ -280,16 +292,6 @@ static int cma_acquire_ib_dev(struct rdm return ret; } -static int cma_acquire_dev(struct rdma_id_private *id_priv) -{ - switch (id_priv->id.route.addr.dev_addr.dev_type) { - case IB_NODE_CA: - return cma_acquire_ib_dev(id_priv); - default: - return -ENODEV; - } -} - static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) @@ -347,6 +349,16 @@ static int cma_init_ib_qp(struct rdma_id IB_QP_PKEY_INDEX | IB_QP_PORT); } +static int cma_init_iw_qp(struct rdma_id_private *id_priv, struct ib_qp *qp) +{ + struct ib_qp_attr qp_attr; + + qp_attr.qp_state = IB_QPS_INIT; + qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE; + + return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS); +} + int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd, struct ib_qp_init_attr *qp_init_attr) { @@ -362,10 +374,13 @@ int rdma_create_qp(struct rdma_cm_id *id if (IS_ERR(qp)) return PTR_ERR(qp); - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = cma_init_ib_qp(id_priv, qp); break; + case RDMA_TRANSPORT_IWARP: + ret = cma_init_iw_qp(id_priv, qp); + break; default: ret = -ENOSYS; break; @@ -451,13 +466,17 @@ int rdma_init_qp_attr(struct rdma_cm_id int ret; id_priv = container_of(id, struct rdma_id_private, id); - switch (id_priv->id.device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id_priv->id.device->node_type)) { + case RDMA_TRANSPORT_IB: ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, qp_attr, qp_attr_mask); if (qp_attr->qp_state == IB_QPS_RTR) qp_attr->rq_psn = id_priv->seq_num; break; + case RDMA_TRANSPORT_IWARP: + ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr, + qp_attr_mask); + break; default: ret = -ENOSYS; break; @@ -590,8 +609,8 @@ static int cma_notify_user(struct rdma_i static void cma_cancel_route(struct rdma_id_private *id_priv) { - switch (id_priv->id.device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id_priv->id.device->node_type)) { + case RDMA_TRANSPORT_IB: if (id_priv->query) ib_sa_cancel_query(id_priv->query_id, id_priv->query); break; @@ -611,11 +630,15 @@ static void cma_destroy_listen(struct rd cma_exch(id_priv, CMA_DESTROYING); if (id_priv->cma_dev) { - switch (id_priv->id.device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id_priv->id.device->node_type)) { + case RDMA_TRANSPORT_IB: if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) ib_destroy_cm_id(id_priv->cm_id.ib); break; + case RDMA_TRANSPORT_IWARP: + if (id_priv->cm_id.iw && !IS_ERR(id_priv->cm_id.iw)) + iw_destroy_cm_id(id_priv->cm_id.iw); + break; default: break; } @@ -690,11 +713,15 @@ void rdma_destroy_id(struct rdma_cm_id * cma_cancel_operation(id_priv, state); if (id_priv->cma_dev) { - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) ib_destroy_cm_id(id_priv->cm_id.ib); break; + case RDMA_TRANSPORT_IWARP: + if (id_priv->cm_id.iw && !IS_ERR(id_priv->cm_id.iw)) + iw_destroy_cm_id(id_priv->cm_id.iw); + break; default: break; } @@ -869,7 +896,7 @@ static struct rdma_id_private *cma_new_i ib_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid); ib_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid); ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey)); - rt->addr.dev_addr.dev_type = IB_NODE_CA; + rt->addr.dev_addr.dev_type = RDMA_NODE_IB_CA; id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = CMA_CONNECT; @@ -898,7 +925,7 @@ static int cma_req_handler(struct ib_cm_ } atomic_inc(&conn_id->dev_remove); - ret = cma_acquire_ib_dev(conn_id); + ret = cma_acquire_dev(conn_id); if (ret) { ret = -ENODEV; cma_release_remove(conn_id); @@ -982,6 +1009,128 @@ static void cma_set_compare_data(enum rd } } +static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) +{ + struct rdma_id_private *id_priv = iw_id->context; + enum rdma_cm_event_type event = 0; + struct sockaddr_in *sin; + int ret = 0; + + atomic_inc(&id_priv->dev_remove); + + switch (iw_event->event) { + case IW_CM_EVENT_CLOSE: + event = RDMA_CM_EVENT_DISCONNECTED; + break; + case IW_CM_EVENT_CONNECT_REPLY: + sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; + *sin = iw_event->local_addr; + sin = (struct sockaddr_in *) &id_priv->id.route.addr.dst_addr; + *sin = iw_event->remote_addr; + if (iw_event->status) + event = RDMA_CM_EVENT_REJECTED; + else + event = RDMA_CM_EVENT_ESTABLISHED; + break; + case IW_CM_EVENT_ESTABLISHED: + event = RDMA_CM_EVENT_ESTABLISHED; + break; + default: + BUG_ON(1); + } + + ret = cma_notify_user(id_priv, event, iw_event->status, + iw_event->private_data, + iw_event->private_data_len); + if (ret) { + /* Destroy the CM ID by returning a non-zero value. */ + id_priv->cm_id.iw = NULL; + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + rdma_destroy_id(&id_priv->id); + return ret; + } + + cma_release_remove(id_priv); + return ret; +} + +static int iw_conn_req_handler(struct iw_cm_id *cm_id, + struct iw_cm_event *iw_event) +{ + struct rdma_cm_id *new_cm_id; + struct rdma_id_private *listen_id, *conn_id; + struct sockaddr_in *sin; + struct net_device *dev = NULL; + int ret; + + listen_id = cm_id->context; + atomic_inc(&listen_id->dev_remove); + if (!cma_comp(listen_id, CMA_LISTEN)) { + ret = -ECONNABORTED; + goto out; + } + + /* Create a new RDMA id for the new IW CM ID */ + new_cm_id = rdma_create_id(listen_id->id.event_handler, + listen_id->id.context, + RDMA_PS_TCP); + if (!new_cm_id) { + ret = -ENOMEM; + goto out; + } + conn_id = container_of(new_cm_id, struct rdma_id_private, id); + atomic_inc(&conn_id->dev_remove); + conn_id->state = CMA_CONNECT; + + dev = ip_dev_find(iw_event->local_addr.sin_addr.s_addr); + if (!dev) { + ret = -EADDRNOTAVAIL; + cma_release_remove(conn_id); + rdma_destroy_id(new_cm_id); + goto out; + } + ret = rdma_copy_addr(&conn_id->id.route.addr.dev_addr, dev, NULL); + if (ret) { + cma_release_remove(conn_id); + rdma_destroy_id(new_cm_id); + goto out; + } + + ret = cma_acquire_dev(conn_id); + if (ret) { + cma_release_remove(conn_id); + rdma_destroy_id(new_cm_id); + goto out; + } + + conn_id->cm_id.iw = cm_id; + cm_id->context = conn_id; + cm_id->cm_handler = cma_iw_handler; + + sin = (struct sockaddr_in *) &new_cm_id->route.addr.src_addr; + *sin = iw_event->local_addr; + sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; + *sin = iw_event->remote_addr; + + ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, + iw_event->private_data, + iw_event->private_data_len); + if (ret) { + /* User wants to destroy the CM ID */ + conn_id->cm_id.iw = NULL; + cma_exch(conn_id, CMA_DESTROYING); + cma_release_remove(conn_id); + rdma_destroy_id(&conn_id->id); + } + +out: + if (dev) + dev_put(dev); + cma_release_remove(listen_id); + return ret; +} + static int cma_ib_listen(struct rdma_id_private *id_priv) { struct ib_cm_compare_data compare_data; @@ -1011,6 +1160,30 @@ static int cma_ib_listen(struct rdma_id_ return ret; } +static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog) +{ + int ret; + struct sockaddr_in *sin; + + id_priv->cm_id.iw = iw_create_cm_id(id_priv->id.device, + iw_conn_req_handler, + id_priv); + if (IS_ERR(id_priv->cm_id.iw)) + return PTR_ERR(id_priv->cm_id.iw); + + sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; + id_priv->cm_id.iw->local_addr = *sin; + + ret = iw_cm_listen(id_priv->cm_id.iw, backlog); + + if (ret) { + iw_destroy_cm_id(id_priv->cm_id.iw); + id_priv->cm_id.iw = NULL; + } + + return ret; +} + static int cma_listen_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) { @@ -1087,12 +1260,17 @@ int rdma_listen(struct rdma_cm_id *id, i id_priv->backlog = backlog; if (id->device) { - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = cma_ib_listen(id_priv); if (ret) goto err; break; + case RDMA_TRANSPORT_IWARP: + ret = cma_iw_listen(id_priv, backlog); + if (ret) + goto err; + break; default: ret = -ENOSYS; goto err; @@ -1231,6 +1409,23 @@ err: } EXPORT_SYMBOL(rdma_set_ib_paths); +static int cma_resolve_iw_route(struct rdma_id_private *id_priv, int timeout_ms) +{ + struct cma_work *work; + + work = kzalloc(sizeof *work, GFP_KERNEL); + if (!work) + return -ENOMEM; + + work->id = id_priv; + INIT_WORK(&work->work, cma_work_handler, work); + work->old_state = CMA_ROUTE_QUERY; + work->new_state = CMA_ROUTE_RESOLVED; + work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; + queue_work(cma_wq, &work->work); + return 0; +} + int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) { struct rdma_id_private *id_priv; @@ -1241,10 +1436,13 @@ int rdma_resolve_route(struct rdma_cm_id return -EINVAL; atomic_inc(&id_priv->refcount); - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = cma_resolve_ib_route(id_priv, timeout_ms); break; + case RDMA_TRANSPORT_IWARP: + ret = cma_resolve_iw_route(id_priv, timeout_ms); + break; default: ret = -ENOSYS; break; @@ -1649,6 +1847,47 @@ out: return ret; } +static int cma_connect_iw(struct rdma_id_private *id_priv, + struct rdma_conn_param *conn_param) +{ + struct iw_cm_id *cm_id; + struct sockaddr_in* sin; + int ret; + struct iw_cm_conn_param iw_param; + + cm_id = iw_create_cm_id(id_priv->id.device, cma_iw_handler, id_priv); + if (IS_ERR(cm_id)) { + ret = PTR_ERR(cm_id); + goto out; + } + + id_priv->cm_id.iw = cm_id; + + sin = (struct sockaddr_in*) &id_priv->id.route.addr.src_addr; + cm_id->local_addr = *sin; + + sin = (struct sockaddr_in*) &id_priv->id.route.addr.dst_addr; + cm_id->remote_addr = *sin; + + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) { + iw_destroy_cm_id(cm_id); + return ret; + } + + iw_param.ord = conn_param->initiator_depth; + iw_param.ird = conn_param->responder_resources; + iw_param.private_data = conn_param->private_data; + iw_param.private_data_len = conn_param->private_data_len; + if (id_priv->id.qp) + iw_param.qpn = id_priv->qp_num; + else + iw_param.qpn = conn_param->qp_num; + ret = iw_cm_connect(cm_id, &iw_param); +out: + return ret; +} + int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; @@ -1664,10 +1903,13 @@ int rdma_connect(struct rdma_cm_id *id, id_priv->srq = conn_param->srq; } - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = cma_connect_ib(id_priv, conn_param); break; + case RDMA_TRANSPORT_IWARP: + ret = cma_connect_iw(id_priv, conn_param); + break; default: ret = -ENOSYS; break; @@ -1708,6 +1950,28 @@ static int cma_accept_ib(struct rdma_id_ return ib_send_cm_rep(id_priv->cm_id.ib, &rep); } +static int cma_accept_iw(struct rdma_id_private *id_priv, + struct rdma_conn_param *conn_param) +{ + struct iw_cm_conn_param iw_param; + int ret; + + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + return ret; + + iw_param.ord = conn_param->initiator_depth; + iw_param.ird = conn_param->responder_resources; + iw_param.private_data = conn_param->private_data; + iw_param.private_data_len = conn_param->private_data_len; + if (id_priv->id.qp) { + iw_param.qpn = id_priv->qp_num; + } else + iw_param.qpn = conn_param->qp_num; + + return iw_cm_accept(id_priv->cm_id.iw, &iw_param); +} + int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; @@ -1723,13 +1987,16 @@ int rdma_accept(struct rdma_cm_id *id, s id_priv->srq = conn_param->srq; } - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: if (conn_param) ret = cma_accept_ib(id_priv, conn_param); else ret = cma_rep_recv(id_priv); break; + case RDMA_TRANSPORT_IWARP: + ret = cma_accept_iw(id_priv, conn_param); + break; default: ret = -ENOSYS; break; @@ -1756,12 +2023,16 @@ int rdma_reject(struct rdma_cm_id *id, c if (!cma_comp(id_priv, CMA_CONNECT)) return -EINVAL; - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); break; + case RDMA_TRANSPORT_IWARP: + ret = iw_cm_reject(id_priv->cm_id.iw, + private_data, private_data_len); + break; default: ret = -ENOSYS; break; @@ -1780,16 +2051,18 @@ int rdma_disconnect(struct rdma_cm_id *i !cma_comp(id_priv, CMA_DISCONNECT)) return -EINVAL; - ret = cma_modify_qp_err(id); - if (ret) - goto out; - - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: + ret = cma_modify_qp_err(id); + if (ret) + goto out; /* Initiate or respond to a disconnect. */ if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0)) ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0); break; + case RDMA_TRANSPORT_IWARP: + ret = iw_cm_disconnect(id_priv->cm_id.iw, 0); + break; default: break; } diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index b2f3cb9..d978fbe 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -505,7 +505,7 @@ int ib_query_port(struct ib_device *devi u8 port_num, struct ib_port_attr *port_attr) { - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { if (port_num) return -EINVAL; } else if (port_num < 1 || port_num > device->phys_port_cnt) @@ -580,7 +580,7 @@ int ib_modify_port(struct ib_device *dev u8 port_num, int port_modify_mask, struct ib_port_modify *port_modify) { - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { if (port_num) return -EINVAL; } else if (port_num < 1 || port_num > device->phys_port_cnt) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 32d3028..082f03c 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2876,7 +2876,10 @@ static void ib_mad_init_device(struct ib { int start, end, i; - if (device->node_type == IB_NODE_SWITCH) { + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) { start = 0; end = 0; } else { @@ -2923,7 +2926,7 @@ static void ib_mad_remove_device(struct { int i, num_ports, cur_port; - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { num_ports = 1; cur_port = 0; } else { diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index df762ba..ca8760a 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -919,7 +919,10 @@ static void ib_sa_add_one(struct ib_devi struct ib_sa_device *sa_dev; int s, e, i; - if (device->node_type == IB_NODE_SWITCH) + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) s = e = 0; else { s = 1; diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c index 35852e7..54b81e1 100644 --- a/drivers/infiniband/core/smi.c +++ b/drivers/infiniband/core/smi.c @@ -64,7 +64,7 @@ int smi_handle_dr_smp_send(struct ib_smp /* C14-9:2 */ if (hop_ptr && hop_ptr < hop_cnt) { - if (node_type != IB_NODE_SWITCH) + if (node_type != RDMA_NODE_IB_SWITCH) return 0; /* smp->return_path set when received */ @@ -77,7 +77,7 @@ int smi_handle_dr_smp_send(struct ib_smp if (hop_ptr == hop_cnt) { /* smp->return_path set when received */ smp->hop_ptr++; - return (node_type == IB_NODE_SWITCH || + return (node_type == RDMA_NODE_IB_SWITCH || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -95,7 +95,7 @@ int smi_handle_dr_smp_send(struct ib_smp /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (node_type != IB_NODE_SWITCH) + if (node_type != RDMA_NODE_IB_SWITCH) return 0; smp->hop_ptr--; @@ -107,7 +107,7 @@ int smi_handle_dr_smp_send(struct ib_smp if (hop_ptr == 1) { smp->hop_ptr--; /* C14-13:3 -- SMPs destined for SM shouldn't be here */ - return (node_type == IB_NODE_SWITCH || + return (node_type == RDMA_NODE_IB_SWITCH || smp->dr_slid == IB_LID_PERMISSIVE); } @@ -142,7 +142,7 @@ int smi_handle_dr_smp_recv(struct ib_smp /* C14-9:2 -- intermediate hop */ if (hop_ptr && hop_ptr < hop_cnt) { - if (node_type != IB_NODE_SWITCH) + if (node_type != RDMA_NODE_IB_SWITCH) return 0; smp->return_path[hop_ptr] = port_num; @@ -156,7 +156,7 @@ int smi_handle_dr_smp_recv(struct ib_smp smp->return_path[hop_ptr] = port_num; /* smp->hop_ptr updated when sending */ - return (node_type == IB_NODE_SWITCH || + return (node_type == RDMA_NODE_IB_SWITCH || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -175,7 +175,7 @@ int smi_handle_dr_smp_recv(struct ib_smp /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (node_type != IB_NODE_SWITCH) + if (node_type != RDMA_NODE_IB_SWITCH) return 0; /* smp->hop_ptr updated when sending */ @@ -190,7 +190,7 @@ int smi_handle_dr_smp_recv(struct ib_smp return 1; } /* smp->hop_ptr updated when sending */ - return (node_type == IB_NODE_SWITCH); + return (node_type == RDMA_NODE_IB_SWITCH); } /* C14-13:4 -- hop_ptr = 0 -> give to SM */ diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index fb66605..709323c 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -589,10 +589,11 @@ static ssize_t show_node_type(struct cla return -ENODEV; switch (dev->node_type) { - case IB_NODE_CA: return sprintf(buf, "%d: CA\n", dev->node_type); - case IB_NODE_SWITCH: return sprintf(buf, "%d: switch\n", dev->node_type); - case IB_NODE_ROUTER: return sprintf(buf, "%d: router\n", dev->node_type); - default: return sprintf(buf, "%d: \n", dev->node_type); + case RDMA_NODE_IB_CA: return sprintf(buf, "%d: CA\n", dev->node_type); + case RDMA_NODE_RNIC: return sprintf(buf, "%d: RNIC\n", dev->node_type); + case RDMA_NODE_IB_SWITCH: return sprintf(buf, "%d: switch\n", dev->node_type); + case RDMA_NODE_IB_ROUTER: return sprintf(buf, "%d: router\n", dev->node_type); + default: return sprintf(buf, "%d: \n", dev->node_type); } } @@ -708,7 +709,7 @@ int ib_device_register_sysfs(struct ib_d if (ret) goto err_put; - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { ret = add_port(device, 0); if (ret) goto err_put; diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index e74c964..ad4f4d5 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -1247,7 +1247,8 @@ static void ib_ucm_add_one(struct ib_dev { struct ib_ucm_device *ucm_dev; - if (!device->alloc_ucontext) + if (!device->alloc_ucontext || + rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) return; ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL); diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 8a455ae..807fbd6 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1032,7 +1032,10 @@ static void ib_umad_add_one(struct ib_de struct ib_umad_device *umad_dev; int s, e, i; - if (device->node_type == IB_NODE_SWITCH) + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) s = e = 0; else { s = 1; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 06f98e9..8b5dd36 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -79,6 +79,23 @@ enum ib_rate mult_to_ib_rate(int mult) } EXPORT_SYMBOL(mult_to_ib_rate); +enum rdma_transport_type +rdma_node_get_transport(enum rdma_node_type node_type) +{ + switch (node_type) { + case RDMA_NODE_IB_CA: + case RDMA_NODE_IB_SWITCH: + case RDMA_NODE_IB_ROUTER: + return RDMA_TRANSPORT_IB; + case RDMA_NODE_RNIC: + return RDMA_TRANSPORT_IWARP; + default: + BUG(); + return 0; + } +} +EXPORT_SYMBOL(rdma_node_get_transport); + /* Protection domains */ struct ib_pd *ib_alloc_pd(struct ib_device *device) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index a2a76c3..159b0be 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -268,7 +268,7 @@ int ehca_register_device(struct ehca_shc (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST) | (1ull << IB_USER_VERBS_CMD_DETACH_MCAST); - shca->ib_device.node_type = IB_NODE_CA; + shca->ib_device.node_type = RDMA_NODE_IB_CA; shca->ib_device.phys_port_cnt = shca->num_ports; shca->ib_device.dma_device = &shca->ibmebus_dev->ofdev.dev; shca->ib_device.query_device = ehca_query_device; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index fbda773..b8381c5 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -1538,7 +1538,7 @@ int ipath_register_ib_device(struct ipat (1ull << IB_USER_VERBS_CMD_QUERY_SRQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ) | (1ull << IB_USER_VERBS_CMD_POST_SRQ_RECV); - dev->node_type = IB_NODE_CA; + dev->node_type = RDMA_NODE_IB_CA; dev->phys_port_cnt = 1; dev->dma_device = &dd->pcidev->dev; dev->class_dev.dev = dev->dma_device; diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 265b1d1..981fe2e 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -1288,7 +1288,7 @@ int mthca_register_device(struct mthca_d (1ull << IB_USER_VERBS_CMD_DESTROY_QP) | (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST) | (1ull << IB_USER_VERBS_CMD_DETACH_MCAST); - dev->ib_dev.node_type = IB_NODE_CA; + dev->ib_dev.node_type = RDMA_NODE_IB_CA; dev->ib_dev.phys_port_cnt = dev->limits.num_ports; dev->ib_dev.dma_device = &dev->pdev->dev; dev->ib_dev.class_dev.dev = &dev->pdev->dev; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 36d7698..e9a7659 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1111,13 +1111,16 @@ static void ipoib_add_one(struct ib_devi struct ipoib_dev_priv *priv; int s, e, p; + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL); if (!dev_list) return; INIT_LIST_HEAD(dev_list); - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { s = 0; e = 0; } else { @@ -1141,6 +1144,9 @@ static void ipoib_remove_one(struct ib_d struct ipoib_dev_priv *priv, *tmp; struct list_head *dev_list; + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + dev_list = ib_get_client_data(device, &ipoib_client); list_for_each_entry_safe(priv, tmp, dev_list, list) { diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 4f1775d..297c9ff 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1913,7 +1913,7 @@ static void srp_add_one(struct ib_device if (IS_ERR(srp_dev->fmr_pool)) srp_dev->fmr_pool = NULL; - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { s = 0; e = 0; } else { diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index 0ff6739..81b6230 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -40,7 +40,7 @@ struct rdma_dev_addr { unsigned char src_dev_addr[MAX_ADDR_LEN]; unsigned char dst_dev_addr[MAX_ADDR_LEN]; unsigned char broadcast[MAX_ADDR_LEN]; - enum ib_node_type dev_type; + enum rdma_node_type dev_type; }; /** @@ -72,6 +72,9 @@ int rdma_resolve_ip(struct sockaddr *src void rdma_addr_cancel(struct rdma_dev_addr *addr); +int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, + const unsigned char *dst_dev_addr); + static inline int ip_addr_size(struct sockaddr *addr) { return addr->sa_family == AF_INET6 ? @@ -113,4 +116,16 @@ static inline void ib_addr_set_dgid(stru memcpy(dev_addr->dst_dev_addr + 4, gid, sizeof *gid); } +static inline void iw_addr_get_sgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) +{ + memcpy(gid, dev_addr->src_dev_addr, sizeof *gid); +} + +static inline void iw_addr_get_dgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) +{ + memcpy(gid, dev_addr->dst_dev_addr, sizeof *gid); +} + #endif /* IB_ADDR_H */ diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 61eed39..8eacc35 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -56,12 +56,22 @@ union ib_gid { } global; }; -enum ib_node_type { - IB_NODE_CA = 1, - IB_NODE_SWITCH, - IB_NODE_ROUTER +enum rdma_node_type { + /* IB values map to NodeInfo:NodeType. */ + RDMA_NODE_IB_CA = 1, + RDMA_NODE_IB_SWITCH, + RDMA_NODE_IB_ROUTER, + RDMA_NODE_RNIC }; +enum rdma_transport_type { + RDMA_TRANSPORT_IB, + RDMA_TRANSPORT_IWARP +}; + +enum rdma_transport_type +rdma_node_get_transport(enum rdma_node_type node_type) __attribute_const__; + enum ib_device_cap_flags { IB_DEVICE_RESIZE_MAX_WR = 1, IB_DEVICE_BAD_PKEY_CNTR = (1<<1), @@ -78,6 +88,9 @@ enum ib_device_cap_flags { IB_DEVICE_RC_RNR_NAK_GEN = (1<<12), IB_DEVICE_SRQ_RESIZE = (1<<13), IB_DEVICE_N_NOTIFY_CQ = (1<<14), + IB_DEVICE_ZERO_STAG = (1<<15), + IB_DEVICE_SEND_W_INV = (1<<16), + IB_DEVICE_MEM_WINDOW = (1<<17) }; enum ib_atomic_cap { @@ -835,6 +848,8 @@ struct ib_cache { u8 *lmc_cache; }; +struct iw_cm_verbs; + struct ib_device { struct device *dma_device; @@ -851,6 +866,8 @@ struct ib_device { u32 flags; + struct iw_cm_verbs *iwcm; + int (*query_device)(struct ib_device *device, struct ib_device_attr *device_attr); int (*query_port)(struct ib_device *device, -- 1.4.1 From rolandd at cisco.com Fri Sep 8 14:55:40 2006 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 8 Sep 2006 14:55:40 -0700 Subject: [openib-general] [PATCH 0/2] RDMA: merge iWARP support Message-ID: <2006981455.F7Cau4RN2pBSAVMu@cisco.com> Here is a series of patches that adds iWARP (RDMA over IP) support to the InfiniBand support already in the kernel. Since the iWARP RDMA model is quite close to the InfiniBand model, the changes are not that large. The biggest difference is in how connections are established, since iWARP connections are TCP connections, while IB uses a different (native IB) mechanism for establishing a connection. The first patch in the series adds an iWARP connection manager, which handles establishing and tearing down connections for iWARP devices. The second patch is all the small changes required to hook in the connection manager and make the rest of the IB stuff also work with iWARP devices. The third patch (compressed due to its size) adds the first driver for an iWARP device, the Ammasso 1100 1 Gb/sec RNIC. My current plan is to merge this stuff for 2.6.19. Please let me know if you see anything (major or minor) that needs to be fixed up. Thanks, Roland From rdreier at cisco.com Fri Sep 8 14:58:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 08 Sep 2006 14:58:08 -0700 Subject: [openib-general] [PATCH 3/2] RDMA: Ammasso 1100 RNIC driver In-Reply-To: <2006981455.5zPhTm8jRQnxTde2@cisco.com> (Roland Dreier's message of "Fri, 8 Sep 2006 14:55:41 -0700") References: <2006981455.5zPhTm8jRQnxTde2@cisco.com> Message-ID: Here's the compressed patch adding the amso1100 driver. You can also find this in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git in the for-2.6.19 branch. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-RDMA-amso1100-Add-driver-for-Ammasso-1100-RNIC.txt.bz2 Type: application/x-bzip Size: 42201 bytes Desc: not available URL: From Terry.Yoder at qlogic.com Fri Sep 8 16:48:14 2006 From: Terry.Yoder at qlogic.com (Terry Yoder) Date: Fri, 8 Sep 2006 16:48:14 -0700 Subject: [openib-general] svn iwarp and OFED Message-ID: Is the svn iwarp branch in sync with OFED 1.1 rc3? Terry -------------- next part -------------- An HTML attachment was scrubbed... URL: From shahanse at cisco.com Fri Sep 8 17:28:45 2006 From: shahanse at cisco.com (Shawn Hansen (shahanse)) Date: Fri, 8 Sep 2006 17:28:45 -0700 Subject: [openib-general] Goodbye and Transition Message-ID: All, FYI: I've decided to relocate my family to Seattle, and will be leaving Cisco. I plan to join Microsoft's Server and Tools division at the end of this month. I would like to recommend Jamie Riotto, Senior Director of Engineering, as my EWG replacement. Jamie is responsible for all engineering for Cisco's Server Networking and Virtualization Business Unit, including Cisco's host driver and RDMA development efforts. Please stay in touch, and I wish the team the best. Regards, --Shawn ---------------------------- Shawn Hansen Director, Product Management Cisco Systems From swise at opengridcomputing.com Sat Sep 9 05:44:54 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 9 Sep 2006 07:44:54 -0500 Subject: [openib-general] svn iwarp and OFED References: Message-ID: <003a01c6d40d$c0a96c70$020010ac@haggard> No. It is at trunk revision 7626. Merged 6/2/2006 under revision 7631. Steve. ----- Original Message ----- From: "Terry Yoder" To: Sent: Friday, September 08, 2006 6:48 PM Subject: [openib-general] svn iwarp and OFED Is the svn iwarp branch in sync with OFED 1.1 rc3? Terry -------------------------------------------------------------------------------- > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From kliteyn at dev.mellanox.co.il Sat Sep 9 23:35:54 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 10 Sep 2006 09:35:54 +0300 Subject: [openib-general] [PATCH] osm: OSM bug fix with --run-once option Message-ID: <1157870154.29270.42.camel@kliteynik.yok.mtl.com> Hi Hal This patch fixes the bug that was occurring when OSM was running with --run-once option (-o) and the SM port was down. In that case, OSM would be stuck in cond_wait forever (or until the port will become active), and could not be terminated, other than by SIGKILL. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: opensm/main.c =================================================================== --- opensm/main.c (revision 9354) +++ opensm/main.c (working copy) @@ -908,9 +908,13 @@ main( if( run_once_flag == TRUE ) { - status = osm_opensm_wait_for_subnet_up( - &osm, EVENT_NO_TIMEOUT, TRUE ); - osm_exit_flag = 1; + while (!osm_exit_flag) + { + status = osm_opensm_wait_for_subnet_up( + &osm, osm.subn.opt.sweep_interval * 1000000, TRUE ); + if (!status) + osm_exit_flag = 1; + } } else { From dotanb at dev.mellanox.co.il Sat Sep 9 23:43:56 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 10 Sep 2006 09:43:56 +0300 Subject: [openib-general] HCAs with and without memory In-Reply-To: References: Message-ID: <4503B42C.60405@dev.mellanox.co.il> Hi john. john t wrote: > Hi OpenIB group, > > What is the difference between HCAs with memory and without memory. > How is the on-board memory used by HCAs? Is it that data is first > copied into this memory and then into physical memory? > > Regards, > John T. If you are asking about Mellanox HCAs i can answer you: The difference is the technology which those HCAs are using: The HCAs without the attached memory are using the memfree technology. The main difference between the 2 HCAs is where the context of the various resources is located: in the host memory or in the attached memory. The data itself (during data movement) is not stored in this memory at any point in the attached memory. Dotan From dotanb at dev.mellanox.co.il Sat Sep 9 23:55:31 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 10 Sep 2006 09:55:31 +0300 Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work In-Reply-To: References: Message-ID: <4503B6E3.3070506@dev.mellanox.co.il> Bub Thomas wrote: > Dortan Barak wrote: > If you are using RC QP: > the reason for not getting any completion in the CQ is that > > Did you post any RR (Receive Request) at the listener side? > > > Dotan, > with the cmpost.c example I now get a cm connection even with another > machine. > However I don't get the cq event, on the sender side, when the > IBV_WR_SEND is done. Is this correct? Is this what you are saying below? > If it is correct this is different from gen1 drivers where I got a > VAPI_SUCCESS cq event. Is there a way to get this back? > > On the receiver side I get an cq event for the receive request. > > Thanks > Thomas > > > What do you mean that you don't get the cq event? i assume that you are talking about the completions: in the receiver side there is always a completion. in the sender side there there may be a completion (depend on the QP / WR configuration): if you want to have a completion you other need to set the sq_sig_all in the QP creation (if you want that completions will be created for all of the post sends in this QP) or you need to set the IBV_SEND_SIGNALED in the send_flags in the WR that you are posting. Dotan From greg.lindahl at qlogic.com Sun Sep 10 00:24:17 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Sun, 10 Sep 2006 00:24:17 -0700 Subject: [openib-general] HCAs with and without memory In-Reply-To: References: Message-ID: <20060910072417.GC1252@greglaptop.hsd1.ca.comcast.net> On Fri, Sep 08, 2006 at 03:49:57PM +0530, john t wrote: > What is the difference between HCAs with memory and without memory. And to answer for QLogic InfiniPath HCAs, we don't sell HCAs with memory. We don't need it. There's actually a small amount of memory within the single chip that makes up our HCA, and that's all that's necessary. -- greg From mst at mellanox.co.il Sun Sep 10 00:58:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 10 Sep 2006 10:58:18 +0300 Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060901102639.55709.qmail@web36915.mail.mud.yahoo.com> References: <20060830045927.GB25478@mellanox.co.il> <20060901102639.55709.qmail@web36915.mail.mud.yahoo.com> Message-ID: <20060910075818.GV6928@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: why sdp connections cost so much memory > > OFED-1.1-rc3 has passed my tests. I have to adjust > Post buffer size to 0x4 and use your patch for me. > Can you make it fixed not to do these myself manually? > > zhu I plan to add the following patch to OFED. Could you please verify that it fixes the issue for you, without tweaking the ring size? Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/ulp/sdp/sdp_bcopy.c b/drivers/infiniband/ulp/sdp/sdp_bcopy.c index b30d2a0..4540fa4 100644 --- a/drivers/infiniband/ulp/sdp/sdp_bcopy.c +++ b/drivers/infiniband/ulp/sdp/sdp_bcopy.c @@ -37,6 +37,10 @@ #include #include #include "sdp.h" +static int rcvbuf_scale = 0x1; +module_param_named(rcvbuf_scale, rcvbuf_scale, int, 0644); +MODULE_PARM_DESC(srcvbuf_scale, "Receive buffer size scale factor."); + /* Like tcp_fin */ static void sdp_fin(struct sock *sk) { @@ -237,7 +241,7 @@ void sdp_post_recvs(struct sdp_sock *ssk while ((likely(ssk->rx_head - ssk->rx_tail < SDP_RX_SIZE) && (ssk->rx_head - ssk->rx_tail - SDP_MIN_BUFS) * SDP_MAX_SEND_SKB_FRAGS * PAGE_SIZE + rmem < - ssk->isk.sk.sk_rcvbuf * 0x10) || + ssk->isk.sk.sk_rcvbuf * rcvbuf_scale) || unlikely(ssk->rx_head - ssk->rx_tail < SDP_MIN_BUFS)) sdp_post_recv(ssk); } -- MST From erezz at voltaire.com Sun Sep 10 02:08:48 2006 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 10 Sep 2006 12:08:48 +0300 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: References: <200609071902.57379.toralf.foerster@gmx.de> <15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com> Message-ID: <4503D620.4000602@voltaire.com> Roland Dreier wrote: > Or> you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i > Or> think you are missing CONFIG_INET=m > > Seems like a bug in the iSER Kconfig -- it shouldn't be possible to > select iSER without everything it needs to compile. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > I wasn't able to reproduce this behavior. This is what I did: 1. I deleted my .config file. 2. make menuconfig 3. Selected Device drivers/InfiniBand support = y (if I don't do that, I cannot select iSER). 4. Selected ISCSI RDMA protocol = y I've attached the .config file. Note that CONFIG_INFINIBAND_ADDR_TRANS=y. This is because in drivers/infiniband/Kconfig it says: config INFINIBAND_ADDR_TRANS bool depends on INFINIBAND && INET default y Therefore, I don't understand how did this behavior happen. Can you reproduce it and send the list of steps? Erez -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: .config URL: From eli at dev.mellanox.co.il Sun Sep 10 02:30:28 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Sun, 10 Sep 2006 12:30:28 +0300 Subject: [openib-general] PXE + infiniband? In-Reply-To: <003e01c6d24d$f19caed0$8000a8c0@blorp> References: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com> <1157610538.30038.35.camel@localhost> <003e01c6d24d$f19caed0$8000a8c0@blorp> Message-ID: <1157880628.5386.13.camel@localhost> On Thu, 2006-09-07 at 08:19 +0100, Paul Baxter wrote: > > There is an implementation of PXE for Mellanox's HCAs that can be found > > here: http://sourceforge.net/forum/forum.php?forum_id=494529 > > Thanks for the tip > > I, too, am interested in this. > > Do you have a more direct link as I wandered around etherboot's project site > and couldn't find anything IB-specific. > > Paul Baxter Hi, Please use the following link http://kent.dl.sourceforge.net/sourceforge/etherboot/etherboot-5.4.2.tar.bz2 to download the package. Unpack the package and cd to the src dir. Use an x86 arch machine to build the binaries. The infiniband drivers are located at src/drivers/net/mlx_ipoib/ where you can find a readme file in the doc directory. To build. cd src make bin/MT23108.zrom // for MT230108 make bin/MT25208.zrom make bin/MT25218.zrom This covers all Mellanox HCAs. Please let me know if you need more assistance. From mst at mellanox.co.il Sun Sep 10 02:37:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 10 Sep 2006 12:37:47 +0300 Subject: [openib-general] [PATCH] ehca for OFED 1.1-rc4 In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7892@mtlexch01.mtl.com> Message-ID: <20060910093747.GA11625@mellanox.co.il> Quoting r. Hoang-Nam Nguyen : Subject: [PATCH] ehca for OFED 1.1-rc4 > Hello Tziporet! > Below is a patch of ehca against the ofed git tree branch ehca-branch in order to upgrade it to the same code level of Roland's git tree branch for-2.6.19, which has been posted for a while. The main code changes are: > - Replace the "huge" EDEB macro by a simpler wrapper based on dev_err/dbg > - Remove superfluous variables initialization and arguments checking > - Replace struct ehca_module by static member variables in appropriate files, where they are accessed > - Rename module name to ib_ehca.ko > Thanks! > Nam Nguyen Unfortunately, the patch doesn't apply against either ofed_1_1 or ehca_branch. Please check that it does, before re-posting. One additional request: your message included a copy of patch in both plain text and html format. Please post plain text only. Thanks, -- MST From vlad at dev.mellanox.co.il Sun Sep 10 04:00:18 2006 From: vlad at dev.mellanox.co.il (vlad at dev.mellanox.co.il) Date: Sun, 10 Sep 2006 14:00:18 +0300 (IDT) Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status In-Reply-To: References: Message-ID: <14781.194.90.237.34.1157886018.squirrel@dev.mellanox.co.il> Hello Nam Nguyen, See my comments regarding OFED-1.1-rc3 below. Please check also libehca compilation issue: http://openib.org/bugzilla/show_bug.cgi?id=228 Regards, Vladimir > Hello Tziporet! > First sorry for this late response regarding ehca build test in OFED 1.1 > rc3. > > 1) The userspace lib dir for libehca contains only a few c-files, but no > header files. > On svn dir branches/1.1/src/userspace/libehca/src/ I saw all files needed. > Please correct > this for rc4! > Will you pick new version of libehca from that dir? > There was a missing EXTRA_DIST parameter in the libehca/Makefile.am. I will fix it in the trunk and branches/1.1. > 2) When I used the install.sh script to install the software packages or > compile > them on ppc64, kernel 2.6.18-rc5/6 I got the following error messages: > > gcc -m64 -Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1 > /drivers/infiniband/core/.ib_addr.mod.o.d -nos > M/BUILD/openib-1.1/include -I/var/tmp/OFEDRPM/BUILD/openib-1.1 > /drivers/infiniband/include -Iinclu > oft-float -pipe -mminimal-toc -mtraceback=none -mcall-aixdesc > -mtune=power4 -mno-altivec -funit-at > lude -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include > -I/var/tmp/OFEDRPM/BUILD/openi > g -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(ib_addr.mod)" > -D"KBUILD_MODNAME=KBUILD_STR( > o /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/ib_addr.mod.c > In file included from include/asm/system.h:9, > from include/linux/spinlock.h:56, > from include/linux/capability.h:45, > from include/linux/sched.h:44, > from include/linux/module.h:9, > from /var/tmp/OFEDRPM/BUILD/openib-1.1 > /drivers/infiniband/core/ib_addr.mod.c:1: > include/asm/hw_irq.h: In function `local_irq_disable': > include/asm/hw_irq.h:51: warning: implicit declaration of function > `__mtmsrd' > In file included from include/asm/current.h:15, > from include/linux/capability.h:46, > from include/linux/sched.h:44, > from include/linux/module.h:9, > from /var/tmp/OFEDRPM/BUILD/openib-1.1 > /drivers/infiniband/core/ib_addr.mod.c:1: > include/asm/paca.h: At top level: > include/asm/paca.h:84: error: `SLB_CACHE_ENTRIES' undeclared here (not in > a > function) > In file included from include/linux/sched.h:49, > from include/linux/module.h:9, > from /var/tmp/OFEDRPM/BUILD/openib-1.1 > /drivers/infiniband/core/ib_addr.mod.c:1: > include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined > include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined > > If I use the kernel Makefile in /usr/src/linux-2.6.18-rc5 to compile e.g. > make -C /usr/src/linux-2.6.18-rc5 > SUBDIRS=/var/tmp/OFEDRPM/BUILD/openib-1.1 > /drivers/infiniband/core > then it works fine. We found out that the top-level kernel Makefile does > the following settings > > LINUXINCLUDE := -Iinclude \ > $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ > -include include/linux/autoconf.h > CPPFLAGS := -D__KERNEL__ $(LINUXINCLUDE) > > that include autoconf.h with all configured kernel configs like > CONFIG_PPC64 etc. And obviously those > config defines are lost if one uses > /usr/src/linux-2.6.18-rc5/scripts/Makefile.build as OFED install.sh > does. I'm wondering if anyone else also sees this problem on other > architectures? > Is there any reasons not to use the top-level kernel Makefile? > We are using top-level kernel Makefile. It was an issue in the OFED-1.1-rc3 with 2.6.18 kernels. It fixed in OFED-1.1-rc4. > Thanks! > Nam Nguyen > > openib-general-bounces at openib.org wrote on 07.09.2006 22:01:30: > >> Hi, >> OFED 1.1 RC4 will be published on Monday 11-Sep. >> We currently work on several showstoppers: >> 1. 223: mthca.so not properly linked to libibverbs – Vlad & Jack >> 2. 221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118 - Roland >> 3. 219: OFED 1.1rc3 contains prerelease unstable libibverbs code – >> Vlad & > Jack >> >> Thus final release date will be delayed to end of next week >> >> >> Tziporet Koren >> Software Director >> Mellanox Technologies >> mailto: tziporet at mellanox.co.il >> Tel +972-4-9097200, ext 380 >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From mst at mellanox.co.il Sun Sep 10 04:11:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 10 Sep 2006 14:11:45 +0300 Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases. In-Reply-To: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com> References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com> Message-ID: <20060910111145.GA12111@mellanox.co.il> Quoting r. Krishna Kumar : > Subject: [PATCH] cma_connect_ib leaks memory in failure cases. > > cma_connect_ib leaks an struct ib_cm_id* in failure cases. > > Signed-off-by: Krishna Kumar This one looks like it might be good for 2.6.18. Sean? -- MST From toralf.foerster at gmx.de Sun Sep 10 04:43:00 2006 From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=) Date: Sun, 10 Sep 2006 13:43:00 +0200 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <4503D620.4000602@voltaire.com> References: <200609071902.57379.toralf.foerster@gmx.de> <4503D620.4000602@voltaire.com> Message-ID: <200609101343.02740.toralf.foerster@gmx.de> I copied the config file to .config, made then a "make oldconfig && make" against current sources 2.6.18-rc6-git3. BTW, I attach another .config where the similar problem occured Am Sunday 10 September 2006 11:08 schrieb Erez Zilber: > Roland Dreier wrote: > > Or> you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i > > Or> think you are missing CONFIG_INET=m > > > > Seems like a bug in the iSER Kconfig -- it shouldn't be possible to > > select iSER without everything it needs to compile. > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > I wasn't able to reproduce this behavior. This is what I did: > > 1. I deleted my .config file. > 2. make menuconfig > 3. Selected Device drivers/InfiniBand support = y (if I don't do > that, I cannot select iSER). > 4. Selected ISCSI RDMA protocol = y > > I've attached the .config file. Note that > CONFIG_INFINIBAND_ADDR_TRANS=y. This is because in > drivers/infiniband/Kconfig it says: > > config INFINIBAND_ADDR_TRANS > bool > depends on INFINIBAND && INET > default y > > Therefore, I don't understand how did this behavior happen. Can you > reproduce it and send the list of steps? > > Erez > -- MfG/Sincerely Toralf Förster -------------- next part -------------- # # Automatically generated make config: don't edit # Linux kernel version: 2.6.18-rc6-git3 # Sun Sep 10 13:26:49 2006 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # # CONFIG_EXPERIMENTAL is not set CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_SYSCTL=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_RELAY=y CONFIG_INITRAMFS_SOURCE="" CONFIG_UID16=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_RT_MUTEXES=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # # CONFIG_MODULES is not set # # Block layer # CONFIG_LBD=y CONFIG_BLK_DEV_IO_TRACE=y CONFIG_LSF=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_HPET_TIMER=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set # CONFIG_X86_UP_APIC is not set CONFIG_X86_MCE=y # CONFIG_X86_MCE_NONFATAL is not set CONFIG_VM86=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set CONFIG_X86_REBOOTFIXUPS=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y # # Firmware Drivers # # CONFIG_EDD is not set # CONFIG_EFI_VARS is not set # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set CONFIG_NOHIGHMEM=y # CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM64G is not set CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_RESOURCES_64BIT is not set CONFIG_MATH_EMULATION=y # CONFIG_MTRR is not set CONFIG_EFI=y CONFIG_BOOT_IOREMAP=y CONFIG_REGPARM=y CONFIG_SECCOMP=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_PHYSICAL_START=0x100000 # CONFIG_COMPAT_VDSO is not set # # Power management options (ACPI, APM) # CONFIG_PM=y CONFIG_PM_LEGACY=y # CONFIG_PM_DEBUG is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y # CONFIG_ACPI_SLEEP_PROC_SLEEP is not set # CONFIG_ACPI_AC is not set CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_VIDEO=y CONFIG_ACPI_FAN=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set CONFIG_ACPI_BLACKLIST_YEAR=0 CONFIG_ACPI_DEBUG=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y # # APM (Advanced Power Management) BIOS Support # # CONFIG_APM is not set # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # # Bus options (PCI, PCMCIA, EISA, MCA, ISA) # CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set CONFIG_PCI_GOMMCONFIG=y # CONFIG_PCI_GODIRECT is not set # CONFIG_PCI_GOANY is not set CONFIG_PCI_MMCONFIG=y # CONFIG_PCIEPORTBUS is not set CONFIG_PCI_DEBUG=y CONFIG_ISA_DMA_API=y # CONFIG_ISA is not set # CONFIG_MCA is not set CONFIG_SCx200=y CONFIG_SCx200HR_TIMER=y # # PCCARD (PCMCIA/CardBus) support # CONFIG_PCCARD=y CONFIG_PCMCIA_DEBUG=y CONFIG_PCMCIA=y CONFIG_PCMCIA_IOCTL=y # CONFIG_CARDBUS is not set # # PC-card bridges # # CONFIG_YENTA is not set CONFIG_PD6729=y # CONFIG_I82092 is not set CONFIG_PCCARD_NONSTATIC=y # # PCI Hotplug Support # # # Executable file formats # # CONFIG_BINFMT_ELF is not set CONFIG_BINFMT_AOUT=y # CONFIG_BINFMT_MISC is not set # # Networking # # CONFIG_NET is not set # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y CONFIG_DEBUG_DRIVER=y # CONFIG_SYS_HYPERVISOR is not set # # Connector - unified userspace <-> kernelspace linker # # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # CONFIG_PARPORT=y CONFIG_PARPORT_PC=y CONFIG_PARPORT_SERIAL=y # CONFIG_PARPORT_PC_PCMCIA is not set CONFIG_PARPORT_NOT_PC=y # CONFIG_PARPORT_GSC is not set CONFIG_PARPORT_AX88796=y CONFIG_PARPORT_1284=y # # Plug and Play support # # CONFIG_PNP is not set # # Block devices # CONFIG_BLK_DEV_FD=y CONFIG_PARIDE=y CONFIG_PARIDE_PARPORT=y # # Parallel IDE high-level drivers # # CONFIG_PARIDE_PD is not set # CONFIG_PARIDE_PCD is not set # CONFIG_PARIDE_PF is not set # CONFIG_PARIDE_PT is not set CONFIG_PARIDE_PG=y # # Parallel IDE protocol modules # CONFIG_PARIDE_ATEN=y CONFIG_PARIDE_BPCK=y CONFIG_PARIDE_BPCK6=y CONFIG_PARIDE_COMM=y CONFIG_PARIDE_DSTR=y # CONFIG_PARIDE_FIT2 is not set # CONFIG_PARIDE_FIT3 is not set # CONFIG_PARIDE_EPAT is not set CONFIG_PARIDE_EPIA=y CONFIG_PARIDE_FRIQ=y CONFIG_PARIDE_FRPW=y # CONFIG_PARIDE_KBIC is not set CONFIG_PARIDE_KTTI=y CONFIG_PARIDE_ON20=y # CONFIG_PARIDE_ON26 is not set CONFIG_BLK_CPQ_DA=y # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_COW_COMMON is not set # CONFIG_BLK_DEV_LOOP is not set CONFIG_BLK_DEV_SX8=y # CONFIG_BLK_DEV_RAM is not set # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CDROM_PKTCDVD is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set # CONFIG_BLK_DEV_IDECS is not set # CONFIG_BLK_DEV_IDECD is not set CONFIG_BLK_DEV_IDEFLOPPY=y # CONFIG_BLK_DEV_IDESCSI is not set CONFIG_IDE_TASK_IOCTL=y # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y # CONFIG_BLK_DEV_CMD640 is not set CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_OFFBOARD=y CONFIG_BLK_DEV_GENERIC=y CONFIG_BLK_DEV_RZ1000=y CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set CONFIG_BLK_DEV_AEC62XX=y # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set CONFIG_BLK_DEV_ATIIXP=y CONFIG_BLK_DEV_CMD64X=y # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_CS5535 is not set # CONFIG_BLK_DEV_HPT34X is not set CONFIG_BLK_DEV_HPT366=y CONFIG_BLK_DEV_SC1200=y CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_IT821X is not set # CONFIG_BLK_DEV_NS87415 is not set CONFIG_BLK_DEV_PDC202XX_OLD=y # CONFIG_PDC202XX_BURST is not set CONFIG_BLK_DEV_PDC202XX_NEW=y # CONFIG_BLK_DEV_SVWKS is not set CONFIG_BLK_DEV_SIIMAGE=y # CONFIG_BLK_DEV_SIS5513 is not set CONFIG_BLK_DEV_SLC90E66=y CONFIG_BLK_DEV_TRM290=y # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # CONFIG_RAID_ATTRS=y CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_ST=y # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y CONFIG_CHR_DEV_SCH=y # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y # CONFIG_SCSI_CONSTANTS is not set CONFIG_SCSI_LOGGING=y # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y CONFIG_SCSI_ISCSI_ATTRS=y CONFIG_SCSI_SAS_ATTRS=y # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set CONFIG_SCSI_3W_9XXX=y # CONFIG_SCSI_ACARD is not set CONFIG_SCSI_AACRAID=y CONFIG_SCSI_AIC7XXX=y CONFIG_AIC7XXX_CMDS_PER_DEVICE=32 CONFIG_AIC7XXX_RESET_DELAY_MS=5000 # CONFIG_AIC7XXX_DEBUG_ENABLE is not set CONFIG_AIC7XXX_DEBUG_MASK=0 # CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set # CONFIG_SCSI_AIC7XXX_OLD is not set CONFIG_SCSI_AIC79XX=y CONFIG_AIC79XX_CMDS_PER_DEVICE=32 CONFIG_AIC79XX_RESET_DELAY_MS=5000 # CONFIG_AIC79XX_ENABLE_RD_STRM is not set CONFIG_AIC79XX_DEBUG_ENABLE=y CONFIG_AIC79XX_DEBUG_MASK=0 # CONFIG_AIC79XX_REG_PRETTY_PRINT is not set # CONFIG_SCSI_DPT_I2O is not set CONFIG_SCSI_ADVANSYS=y # CONFIG_MEGARAID_NEWGEN is not set CONFIG_MEGARAID_LEGACY=y CONFIG_MEGARAID_SAS=y # CONFIG_SCSI_SATA is not set CONFIG_SCSI_HPTIOP=y CONFIG_SCSI_BUSLOGIC=y # CONFIG_SCSI_OMIT_FLASHPOINT is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set CONFIG_SCSI_FUTURE_DOMAIN=y CONFIG_SCSI_GDTH=y # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set CONFIG_SCSI_PPA=y CONFIG_SCSI_IMM=y # CONFIG_SCSI_IZIP_EPP16 is not set CONFIG_SCSI_IZIP_SLOW_CTR=y # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set CONFIG_SCSI_QLOGIC_1280=y CONFIG_SCSI_QLA_FC=y # CONFIG_SCSI_LPFC is not set CONFIG_SCSI_DC390T=y CONFIG_SCSI_NSP32=y # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # # CONFIG_MD is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # CONFIG_FUSION_SPI is not set # CONFIG_FUSION_FC is not set # CONFIG_FUSION_SAS is not set # # IEEE 1394 (FireWire) support # # CONFIG_IEEE1394 is not set # # I2O device support # CONFIG_I2O=y CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y CONFIG_I2O_EXT_ADAPTEC=y CONFIG_I2O_CONFIG=y CONFIG_I2O_CONFIG_OLD_IOCTL=y CONFIG_I2O_BUS=y CONFIG_I2O_BLOCK=y CONFIG_I2O_SCSI=y CONFIG_I2O_PROC=y # # ISDN subsystem # # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y # CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=y CONFIG_INPUT_EVBUG=y # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set CONFIG_KEYBOARD_LKKBD=y CONFIG_KEYBOARD_XTKBD=y CONFIG_KEYBOARD_NEWTON=y # CONFIG_INPUT_MOUSE is not set CONFIG_INPUT_JOYSTICK=y CONFIG_JOYSTICK_ANALOG=y # CONFIG_JOYSTICK_A3D is not set # CONFIG_JOYSTICK_ADI is not set # CONFIG_JOYSTICK_COBRA is not set CONFIG_JOYSTICK_GF2K=y # CONFIG_JOYSTICK_GRIP is not set CONFIG_JOYSTICK_GRIP_MP=y CONFIG_JOYSTICK_GUILLEMOT=y CONFIG_JOYSTICK_INTERACT=y # CONFIG_JOYSTICK_SIDEWINDER is not set # CONFIG_JOYSTICK_TMDC is not set # CONFIG_JOYSTICK_IFORCE is not set # CONFIG_JOYSTICK_WARRIOR is not set # CONFIG_JOYSTICK_MAGELLAN is not set # CONFIG_JOYSTICK_SPACEORB is not set CONFIG_JOYSTICK_SPACEBALL=y # CONFIG_JOYSTICK_STINGER is not set CONFIG_JOYSTICK_TWIDJOY=y CONFIG_JOYSTICK_DB9=y CONFIG_JOYSTICK_GAMECON=y # CONFIG_JOYSTICK_TURBOGRAFX is not set CONFIG_JOYSTICK_JOYDUMP=y # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y # CONFIG_INPUT_PCSPKR is not set # CONFIG_INPUT_WISTRON_BTNS is not set # CONFIG_INPUT_UINPUT is not set # # Hardware I/O ports # CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set CONFIG_SERIO_PARKBD=y # CONFIG_SERIO_PCIPS2 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set CONFIG_GAMEPORT=y CONFIG_GAMEPORT_NS558=y # CONFIG_GAMEPORT_L4 is not set # CONFIG_GAMEPORT_EMU10K1 is not set CONFIG_GAMEPORT_FM801=y # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_VT_HW_CONSOLE_BINDING=y CONFIG_SERIAL_NONSTANDARD=y # CONFIG_COMPUTONE is not set # CONFIG_ROCKETPORT is not set CONFIG_CYCLADES=y # CONFIG_DIGIEPCA is not set # CONFIG_MOXA_INTELLIO is not set CONFIG_MOXA_SMARTIO=y CONFIG_ISI=y CONFIG_SYNCLINK=y CONFIG_SYNCLINKMP=y CONFIG_SYNCLINK_GT=y CONFIG_N_HDLC=y # CONFIG_RISCOM8 is not set # CONFIG_SPECIALIX is not set CONFIG_SX=y # CONFIG_RIO is not set CONFIG_STALDRV=y # CONFIG_STALLION is not set # CONFIG_ISTALLION is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_PCI=y # CONFIG_SERIAL_8250_CS is not set CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_8250_RUNTIME_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y # CONFIG_LEGACY_PTYS is not set CONFIG_PRINTER=y # CONFIG_LP_CONSOLE is not set # CONFIG_PPDEV is not set # CONFIG_TIPAR is not set # # IPMI # CONFIG_IPMI_HANDLER=y # CONFIG_IPMI_PANIC_EVENT is not set CONFIG_IPMI_DEVICE_INTERFACE=y CONFIG_IPMI_SI=y CONFIG_IPMI_WATCHDOG=y # CONFIG_IPMI_POWEROFF is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set CONFIG_HW_RANDOM=y # CONFIG_HW_RANDOM_INTEL is not set CONFIG_HW_RANDOM_AMD=y CONFIG_HW_RANDOM_GEODE=y CONFIG_HW_RANDOM_VIA=y CONFIG_NVRAM=y # CONFIG_RTC is not set CONFIG_GEN_RTC=y # CONFIG_GEN_RTC_X is not set # CONFIG_DTLK is not set CONFIG_R3964=y # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # # CONFIG_FTAPE is not set # CONFIG_AGP is not set # CONFIG_DRM is not set # # PCMCIA character devices # # CONFIG_SYNCLINK_CS is not set # CONFIG_CARDMAN_4000 is not set # CONFIG_CARDMAN_4040 is not set CONFIG_MWAVE=y # CONFIG_SCx200_GPIO is not set # CONFIG_PC8736x_GPIO is not set # CONFIG_NSC_GPIO is not set CONFIG_CS5535_GPIO=y # CONFIG_RAW_DRIVER is not set CONFIG_HPET=y # CONFIG_HPET_RTC_IRQ is not set # CONFIG_HPET_MMAP is not set # CONFIG_HANGCHECK_TIMER is not set # # TPM devices # # # I2C support # CONFIG_I2C=y CONFIG_I2C_CHARDEV=y # # I2C Algorithms # CONFIG_I2C_ALGOBIT=y CONFIG_I2C_ALGOPCF=y # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_I801 is not set CONFIG_I2C_I810=y CONFIG_I2C_PIIX4=y # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT is not set # CONFIG_I2C_PARPORT_LIGHT is not set CONFIG_I2C_PROSAVAGE=y # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Miscellaneous I2C Chip support # # CONFIG_I2C_DEBUG_CORE is not set CONFIG_I2C_DEBUG_ALGO=y # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # SPI support # # CONFIG_SPI is not set # CONFIG_SPI_MASTER is not set # # Dallas's 1-wire bus # # # Hardware Monitoring support # # CONFIG_HWMON is not set # CONFIG_HWMON_VID is not set # # Misc devices # # # Multimedia devices # # CONFIG_VIDEO_DEV is not set CONFIG_VIDEO_V4L2=y # # Digital Video Broadcasting Devices # # # Graphics support # # CONFIG_FIRMWARE_EDID is not set CONFIG_FB=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y # CONFIG_FB_MACMODES is not set # CONFIG_FB_BACKLIGHT is not set CONFIG_FB_MODE_HELPERS=y CONFIG_FB_TILEBLITTING=y # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set CONFIG_FB_CYBER2000=y CONFIG_FB_ARC=y CONFIG_FB_ASILIANT=y CONFIG_FB_IMSTT=y CONFIG_FB_VGA16=y # CONFIG_FB_VESA is not set # CONFIG_FB_IMAC is not set # CONFIG_FB_HGA is not set CONFIG_FB_S1D13XXX=y # CONFIG_FB_NVIDIA is not set CONFIG_FB_RIVA=y CONFIG_FB_RIVA_I2C=y CONFIG_FB_RIVA_DEBUG=y CONFIG_FB_MATROX=y # CONFIG_FB_MATROX_MILLENIUM is not set CONFIG_FB_MATROX_MYSTIQUE=y CONFIG_FB_MATROX_G=y CONFIG_FB_MATROX_I2C=y # CONFIG_FB_MATROX_MAVEN is not set CONFIG_FB_MATROX_MULTIHEAD=y CONFIG_FB_RADEON=y # CONFIG_FB_RADEON_I2C is not set CONFIG_FB_RADEON_DEBUG=y # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_SIS is not set CONFIG_FB_NEOMAGIC=y CONFIG_FB_KYRO=y CONFIG_FB_3DFX=y # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_CYBLA is not set CONFIG_FB_TRIDENT=y CONFIG_FB_VIRTUAL=y # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_VGACON_SOFT_SCROLLBACK=y CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64 CONFIG_VIDEO_SELECT=y CONFIG_DUMMY_CONSOLE=y # CONFIG_FRAMEBUFFER_CONSOLE is not set # # Logo configuration # # CONFIG_LOGO is not set # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # # Sound # CONFIG_SOUND=y # # Advanced Linux Sound Architecture # CONFIG_SND=y CONFIG_SND_TIMER=y CONFIG_SND_PCM=y CONFIG_SND_HWDEP=y CONFIG_SND_RAWMIDI=y CONFIG_SND_SEQUENCER=y CONFIG_SND_SEQ_DUMMY=y CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=y CONFIG_SND_PCM_OSS=y # CONFIG_SND_PCM_OSS_PLUGINS is not set # CONFIG_SND_SEQUENCER_OSS is not set # CONFIG_SND_DYNAMIC_MINORS is not set CONFIG_SND_SUPPORT_OLD_API=y CONFIG_SND_VERBOSE_PROCFS=y # CONFIG_SND_VERBOSE_PRINTK is not set # CONFIG_SND_DEBUG is not set # # Generic devices # CONFIG_SND_MPU401_UART=y CONFIG_SND_OPL3_LIB=y CONFIG_SND_VX_LIB=y CONFIG_SND_AC97_CODEC=y CONFIG_SND_AC97_BUS=y # CONFIG_SND_DUMMY is not set CONFIG_SND_VIRMIDI=y # CONFIG_SND_MTPAV is not set # CONFIG_SND_SERIAL_U16550 is not set CONFIG_SND_MPU401=y # # PCI devices # # CONFIG_SND_AD1889 is not set CONFIG_SND_ALS300=y CONFIG_SND_ALS4000=y CONFIG_SND_ALI5451=y # CONFIG_SND_ATIIXP is not set CONFIG_SND_ATIIXP_MODEM=y CONFIG_SND_AU8810=y CONFIG_SND_AU8820=y CONFIG_SND_AU8830=y CONFIG_SND_BT87X=y # CONFIG_SND_BT87X_OVERCLOCK is not set # CONFIG_SND_CA0106 is not set CONFIG_SND_CMIPCI=y # CONFIG_SND_CS4281 is not set # CONFIG_SND_CS46XX is not set CONFIG_SND_CS5535AUDIO=y # CONFIG_SND_DARLA20 is not set # CONFIG_SND_GINA20 is not set # CONFIG_SND_LAYLA20 is not set CONFIG_SND_DARLA24=y # CONFIG_SND_GINA24 is not set CONFIG_SND_LAYLA24=y # CONFIG_SND_MONA is not set # CONFIG_SND_MIA is not set # CONFIG_SND_ECHO3G is not set CONFIG_SND_INDIGO=y # CONFIG_SND_INDIGOIO is not set CONFIG_SND_INDIGODJ=y # CONFIG_SND_EMU10K1 is not set CONFIG_SND_EMU10K1X=y # CONFIG_SND_ENS1370 is not set # CONFIG_SND_ENS1371 is not set CONFIG_SND_ES1938=y # CONFIG_SND_ES1968 is not set # CONFIG_SND_FM801 is not set # CONFIG_SND_HDA_INTEL is not set CONFIG_SND_HDSP=y CONFIG_SND_HDSPM=y # CONFIG_SND_ICE1712 is not set CONFIG_SND_ICE1724=y CONFIG_SND_INTEL8X0=y CONFIG_SND_INTEL8X0M=y CONFIG_SND_KORG1212=y CONFIG_SND_MAESTRO3=y # CONFIG_SND_MIXART is not set CONFIG_SND_NM256=y # CONFIG_SND_PCXHR is not set CONFIG_SND_RIPTIDE=y CONFIG_SND_RME32=y CONFIG_SND_RME96=y # CONFIG_SND_RME9652 is not set CONFIG_SND_SONICVIBES=y CONFIG_SND_TRIDENT=y # CONFIG_SND_VIA82XX is not set # CONFIG_SND_VIA82XX_MODEM is not set # CONFIG_SND_VX222 is not set CONFIG_SND_YMFPCI=y # # PCMCIA devices # CONFIG_SND_VXPOCKET=y # CONFIG_SND_PDAUDIOCF is not set # # Open Sound System # # CONFIG_SOUND_PRIME is not set # # USB support # CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y # CONFIG_USB is not set # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' # # # USB Gadget Support # CONFIG_USB_GADGET=y CONFIG_USB_GADGET_DEBUG_FILES=y CONFIG_USB_GADGET_SELECTED=y CONFIG_USB_GADGET_NET2280=y CONFIG_USB_NET2280=y # CONFIG_USB_GADGET_PXA2XX is not set # CONFIG_USB_GADGET_GOKU is not set # CONFIG_USB_GADGET_LH7A40X is not set # CONFIG_USB_GADGET_OMAP is not set # CONFIG_USB_GADGET_AT91 is not set # CONFIG_USB_GADGET_DUMMY_HCD is not set CONFIG_USB_GADGET_DUALSPEED=y # CONFIG_USB_ZERO is not set # CONFIG_USB_ETH is not set # CONFIG_USB_GADGETFS is not set CONFIG_USB_FILE_STORAGE=y CONFIG_USB_FILE_STORAGE_TEST=y # CONFIG_USB_G_SERIAL is not set # # MMC/SD Card support # CONFIG_MMC=y CONFIG_MMC_DEBUG=y # CONFIG_MMC_BLOCK is not set CONFIG_MMC_WBSD=y # # LED devices # # CONFIG_NEW_LEDS is not set # # LED drivers # # # LED Triggers # # # InfiniBand support # CONFIG_INFINIBAND=y CONFIG_INFINIBAND_USER_MAD=y CONFIG_INFINIBAND_USER_ACCESS=y CONFIG_INFINIBAND_MTHCA=y CONFIG_INFINIBAND_MTHCA_DEBUG=y CONFIG_INFINIBAND_SRP=y CONFIG_INFINIBAND_ISER=y # # EDAC - error detection and reporting (RAS) (EXPERIMENTAL) # # # Real Time Clock # # # DMA Engine support # CONFIG_DMA_ENGINE=y # # DMA Clients # # # DMA Devices # CONFIG_INTEL_IOATDMA=y # # File systems # # CONFIG_EXT2_FS is not set CONFIG_EXT3_FS=y # CONFIG_EXT3_FS_XATTR is not set CONFIG_JBD=y CONFIG_JBD_DEBUG=y CONFIG_REISERFS_FS=y # CONFIG_REISERFS_CHECK is not set CONFIG_REISERFS_PROC_INFO=y # CONFIG_REISERFS_FS_XATTR is not set CONFIG_JFS_FS=y CONFIG_JFS_POSIX_ACL=y # CONFIG_JFS_SECURITY is not set # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set CONFIG_FS_POSIX_ACL=y # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_INOTIFY is not set # CONFIG_QUOTA is not set CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=y CONFIG_AUTOFS4_FS=y CONFIG_FUSE_FS=y # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=y CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # # CONFIG_MSDOS_FS is not set # CONFIG_VFAT_FS is not set # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_HFSPLUS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set CONFIG_HPFS_FS=y # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set CONFIG_UFS_FS=y # CONFIG_UFS_DEBUG is not set # # Partition Types # CONFIG_PARTITION_ADVANCED=y CONFIG_ACORN_PARTITION=y # CONFIG_ACORN_PARTITION_CUMANA is not set # CONFIG_ACORN_PARTITION_EESOX is not set # CONFIG_ACORN_PARTITION_ICS is not set # CONFIG_ACORN_PARTITION_ADFS is not set # CONFIG_ACORN_PARTITION_POWERTEC is not set CONFIG_ACORN_PARTITION_RISCIX=y # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set # CONFIG_MAC_PARTITION is not set CONFIG_MSDOS_PARTITION=y CONFIG_BSD_DISKLABEL=y # CONFIG_MINIX_SUBPARTITION is not set CONFIG_SOLARIS_X86_PARTITION=y # CONFIG_UNIXWARE_DISKLABEL is not set # CONFIG_LDM_PARTITION is not set CONFIG_SGI_PARTITION=y # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set # CONFIG_KARMA_PARTITION is not set CONFIG_EFI_PARTITION=y # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" # CONFIG_NLS_CODEPAGE_437 is not set CONFIG_NLS_CODEPAGE_737=y CONFIG_NLS_CODEPAGE_775=y # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set CONFIG_NLS_CODEPAGE_855=y CONFIG_NLS_CODEPAGE_857=y CONFIG_NLS_CODEPAGE_860=y CONFIG_NLS_CODEPAGE_861=y CONFIG_NLS_CODEPAGE_862=y # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set CONFIG_NLS_CODEPAGE_865=y # CONFIG_NLS_CODEPAGE_866 is not set CONFIG_NLS_CODEPAGE_869=y # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set CONFIG_NLS_CODEPAGE_932=y # CONFIG_NLS_CODEPAGE_949 is not set CONFIG_NLS_CODEPAGE_874=y CONFIG_NLS_ISO8859_8=y # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set # CONFIG_NLS_ASCII is not set # CONFIG_NLS_ISO8859_1 is not set CONFIG_NLS_ISO8859_2=y # CONFIG_NLS_ISO8859_3 is not set CONFIG_NLS_ISO8859_4=y CONFIG_NLS_ISO8859_5=y # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set CONFIG_NLS_ISO8859_13=y # CONFIG_NLS_ISO8859_14 is not set CONFIG_NLS_ISO8859_15=y # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set # CONFIG_NLS_UTF8 is not set # # Kernel hacking # CONFIG_TRACE_IRQFLAGS_SUPPORT=y # CONFIG_PRINTK_TIME is not set CONFIG_MAGIC_SYSRQ=y CONFIG_UNUSED_SYMBOLS=y CONFIG_DEBUG_KERNEL=y CONFIG_LOG_BUF_SHIFT=15 CONFIG_DETECT_SOFTLOCKUP=y CONFIG_SCHEDSTATS=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SLAB_LEAK=y # CONFIG_DEBUG_RT_MUTEXES is not set # CONFIG_RT_MUTEX_TESTER is not set CONFIG_DEBUG_SPINLOCK=y # CONFIG_DEBUG_MUTEXES is not set # CONFIG_DEBUG_RWSEMS is not set # CONFIG_DEBUG_LOCK_ALLOC is not set # CONFIG_PROVE_LOCKING is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set CONFIG_DEBUG_KOBJECT=y CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_VM=y CONFIG_FRAME_POINTER=y # CONFIG_UNWIND_INFO is not set # CONFIG_FORCED_INLINING is not set CONFIG_RCU_TORTURE_TEST=y CONFIG_EARLY_PRINTK=y CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUG_RODATA is not set CONFIG_4KSTACKS=y CONFIG_DOUBLEFAULT=y # # Security options # CONFIG_KEYS=y # CONFIG_KEYS_DEBUG_PROC_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y # CONFIG_CRYPTO_NULL is not set CONFIG_CRYPTO_MD4=y CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y # CONFIG_CRYPTO_SHA256 is not set CONFIG_CRYPTO_SHA512=y # CONFIG_CRYPTO_WP512 is not set # CONFIG_CRYPTO_TGR192 is not set # CONFIG_CRYPTO_DES is not set # CONFIG_CRYPTO_BLOWFISH is not set CONFIG_CRYPTO_TWOFISH=y CONFIG_CRYPTO_SERPENT=y CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_AES_586=y CONFIG_CRYPTO_CAST5=y # CONFIG_CRYPTO_CAST6 is not set # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set CONFIG_CRYPTO_KHAZAD=y # CONFIG_CRYPTO_ANUBIS is not set CONFIG_CRYPTO_DEFLATE=y CONFIG_CRYPTO_MICHAEL_MIC=y # CONFIG_CRYPTO_CRC32C is not set # # Hardware crypto devices # # CONFIG_CRYPTO_DEV_PADLOCK is not set # # Library routines # CONFIG_CRC_CCITT=y # CONFIG_CRC16 is not set CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y CONFIG_PLIST=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_KTIME_SCALAR=y -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From halr at voltaire.com Sun Sep 10 04:48:11 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Sep 2006 07:48:11 -0400 Subject: [openib-general] [PATCH] osm: OSM bug fix with --run-once option In-Reply-To: <1157870154.29270.42.camel@kliteynik.yok.mtl.com> References: <1157870154.29270.42.camel@kliteynik.yok.mtl.com> Message-ID: <1157888830.27427.49152.camel@hal.voltaire.com> Hi Yevgeny, On Sun, 2006-09-10 at 02:35, Yevgeny Kliteynik wrote: > Hi Hal > > This patch fixes the bug that was occurring when OSM was > running with --run-once option (-o) and the SM port was down. > In that case, OSM would be stuck in cond_wait forever (or until > the port will become active), and could not be terminated, > other than by SIGKILL. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Should this go to 1.1 as well as trunk ? How critical for 1.1 ? -- Hal From erezz at voltaire.com Sun Sep 10 05:14:38 2006 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 10 Sep 2006 15:14:38 +0300 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <200609101343.02740.toralf.foerster@gmx.de> References: <200609071902.57379.toralf.foerster@gmx.de> <4503D620.4000602@voltaire.com> <200609101343.02740.toralf.foerster@gmx.de> Message-ID: <450401AE.2030606@voltaire.com> Toralf Förster wrote: > I copied the config file to .config, made then a "make oldconfig && make" against current sources 2.6.18-rc6-git3. > BTW, I attach another .config where the similar problem occured > Where did you get this config file from? I don't think that this config file was generated by a too like 'make menuconfig'. As I explained before, if you select CONFIG_INFINIBAND=y using the menuconfig tool, it also sets CONFIG_INFINIBAND_ADDR_TRANS=y. Therefore, I can only guess that this config file was generated manually (or at least modified manually). If you can explain how can I generate the config file that you used, maybe I can reproduce it. Else, I suggest that you delete your .config file and run 'make menuconfig'. Then, select InfiniBand & iSER and it should work fine. Let me know if it works for you. Erez From toralf.foerster at gmx.de Sun Sep 10 07:45:19 2006 From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=) Date: Sun, 10 Sep 2006 16:45:19 +0200 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <450401AE.2030606@voltaire.com> References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> Message-ID: <200609101645.22695.toralf.foerster@gmx.de> Your're right, sorry, I forgot this to say that: - first I created a random .config file using "make rndconfig" - after that I removed all options not fitting my system - then I prepended some options commonly used by me on top of the .config file - and run finally a "make oldconfig" to-hopefully- get a clean config Doesn't "make oldconfig" make a clean .config file ? Am Sunday 10 September 2006 14:14 schrieb Erez Zilber: > Toralf Förster wrote: > > I copied the config file to .config, made then a "make oldconfig && make" against current sources 2.6.18-rc6-git3. > > BTW, I attach another .config where the similar problem occured > > > Where did you get this config file from? I don't think that this config > file was generated by a too like 'make menuconfig'. As I explained > before, if you select CONFIG_INFINIBAND=y using the menuconfig tool, it > also sets CONFIG_INFINIBAND_ADDR_TRANS=y. Therefore, I can only guess > that this config file was generated manually (or at least modified > manually). > > If you can explain how can I generate the config file that you used, > maybe I can reproduce it. Else, I suggest that you delete your .config > file and run 'make menuconfig'. Then, select InfiniBand & iSER and it > should work fine. Let me know if it works for you. > > Erez > > -- MfG/Sincerely Toralf Förster -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From eitan at mellanox.co.il Sun Sep 10 11:46:22 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 10 Sep 2006 21:46:22 +0300 Subject: [openib-general] [PATCH] osm: OSM bug fix with --run-once option In-Reply-To: <1157888830.27427.49152.camel@hal.voltaire.com> References: <1157870154.29270.42.camel@kliteynik.yok.mtl.com> <1157888830.27427.49152.camel@hal.voltaire.com> Message-ID: <45045D7E.6090908@mellanox.co.il> Hal Rosenstock wrote: >Hi Yevgeny, > >On Sun, 2006-09-10 at 02:35, Yevgeny Kliteynik wrote: > > >>Hi Hal >> >>This patch fixes the bug that was occurring when OSM was >>running with --run-once option (-o) and the SM port was down. >>In that case, OSM would be stuck in cond_wait forever (or until >>the port will become active), and could not be terminated, >>other than by SIGKILL. >> >>Yevgeny >> >>Signed-off-by: Yevgeny Kliteynik >> >> > >Should this go to 1.1 as well as trunk ? How critical for 1.1 ? > > IMO this should only be applied to the trunk as the --run-once is not a user mode rather then a testing mode.So it is not critical for the branch. EZ >-- Hal > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From zhushisongzhu at yahoo.com Sun Sep 10 21:37:07 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Sun, 10 Sep 2006 21:37:07 -0700 (PDT) Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060910075818.GV6928@mellanox.co.il> Message-ID: <20060911043707.54946.qmail@web36909.mail.mud.yahoo.com> I tested for 10 times and every time issued 5000 concurrent connections to access Internet through SDP. Until now I haven't found any problem. I hope you can also reduce memory used by one SDP connection. Another question, When I use ib_query_gid in sdp_init function, it return -EINVAL. How to use ib_query_gid correctly? If I want to let SDP module is independent of ipoib module, it it difficult? tks zhu --- "Michael S. Tsirkin" wrote: > Quoting r. zhu shi song : > > Subject: Re: why sdp connections cost so much > memory > > > > OFED-1.1-rc3 has passed my tests. I have to adjust > > Post buffer size to 0x4 and use your patch for me. > > > Can you make it fixed not to do these myself > manually? > > > > zhu > > I plan to add the following patch to OFED. Could you > please verify > that it fixes the issue for you, without tweaking > the ring size? > > Signed-off-by: Michael S. Tsirkin > > > diff --git a/drivers/infiniband/ulp/sdp/sdp_bcopy.c > b/drivers/infiniband/ulp/sdp/sdp_bcopy.c > index b30d2a0..4540fa4 100644 > --- a/drivers/infiniband/ulp/sdp/sdp_bcopy.c > +++ b/drivers/infiniband/ulp/sdp/sdp_bcopy.c > @@ -37,6 +37,10 @@ #include > #include > #include "sdp.h" > > +static int rcvbuf_scale = 0x1; > +module_param_named(rcvbuf_scale, rcvbuf_scale, int, > 0644); > +MODULE_PARM_DESC(srcvbuf_scale, "Receive buffer > size scale factor."); > + > /* Like tcp_fin */ > static void sdp_fin(struct sock *sk) > { > @@ -237,7 +241,7 @@ void sdp_post_recvs(struct > sdp_sock *ssk > while ((likely(ssk->rx_head - ssk->rx_tail < > SDP_RX_SIZE) && > (ssk->rx_head - ssk->rx_tail - SDP_MIN_BUFS) * > SDP_MAX_SEND_SKB_FRAGS * PAGE_SIZE + rmem < > - ssk->isk.sk.sk_rcvbuf * 0x10) || > + ssk->isk.sk.sk_rcvbuf * rcvbuf_scale) || > unlikely(ssk->rx_head - ssk->rx_tail < > SDP_MIN_BUFS)) > sdp_post_recv(ssk); > } > > -- > MST > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Sun Sep 10 21:57:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 07:57:56 +0300 Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911043707.54946.qmail@web36909.mail.mud.yahoo.com> References: <20060911043707.54946.qmail@web36909.mail.mud.yahoo.com> Message-ID: <20060911045756.GA8709@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: why sdp connections cost so much memory > > I tested for 10 times and every time issued 5000 > concurrent connections to access Internet through SDP. > Until now I haven't found any problem. I hope you can > also reduce memory used by one SDP connection. You mean - when only a single socket is open? > Another question, When I use ib_query_gid in > sdp_init function, it return -EINVAL. How to use > ib_query_gid correctly? Looks like you are passing in an invalid gid. > If I want to let SDP module is independent of ipoib > module, it it difficult? The SDP protocol uses ARP over IPoIB for its address resolution. So you'd need to find some other way to perform address resolution. -- MST From sweitzen at cisco.com Sun Sep 10 22:27:10 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 10 Sep 2006 22:27:10 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status Message-ID: Please make sure 1. and 3. are fixed before you release rc4, thanks. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Thursday, September 07, 2006 1:02 PM To: EWG Cc: openib Subject: [openfabrics-ewg] OFED 1.1 status Hi, OFED 1.1 RC4 will be published on Monday 11-Sep. We currently work on several showstoppers: 1. 223: mthca.so not properly linked to libibverbs - Vlad & Jack 2. 221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118 - Roland 3. 219: OFED 1.1rc3 contains prerelease unstable libibverbs code - Vlad & Jack Thus final release date will be delayed to end of next week Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Sun Sep 10 22:47:24 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 10 Sep 2006 22:47:24 -0700 Subject: [openib-general] is there a plan for getting SDP into kernel.org? Message-ID: I would like to see netstat support, zcopy support, and ideally AIO support get added first... Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Sun Sep 10 23:02:39 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 11 Sep 2006 09:02:39 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA78BA@mtlexch01.mtl.com> both already fixed. Tziporet -----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Monday, September 11, 2006 8:27 AM To: Tziporet Koren; EWG Cc: openib Subject: RE: [openfabrics-ewg] OFED 1.1 status Please make sure 1. and 3. are fixed before you release rc4, thanks. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Thursday, September 07, 2006 1:02 PM To: EWG Cc: openib Subject: [openfabrics-ewg] OFED 1.1 status Hi, OFED 1.1 RC4 will be published on Monday 11-Sep. We currently work on several showstoppers: 1.223: mthca.so not properly linked to libibverbs - Vlad & Jack 2.221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118 - Roland 3.219: OFED 1.1rc3 contains prerelease unstable libibverbs code - Vlad & Jack Thus final release date will be delayed to end of next week Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Sep 10 23:18:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 09:18:57 +0300 Subject: [openib-general] is there a plan for getting SDP into kernel.org? In-Reply-To: References: Message-ID: <20060911061857.GA8948@mellanox.co.il> Quoting r. Scott Weitzenkamp (sweitzen) : > Subject: is there a plan for getting SDP into kernel.org? > > I would like to see netstat support, zcopy support, and ideally AIO support get added first... > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems IMO, only netstat support makes some sense at this point. -- MST From erezz at voltaire.com Sun Sep 10 23:33:15 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 09:33:15 +0300 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <200609101645.22695.toralf.foerster@gmx.de> References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> Message-ID: <4505032B.3050706@voltaire.com> Toralf Förster wrote: > Your're right, sorry, > > I forgot this to say that: > - first I created a random .config file using "make rndconfig" > - after that I removed all options not fitting my system > - then I prepended some options commonly used by me on top of the .config file > - and run finally a "make oldconfig" to-hopefully- get a clean config > > Doesn't "make oldconfig" make a clean .config file ? > > Here's what the kernel's README file has to say about it: "make oldconfig": Default all questions based on the contents of your existing ./.config file and asking about new config symbols. I guess that 'make rndconfig' selected CONFIG_INFINIBAND=y, but didn't select CONFIG_INFINIBAND_ADDR_TRANS=y. Then, 'make oldconfig' asked you about new symbols. I guess that running 'make rndconfig' may create scenarios like this, but I don't think that there's a bug in iSER's Kconfig file. If you still want to use your .config file, reselect InfiniBand in 'make menuconfig'. It will set CONFIG_INFINIBAND_ADDR_TRANS=y. I hope this helps. Erez From bugzilla-daemon at openib.org Mon Sep 11 00:15:00 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 11 Sep 2006 00:15:00 -0700 (PDT) Subject: [openib-general] [Bug 229] New: heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20060911071500.4CB962283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 Summary: heavy CPU load can starve ib_mad thread on latest processors Product: OpenFabrics Linux Version: 1.1rc3 Platform: All OS/Version: RHEL 4 Status: NEW Severity: normal Priority: P3 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: sweitzen at cisco.com RHEL4 U3 x86_64 We have a proprietary test tool that places a very heavy CPU load on system. When this test is run on an IB host on Intel Woodcrest, AMD Opteron (Rev F, I believe, not sure about Rev G), and PowerPC JS21 systems, IB port goes from ACTIVE to INIT state. The workaround is to renice the ib_mad thread to highest priority, we recommend changing the OpenIB code to do this when ib_mad thread is created. This does not seem to happen on older Intel or AMD processors, dunno about PowerPC. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From zhushisongzhu at yahoo.com Mon Sep 11 00:43:05 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 11 Sep 2006 00:43:05 -0700 (PDT) Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911045756.GA8709@mellanox.co.il> Message-ID: <20060911074305.52197.qmail@web36905.mail.mud.yahoo.com> >> You mean - when only a single socket is open? Every one connection will cost 2M RAM. So I make the following changes: #define SDP_TX_SIZE 0x4 #define SDP_RX_SIZE 0x4 > The SDP protocol uses ARP over IPoIB for its address > resolution. > So you'd need to find some other way to perform > address resolution. I'll try pre-resolute the address, So I can remove ARP from ipoib zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Mon Sep 11 00:50:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 10:50:38 +0300 Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911074305.52197.qmail@web36905.mail.mud.yahoo.com> References: <20060911074305.52197.qmail@web36905.mail.mud.yahoo.com> Message-ID: <20060911075038.GC10024@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: why sdp connections cost so much memory > > > > >> You mean - when only a single socket is open? > Every one connection will cost 2M RAM. So I make the > following changes: > #define SDP_TX_SIZE 0x4 > #define SDP_RX_SIZE 0x4 You should not need this change with the scale patch I posted - after applying this, and setting the scale parameter to 0x1, each connection should use around 128K for RX. Please confirm. > > The SDP protocol uses ARP over IPoIB for its address > > resolution. > > So you'd need to find some other way to perform > > address resolution. > > > I'll try pre-resolute the address, So I can remove ARP > from ipoib But you'll still need the ipoib module loaded. -- MST From bugzilla-daemon at openib.org Mon Sep 11 00:58:40 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 11 Sep 2006 00:58:40 -0700 (PDT) Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20060911075840.8800A2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 ------- Comment #1 from amitk at mellanox.co.il 2006-09-11 00:58 ------- Which SM are you running ? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From HNGUYEN at de.ibm.com Mon Sep 11 01:30:57 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 11 Sep 2006 10:30:57 +0200 Subject: [openib-general] [PATCH] ehca for OFED 1.1-rc4 In-Reply-To: <20060910093747.GA11625@mellanox.co.il> Message-ID: I guess my email client must have wrapped the lines so that the patch is not applicable any more. Sorry for that! Need little time to fix that problem. For now I'm sending you the patch file as attachment that I could apply without errors. Thanks, Nam Nguyen (See attached file: ofed_svnehca_0015.patch) openib-general-bounces at openib.org wrote on 10.09.2006 11:37:47: > Quoting r. Hoang-Nam Nguyen : > Subject: [PATCH] ehca for OFED 1.1-rc4 > > > > Hello Tziporet! > > Below is a patch of ehca against the ofed git tree branch ehca- > branch in order to upgrade it to the same code level of Roland's git > tree branch for-2.6.19, which has been posted for a while. The main > code changes are: > > - Replace the "huge" EDEB macro by a simpler wrapper based on dev_err/dbg > > - Remove superfluous variables initialization and arguments checking > > - Replace struct ehca_module by static member variables in > appropriate files, where they are accessed > > - Rename module name to ib_ehca.ko > > Thanks! > > Nam Nguyen > > Unfortunately, the patch doesn't apply against either ofed_1_1 or ehca_branch. > Please check that it does, before re-posting. > > One additional request: your message included a copy > of patch in both plain text and html format. Please > post plain text only. > > Thanks, > > -- > MST > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed_svnehca_0015.patch Type: application/octet-stream Size: 291382 bytes Desc: not available URL: From zhushisongzhu at yahoo.com Mon Sep 11 01:36:54 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 11 Sep 2006 01:36:54 -0700 (PDT) Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911075038.GC10024@mellanox.co.il> Message-ID: <20060911083655.6871.qmail@web36911.mail.mud.yahoo.com> --- "Michael S. Tsirkin" wrote: > Quoting r. zhu shi song : > > Subject: Re: why sdp connections cost so much > memory > > > > >> You mean - when only a single socket is open? > > Every one connection will cost 2M RAM. So I make > the > > following changes: > > #define SDP_TX_SIZE 0x4 > > #define SDP_RX_SIZE 0x4 > > You should not need this change with the scale patch > I posted - after applying > this, and setting the scale parameter to 0x1, each > connection should use around > 128K for RX. Please confirm. can each connection use 64K memory? > > > The SDP protocol uses ARP over IPoIB for its > address > > > resolution. > > > So you'd need to find some other way to perform > > > address resolution. > > > > > I'll try pre-resolute the address, So I can remove > ARP > > from ipoib > > But you'll still need the ipoib module loaded. > is it difficult not to load ipoib module? zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From toralf.foerster at gmx.de Mon Sep 11 01:40:32 2006 From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=) Date: Mon, 11 Sep 2006 10:40:32 +0200 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <4505032B.3050706@voltaire.com> References: <200609071902.57379.toralf.foerster@gmx.de> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> Message-ID: <200609111040.36277.toralf.foerster@gmx.de> Ah, thanks for clarifying this. Unfortunately this means, that there is a small chance, that "make oldconfig" will not work correctly in all cases, eg. upgrading a kernel to a newer version could yield into such failed compile step :-( Am Monday 11 September 2006 08:33 schrieb Erez Zilber: > Toralf Förster wrote: > > Your're right, sorry, > > > > I forgot this to say that: > > - first I created a random .config file using "make rndconfig" > > - after that I removed all options not fitting my system > > - then I prepended some options commonly used by me on top of the .config file > > - and run finally a "make oldconfig" to-hopefully- get a clean config > > > > Doesn't "make oldconfig" make a clean .config file ? > > > > > Here's what the kernel's README file has to say about it: > "make oldconfig": Default all questions based on the contents of your > existing ./.config file and asking about new config symbols. > > I guess that 'make rndconfig' selected CONFIG_INFINIBAND=y, but didn't > select CONFIG_INFINIBAND_ADDR_TRANS=y. Then, 'make oldconfig' asked you > about new symbols. I guess that running 'make rndconfig' may create > scenarios like this, but I don't think that there's a bug in iSER's > Kconfig file. If you still want to use your .config file, reselect > InfiniBand in 'make menuconfig'. It will set CONFIG_INFINIBAND_ADDR_TRANS=y. > > I hope this helps. > > Erez > > -- MfG/Sincerely Toralf Förster -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From mst at mellanox.co.il Mon Sep 11 02:07:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 12:07:54 +0300 Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911083655.6871.qmail@web36911.mail.mud.yahoo.com> References: <20060911075038.GC10024@mellanox.co.il> <20060911083655.6871.qmail@web36911.mail.mud.yahoo.com> Message-ID: <20060911090754.GA10480@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: why sdp connections cost so much memory > > > > --- "Michael S. Tsirkin" wrote: > > > Quoting r. zhu shi song : > > > Subject: Re: why sdp connections cost so much > > memory > > > > > > >> You mean - when only a single socket is open? > > > Every one connection will cost 2M RAM. So I make > > the > > > following changes: > > > #define SDP_TX_SIZE 0x4 > > > #define SDP_RX_SIZE 0x4 > > > > You should not need this change with the scale patch > > I posted - after applying > > this, and setting the scale parameter to 0x1, each > > connection should use around > > 128K for RX. Please confirm. Could you please confirm that setting scale factor to 1 works for you, without changing SDP_TX_SIZE/SDP_RX_SIZE? > can each connection use 64K memory? SDP_MAX_SEND_SKB_FRAGS controls the number of pages per descriptor. You need at least 4 of these. I have it at 8 at the moment, try scaling it down. -- MST From erezz at voltaire.com Mon Sep 11 02:17:36 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 12:17:36 +0300 (IDT) Subject: [openib-general] [PATCH 0/5] IB/iser: iSER code changes for 2.6.19 Message-ID: Here is a series of patches that fix some bugs that were found in iSER during testing (some were found while testing iSER on architectures like ia64). All of them are related to memory registartion. Thanks Erez From erezz at voltaire.com Mon Sep 11 02:19:17 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 12:19:17 +0300 (IDT) Subject: [openib-general] [PATCH 1/5] IB/iser: fix a check of SG alignment for RDMA In-Reply-To: Message-ID: dma mapping may include a "compaction" of the sg associated with scsi command. Hence, the size of the maximal prefix of the SG which is aligned for rdma must be compared against the length of the dma mapped sg (mem->dma_nents) and not against the size of it before it was mapped (mem->size). Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iser_memory.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) 5301a4bb4f73250a93bc0c103839ae527f6b4110 diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 31950a5..53af956 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -378,7 +378,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_ regd_buf = &iser_ctask->rdma_regd[cmd_dir]; aligned_len = iser_data_buf_aligned_len(mem); - if (aligned_len != mem->size) { + if (aligned_len != mem->dma_nents) { iser_err("rdma alignment violation %d/%d aligned\n", aligned_len, mem->size); iser_data_buf_dump(mem); -- 1.2.6 From erezz at voltaire.com Mon Sep 11 02:20:54 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 12:20:54 +0300 (IDT) Subject: [openib-general] [PATCH 2/5] IB/iser: Limit the max size of a scsi command In-Reply-To: Message-ID: Currently, the data length of a command coming down from scsi-ml is limited only by the size of its sg list (sg_tablesize). The max data length may be different for different page size values. By setting max_sectors, we limit the data length to max_sectors*512 bytes. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) 522817c2dbb865c98465f3d17978dbdc8c4ff100 diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 101e407..2a14fe2 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -545,6 +545,7 @@ static struct scsi_host_template iscsi_i .queuecommand = iscsi_queuecommand, .can_queue = ISCSI_XMIT_CMDS_MAX - 1, .sg_tablesize = ISCSI_ISER_SG_TABLESIZE, + .max_sectors = 1024, .cmd_per_lun = ISCSI_MAX_CMD_PER_LUN, .eh_abort_handler = iscsi_eh_abort, .eh_host_reset_handler = iscsi_eh_host_reset, -- 1.2.6 From erezz at voltaire.com Mon Sep 11 02:22:30 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 12:22:30 +0300 (IDT) Subject: [openib-general] [PATCH 3/5] IB/iser: make FMR "page size" be 4K and not PAGE_SIZE In-Reply-To: Message-ID: As iser is able to use at most one rdma operation for the execution of a scsi command, and registration of the sg associated with scsi command has its restrictions, the code checks if an sg is "aligned for rdma". Alignment for rdma is measured in "fmr page" units whose possible resolutions are different between HCAs and can be smaller, equal or bigger to the system page size. When the system page size is bigger than 4KB (eg the default with ia64 kernels) there a bigger chance that an sg would be aligned for rdma if the fmr page size is 4KB. Change the code to create FMR whose pages are of size 4KB and to take that into account when processing the sg. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.h | 6 +++++- drivers/infiniband/ulp/iser/iser_memory.c | 31 +++++++++++++++++++---------- drivers/infiniband/ulp/iser/iser_verbs.c | 4 ++-- 3 files changed, 27 insertions(+), 14 deletions(-) 1f90243f796772fcaea6ad059876a0aad6a06d52 diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 7c3d0c9..2c8bc67 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -82,8 +82,12 @@ __func__ , ## arg); \ } while (0) +#define SHIFT_4K 12 +#define SIZE_4K (1UL << SHIFT_4K) +#define MASK_4K (~(SIZE_4K-1)) + /* support upto 512KB in one RDMA */ -#define ISCSI_ISER_SG_TABLESIZE (0x80000 >> PAGE_SHIFT) +#define ISCSI_ISER_SG_TABLESIZE (0x80000 >> SHIFT_4K) #define ISCSI_ISER_MAX_LUN 256 #define ISCSI_ISER_MAX_CMD_LEN 16 diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 53af956..bcef0d3 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -42,6 +42,7 @@ #include "iscsi_iser.h" #define ISER_KMALLOC_THRESHOLD 0x20000 /* 128K - kmalloc limit */ + /** * Decrements the reference count for the * registered buffer & releases it @@ -239,7 +240,7 @@ static int iser_sg_to_page_vec(struct is int i; /* compute the offset of first element */ - page_vec->offset = (u64) sg[0].offset; + page_vec->offset = (u64) sg[0].offset & ~MASK_4K; for (i = 0; i < data->dma_nents; i++) { total_sz += sg_dma_len(&sg[i]); @@ -247,21 +248,30 @@ static int iser_sg_to_page_vec(struct is first_addr = sg_dma_address(&sg[i]); last_addr = first_addr + sg_dma_len(&sg[i]); - start_aligned = !(first_addr & ~PAGE_MASK); - end_aligned = !(last_addr & ~PAGE_MASK); + start_aligned = !(first_addr & ~MASK_4K); + end_aligned = !(last_addr & ~MASK_4K); /* continue to collect page fragments till aligned or SG ends */ while (!end_aligned && (i + 1 < data->dma_nents)) { i++; total_sz += sg_dma_len(&sg[i]); last_addr = sg_dma_address(&sg[i]) + sg_dma_len(&sg[i]); - end_aligned = !(last_addr & ~PAGE_MASK); + end_aligned = !(last_addr & ~MASK_4K); } - first_addr = first_addr & PAGE_MASK; - - for (page = first_addr; page < last_addr; page += PAGE_SIZE) - page_vec->pages[cur_page++] = page; + /* handle the 1st page in the 1st DMA element */ + if (cur_page == 0) { + page = first_addr & MASK_4K; + page_vec->pages[cur_page] = page; + cur_page++; + page += SIZE_4K; + } else + page = first_addr; + + for (; page < last_addr; page += SIZE_4K) { + page_vec->pages[cur_page] = page; + cur_page++; + } } page_vec->data_size = total_sz; @@ -269,8 +279,7 @@ static int iser_sg_to_page_vec(struct is return cur_page; } -#define MASK_4K ((1UL << 12) - 1) /* 0xFFF */ -#define IS_4K_ALIGNED(addr) ((((unsigned long)addr) & MASK_4K) == 0) +#define IS_4K_ALIGNED(addr) ((((unsigned long)addr) & ~MASK_4K) == 0) /** * iser_data_buf_aligned_len - Tries to determine the maximal correctly aligned @@ -352,7 +361,7 @@ static void iser_page_vec_build(struct i page_vec->length = page_vec_len; - if (page_vec_len * PAGE_SIZE < page_vec->data_size) { + if (page_vec_len * SIZE_4K < page_vec->data_size) { iser_err("page_vec too short to hold this SG\n"); iser_data_buf_dump(data); iser_dump_page_vec(page_vec); diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 72febf1..9b27a7c 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -150,7 +150,7 @@ static int iser_create_ib_conn_res(struc } ib_conn->page_vec->pages = (u64 *) (ib_conn->page_vec + 1); - params.page_shift = PAGE_SHIFT; + params.page_shift = SHIFT_4K; /* when the first/last SG element are not start/end * * page aligned, the map whould be of N+1 pages */ params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE + 1; @@ -604,7 +604,7 @@ int iser_reg_page_vec(struct iser_conn mem_reg->lkey = mem->fmr->lkey; mem_reg->rkey = mem->fmr->rkey; - mem_reg->len = page_vec->length * PAGE_SIZE; + mem_reg->len = page_vec->length * SIZE_4K; mem_reg->va = io_addr; mem_reg->mem_h = (void *)mem; -- 1.2.6 From erezz at voltaire.com Mon Sep 11 02:24:00 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 12:24:00 +0300 (IDT) Subject: [openib-general] [PATCH 4/5] IB/iser: fix some debug prints In-Reply-To: Message-ID: fix and add some debug prints related to iser handling of memory for rdma. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iser_memory.c | 17 ++++++++++++++--- 1 files changed, 14 insertions(+), 3 deletions(-) 00703cf2800ce3ac864b149ce75435b00480d9d2 diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index bcef0d3..8fea0bc 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -329,9 +329,9 @@ static void iser_data_buf_dump(struct is struct scatterlist *sg = (struct scatterlist *)data->buf; int i; - for (i = 0; i < data->size; i++) + for (i = 0; i < data->dma_nents; i++) iser_err("sg[%d] dma_addr:0x%lX page:0x%p " - "off:%d sz:%d dma_len:%d\n", + "off:0x%x sz:0x%x dma_len:0x%x\n", i, (unsigned long)sg_dma_address(&sg[i]), sg[i].page, sg[i].offset, sg[i].length,sg_dma_len(&sg[i])); @@ -383,6 +383,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_ struct iser_regd_buf *regd_buf; int aligned_len; int err; + int i; regd_buf = &iser_ctask->rdma_regd[cmd_dir]; @@ -400,8 +401,18 @@ int iser_reg_rdma_mem(struct iscsi_iser_ iser_page_vec_build(mem, ib_conn->page_vec); err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, ®d_buf->reg); - if (err) + if (err) { + iser_data_buf_dump(mem); + iser_err("mem->dma_nents = %d (dlength = 0x%x)\n", mem->dma_nents, + ntoh24(iser_ctask->desc.iscsi_header.dlength)); + iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n", + ib_conn->page_vec->data_size, ib_conn->page_vec->length, + ib_conn->page_vec->offset); + for (i=0 ; ipage_vec->length ; i++) { + iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]); + } return err; + } /* take a reference on this regd buf such that it will not be released * * (eg in send dto completion) before we get the scsi response */ -- 1.2.6 From erezz at voltaire.com Mon Sep 11 02:26:33 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 12:26:33 +0300 (IDT) Subject: [openib-general] [PATCH 5/5] IB/iser: Do not use FMR for a single dma entry sg In-Reply-To: Message-ID: Fast Memory Registration (fmr) is used to register for rdma an sg whose elements are not linearly sequential after dma mapping. The IB verbs layer provides an "all dma memory MR (memory region)" which can be used for RDMA-ing a dma linearly sequential buffer. Change the code to use the dma mr instead of doing fmr when dma mapping produces a single dma entry sg. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.h | 1 + drivers/infiniband/ulp/iser/iser_memory.c | 48 +++++++++++++++++++++-------- drivers/infiniband/ulp/iser/iser_verbs.c | 6 ++-- 3 files changed, 39 insertions(+), 16 deletions(-) c403e930977afb2838588523d10819ce586951a2 diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 2c8bc67..2cf9ae0 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -175,6 +175,7 @@ struct iser_mem_reg { u64 va; u64 len; void *mem_h; + int is_fmr; }; struct iser_regd_buf { diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 8fea0bc..e0d4347 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -56,7 +56,7 @@ int iser_regd_buff_release(struct iser_r if ((atomic_read(®d_buf->ref_count) == 0) || atomic_dec_and_test(®d_buf->ref_count)) { /* if we used the dma mr, unreg is just NOP */ - if (regd_buf->reg.rkey != 0) + if (regd_buf->reg.is_fmr) iser_unreg_mem(®d_buf->reg); if (regd_buf->dma_addr) { @@ -91,9 +91,9 @@ void iser_reg_single(struct iser_device BUG_ON(dma_mapping_error(dma_addr)); regd_buf->reg.lkey = device->mr->lkey; - regd_buf->reg.rkey = 0; /* indicate there's no need to unreg */ regd_buf->reg.len = regd_buf->data_size; regd_buf->reg.va = dma_addr; + regd_buf->reg.is_fmr = 0; regd_buf->dma_addr = dma_addr; regd_buf->direction = direction; @@ -379,11 +379,13 @@ int iser_reg_rdma_mem(struct iscsi_iser_ enum iser_data_dir cmd_dir) { struct iser_conn *ib_conn = iser_ctask->iser_conn->ib_conn; + struct iser_device *device = ib_conn->device; struct iser_data_buf *mem = &iser_ctask->data[cmd_dir]; struct iser_regd_buf *regd_buf; int aligned_len; int err; int i; + struct scatterlist *sg; regd_buf = &iser_ctask->rdma_regd[cmd_dir]; @@ -399,19 +401,37 @@ int iser_reg_rdma_mem(struct iscsi_iser_ mem = &iser_ctask->data_copy[cmd_dir]; } - iser_page_vec_build(mem, ib_conn->page_vec); - err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, ®d_buf->reg); - if (err) { - iser_data_buf_dump(mem); - iser_err("mem->dma_nents = %d (dlength = 0x%x)\n", mem->dma_nents, - ntoh24(iser_ctask->desc.iscsi_header.dlength)); - iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n", - ib_conn->page_vec->data_size, ib_conn->page_vec->length, - ib_conn->page_vec->offset); - for (i=0 ; ipage_vec->length ; i++) { - iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]); + /* if there a single dma entry, FMR is not needed */ + if (mem->dma_nents == 1) { + sg = (struct scatterlist *)mem->buf; + + regd_buf->reg.lkey = device->mr->lkey; + regd_buf->reg.rkey = device->mr->rkey; + regd_buf->reg.len = sg_dma_len(&sg[0]); + regd_buf->reg.va = sg_dma_address(&sg[0]); + regd_buf->reg.is_fmr = 0; + + iser_dbg("PHYSICAL Mem.register: lkey: 0x%08X rkey: 0x%08X " + "va: 0x%08lX sz: %ld]\n", + (unsigned int)regd_buf->reg.lkey, + (unsigned int)regd_buf->reg.rkey, + (unsigned long)regd_buf->reg.va, + (unsigned long)regd_buf->reg.len); + } else { /* use FMR for multiple dma entries */ + iser_page_vec_build(mem, ib_conn->page_vec); + err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, ®d_buf->reg); + if (err) { + iser_data_buf_dump(mem); + iser_err("mem->dma_nents = %d (dlength = 0x%x)\n", mem->dma_nents, + ntoh24(iser_ctask->desc.iscsi_header.dlength)); + iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n", + ib_conn->page_vec->data_size, ib_conn->page_vec->length, + ib_conn->page_vec->offset); + for (i=0 ; ipage_vec->length ; i++) { + iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]); + } + return err; } - return err; } /* take a reference on this regd buf such that it will not be released * diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 9b27a7c..ecdca7f 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -88,8 +88,9 @@ static int iser_create_device_ib_res(str iser_cq_tasklet_fn, (unsigned long)device); - device->mr = ib_get_dma_mr(device->pd, - IB_ACCESS_LOCAL_WRITE); + device->mr = ib_get_dma_mr(device->pd, IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_READ); if (IS_ERR(device->mr)) goto dma_mr_err; @@ -606,6 +607,7 @@ int iser_reg_page_vec(struct iser_conn mem_reg->rkey = mem->fmr->rkey; mem_reg->len = page_vec->length * SIZE_4K; mem_reg->va = io_addr; + mem_reg->is_fmr = 1; mem_reg->mem_h = (void *)mem; mem_reg->va += page_vec->offset; -- 1.2.6 From mst at mellanox.co.il Mon Sep 11 02:51:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 12:51:19 +0300 Subject: [openib-general] [PATCH] ehca for OFED 1.1-rc4 In-Reply-To: References: <20060910093747.GA11625@mellanox.co.il> Message-ID: <20060911095119.GA11825@mellanox.co.il> Quoting r. Hoang-Nam Nguyen : > Subject: Re: [PATCH] ehca for OFED 1.1-rc4 > > I guess my email client must have wrapped the lines so that the patch is > not applicable any more. Sorry for that! You also want to fix the mail format - you currently send each mail in both HTML and plain text - make it plaintext only. > Need little time to fix that problem. For now I'm sending you the patch > file as attachment that I could apply without errors. > Thanks, > Nam Nguyen > > (See attached file: ofed_svnehca_0015.patch) OK, applied and will be pushed out. -- MST From zhushisongzhu at yahoo.com Mon Sep 11 02:53:11 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 11 Sep 2006 02:53:11 -0700 (PDT) Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911075038.GC10024@mellanox.co.il> Message-ID: <20060911095311.71272.qmail@web36909.mail.mud.yahoo.com> > You should not need this change with the scale patch > I posted - after applying > this, and setting the scale parameter to 0x1, each > connection should use around > 128K for RX. Please confirm. I have tested it again, yes, you are right. I just set the scale parameter to 0x1, each connection cost about 128K memory. zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From halr at voltaire.com Mon Sep 11 02:49:37 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Sep 2006 05:49:37 -0400 Subject: [openib-general] [PATCH] osm: OSM bug fix with --run-once option In-Reply-To: <1157870154.29270.42.camel@kliteynik.yok.mtl.com> References: <1157870154.29270.42.camel@kliteynik.yok.mtl.com> Message-ID: <1157968170.27427.97391.camel@hal.voltaire.com> Hi Yevgeny, On Sun, 2006-09-10 at 02:35, Yevgeny Kliteynik wrote: > Hi Hal > > This patch fixes the bug that was occurring when OSM was > running with --run-once option (-o) and the SM port was down. > In that case, OSM would be stuck in cond_wait forever (or until > the port will become active), and could not be terminated, > other than by SIGKILL. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied to trunk only. -- Hal From zhushisongzhu at yahoo.com Mon Sep 11 02:59:41 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 11 Sep 2006 02:59:41 -0700 (PDT) Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911075038.GC10024@mellanox.co.il> Message-ID: <20060911095941.87733.qmail@web36903.mail.mud.yahoo.com> > You should not need this change with the scale patch > I posted - after applying > this, and setting the scale parameter to 0x1, each > connection should use around > 128K for RX. Please confirm. Just setting the scale parameter to 0x1, memory reduction is OK. But there occurred one bug, sometimes my kernel crashed. So I think PRE POST buf size should be changed either. zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Mon Sep 11 04:05:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 14:05:24 +0300 Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911095941.87733.qmail@web36903.mail.mud.yahoo.com> References: <20060911095941.87733.qmail@web36903.mail.mud.yahoo.com> Message-ID: <20060911110524.GB11825@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: why sdp connections cost so much memory > > > You should not need this change with the scale patch > > I posted - after applying > > this, and setting the scale parameter to 0x1, each > > connection should use around > > 128K for RX. Please confirm. > Just setting the scale parameter to 0x1, memory > reduction is OK. But there occurred one bug, > sometimes my kernel crashed. Shouldn't happen. Backtrace? > So I think PRE POST buf > size should be changed either. > zhu Hmm. I don't really see how this would help. Is it true that changing just the RX size fixes the crashes for you? If yes I'd like to investigate. -- MST From johnt1johnt2 at gmail.com Mon Sep 11 05:18:40 2006 From: johnt1johnt2 at gmail.com (john t) Date: Mon, 11 Sep 2006 17:48:40 +0530 Subject: [openib-general] kernel mode Message-ID: Hi, A general doubt. If I write a kernel program (linux kernel module) to send and receive data using IB, will it perform better then its user mode counterpart. Unlike user mode, in kernel mode, I think it is possible to allocate physically contiguous memory using "kmalloc or alloc_pages" which means HCAs need not do any address translation (i.e. no need of page table lookup as I guess in this case virtual address and physical address will differ only by a fixed offset) for copying data into main memory. Besides I think traditional DMAs give better performance with contiguous memory and use a special GFP_DMA zone. Moreover polling a CQ may be more efficient in kernel. Is this correct? Regards, John T. -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Sep 11 06:23:37 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Sep 2006 09:23:37 -0400 Subject: [openib-general] [PATCH][TRIVIAL] OpenSM: Change QoS syntax for CA ports Message-ID: <1157981006.27427.104217.camel@hal.voltaire.com> OpenSM: Change QoS syntax for CA ports Change names from hca_ to ca_ to make it clearer that these are for both HCAs and TCAs. Signed-off-by: Hal Rosenstock Index: doc/qos-config.txt =================================================================== --- doc/qos-config.txt (revision 9347) +++ doc/qos-config.txt (working copy) @@ -28,11 +28,11 @@ values may be stored in OpenSM config fi In addition to the above, we may define separate QoS configuration parameters sets for various target types. As targets, we currently support -HCA, routers, switch external ports, and switch's enhanced port 0. The +CAs, routers, switch external ports, and switch's enhanced port 0. The names of such specialized parameters are prefixed by "qos__" string. Here is a full list of the currently supported sets: - qos_hca_ - QoS configuration parameters set for HCAs. + qos_ca_ - QoS configuration parameters set for CAs. qos_rtr_ - parameters set for routers. qos_sw0_ - parameters set for switches' port 0. qos_swe_ - parameters set for switches' external ports. @@ -40,5 +40,5 @@ string. Here is a full list of the curre Examples: qos_sw0_max_vls=2 - qos_hca_sl2vl=0,1,2,3,5,5,5,12,12,0, + qos_ca_sl2vl=0,1,2,3,5,5,5,12,12,0, qos_swe_high_limit=0 Index: man/opensm.8 =================================================================== --- man/opensm.8 (revision 9347) +++ man/opensm.8 (working copy) @@ -1,4 +1,4 @@ -.TH OPENSM 8 "Setpember 6, 2006" "OpenIB" "OpenIB Management" +.TH OPENSM 8 "Setpember 11, 2006" "OpenIB" "OpenIB Management" .SH NAME opensm \- InfiniBand subnet manager and administration (SM/SA) @@ -365,18 +365,18 @@ values may be stored in OpenSM config fi In addition to the above, we may define separate QoS configuration parameters sets for various target types. As targets, we currently support -HCA, routers, switch external ports, and switch's enhanced port 0. The +CAs, routers, switch external ports, and switch's enhanced port 0. The names of such specialized parameters are prefixed by "qos__" string. Here is a full list of the currently supported sets: - qos_hca_ - QoS configuration parameters set for HCAs. + qos_ca_ - QoS configuration parameters set for CAs. qos_rtr_ - parameters set for routers. qos_sw0_ - parameters set for switches' port 0. qos_swe_ - parameters set for switches' external ports. Examples: qos_sw0_max_vls=2 - qos_hca_sl2vl=0,1,2,3,5,5,5,12,12,0, + qos_ca_sl2vl=0,1,2,3,5,5,5,12,12,0, qos_swe_high_limit=0 .SH ROUTING Index: include/opensm/osm_subnet.h =================================================================== --- include/opensm/osm_subnet.h (revision 9351) +++ include/opensm/osm_subnet.h (working copy) @@ -282,7 +282,7 @@ typedef struct _osm_subn_opt boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; osm_qos_options_t qos_options; - osm_qos_options_t qos_hca_options; + osm_qos_options_t qos_ca_options; osm_qos_options_t qos_sw0_options; osm_qos_options_t qos_swe_options; osm_qos_options_t qos_rtr_options; @@ -457,8 +457,8 @@ typedef struct _osm_subn_opt * qos_options * Default set of QoS options * -* qos_hca_options -* QoS options for HCA ports +* qos_ca_options +* QoS options for CA ports * * qos_sw0_options * QoS options for switches' port 0 Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 9351) +++ opensm/osm_subnet.c (working copy) @@ -495,7 +495,7 @@ osm_subn_set_default_opt( p_opt->updn_guid_file = NULL; p_opt->exit_on_fatal = TRUE; subn_set_default_qos_options(&p_opt->qos_options); - subn_set_default_qos_options(&p_opt->qos_hca_options); + subn_set_default_qos_options(&p_opt->qos_ca_options); subn_set_default_qos_options(&p_opt->qos_sw0_options); subn_set_default_qos_options(&p_opt->qos_swe_options); subn_set_default_qos_options(&p_opt->qos_rtr_options); @@ -737,8 +737,8 @@ osm_subn_rescan_conf_file( subn_parse_qos_options("qos", p_key, p_val, &p_opts->qos_options); - subn_parse_qos_options("qos_hca", - p_key, p_val, &p_opts->qos_hca_options); + subn_parse_qos_options("qos_ca", + p_key, p_val, &p_opts->qos_ca_options); subn_parse_qos_options("qos_sw0", p_key, p_val, &p_opts->qos_sw0_options); @@ -967,8 +967,8 @@ osm_subn_parse_conf_file( subn_parse_qos_options("qos", p_key, p_val, &p_opts->qos_options); - subn_parse_qos_options("qos_hca", - p_key, p_val, &p_opts->qos_hca_options); + subn_parse_qos_options("qos_ca", + p_key, p_val, &p_opts->qos_ca_options); subn_parse_qos_options("qos_sw0", p_key, p_val, &p_opts->qos_sw0_options); @@ -1211,7 +1211,7 @@ osm_subn_write_conf_file( "QoS default options", "qos", &p_opts->qos_options); fprintf(opts_file, "\n"); subn_dump_qos_options(opts_file, - "QoS HCA options", "qos_hca", &p_opts->qos_hca_options); + "QoS CA options", "qos_ca", &p_opts->qos_ca_options); fprintf(opts_file, "\n"); subn_dump_qos_options(opts_file, "QoS Switch Port 0 options", "qos_sw0", &p_opts->qos_sw0_options); Index: opensm/osm_qos.c =================================================================== --- opensm/osm_qos.c (revision 9347) +++ opensm/osm_qos.c (working copy) @@ -318,7 +318,7 @@ static ib_api_status_t qos_physp_setup(o osm_signal_t osm_qos_setup(osm_opensm_t * p_osm) { - struct qos_config hca_config, sw0_config, swe_config, rtr_config; + struct qos_config ca_config, sw0_config, swe_config, rtr_config; struct qos_config *cfg; osm_switch_t *p_sw; ib_switch_info_t *p_si; @@ -336,7 +336,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t OSM_LOG_ENTER(&p_osm->log, osm_qos_setup); - qos_build_config(&hca_config, &p_osm->subn.opt.qos_hca_options, + qos_build_config(&ca_config, &p_osm->subn.opt.qos_ca_options, &p_osm->subn.opt.qos_options); qos_build_config(&sw0_config, &p_osm->subn.opt.qos_sw0_options, &p_osm->subn.opt.qos_options); @@ -376,7 +376,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t else if (node_type == IB_NODE_TYPE_ROUTER) cfg = &rtr_config; else - cfg = &hca_config; + cfg = &ca_config; p_physp = osm_port_get_default_phys_ptr(p_port); if (!osm_physp_is_valid(p_physp)) From jim.ryan at intel.com Mon Sep 11 07:22:28 2006 From: jim.ryan at intel.com (Ryan, Jim) Date: Mon, 11 Sep 2006 07:22:28 -0700 Subject: [openib-general] [openfabrics-ewg] Goodbye and Transition Message-ID: Shawn, thanks for the note and best of luck at Microsoft. I suggest we take Shawn's recommendation and ask Jamie to continue Shawn's leadership of the EWG. Jim -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Shawn Hansen (shahanse) Sent: Friday, September 08, 2006 5:29 PM To: OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: [openfabrics-ewg] Goodbye and Transition All, FYI: I've decided to relocate my family to Seattle, and will be leaving Cisco. I plan to join Microsoft's Server and Tools division at the end of this month. I would like to recommend Jamie Riotto, Senior Director of Engineering, as my EWG replacement. Jamie is responsible for all engineering for Cisco's Server Networking and Virtualization Business Unit, including Cisco's host driver and RDMA development efforts. Please stay in touch, and I wish the team the best. Regards, --Shawn ---------------------------- Shawn Hansen Director, Product Management Cisco Systems _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From rdreier at cisco.com Mon Sep 11 07:37:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Sep 2006 07:37:00 -0700 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <4505032B.3050706@voltaire.com> (Erez Zilber's message of "Mon, 11 Sep 2006 09:33:15 +0300") References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> Message-ID: There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig file. ISER only depends on INFINIBAND && SCSI. However it is easily possible to enable INFINIBAND and SCSI without enabling INET (in fact they can be enabled without NET as in the original config in this thread). iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it depends on, so this alone will result in a broken config. However nothing will enable INET (which I think you said iser depends on). So something like the below is required, I think. Although it would probably be better to make iser depend on INET (as ISCSI_TCP does) rather than selecting NET and INET. Toralf, can you confirm that applying this patch and doing make oldconfig and make with your original config works OK? Thanks, Roland diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig index fead87d..a122bb4 100644 --- a/drivers/infiniband/ulp/iser/Kconfig +++ b/drivers/infiniband/ulp/iser/Kconfig @@ -1,6 +1,8 @@ config INFINIBAND_ISER tristate "ISCSI RDMA Protocol" depends on INFINIBAND && SCSI + select NET + select INET select SCSI_ISCSI_ATTRS ---help--- Support for the ISCSI RDMA Protocol over InfiniBand. This From mst at mellanox.co.il Mon Sep 11 07:44:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 17:44:38 +0300 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> Message-ID: <20060911144438.GA13919@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' > > There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig > file. ISER only depends on INFINIBAND && SCSI. However it is easily > possible to enable INFINIBAND and SCSI without enabling INET (in fact > they can be enabled without NET as in the original config in this thread). > > iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it > depends on, so this alone will result in a broken config. However > nothing will enable INET (which I think you said iser depends on). So > something like the below is required, I think. Although it would > probably be better to make iser depend on INET (as ISCSI_TCP does) > rather than selecting NET and INET. Maybe just make iser depend on CMA since that is what it really needs? -- MST From rdreier at cisco.com Mon Sep 11 07:52:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Sep 2006 07:52:38 -0700 Subject: [openib-general] [openfabrics-ewg] is there a plan for getting SDP into kernel.org? In-Reply-To: (Scott Weitzenkamp's message of "Sun, 10 Sep 2006 22:47:24 -0700") References: Message-ID: Scott> I would like to see netstat support, zcopy support, and Scott> ideally AIO support get added first... Better to merge first and then add features I think. - R. From rdreier at cisco.com Mon Sep 11 07:54:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Sep 2006 07:54:33 -0700 Subject: [openib-general] [PATCH 5/5] IB/iser: Do not use FMR for a single dma entry sg In-Reply-To: (Erez Zilber's message of "Mon, 11 Sep 2006 12:26:33 +0300 (IDT)") References: Message-ID: Thanks, applied 1-5 with this minor fix for a compile warning: --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -427,9 +427,9 @@ int iser_reg_rdma_mem(struct iscsi_iser_ iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n", ib_conn->page_vec->data_size, ib_conn->page_vec->length, ib_conn->page_vec->offset); - for (i=0 ; ipage_vec->length ; i++) { - iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]); - } + for (i=0 ; ipage_vec->length ; i++) + iser_err("page_vec[%d] = 0x%llx\n", i, + (unsigned long long) ib_conn->page_vec->pages[i]); return err; } } From erezz at voltaire.com Mon Sep 11 08:19:05 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 18:19:05 +0300 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> Message-ID: <45057E69.6040503@voltaire.com> Roland Dreier wrote: > There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig > file. ISER only depends on INFINIBAND && SCSI. However it is easily > possible to enable INFINIBAND and SCSI without enabling INET (in fact > they can be enabled without NET as in the original config in this thread). > > iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it > depends on, so this alone will result in a broken config. However > nothing will enable INET (which I think you said iser depends on). So > something like the below is required, I think. Although it would > probably be better to make iser depend on INET (as ISCSI_TCP does) > rather than selecting NET and INET. > > Let me make sure that I understand: If INET is disabled and we enable INFINIBAND, INFINIBAND_ADDR_TRANS will not be enabled (because INET is disbaled). This results in the scenario that Toralf is in. If this is correct, I agree with your patch. Thanks Erez From erezz at voltaire.com Mon Sep 11 08:19:57 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 11 Sep 2006 18:19:57 +0300 Subject: [openib-general] [PATCH 5/5] IB/iser: Do not use FMR for a single dma entry sg In-Reply-To: References: Message-ID: <45057E9D.7030502@voltaire.com> Roland Dreier wrote: > Thanks, applied 1-5 with this minor fix for a compile warning: > > --- a/drivers/infiniband/ulp/iser/iser_memory.c > +++ b/drivers/infiniband/ulp/iser/iser_memory.c > @@ -427,9 +427,9 @@ int iser_reg_rdma_mem(struct iscsi_iser_ > iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n", > ib_conn->page_vec->data_size, ib_conn->page_vec->length, > ib_conn->page_vec->offset); > - for (i=0 ; ipage_vec->length ; i++) { > - iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]); > - } > + for (i=0 ; ipage_vec->length ; i++) > + iser_err("page_vec[%d] = 0x%llx\n", i, > + (unsigned long long) ib_conn->page_vec->pages[i]); > return err; > } > } > OK, thanks. From rdreier at cisco.com Mon Sep 11 08:24:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Sep 2006 08:24:18 -0700 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: <45057E69.6040503@voltaire.com> (Erez Zilber's message of "Mon, 11 Sep 2006 18:19:05 +0300") References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> <45057E69.6040503@voltaire.com> Message-ID: Erez> Let me make sure that I understand: If INET is disabled and Erez> we enable INFINIBAND, INFINIBAND_ADDR_TRANS will not be Erez> enabled (because INET is disbaled). This results in the Erez> scenario that Toralf is in. If this is correct, I agree with Erez> your patch. Yes, that's right. - R. From Sujal at Mellanox.com Mon Sep 11 09:28:04 2006 From: Sujal at Mellanox.com (Sujal Das) Date: Mon, 11 Sep 2006 09:28:04 -0700 Subject: [openib-general] [openfabrics-ewg] Goodbye and Transition Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F31DE2D@mtiexch01.mti.com> Sounds like a good idea. Not sure if the EWG community knows Jamie (I do not, for example) - it might be a good idea if Jamie introduces himself, and specifically highlights his roles and contributions to OFA in the past and what his vision is for OFED and its adoption by OSVs, ISVs, HPC and enterprise customers. -Sujal -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Ryan, Jim Sent: Monday, September 11, 2006 7:22 AM To: Shawn Hansen (shahanse); OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: Re: [openfabrics-ewg] Goodbye and Transition Shawn, thanks for the note and best of luck at Microsoft. I suggest we take Shawn's recommendation and ask Jamie to continue Shawn's leadership of the EWG. Jim -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Shawn Hansen (shahanse) Sent: Friday, September 08, 2006 5:29 PM To: OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: [openfabrics-ewg] Goodbye and Transition All, FYI: I've decided to relocate my family to Seattle, and will be leaving Cisco. I plan to join Microsoft's Server and Tools division at the end of this month. I would like to recommend Jamie Riotto, Senior Director of Engineering, as my EWG replacement. Jamie is responsible for all engineering for Cisco's Server Networking and Virtualization Business Unit, including Cisco's host driver and RDMA development efforts. Please stay in touch, and I wish the team the best. Regards, --Shawn ---------------------------- Shawn Hansen Director, Product Management Cisco Systems _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From stan.smith at intel.com Mon Sep 11 09:27:28 2006 From: stan.smith at intel.com (Smith, Stan) Date: Mon, 11 Sep 2006 09:27:28 -0700 Subject: [openib-general] PXE + infiniband? Message-ID: Eli cohen wrote: > On Thu, 2006-09-07 at 08:19 +0100, Paul Baxter wrote: >>> There is an implementation of PXE for Mellanox's HCAs that can be >>> found here: http://sourceforge.net/forum/forum.php?forum_id=494529 >> >> Thanks for the tip >> >> I, too, am interested in this. >> >> Do you have a more direct link as I wandered around etherboot's >> project site and couldn't find anything IB-specific. >> >> Paul Baxter > Hi, > > Please use the following link > http://kent.dl.sourceforge.net/sourceforge/etherboot/etherboot-5.4.2.tar .bz2 > to download the package. Unpack the package and cd to the src dir. > Use an x86 arch machine to build the binaries. The infiniband drivers > are located at src/drivers/net/mlx_ipoib/ where you can find a readme > file in the doc directory. To build. > > cd src > make bin/MT23108.zrom // for MT230108 > make bin/MT25208.zrom > make bin/MT25218.zrom > > This covers all Mellanox HCAs. Please let me know if you need more > assistance. > A less involved solution is to use ROM-o-matic http://rom-o-matic.net/ . The Etherboot 5.4.2 image for MT23108 works nicely. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From jim.ryan at intel.com Mon Sep 11 09:34:34 2006 From: jim.ryan at intel.com (Ryan, Jim) Date: Mon, 11 Sep 2006 09:34:34 -0700 Subject: [openib-general] [openfabrics-ewg] Goodbye and Transition Message-ID: Sujal, yes, thanks, makes sense. I got a "no longer there" response from my earlier email, so Shawn won't be around to do a handoff Jim -----Original Message----- From: Sujal Das [mailto:Sujal at Mellanox.com] Sent: Monday, September 11, 2006 9:28 AM To: Ryan, Jim; Shawn Hansen (shahanse); OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: RE: [openfabrics-ewg] Goodbye and Transition Sounds like a good idea. Not sure if the EWG community knows Jamie (I do not, for example) - it might be a good idea if Jamie introduces himself, and specifically highlights his roles and contributions to OFA in the past and what his vision is for OFED and its adoption by OSVs, ISVs, HPC and enterprise customers. -Sujal -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Ryan, Jim Sent: Monday, September 11, 2006 7:22 AM To: Shawn Hansen (shahanse); OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: Re: [openfabrics-ewg] Goodbye and Transition Shawn, thanks for the note and best of luck at Microsoft. I suggest we take Shawn's recommendation and ask Jamie to continue Shawn's leadership of the EWG. Jim -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Shawn Hansen (shahanse) Sent: Friday, September 08, 2006 5:29 PM To: OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: [openfabrics-ewg] Goodbye and Transition All, FYI: I've decided to relocate my family to Seattle, and will be leaving Cisco. I plan to join Microsoft's Server and Tools division at the end of this month. I would like to recommend Jamie Riotto, Senior Director of Engineering, as my EWG replacement. Jamie is responsible for all engineering for Cisco's Server Networking and Virtualization Business Unit, including Cisco's host driver and RDMA development efforts. Please stay in touch, and I wish the team the best. Regards, --Shawn ---------------------------- Shawn Hansen Director, Product Management Cisco Systems _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From sweitzen at cisco.com Mon Sep 11 09:38:35 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 11 Sep 2006 09:38:35 -0700 Subject: [openib-general] [openfabrics-ewg] is there a plan for getting SDP into kernel.org? Message-ID: > Scott> I would like to see netstat support, zcopy support, and > Scott> ideally AIO support get added first... > > Better to merge first and then add features I think. > > - R. > How about just adding netstat before the merge, so we have some visibility into what SDP connections are in use? Scott From mshefty at ichips.intel.com Mon Sep 11 10:11:06 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 10:11:06 -0700 Subject: [openib-general] [PATCH v3] ib_sa: require SA registration In-Reply-To: References: <000501c6c57b$2594dd00$8698070a@amr.corp.intel.com> <44F38370.7050809@ichips.intel.com> Message-ID: <450598AA.1070003@ichips.intel.com> Roland Dreier wrote: > I haven't really read the later patches but I am planning on merging > at least the registration stuff for 2.6.19. I'd like to commit the SA related patches soon. There have been several e-mails recently about using IB multicast and the IB CM directly. - Sean From mshefty at ichips.intel.com Mon Sep 11 10:18:08 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 10:18:08 -0700 Subject: [openib-general] Wrong byte order in lid of struct ibv_port_attr reported by ibv_query port!? In-Reply-To: References: Message-ID: <45059A50.1030405@ichips.intel.com> Bub Thomas wrote: > with the help of your modified cmpost.c example I found out that the > byte order in the lid your query_for_path in cmpost.c is getting into > the ib_sa_path_rec is the opposite to the one reported by ibv_query_port. The path record defines all fields in network-byte order. The verb calls use host-byte order. Typically, the path record information will come directly from the SA, which defines the fields in network-byte order, which is why it isn't converted to host-order. - Sean From mst at mellanox.co.il Mon Sep 11 10:28:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 20:28:24 +0300 Subject: [openib-general] [PATCH v3] ib_sa: require SA registration In-Reply-To: References: Message-ID: <20060911172824.GB15556@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH v3] ib_sa: require SA registration > > Sean> Roland, Not sure if you've had a chance to review the SA > Sean> patches, but any comments on any of the SA related patches? > Sean> (SA registration, generic RMPP query support, or userspace > Sean> SA) > > I haven't really read the later patches but I am planning on merging > at least the registration stuff for 2.6.19. Yes, the registration stuff is clearly safe -- MST From rdreier at cisco.com Mon Sep 11 10:30:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Sep 2006 10:30:58 -0700 Subject: [openib-general] [openfabrics-ewg] is there a plan for getting SDP into kernel.org? In-Reply-To: (Scott Weitzenkamp's message of "Mon, 11 Sep 2006 09:38:35 -0700") References: Message-ID: Scott> How about just adding netstat before the merge, so we have Scott> some visibility into what SDP connections are in use? That's fine. Merging upstream is somewhat long-term anyway, since Michael has not even posted a first candidate for review -- I expect SDP will require several go-arounds to get merged. - R. From sweitzen at cisco.com Mon Sep 11 10:34:25 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 11 Sep 2006 10:34:25 -0700 Subject: [openib-general] [openfabrics-ewg] is there a plan for getting SDP into kernel.org? Message-ID: > Scott> How about just adding netstat before the merge, so we have > Scott> some visibility into what SDP connections are in use? > > That's fine. Merging upstream is somewhat long-term anyway, since > Michael has not even posted a first candidate for review -- I expect > SDP will require several go-arounds to get merged. > > - R. Michael, when do you expect to post a first candidate for review? Scott From mshefty at ichips.intel.com Mon Sep 11 10:38:47 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 10:38:47 -0700 Subject: [openib-general] RDMA CMA and C++ In-Reply-To: <45003711.3040108@dev.mellanox.co.il> References: <1157640982.20399.5.camel@trinity.ogc.int> <45003711.3040108@dev.mellanox.co.il> Message-ID: <45059F27.8050805@ichips.intel.com> Dotan Barak wrote: >>The user-mode cm header files don't have the C++ stuff to identify all >>the declarations as C. The verbs.h file has it and works fine if you >>wanted to copy it, but all you really need is ... >> > Sean, please add those definitions to the libibcm header as well. I've updated the libibcm and librdmacm header files. Thanks. - Sean From mshefty at ichips.intel.com Mon Sep 11 10:44:46 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 10:44:46 -0700 Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases. In-Reply-To: <20060910111145.GA12111@mellanox.co.il> References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com> <20060910111145.GA12111@mellanox.co.il> Message-ID: <4505A08E.5000705@ichips.intel.com> Michael S. Tsirkin wrote: >>cma_connect_ib leaks an struct ib_cm_id* in failure cases. >> >>Signed-off-by: Krishna Kumar > > > This one looks like it might be good for 2.6.18. Sean? The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a second call is not made to rdma_connect after the first call fails. So we're probably safe deferring this until 2.6.19, unless someone has code which calls rdma_connect twice. - Sean From toralf.foerster at gmx.de Mon Sep 11 10:45:59 2006 From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=) Date: Mon, 11 Sep 2006 19:45:59 +0200 Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id' In-Reply-To: References: <200609071902.57379.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> Message-ID: <200609111946.03315.toralf.foerster@gmx.de> Yep, that patch fixes the bug :-) Thanks Am Monday 11 September 2006 16:37 schrieb Roland Dreier: > There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig > file. ISER only depends on INFINIBAND && SCSI. However it is easily > possible to enable INFINIBAND and SCSI without enabling INET (in fact > they can be enabled without NET as in the original config in this thread). > > iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it > depends on, so this alone will result in a broken config. However > nothing will enable INET (which I think you said iser depends on). So > something like the below is required, I think. Although it would > probably be better to make iser depend on INET (as ISCSI_TCP does) > rather than selecting NET and INET. > > Toralf, can you confirm that applying this patch and doing make > oldconfig and make with your original config works OK? > > Thanks, > Roland > > diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig > index fead87d..a122bb4 100644 > --- a/drivers/infiniband/ulp/iser/Kconfig > +++ b/drivers/infiniband/ulp/iser/Kconfig > @@ -1,6 +1,8 @@ > config INFINIBAND_ISER > tristate "ISCSI RDMA Protocol" > depends on INFINIBAND && SCSI > + select NET > + select INET > select SCSI_ISCSI_ATTRS > ---help--- > Support for the ISCSI RDMA Protocol over InfiniBand. This > > -- MfG/Sincerely Toralf Förster -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From mshefty at ichips.intel.com Mon Sep 11 10:50:31 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 10:50:31 -0700 Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases. In-Reply-To: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com> References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com> Message-ID: <4505A1E7.1060007@ichips.intel.com> Krishna Kumar wrote: > cma_connect_ib leaks an struct ib_cm_id* in failure cases. Thanks - committed. - Sean From stephanieh at owenmedia.com Mon Sep 11 10:52:19 2006 From: stephanieh at owenmedia.com (Stephanie Howard) Date: Mon, 11 Sep 2006 10:52:19 -0700 Subject: [openib-general] InfiniBand DevCon Conference Message-ID: Hello, Attached is the final reminder for InfinBand DevCon conference. If you have any questions, please let me know. Thank you, Stephanie Stephanie Howard Owen Media 206.322.1167 ext. 102 StephanieH at owenmedia.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: InfiniBand DevCon Blast Final Blast - OFA.doc Type: application/msword Size: 32256 bytes Desc: InfiniBand DevCon Blast Final Blast - OFA.doc URL: From mst at mellanox.co.il Mon Sep 11 10:53:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 20:53:12 +0300 Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases. In-Reply-To: <4505A08E.5000705@ichips.intel.com> References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com> <20060910111145.GA12111@mellanox.co.il> <4505A08E.5000705@ichips.intel.com> Message-ID: <20060911175312.GC15556@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] cma_connect_ib leaks memory in failure cases. > > Michael S. Tsirkin wrote: > >>cma_connect_ib leaks an struct ib_cm_id* in failure cases. > >> > >>Signed-off-by: Krishna Kumar > > > > > > This one looks like it might be good for 2.6.18. Sean? > > The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a > second call is not made to rdma_connect after the first call fails. So we're > probably safe deferring this until 2.6.19, unless someone has code which calls > rdma_connect twice. SDP can do this I think. -- MST From john.blackwood at ccur.com Mon Sep 11 10:53:47 2006 From: john.blackwood at ccur.com (John Blackwood) Date: Mon, 11 Sep 2006 13:53:47 -0400 Subject: [openib-general] ib_madeye kfree() problem on module unload Message-ID: <4505A2AB.3000603@ccur.com> When using OFED-1.1-rc3 on a x86_64 system running a 2.6.17.3 debug kernel in a RHEL4 U2 environment, I see the follwing console warning messages when I unload the ib_madeye kernel module: modprobe ib_madeye modprobe -r ib_madeye ------------------------------------ console messages ------------------------------------ slab error in cache_free_debugcheck(): cache `size-32': double free, or memory outside object was overwritten Call Trace: {__slab_error+36} {cache_free_debugcheck+365} {kfree+136} {:ib_madeye:madeye_remove_one+123} {:ib_core:ib_unregister_client+75} {:ib_madeye:ib_madeye_cleanup+16} {sys_delete_module+446} {tracesys+113} {tracesys+209} ffff81007834bd48: redzone 1:0x170fc2a5, redzone 2:0xffff8100400929c8 From mshefty at ichips.intel.com Mon Sep 11 11:35:51 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 11:35:51 -0700 Subject: [openib-general] [PATCH] Modify callers of cma_get_net_info for better error handling. In-Reply-To: <20060908051301.5221.63041.sendpatchset@K50wks273895wss.in.ibm.com> References: <20060908051301.5221.63041.sendpatchset@K50wks273895wss.in.ibm.com> Message-ID: <4505AC87.4070309@ichips.intel.com> Krishna Kumar wrote: > Re-organize code relating to cma_get_net_info() and rdma_create_id() to > optimize error case handling (no need to alloc memory/etc as part of > rdma_create_id() if input parameters are wrong). Thanks! Committed with a minor adjustment to rename 'out' label 'err'. - Sean From mshefty at ichips.intel.com Mon Sep 11 11:52:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 11:52:24 -0700 Subject: [openib-general] [PATCH] Optimize cma_process_remove() In-Reply-To: <20060908051439.5229.71544.sendpatchset@K50wks273895wss.in.ibm.com> References: <20060908051439.5229.71544.sendpatchset@K50wks273895wss.in.ibm.com> Message-ID: <4505B068.8050501@ichips.intel.com> Krishna Kumar wrote: > static void cma_process_remove(struct cma_device *cma_dev) > { > struct list_head remove_list; > - struct rdma_id_private *id_priv; > + struct rdma_id_private *id_priv, *tmp; > int ret; > > INIT_LIST_HEAD(&remove_list); > @@ -2344,22 +2344,20 @@ static void cma_process_remove(struct cm > > if (cma_internal_listen(id_priv)) { > cma_destroy_listen(id_priv); > - continue; > + } else { > + list_del(&id_priv->list); > + list_add_tail(&id_priv->list, &remove_list); > } > + } > + mutex_unlock(&lock); > > - list_del(&id_priv->list); > - list_add_tail(&id_priv->list, &remove_list); > + list_for_each_entry_safe(id_priv, tmp, &remove_list, list) { > atomic_inc(&id_priv->refcount); > - mutex_unlock(&lock); > - I don't think that this will work. The issue is that we need to walk a list of IDs associated with a particular device to notify the user that the device is being removed. While we're doing that, the user could try to destroy the ID, which removes the ID from the device list. The original code takes a reference on the ID before removing it from the from cma_dev's list to ensure that the ID will be valid while we process it. The remove list ensures that the user is only notified once of a device removal. (We don't know where the thread calling rdma_destroy_id() is at.) We can eliminate the remove_list by calling list_del_init(). - Sean From jriotto at cisco.com Mon Sep 11 12:26:10 2006 From: jriotto at cisco.com (Jamie Riotto (jriotto)) Date: Mon, 11 Sep 2006 12:26:10 -0700 Subject: [openib-general] [openfabrics-ewg] Goodbye and Transition Message-ID: <944AD9DA9232E346ADF590C41BFFEC410294E3FF@xmb-sjc-232.amer.cisco.com> Hi everyone. Just wanted to respond and say I'm on the alias, and will prepare a small statement in line with what has been asked below. I should be able to get this out in a day or two. Looking forward to working with you all. Cheers - jamie Jamie Riotto Sr. Director Engineering Server Virtualization Business Unit (SVBU) Cisco Communications 408-853-7813 jriotto at cisco.com -----Original Message----- From: Ryan, Jim [mailto:jim.ryan at intel.com] Sent: Monday, September 11, 2006 9:35 AM To: Sujal Das; OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: RE: [openfabrics-ewg] Goodbye and Transition Sujal, yes, thanks, makes sense. I got a "no longer there" response from my earlier email, so Shawn won't be around to do a handoff Jim -----Original Message----- From: Sujal Das [mailto:Sujal at Mellanox.com] Sent: Monday, September 11, 2006 9:28 AM To: Ryan, Jim; Shawn Hansen (shahanse); OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: RE: [openfabrics-ewg] Goodbye and Transition Sounds like a good idea. Not sure if the EWG community knows Jamie (I do not, for example) - it might be a good idea if Jamie introduces himself, and specifically highlights his roles and contributions to OFA in the past and what his vision is for OFED and its adoption by OSVs, ISVs, HPC and enterprise customers. -Sujal -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Ryan, Jim Sent: Monday, September 11, 2006 7:22 AM To: Shawn Hansen (shahanse); OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: Re: [openfabrics-ewg] Goodbye and Transition Shawn, thanks for the note and best of luck at Microsoft. I suggest we take Shawn's recommendation and ask Jamie to continue Shawn's leadership of the EWG. Jim -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Shawn Hansen (shahanse) Sent: Friday, September 08, 2006 5:29 PM To: OpenFabricsEWG; openib-general at openib.org Cc: Jamie Riotto (jriotto) Subject: [openfabrics-ewg] Goodbye and Transition All, FYI: I've decided to relocate my family to Seattle, and will be leaving Cisco. I plan to join Microsoft's Server and Tools division at the end of this month. I would like to recommend Jamie Riotto, Senior Director of Engineering, as my EWG replacement. Jamie is responsible for all engineering for Cisco's Server Networking and Virtualization Business Unit, including Cisco's host driver and RDMA development efforts. Please stay in touch, and I wish the team the best. Regards, --Shawn ---------------------------- Shawn Hansen Director, Product Management Cisco Systems _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From mst at mellanox.co.il Mon Sep 11 12:29:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Sep 2006 22:29:09 +0300 Subject: [openib-general] CMA issue: bind selects the same port after close Message-ID: <20060911192909.GA16667@mellanox.co.il> We have encountered an issue in CMA: if I bind to port 0, destroy the id, then bind to port 0 again I often get back the same port from both binds. TCP behaves differently - it seems to assign new port numbers each time. This is an issue for some socket programs that assume that the same port number won't be reused to a remote side that connects to the same port after I have closed by socket will get connection refused message. I also see applications looking for a port number that matches some rule by repeating the create/bind/close cycle. With CMA they always get back the same port number it seems. Is this something that can be fixed in CMA? Thanks, -- MST From rdreier at cisco.com Mon Sep 11 14:06:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Sep 2006 14:06:56 -0700 Subject: [openib-general] [PATCH v3] ib_sa: require SA registration In-Reply-To: <000501c6c57b$2594dd00$8698070a@amr.corp.intel.com> (Sean Hefty's message of "Mon, 21 Aug 2006 16:40:12 -0700") References: <000501c6c57b$2594dd00$8698070a@amr.corp.intel.com> Message-ID: OK, I added the following to my for-2.6.19 branch. The differences from your patch are: - CMA can have a static variable (good to avoid clashes with a global 'sa_client' variable name too) - IPoIB does not use multicast module upstream, fix ipoib_multicast.c too. - Simplify sa_query.c changes a little. I don't like the "deref_client" name for a function, since it sounds too much like dereferencing a pointer rather than dropping a reference. And I also didn't like ib_sa_client_get() having a magic side effect of setting query->client. So I just open-coded more stuff. How does this look? - R. From sean.hefty at intel.com Mon Sep 11 14:21:14 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 14:21:14 -0700 Subject: [openib-general] [PATCH v3] ib_sa: require SA registration In-Reply-To: Message-ID: <000001c6d5e8$3662a040$a4d0180a@amr.corp.intel.com> > - CMA can have a static variable (good to avoid clashes with a global > 'sa_client' variable name too) Sounds good - that's a goof on my part. > - IPoIB does not use multicast module upstream, fix ipoib_multicast.c too. Okay - As an FYI, I will probably submit the multicast module upstream for 2.6.20, along with some sort of support for userspace access. > - Simplify sa_query.c changes a little. I don't like the > "deref_client" name for a function, since it sounds too much like > dereferencing a pointer rather than dropping a reference. And I > also didn't like ib_sa_client_get() having a magic side effect of > setting query->client. So I just open-coded more stuff. Those changes sound fine to me. - Sean From mshefty at ichips.intel.com Mon Sep 11 15:07:56 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 15:07:56 -0700 Subject: [openib-general] CMA issue: bind selects the same port after close In-Reply-To: <20060911192909.GA16667@mellanox.co.il> References: <20060911192909.GA16667@mellanox.co.il> Message-ID: <4505DE3C.7090205@ichips.intel.com> Michael S. Tsirkin wrote: > We have encountered an issue in CMA: if > I bind to port 0, destroy the id, then bind to port 0 again > I often get back the same port from both binds. > > TCP behaves differently - it seems to assign new port numbers > each time. > This is an issue for some socket programs that assume that > the same port number won't be reused to a remote side that > connects to the same port after I have closed by socket will get > connection refused message. > I also see applications looking for a port number that matches > some rule by repeating the create/bind/close cycle. > With CMA they always get back the same port number it seems. > > Is this something that can be fixed in CMA? I think we can fix this without a huge impact. Is there anything that states the way bind is supposed to behave wrt this? Is there some delay between releasing a port and it being re-used that needs to be taken into account? The basic problem in the CMA is in cma_alloc_port(). If the port number (passed in as snum) is 0, the first available port starting at sysctl_local_port_range[0] is used. We could instead start our search by adding an increasing counter or a random value to the lower-end of the port range. Then expand the code to handle searching below our starting value if we failed to find one above it. Are the port numbers assigned by TCP sequential or more random? - Sean From mst at mellanox.co.il Mon Sep 11 15:16:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 01:16:33 +0300 Subject: [openib-general] CMA issue: bind selects the same port after close In-Reply-To: <4505DE3C.7090205@ichips.intel.com> References: <4505DE3C.7090205@ichips.intel.com> Message-ID: <20060911221633.GB17098@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] CMA issue: bind selects the same port after close > > Michael S. Tsirkin wrote: > > We have encountered an issue in CMA: if > > I bind to port 0, destroy the id, then bind to port 0 again > > I often get back the same port from both binds. > > > > TCP behaves differently - it seems to assign new port numbers > > each time. > > This is an issue for some socket programs that assume that > > the same port number won't be reused to a remote side that > > connects to the same port after I have closed by socket will get > > connection refused message. > > I also see applications looking for a port number that matches > > some rule by repeating the create/bind/close cycle. > > With CMA they always get back the same port number it seems. > > > > Is this something that can be fixed in CMA? > > I think we can fix this without a huge impact. Is there anything that states > the way bind is supposed to behave wrt this? I don't think so. But since that's how it works on linux and other systems, apps assume this. > Is there some delay between > releasing a port and it being re-used that needs to be taken into account? TCP keeps port busy while in timewait state, unless REUSEADDR is given. I have not yet seen any app rely on this, so it might not be important to emulate this. > The basic problem in the CMA is in cma_alloc_port(). If the port number (passed > in as snum) is 0, the first available port starting at > sysctl_local_port_range[0] is used. We could instead start our search by > adding an increasing counter or a random value to the lower-end of the port > range. Then expand the code to handle searching below our starting value if we > failed to find one above it. Sounds good. > Are the port numbers assigned by TCP sequential or more random? TCP ports seem to be sequential. -- MST From mshefty at ichips.intel.com Mon Sep 11 15:20:32 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 11 Sep 2006 15:20:32 -0700 Subject: [openib-general] [PATCH] IB/cma: add rdma_establish In-Reply-To: <20060907214524.GA14791@mellanox.co.il> References: <20060907214524.GA14791@mellanox.co.il> Message-ID: <4505E130.8010301@ichips.intel.com> Michael S. Tsirkin wrote: > Sean, did we decide what to do for upstream yet? > I would say we need something like the below for 2.6.19 too > (probably just need to update node type check). > And, I like it that this approach leaves all matters of policy > to users (such as whether move QP to RTS after asynchronous event > or after completion event). I will go with a patch similar to this one. It seems the most flexible. > As a side note, reasons for frequent loss of RTU must be investigated. A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU never showing up? I will look into the ib_cm and see if there's an issue that would cause an RTU not to be retried. - Sean From mst at mellanox.co.il Mon Sep 11 15:29:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 01:29:56 +0300 Subject: [openib-general] [PATCH] IB/cma: add rdma_establish In-Reply-To: <4505E130.8010301@ichips.intel.com> References: <20060907214524.GA14791@mellanox.co.il> <4505E130.8010301@ichips.intel.com> Message-ID: <20060911222956.GD17098@mellanox.co.il> Quoting r. Sean Hefty : > > As a side note, reasons for frequent loss of RTU must be investigated. > > A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU > never showing up? Seems like that. I know fir sure I do accept after REP but remote side never gets ESTABLISHED. > I will look into the ib_cm and see if there's an issue that > would cause an RTU not to be retried. -- MST From mst at mellanox.co.il Mon Sep 11 15:52:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 01:52:56 +0300 Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs (was Fwd: Re: [PATCH ] RFC IB/cm do not track remote QPN in timewait state) In-Reply-To: <20060829130908.GA24322@mellanox.co.il> References: <20060829130908.GA24322@mellanox.co.il> Message-ID: <20060911225256.GE17098@mellanox.co.il> Roland, all, we plan to implement the timewait handling in mthca in time for 2.6.19: For all connected QPs: - upon QP destroy or move from RTS to reset/error, start timer for the duration of packet lifetime - until packet expires, do not reuse this QPN This must be done to prevent stale packets from corruptiing the new connection (see 9.7.1). Could you pls let me know if this approach looks sane to you? This approach has a number of advantages over attempting to implement same in CM on top of verbs by not destroying the QP: - Reduce resource usage by freeing the QP (only track QPN+timer) - Applies to all verbs users even if they bypass CM - Solves problem for userspace CM where we can't rely on CM to enforce timewait More detail can be found in thread I'm replying to. Please comment. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From rdreier at cisco.com Mon Sep 11 16:25:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Sep 2006 16:25:13 -0700 Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs In-Reply-To: <20060911225256.GE17098@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 12 Sep 2006 01:52:56 +0300") References: <20060829130908.GA24322@mellanox.co.il> <20060911225256.GE17098@mellanox.co.il> Message-ID: My gut reaction is that it seems pretty ugly. I guess we'll also need similar patches for ipath and ehca too -- which makes me want to have this in common code somehow. Also timewait is really only part of the CM spec -- do we want to limit the rate of RC QP creation in general for potential non-CM users that know what they're doing? I'm not sure the following is a real concern (since a hostile user can currently just create a ton of QPs and hold onto them forever), but this also allows someone to create a bunch of QPs with a super-long timeout and prevent any other QPs from being created for a few hours (until the timewait expires). Finally one implementation comment: I think you'll want a list in addition to QPN + timer, to allow the ib_mthca module to be unloaded without having to wait an hour for all timers to expire. This allows timewait to be bypassed by unloading + reloading but that's no different than rebooting really. Another good prophylactic measure would probably to randomize initial PSNs for RC connections. SRP currently does this. - R. From mst at mellanox.co.il Mon Sep 11 16:37:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 02:37:40 +0300 Subject: [openib-general] 4 patches in mst-for-2.6.19 Message-ID: <20060911233740.GA19021@mellanox.co.il> I have put the following patches in my mst-for-2.6.10 tree: $git log --pretty=short origin..mst-for-2.6.19 commit ddfe6867088167b64962399934d21cf3e37c338b Author: Jack Morgenstein [PATCH] IB/mthca: recover from device errors commit 4403ad431b139b03a291263be4686363fd04138b Author: Michael S. Tsirkin [PATCH] IB/cm: do not track remote QPN in timewait state commit 12f4b3b6fabcccf96ca0fa9911e86c1a6d9fc7a4 Author: Ishai Rabinovitz [PATCH] IB/srp: don't schedule reconnect from srp, scsi does it for us commit a6f9624098dada22825d116d104c92bfd34465b2 Author: Ishai Rabinovitz [PATCH] IB/srp: destroy and re-create QP and CQ on reconnect You can get them here git://www.mellanox.co.il/~git/infiniband mst-for-2.6.19 This is against Roland's for-2.6.19 001c6b9030233a14fa27795ab3e6a6f45f16a317 These patches have been posted on the list previously, but let me know and I'll repost them if needed. Please comment. -- MST From mst at mellanox.co.il Mon Sep 11 16:54:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 02:54:46 +0300 Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs In-Reply-To: References: Message-ID: <20060911235446.GB19021@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: RFC: mthca: implement timewait by tracking QPNs > > My gut reaction is that it seems pretty ugly. Hmm. All of it or just some bits? > I guess we'll also need > similar patches for ipath and ehca too -- which makes me want to have > this in common code somehow. Could be a library function in core so that ipath etc can reuse it. But note how there's no dependency between drivers here - no reason to block change in mthca until ipath/ehca implement this functionality, too. > Also timewait is really only part of the CM spec Not entirely corect. Please look at 9.7.1 - search for "stale packets": In addition to duplicate packets and invalid packets, there is a third condition, called a Stale Packet (.TIME WAIT packet.). If a connection to a responder is torn down and a new connection is established while packets are in flight, a packet from the old (stale) connection may arrive at the responder. The responder, in turn, may interpret this stale incoming packet as a valid packet, when in fact it is a remnant of a previous connection. There are no transport layer mechanisms to guard against this condition; it is the responsibility of connection management to avoid re-using QPs until there is no possibility that a stale packet could arrive at the responder. This is done by placing the requester and responder QPs in a .Time Wait. state long enough to ensure that any stale packets left in the fabric have expired before re-using those QPs. So the spec suggests that timewait be implemented in CM, but timewait is needed to solve a problem that affects the transport layer and that is described in Chapter 9. > -- do we want to > limit the rate of RC QP creation in general for potential non-CM users > that know what they're doing? I don't see how this limits the rate of QP creation. Could you explain? Second, there's no way I can see verbs user can check there no stale packets (AK TimeWait packets). Is there? So user only *thinks* he knows what he's doing, meanwhile getting silen data corruption. Correct? > I'm not sure the following is a real concern (since a hostile user can > currently just create a ton of QPs and hold onto them forever), but > this also allows someone to create a bunch of QPs with a super-long > timeout and prevent any other QPs from being created for a few hours > (until the timewait expires). Another reason why this might not be an issue is that the QPN space is reasonably big - 2^24. I guess when we start looking at limiting #of QPs per user, we'll need to limit the max legal packet lifetime too. Might be a good idea anyway. > Finally one implementation comment: I think you'll want a list in > addition to QPN + timer, to allow the ib_mthca module to be unloaded > without having to wait an hour for all timers to expire. This allows > timewait to be bypassed by unloading + reloading but that's no > different than rebooting really. Sure, that's obvious. > Another good prophylactic measure would probably to randomize initial > PSNs for RC connections. SRP currently does this. I agree this also helps. -- MST From rdreier at cisco.com Mon Sep 11 18:09:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Sep 2006 18:09:17 -0700 Subject: [openib-general] 4 patches in mst-for-2.6.19 In-Reply-To: <20060911233740.GA19021@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 12 Sep 2006 02:37:40 +0300") References: <20060911233740.GA19021@mellanox.co.il> Message-ID: OK, I applied [PATCH] IB/cm: do not track remote QPN in timewait state since Sean has acked that already. I'll review the rest in the next day or two. - R. From rjwalsh at pathscale.com Mon Sep 11 20:08:32 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Mon, 11 Sep 2006 20:08:32 -0700 Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs In-Reply-To: <20060911235446.GB19021@mellanox.co.il> References: <20060911235446.GB19021@mellanox.co.il> Message-ID: <450624B0.3010709@pathscale.com> > Could be a library function in core so that ipath etc can reuse it. > But note how there's no dependency between drivers here - no > reason to block change in mthca until ipath/ehca implement this functionality, > too. True. But FWIW, we (QLogic) could probably spin something like this pretty quickly anyway. Regards, Robert. From zhushisongzhu at yahoo.com Mon Sep 11 20:46:47 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 11 Sep 2006 20:46:47 -0700 (PDT) Subject: [openib-general] why sdp connections cost so much memory In-Reply-To: <20060911110524.GB11825@mellanox.co.il> Message-ID: <20060912034647.9016.qmail@web36909.mail.mud.yahoo.com> --- "Michael S. Tsirkin" wrote: > Quoting r. zhu shi song : > > Subject: Re: why sdp connections cost so much > memory > > > > > You should not need this change with the scale > patch > > > I posted - after applying > > > this, and setting the scale parameter to 0x1, > each > > > connection should use around > > > 128K for RX. Please confirm. > > Just setting the scale parameter to 0x1, memory > > reduction is OK. But there occurred one bug, > > sometimes my kernel crashed. > > Shouldn't happen. Backtrace? > > > So I think PRE POST buf > > size should be changed either. > > zhu > > Hmm. I don't really see how this would help. > Is it true that changing just the RX size fixes the > crashes for you? > If yes I'd like to investigate. > > -- > MST > (1) when changing RX_SIZE=0x4 and TX_SIZE=0x4, I ran my testbench for 30 times, there was no kernel crash. I found sdp worked more stably and fast when I changed RX and TX size. (2) when RX_SIZE=0x40 and TX_SIZE=0x40, I could just run my testbench for several times before kernel crashed. The result is very different for the two cases. zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From krkumar2 at in.ibm.com Mon Sep 11 21:27:50 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Tue, 12 Sep 2006 09:57:50 +0530 Subject: [openib-general] CMA issue: bind selects the same port after close In-Reply-To: <20060911221633.GB17098@mellanox.co.il> Message-ID: Hi Michael, > > The basic problem in the CMA is in cma_alloc_port(). If the port number (passed > > in as snum) is 0, the first available port starting at > > sysctl_local_port_range[0] is used. We could instead start our search by > > adding an increasing counter or a random value to the lower-end of the port > > range. Then expand the code to handle searching below our starting value if we > > failed to find one above it. > > Sounds good. > > > Are the port numbers assigned by TCP sequential or more random? > > TCP ports seem to be sequential. Are you getting sequential port numbers ? inet_csk_get_port() is actually using random number to get the *starting* value between sysctl_local_port_range[0] and sysctl_local_port_range[2]. Once it gets this starting number, it goes sequentially all the way to the high limit (sysctl*[1]) and then loops back from low (sysctl*[0]) limit until all the numbers in the middle are looked at. I think we can easily use the same logic. Sean's second option seems to be followed here "> > adding a random value to the lower-end of the port range" Thanks, - KK From krkumar2 at in.ibm.com Mon Sep 11 21:31:50 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Tue, 12 Sep 2006 10:01:50 +0530 Subject: [openib-general] [PATCH] Optimize cma_process_remove() In-Reply-To: <4505B068.8050501@ichips.intel.com> Message-ID: Hi Sean, > I don't think that this will work. The issue is that we need to walk a list of > IDs associated with a particular device to notify the user that the device is > being removed. While we're doing that, the user could try to destroy the ID, > which removes the ID from the device list. > > The original code takes a reference on the ID before removing it from the from > cma_dev's list to ensure that the ID will be valid while we process it. The > remove list ensures that the user is only notified once of a device removal. > (We don't know where the thread calling rdma_destroy_id() is at.) Yes, you are right - I missed the parallel rdma_destroy_id's. How about something like this then (it is cleaner than dropping/re-getting locks) : mutex_lock(&lock); while (!list_empty(&cma_dev->id_list)) { id_priv = list_entry(cma_dev->id_list.next, struct rdma_id_private, list); if (cma_internal_listen(id_priv)) { cma_destroy_listen(id_priv); } else { atomic_inc(&id_priv->refcount); list_del(&id_priv->list); list_add_tail(&id_priv->list, &remove_list); } } mutex_unlock(&lock); list_for_each_entry_safe(id_priv, tmp, &remove_list, list) { ret = cma_remove_id_dev(id_priv); cma_deref_id(id_priv); if (ret) rdma_destroy_id(&id_priv->id); } thanks, - KK From bugzilla-daemon at openib.org Mon Sep 11 21:54:43 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 11 Sep 2006 21:54:43 -0700 (PDT) Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20060912045443.27D7C2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 ------- Comment #2 from sweitzen at cisco.com 2006-09-11 21:54 ------- Cisco embedded SM on a switch, thus no SM on a host, only IB drivers. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sweitzen at cisco.com Mon Sep 11 22:55:48 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 11 Sep 2006 22:55:48 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status Message-ID: When will rc4 be available? I'd also like to suggest we not rush the final build, end of this week seems too soon. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Thursday, September 07, 2006 1:02 PM To: EWG Cc: openib Subject: [openfabrics-ewg] OFED 1.1 status Hi, OFED 1.1 RC4 will be published on Monday 11-Sep. We currently work on several showstoppers: 1. 223: mthca.so not properly linked to libibverbs - Vlad & Jack 2. 221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118 - Roland 3. 219: OFED 1.1rc3 contains prerelease unstable libibverbs code - Vlad & Jack Thus final release date will be delayed to end of next week Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Sep 11 23:01:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 09:01:52 +0300 Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors In-Reply-To: <20060912045443.27D7C2283D4@openib.ca.sandia.gov> References: <20060912045443.27D7C2283D4@openib.ca.sandia.gov> Message-ID: <20060912060152.GA14719@mellanox.co.il> Quoting r. bugzilla-daemon at openib.org : > Subject: [Bug 229] heavy CPU load can starve ib_mad thread on latest processors > > http://openib.org/bugzilla/show_bug.cgi?id=229 > > > > > > ------- Comment #2 from sweitzen at cisco.com 2006-09-11 21:54 ------- > Cisco embedded SM on a switch, thus no SM on a host, only IB drivers. Looks like we'll add the workaround for ofed. What renice level are you using? -- MST From sweitzen at cisco.com Mon Sep 11 23:02:33 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 11 Sep 2006 23:02:33 -0700 Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: I only tested with renice -20. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Monday, September 11, 2006 11:02 PM > To: Scott Weitzenkamp (sweitzen) > Cc: openib-general at openib.org > Subject: Re: [Bug 229] heavy CPU load can starve ib_mad > thread on latest processors > > Quoting r. bugzilla-daemon at openib.org : > > Subject: [Bug 229] heavy CPU load can starve ib_mad thread > on latest processors > > > > http://openib.org/bugzilla/show_bug.cgi?id=229 > > > > > > > > > > > > ------- Comment #2 from sweitzen at cisco.com 2006-09-11 21:54 ------- > > Cisco embedded SM on a switch, thus no SM on a host, only > IB drivers. > > Looks like we'll add the workaround for ofed. > What renice level are you using? > > -- > MST > From mst at mellanox.co.il Mon Sep 11 23:09:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 09:09:14 +0300 Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors In-Reply-To: References: Message-ID: <20060912060914.GC14719@mellanox.co.il> Hmm, OK. I'd like to figure out whether this could be something other than a scheduler issue. Could you test on kernel 2.6.18 or 2.6.17 please? If this is a scheduler issue, there's a chance scheduler is more fair there. Quoting r. Scott Weitzenkamp (sweitzen) : > Subject: RE: [Bug 229] heavy CPU load can starve ib_mad thread on latest processors > > I only tested with renice -20. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > > Sent: Monday, September 11, 2006 11:02 PM > > To: Scott Weitzenkamp (sweitzen) > > Cc: openib-general at openib.org > > Subject: Re: [Bug 229] heavy CPU load can starve ib_mad > > thread on latest processors > > > > Quoting r. bugzilla-daemon at openib.org : > > > Subject: [Bug 229] heavy CPU load can starve ib_mad thread > > on latest processors > > > > > > http://openib.org/bugzilla/show_bug.cgi?id=229 > > > > > > > > > > > > > > > > > > ------- Comment #2 from sweitzen at cisco.com 2006-09-11 21:54 ------- > > > Cisco embedded SM on a switch, thus no SM on a host, only > > IB drivers. > > > > Looks like we'll add the workaround for ofed. > > What renice level are you using? > > > > -- > > MST > > > -- MST From bugzilla-daemon at openib.org Mon Sep 11 23:14:17 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 11 Sep 2006 23:14:17 -0700 (PDT) Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20060912061417.B9B2F2283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 ------- Comment #3 from sweitzen at cisco.com 2006-09-11 23:14 ------- Put email in bugzilla: Hmm, OK. I'd like to figure out whether this could be something other than a scheduler issue. Could you test on kernel 2.6.18 or 2.6.17 please? If this is a scheduler issue, there's a chance scheduler is more fair there. Quoting r. Scott Weitzenkamp (sweitzen) : > Subject: RE: [Bug 229] heavy CPU load can starve ib_mad thread on latest processors > > I only tested with renice -20. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > > Sent: Monday, September 11, 2006 11:02 PM > > To: Scott Weitzenkamp (sweitzen) > > Cc: openib-general at openib.org > > Subject: Re: [Bug 229] heavy CPU load can starve ib_mad > > thread on latest processors > > > > Quoting r. bugzilla-daemon at openib.org : > > > Subject: [Bug 229] heavy CPU load can starve ib_mad thread > > on latest processors > > > > > > http://openib.org/bugzilla/show_bug.cgi?id=229 > > > > > > > > > > > > > > > > > > ------- Comment #2 from sweitzen at cisco.com 2006-09-11 21:54 ------- > > > Cisco embedded SM on a switch, thus no SM on a host, only > > IB drivers. > > > > Looks like we'll add the workaround for ofed. > > What renice level are you using? > > > > -- > > MST > > > ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From k_mahesh85 at yahoo.co.in Mon Sep 11 23:28:50 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Tue, 12 Sep 2006 07:28:50 +0100 (BST) Subject: [openib-general] reason behind locking the WQs while checking the state in modify_qp? Message-ID: <20060912062851.56875.qmail@web8315.mail.in.yahoo.com> hello all recently i have gone through the discussions how you have decided to split the QP lock in to separate WQ locks and the locking mechanism http://openib.org/pipermail/openib-general/2005-February/004491.html in this patch it is mentioned the only place we will be taking the lock is in modify_qp while checking the state of the QP but no description why it is required to do so my question is why it is required to lock the WQs. Is there any dependence of the QP state on the posting WRs -Mahesh --------------------------------- Find out what India is talking about on - Yahoo! Answers India Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Tue Sep 12 00:58:49 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 12 Sep 2006 10:58:49 +0300 (IDT) Subject: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept Message-ID: Document the reject sending and modifying qp to error done in rdma_accept Signed-off-by: Or Gerlitz diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index 402c63d..f932c16 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -237,6 +237,10 @@ int rdma_listen(struct rdma_cm_id *id, i * Typically, this routine is only called by the listener to accept a connection * request. It must also be called on the active side of a connection if the * user is performing their own QP transitions. + * + * In the case of error, a reject message is sent to the remote side and the + * state of the qp associated with the id is modified to error, such that any + * previously posted receive buffers would be flushed. */ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); From ogerlitz at voltaire.com Tue Sep 12 01:33:22 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 12 Sep 2006 11:33:22 +0300 Subject: [openib-general] [PATCH] IB/cma: add rdma_establish In-Reply-To: <4505E130.8010301@ichips.intel.com> References: <20060907214524.GA14791@mellanox.co.il> <4505E130.8010301@ichips.intel.com> Message-ID: <450670D2.4040805@voltaire.com> Sean Hefty wrote: > Michael S. Tsirkin wrote: >> Sean, did we decide what to do for upstream yet? >> I would say we need something like the below for 2.6.19 too >> (probably just need to update node type check). >> And, I like it that this approach leaves all matters of policy >> to users (such as whether move QP to RTS after asynchronous event >> or after completion event). > I will go with a patch similar to this one. It seems the most flexible. Just to make sure, you come to say that you would merge this patch instead the one that had the CM track local qp numbers and install a callback for the consumer QP to catch the async event etc? Also i'd like to make sure i follow what would happen: T1) the consumer gets an rx completion on a QP associated with a non established CMA ID [also on some point along time the async handler is called with a COMM_EST async event for this QP] T2) the consumer calls rdma_establish() T3) the consumer cma callback is called with ESTABLISHED event and is now able to post sends to the QP Indeed the **patch** for itself is somehow simpler, but the consumer must get established event before posting sends to the qp so they need to either queue RX-es or modify the QP to RTS before sending the REP. As i said before this is fine with our iser target as we queue the sole possible RX (login request) till getting the established. Is rdma_established() --> cm_establish() callable from non interruptible context? our target does a context jump once the cq handler is called so it does the actual processing in thread level, but there may be other consumers attempting to call rdma_establish from the hard-irq cq callback context. Also does the patch ensures only one ESTABLISHED event would be called for the id, no matter if rdma_establish() and an RTU reception happen in parallel? >> As a side note, reasons for frequent loss of RTU must be investigated. > A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU > never showing up? I will look into the ib_cm and see if there's an issue that > would cause an RTU not to be retried. Indeed, my initial suspect was that heavy CPU load on the server node prevents the mad/cm threads to be scheduled in, but as REQ messages do appear i also thought we should see if a "retried" REP cause a resend on the RTU. From ogerlitz at voltaire.com Tue Sep 12 01:43:18 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 12 Sep 2006 11:43:18 +0300 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <44FBC374.8040709@voltaire.com> References: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com> <44FBC374.8040709@voltaire.com> Message-ID: <45067326.5070305@voltaire.com> Or Gerlitz wrote: > diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h > index 402c63d..b9e22c8 100644 > --- a/include/rdma/rdma_cm.h > +++ b/include/rdma/rdma_cm.h > @@ -117,6 +117,14 @@ struct rdma_cm_id { > struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, > void *context, enum rdma_port_space ps); > > +/** > + * rdma_destroy_id - Destroys an RDMA identifier. > + * > + * @id: RDMA identifier. > + * > + * Note: calling this function has the effect of canceling in-flight > + * asynchronous operations associated with the id. > + */ > void rdma_destroy_id(struct rdma_cm_id *id); > > /** Hi Sean, Can you queue this for 2.6.19 ? Or. From ogerlitz at voltaire.com Tue Sep 12 01:46:52 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 12 Sep 2006 11:46:52 +0300 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <45067326.5070305@voltaire.com> References: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com> <44FBC374.8040709@voltaire.com> <45067326.5070305@voltaire.com> Message-ID: <450673FC.3000309@voltaire.com> Or Gerlitz wrote: > Or Gerlitz wrote: >> diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h >> index 402c63d..b9e22c8 100644 >> --- a/include/rdma/rdma_cm.h >> +++ b/include/rdma/rdma_cm.h >> @@ -117,6 +117,14 @@ struct rdma_cm_id { >> struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, >> void *context, enum rdma_port_space ps); >> >> +/** >> + * rdma_destroy_id - Destroys an RDMA identifier. >> + * >> + * @id: RDMA identifier. >> + * >> + * Note: calling this function has the effect of canceling in-flight >> + * asynchronous operations associated with the id. >> + */ >> void rdma_destroy_id(struct rdma_cm_id *id); >> >> /** > > Hi Sean, > > Can you queue this for 2.6.19 ? > > Or. > From krkumar2 at in.ibm.com Tue Sep 12 02:33:04 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 12 Sep 2006 15:03:04 +0530 Subject: [openib-general] [RFC] [PATCH] Re: CMA issue : bind selects the same port after close Message-ID: <20060912093304.6648.62748.sendpatchset@K50wks273895wss.in.ibm.com> > The basic problem in the CMA is in cma_alloc_port(). If the port number > (passed in as snum) is 0, the first available port starting at > sysctl_local_port_range[0] is used. We could instead start our search by > adding an increasing counter or a random value to the lower-end of the port > range. Then expand the code to handle searching below our starting value > if we failed to find one above it. Implement the above method where we start search for port# at a random offset from the lower-end of the port range, and on failure search at the lower-end of the port range. (only compile tested) Signed-off-by: Krishna Kumar diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-12 11:25:18.000000000 +0530 +++ new/core/cma.c 2006-09-12 14:28:26.000000000 +0530 @@ -1652,12 +1652,21 @@ static int cma_alloc_port(struct idr *ps { struct rdma_bind_list *bind_list; int port, start, ret; + int out_of_range; bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); if (!bind_list) return -ENOMEM; - start = snum ? snum : sysctl_local_port_range[0]; + if (snum) { + start = snum; + } else { + int low = sysctl_local_port_range[0]; + int high = sysctl_local_port_range[1]; + + get_random_bytes(&start, sizeof start); + start = start % (high - low) + low; + } do { ret = idr_get_new_above(ps, bind_list, start, &port); @@ -1666,8 +1675,21 @@ static int cma_alloc_port(struct idr *ps if (ret) goto err; - if ((snum && port != snum) || - (!snum && port > sysctl_local_port_range[1])) { + out_of_range = 0; + if (!snum && port > sysctl_local_port_range[1]) { + /* + * Couldn't find one from random() off of start, try from + * low. + */ + idr_remove(ps, port); + start = sysctl_local_port_range[0]; + do { + ret = idr_get_new_above(ps, bind_list, start, &port); + } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL)); + if (port > sysctl_local_port_range[1]) + out_of_range = 1; + } + if ((snum && port != snum) || out_of_range) { idr_remove(ps, port); ret = -EADDRNOTAVAIL; goto err; From mst at mellanox.co.il Tue Sep 12 02:14:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 12:14:44 +0300 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <45067326.5070305@voltaire.com> References: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com> <44FBC374.8040709@voltaire.com> <45067326.5070305@voltaire.com> Message-ID: <20060912091443.GA15301@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH] for-2.6.19 cma: protect against adding device during destruction > > Or Gerlitz wrote: > > diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h > > index 402c63d..b9e22c8 100644 > > --- a/include/rdma/rdma_cm.h > > +++ b/include/rdma/rdma_cm.h > > @@ -117,6 +117,14 @@ struct rdma_cm_id { > > struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, > > void *context, enum rdma_port_space ps); > > > > +/** > > + * rdma_destroy_id - Destroys an RDMA identifier. > > + * > > + * @id: RDMA identifier. > > + * > > + * Note: calling this function has the effect of canceling in-flight > > + * asynchronous operations associated with the id. > > + */ > > void rdma_destroy_id(struct rdma_cm_id *id); > > > > /** > > Hi Sean, > > Can you queue this for 2.6.19 ? You might want to repost, with proper Signed-off-by line, subject and patch description. Hint: git-applymbox seems to like mail in the following format: Subject: [PATCH] IB/xx: ..... Short description - goes into message Signed-off-by: xxxx --- Long discussion - including requests for inclusion, etc. Will be ignored by git. diff ..... diff --git xxxxx index yyyy --- zzzzzzzzzzzzz +++ zzzzzzzzzzzzz @@ prqs Patch itself Arbitrary discussion - will be ignored by git. -- MST From halr at voltaire.com Tue Sep 12 02:35:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2006 05:35:15 -0400 Subject: [openib-general] [PATCH][TRIVIAL] OpenSM: Eliminate unused max_port_profile parameter Message-ID: <1158053698.27427.144058.camel@hal.voltaire.com> OpenSM: Eliminate unused max_port_profile parameter in OpenSM subnet options structure Signed-off-by: Hal Rosenstock Index: include/opensm/osm_subnet.h =================================================================== --- include/opensm/osm_subnet.h (revision 9424) +++ include/opensm/osm_subnet.h (working copy) @@ -269,7 +269,6 @@ typedef struct _osm_subn_opt boolean_t console; cl_map_t port_prof_ignore_guids; boolean_t port_profile_switch_nodes; - uint32_t max_port_profile; osm_pfn_ui_extension_t pfn_ui_pre_lid_assign; void * ui_pre_lid_assign_ctx; osm_pfn_ui_mcast_extension_t pfn_ui_mcast_fdb_assign; @@ -405,10 +404,6 @@ typedef struct _osm_subn_opt * If TRUE will count the number of switch nodes routed through * the link. If FALSE - only CA/RT nodes are counted. * -* max_port_profile -* Prevent routing through a port subscribed with more than this -* number of routes. -* * pfn_ui_pre_lid_assign * A UI function to be invoked prior to lid assigment. It should * return 1 if any change was made to any lid or 0 otherwise. Index: include/opensm/osm_switch.h =================================================================== --- include/opensm/osm_switch.h (revision 9347) +++ include/opensm/osm_switch.h (working copy) @@ -1108,7 +1108,6 @@ osm_switch_recommend_path( IN OUT uint16_t *p_num_used_sys, IN OUT uint64_t *remote_node_guids, IN OUT uint16_t *p_num_used_nodes, - IN const uint32_t max_routes_subscribed, IN boolean_t ui_ucast_fdb_assign_func_defined ); /* @@ -1139,12 +1138,6 @@ osm_switch_recommend_path( * p_num_used_nodes * [in out] The number of remote nodes used for routing to the port. * -* max_routes_subscribed -* [in] The maximum allowed number of target lids routed through -* a specific port of the switch. If the port already assigned -* (in the lfdb) this number of target lids - it will not be used -* even if it has the smallest hops count to the target lid. -* * ui_ucast_fdb_assign_func_defined * [in] If TRUE - this means that there is a ui ucast_fdb_assign table * function defined (in pfn_ui_ucast_fdb_assign in subnet opts). This Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 9423) +++ opensm/osm_subnet.c (working copy) @@ -483,7 +483,6 @@ osm_subn_set_default_opt( p_opt->no_qos = FALSE; p_opt->accum_log_file = TRUE; p_opt->port_profile_switch_nodes = FALSE; - p_opt->max_port_profile = 0xffffffff; p_opt->pfn_ui_pre_lid_assign = NULL; p_opt->ui_pre_lid_assign_ctx = NULL; p_opt->pfn_ui_mcast_fdb_assign = NULL; Index: opensm/osm_switch.c =================================================================== --- opensm/osm_switch.c (revision 9427) +++ opensm/osm_switch.c (working copy) @@ -233,7 +233,6 @@ osm_switch_recommend_path( IN OUT uint16_t *p_num_used_sys, IN OUT uint64_t *remote_node_guids, IN OUT uint16_t *p_num_used_nodes, - IN const uint32_t max_routes_subscribed, IN boolean_t ui_ucast_fdb_assign_func_defined ) { @@ -425,8 +424,7 @@ osm_switch_recommend_path( /* the count is min but also lower then the max subscribed */ - if( (check_count < least_paths) && - (check_count <= max_routes_subscribed)) + if( check_count < least_paths ) { port_found = TRUE; best_port = port_num; Index: opensm/osm_ucast_mgr.c =================================================================== --- opensm/osm_ucast_mgr.c (revision 9347) +++ opensm/osm_ucast_mgr.c (working copy) @@ -281,7 +281,7 @@ __osm_ucast_mgr_dump_ucast_routes( best_port = osm_switch_recommend_path( p_sw, lid_ho, TRUE, NULL, NULL, NULL, NULL, /* No LMC Optimization */ - 0xffffffff, ui_ucast_fdb_assign_func_defined ); + ui_ucast_fdb_assign_func_defined ); sprintf( line, "No %u hop path possible via port %u!", best_hops, best_port ); strcat( p_mgr->p_report_buf, line ); @@ -752,12 +752,10 @@ __osm_ucast_mgr_process_port( port = osm_switch_recommend_path( p_sw, lid_ho, ignore_existing, remote_sys_guids, &num_used_sys, remote_node_guids, &num_used_nodes, - p_mgr->p_subn->opt.max_port_profile, ui_ucast_fdb_assign_func_defined ); else port = osm_switch_recommend_path( p_sw, lid_ho, ignore_existing, NULL, NULL, NULL, NULL, - p_mgr->p_subn->opt.max_port_profile, ui_ucast_fdb_assign_func_defined ); /* From michael.arndt at informatik.tu-chemnitz.de Tue Sep 12 04:20:58 2006 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Tue, 12 Sep 2006 13:20:58 +0200 Subject: [openib-general] OpenSM Multiple HCA cards on the same host Message-ID: <002901c6d65d$858004e0$21606d86@one7> Hi, in the osm/docs is mentioned that at the next release multiple HCA cards on the same host will be supported. does anybody know when this release comes or if there is any other implementation which works for multiple HCA cards. Maybe a pre-version is available? thanks Michael Arndt From tziporet at dev.mellanox.co.il Tue Sep 12 04:32:46 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 12 Sep 2006 14:32:46 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status In-Reply-To: References: Message-ID: <45069ADE.3000503@dev.mellanox.co.il> Scott Weitzenkamp (sweitzen) wrote: > When will rc4 be available? I'd also like to suggest we not rush the > final build, end of this week seems too soon. > > Scott Weitzenkamp RC4 will be out today or tomorrow. Final build is planed for mid-end of next week. Tziporet From tziporet at dev.mellanox.co.il Tue Sep 12 04:45:29 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 12 Sep 2006 14:45:29 +0300 Subject: [openib-general] On vacation on Sep-13 till 12-Oct Message-ID: <45069DD9.5070506@dev.mellanox.co.il> Hi, I am going for a month vacation starting today. I will not read emails during the vacation. :-) During my absence the release coordination will be done by Aviram Gutman and Vlad Sokolovsky. Michael Tsirkin will be responsible on approving new patches for OFED 1.1. Jack will be responsible for all release documents. "Shana Tova" Tziporet From halr at voltaire.com Tue Sep 12 04:36:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2006 07:36:03 -0400 Subject: [openib-general] OpenSM Multiple HCA cards on the same host In-Reply-To: <002901c6d65d$858004e0$21606d86@one7> References: <002901c6d65d$858004e0$21606d86@one7> Message-ID: <1158060928.27427.147885.camel@hal.voltaire.com> Hi Michael, On Tue, 2006-09-12 at 07:20, Michael Arndt wrote: > Hi, > > in the osm/docs Which doc ? BTW, what version of OpenSM are you using ? > is mentioned that at the next release multiple HCA cards on > the same host will be supported. If I understand your question correctly, OpenIB OpenSM supports multiple HCA cards. The HCA port is chosen by specifying the port GUID. If one is not specified, the first available port (which is LinkUp) is chosen. Both Mellanox and QLogic HCAs can support running an SM. > does anybody know when this release comes > or if there is any other implementation which works for multiple HCA cards. > Maybe a pre-version is available? Let me know if your question was different from what I answered above. -- Hal > thanks Michael Arndt > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From dotanb at dev.mellanox.co.il Tue Sep 12 05:15:00 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 12 Sep 2006 15:15:00 +0300 Subject: [openib-general] an example to use of multicast messages over the verbs exists in the openib svn Message-ID: <4506A4C4.9070907@dev.mellanox.co.il> Hi all. In https://openib.org/svn/trunk/contrib/mellanox/ibtp/gen2/userspace/useraccess/multicast_test there is an example for using multicast messages over the verbs. This test (for now) don't send any join message to the SA, it only attach (and detach) the QP to the multicast group. Dotan From thomas.bub at thomson.net Tue Sep 12 05:23:56 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Tue, 12 Sep 2006 14:23:56 +0200 Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine Message-ID: Just migrated from SLES 9 x86_64 to SLES 10 x86_64 in order to get 32-Bit support. Stumbled over some installation problems. First I tried "All packages" then "Basic install". Both failed to build at different places. Only a "customizied" installation worked. Find the details blow. Thomas Bub An All packages fails at: gcc -Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.i scsi_iser.o.d -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__ -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include -Iinclude -Iinclude2 -I/usr/src/linux-2.6.16.21-0.8/include -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration -fno-strict-aliasing -fno-common -ffreestanding -Os -fomit-frame-pointer -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement -Wno-pointer-sign -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(iscsi_iser)" -D"KBUILD_MODNAME=KBUILD_STR(ib_iser)" -c -o /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.tmp_iscsi _iser.o /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser .c /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser .c: In function 'iscsi_iser_set_param': /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser .c:478: error: implicit declaration of function 'iscsi_set_param' /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser .c: At top level: /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser .c:612: warning: initialization from incompatible pointer type /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser .c:613: error: 'iscsi_conn_get_param' undeclared here (not in a function) /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser .c:614: error: 'iscsi_session_get_param' undeclared here (not in a function) A Basic install fails at: gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -Wall -D_GNU_SOURCE -g -O2 -MT src_ipathverbs_la-ipathverbs.lo -MD -MP -MF .deps/src_ipathverbs_la-ipathverbs.Tpo -c src/ipathverbs.c -fPIC -DPIC -o .libs/src_ipathverbs_la-ipa thverbs.o In file included from src/ipathverbs.c:45: src/ipathverbs.h: In function 'to_ictx': src/ipathverbs.h:72: warning: implicit declaration of function 'offsetof' src/ipathverbs.h:72: error: expected expression before 'struct'ib_mthca My customized installation that works: ib_verbs kernel-ib kernel-ib-devel libibcm libibcm-devel libibverbs libibverbs-devel libibverbs-utils libmthca libmthca-devel From mst at mellanox.co.il Tue Sep 12 05:54:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 15:54:37 +0300 Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine In-Reply-To: References: Message-ID: <20060912125437.GB22369@mellanox.co.il> Quoting r. Bub Thomas : > Subject: Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine > > Just migrated from SLES 9 x86_64 to SLES 10 x86_64 in order to get > 32-Bit support. > Stumbled over some installation problems. > First I tried "All packages" then "Basic install". Both failed to build > at different places. > Only a "customizied" installation worked. > Find the details blow. > > Thomas Bub > > An All packages fails at: > > gcc > -Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.i > scsi_iser.o.d -nostdinc -isystem > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__ > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include > -Iinclude -Iinclude2 -I/usr/src/linux-2.6.16.21-0.8/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser -Wall > -Wundef -Wstrict-prototypes -Wno-trigraphs > -Werror-implicit-function-declaration -fno-strict-aliasing -fno-common > -ffreestanding -Os -fomit-frame-pointer -mtune=generic -m64 > -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks > -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time > -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement > -Wno-pointer-sign -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug -DMODULE > -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(iscsi_iser)" > -D"KBUILD_MODNAME=KBUILD_STR(ib_iser)" -c -o > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.tmp_iscsi > _iser.o > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c: In function 'iscsi_iser_set_param': > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c:478: error: implicit declaration of function 'iscsi_set_param' > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c: At top level: > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c:612: warning: initialization from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c:613: error: 'iscsi_conn_get_param' undeclared here (not in a > function) > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c:614: error: 'iscsi_session_get_param' undeclared here (not in a > function) Or - could you check this please? AFAIK iser should work on this kernel. > A Basic install fails at: > > gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -Wall > -D_GNU_SOURCE -g -O2 -MT src_ipathverbs_la-ipathverbs.lo -MD -MP -MF > .deps/src_ipathverbs_la-ipathverbs.Tpo -c src/ipathverbs.c -fPIC -DPIC > -o .libs/src_ipathverbs_la-ipa > thverbs.o > In file included from src/ipathverbs.c:45: > src/ipathverbs.h: In function 'to_ictx': > src/ipathverbs.h:72: warning: implicit declaration of function > 'offsetof' > src/ipathverbs.h:72: error: expected expression before 'struct'ib_mthca Looks like ipthverbs.h uses offsetof without including stddef.h Please post fix for trunk and OFED branch. > My customized installation that works: > > ib_verbs > kernel-ib > kernel-ib-devel > libibcm > libibcm-devel > libibverbs > libibverbs-devel > libibverbs-utils > libmthca > libmthca-devel -- MST From Richard.Frank at oracle.com Tue Sep 12 06:24:03 2006 From: Richard.Frank at oracle.com (Richard Frank) Date: Tue, 12 Sep 2006 09:24:03 -0400 Subject: [openib-general] IPOIB failover ? Message-ID: <1158067443.11227.207.camel@localhost.localdomain> Does IPOIB in this stack support transparent fail over between ports and across redundant HCAs using a "virtual IP" ? From thomas.bub at thomson.net Tue Sep 12 06:20:04 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Tue, 12 Sep 2006 15:20:04 +0200 Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine Message-ID: Michael, I don't understand what you mean on the iser trouble. I'm only a "comsumer" and not actively developing in the openIB world. I'm having enough trouble with my own application connecting a PowerPC gen1 from an x86_64 PC gen2 using verbs and cm. ;-) Thus I haven't installed SVN and can't work on this. I wanted to let the people know that there are some issues. Thanks Thomas -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Michael S. Tsirkin Sent: Tuesday, September 12, 2006 2:55 PM To: Bub Thomas Cc: openib-general at openib.org Subject: Re: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine Quoting r. Bub Thomas : > Subject: Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine > > Just migrated from SLES 9 x86_64 to SLES 10 x86_64 in order to get > 32-Bit support. > Stumbled over some installation problems. > First I tried "All packages" then "Basic install". Both failed to build > at different places. > Only a "customizied" installation worked. > Find the details blow. > > Thomas Bub > > An All packages fails at: > > gcc > -Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.i > scsi_iser.o.d -nostdinc -isystem > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__ > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include > -Iinclude -Iinclude2 -I/usr/src/linux-2.6.16.21-0.8/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser -Wall > -Wundef -Wstrict-prototypes -Wno-trigraphs > -Werror-implicit-function-declaration -fno-strict-aliasing -fno-common > -ffreestanding -Os -fomit-frame-pointer -mtune=generic -m64 > -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks > -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time > -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement > -Wno-pointer-sign -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug -DMODULE > -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(iscsi_iser)" > -D"KBUILD_MODNAME=KBUILD_STR(ib_iser)" -c -o > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.tmp_iscsi > _iser.o > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c: In function 'iscsi_iser_set_param': > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c:478: error: implicit declaration of function 'iscsi_set_param' > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c: At top level: > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c:612: warning: initialization from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c:613: error: 'iscsi_conn_get_param' undeclared here (not in a > function) > /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser > .c:614: error: 'iscsi_session_get_param' undeclared here (not in a > function) Or - could you check this please? AFAIK iser should work on this kernel. > A Basic install fails at: > > gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -Wall > -D_GNU_SOURCE -g -O2 -MT src_ipathverbs_la-ipathverbs.lo -MD -MP -MF > .deps/src_ipathverbs_la-ipathverbs.Tpo -c src/ipathverbs.c -fPIC -DPIC > -o .libs/src_ipathverbs_la-ipa > thverbs.o > In file included from src/ipathverbs.c:45: > src/ipathverbs.h: In function 'to_ictx': > src/ipathverbs.h:72: warning: implicit declaration of function > 'offsetof' > src/ipathverbs.h:72: error: expected expression before 'struct'ib_mthca Looks like ipthverbs.h uses offsetof without including stddef.h Please post fix for trunk and OFED branch. > My customized installation that works: > > ib_verbs > kernel-ib > kernel-ib-devel > libibcm > libibcm-devel > libibverbs > libibverbs-devel > libibverbs-utils > libmthca > libmthca-devel -- MST _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Tue Sep 12 06:47:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 16:47:25 +0300 Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine In-Reply-To: References: Message-ID: <20060912134725.GC22369@mellanox.co.il> Quoting r. Bub Thomas : > Subject: RE: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine > > Michael, > I don't understand what you mean on the iser trouble. Or Gerlitz from Voltaire is the iser maintainer. I Cc him. -- MST From minich at ornl.gov Tue Sep 12 06:56:23 2006 From: minich at ornl.gov (Makia Minich) Date: Tue, 12 Sep 2006 09:56:23 -0400 Subject: [openib-general] RDMA question Message-ID: I'm looking for some information on whether or not you can set a service level for RDMA packets (as a way to start working on a QoS design). So, does anyone: * know if this already works? * have an example of setting it? or * know if this could possibly work? Thanks for your help. -- Makia Minich National Center for Computation Science Oak Ridge National Laboratory From halr at voltaire.com Tue Sep 12 07:45:50 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2006 10:45:50 -0400 Subject: [openib-general] RDMA question In-Reply-To: References: Message-ID: <1158072263.27427.153907.camel@hal.voltaire.com> Hi Makia, On Tue, 2006-09-12 at 09:56, Makia Minich wrote: > I'm looking for some information on whether or not you can set a service > level for RDMA packets What API or ULP are you planning on using ? Sounds like you are planning on using verbs directly. Is this userspace or kernel ? > (as a way to start working on a QoS design). What do you mean by "QoS design" here ? > So, does anyone: > * know if this already works? > * have an example of setting it? > or > * know if this could possibly work? OpenSM (on the trunk or OFED 1.1) supports configuring QoS in a coarse manner. It looks to me like SL is supported in the AH attribute which can be set for an RC QP so you should be able to do this from user verbs. Not sure if this has been tried or not. It can be used with certain ULPs. I've done it with IPoIB. It is possible with others as well. -- Hal > Thanks for your help. From thomas.bub at thomson.net Tue Sep 12 07:55:25 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Tue, 12 Sep 2006 16:55:25 +0200 Subject: [openib-general] cmpost establisehd connections are very fragile!? Message-ID: Sean, got my libibverbs/libibcm code working on SLES9 x86_64 after following all the tricks in cmpost.c What I don't understand why the local_cm_response_timeout set to 254 instead of 20 can block IBV_WR_SEND from client to server while the opposite direction from server to client works!? You don't have a more detailed description to the libibcm parameters? There are a lot more that I don't understand. ;-) After having a running gen2 example I moved to my final distribution which is SLES 10 x86_64. I have to do this since I have to use a 32 Bit executable for 32-Bit and 64 Bit machines and this is supported in OFED from SLES10 onwards. Coming back to the fragile connection I encountered the same issue where the client can't do an IBV_WR_SEND to the server. This time both your cmpost example and my code failed. I tried to reduce the local_cm_response_timeout to 10 but thuis did not help at all. All above is done for 64-Bit executables. Interesting enough the 32 Bit executable of cmpost and my own build on a x86 SLES9 machine did not have the IBV_WR_SEND trouble. Thanks in advance for enlighten me. ;-) Thomas Bub -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at dev.mellanox.co.il Tue Sep 12 08:14:20 2006 From: vlad at dev.mellanox.co.il (vlad at dev.mellanox.co.il) Date: Tue, 12 Sep 2006 18:14:20 +0300 (IDT) Subject: [openib-general] OFED-1.1-rc4 is ready Message-ID: <30291.194.90.237.34.1158074060.squirrel@dev.mellanox.co.il> Hi, OFED-1.1-rc4 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ File: OFED-1.1-rc4.tgz Please report any issues in bugzilla http://openib.org/bugzilla/ Schedule reminder: ================== Next milestone: Final release is planed for Sep-20. Owners - please update release notes for final release not later then Sep-18. Tziporet & Vlad ------------------------------------------------------------------------ ------------- Release details: ================ Build_id: OFED-1.1-rc4 openib-1.1 (REV=9435) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace Git: git://www.mellanox.co.il/~git/infiniband ref: refs/heads/ofed_1_1 commit 796b6cb83392fd840549e3b6e559dfce022a2c49 # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm OS support: =========== Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up3 - Redhat EL4 up4 kernel.org: - Kernel 2.6.17 Note: Redhat EL4 up2, Fedora C4 and SuSE Pro 10 were dropped from the list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Note: Kernel components were updated to 2.6.18-rc6 Systems: ======== * x86_64 * x86 * ia64 * ppc64 Bug fixes from OFED-1.1-rc3: ============================ 1. SDP: Data corruption fix 2. libibverbs was reverted to 1.0 version (bug 219) 3. libsdp: TCP_RR fix 4. Compilation on kernel 2.6.18-rcX is failing 5. OSU MPI: fix failure in Intel tests 6. SRP: Kernel oops in case of port down 7. ib_uverbs fails to load on ia64 (bug 222) 8. IPoIB: Spinlock corruption in stress tests 9. Added srp_daemon service, enable from /etc/infiniband/openib.conf 10. mthca.so not properly linked to libibverbs (bug 223) 11. ipath compilation problem on SLES10 (bug 226) 12. problem with MPI: Get_processor_name on MVAPICH (bug 226) 13. Add an option to renice the ib_mad thread to highest priority. Enable from /etc/infiniband/openib.conf (workarouund for bug 229) 14. Update for ehca driver 15. Update for ipath driver 16. Madaye installation using OPENIB_PARAMS. To build madeye run: export OPENIB_PARAMS="--with-madeye-mod" (or put it into ofed.conf file for unattended installation) and run install.sh 17. OFED sources: Added kernel include files under: . Can be used by kerenl modules, and already include the backport pathces for each kernel. 18. ibutils - updated with new flags (-P, -pc and -pm) 19. SDP: RTU packet is lost Accept call blocks even if client connected. Limitations and known issues: ============================= 1. SDP: For Mellanox Sinai HCAs one must use latest FW version (1.1.000). 2. SDP: Scalability issue when hundreds of connections are opened 3. ipath driver is not supported on SLES9 SP3 4. ehca driver supports only PPC machines and compiled on kernel 2.6.18 5. OFED installation fails on PPC64 with SLES9. From mshefty at ichips.intel.com Tue Sep 12 08:22:09 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 08:22:09 -0700 Subject: [openib-general] [PATCH] IB/cma: add rdma_establish In-Reply-To: <450670D2.4040805@voltaire.com> References: <20060907214524.GA14791@mellanox.co.il> <4505E130.8010301@ichips.intel.com> <450670D2.4040805@voltaire.com> Message-ID: <4506D0A1.7060405@ichips.intel.com> Or Gerlitz wrote: > Just to make sure, you come to say that you would merge this patch > instead the one that had the CM track local qp numbers and install a > callback for the consumer QP to catch the async event etc? correct > Indeed the **patch** for itself is somehow simpler, but the consumer > must get established event before posting sends to the qp so they need > to either queue RX-es or modify the QP to RTS before sending the REP. The first patch only allows the option of waiting for the established event. > Is rdma_established() --> cm_establish() callable from non interruptible > context? Yes > Also does the patch ensures only one ESTABLISHED event would be called > for the id, no matter if rdma_establish() and an RTU reception happen in > parallel? Yes - Sean From mshefty at ichips.intel.com Tue Sep 12 08:34:22 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 08:34:22 -0700 Subject: [openib-general] [PATCH] Optimize cma_process_remove() In-Reply-To: References: Message-ID: <4506D37E.9040802@ichips.intel.com> Krishna Kumar2 wrote: > mutex_lock(&lock); > while (!list_empty(&cma_dev->id_list)) { > id_priv = list_entry(cma_dev->id_list.next, > struct rdma_id_private, list); > > if (cma_internal_listen(id_priv)) { > cma_destroy_listen(id_priv); > } else { > atomic_inc(&id_priv->refcount); > list_del(&id_priv->list); > list_add_tail(&id_priv->list, &remove_list); > } > } > mutex_unlock(&lock); > > list_for_each_entry_safe(id_priv, tmp, &remove_list, list) { > ret = cma_remove_id_dev(id_priv); > cma_deref_id(id_priv); > if (ret) > rdma_destroy_id(&id_priv->id); > } I believe that this has the same issue. If a user tries to destroy an rdma_cm_id, it will remove itself from the "device list". (This is why the ID's are moved to a new list, so that the removal still works.) In the code above, destroy thread(s) will remove ID(s) from the remove_list while we're trying to walk it. - Sean From mshefty at ichips.intel.com Tue Sep 12 08:41:18 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 08:41:18 -0700 Subject: [openib-general] cmpost establisehd connections are very fragile!? In-Reply-To: References: Message-ID: <4506D51E.8050107@ichips.intel.com> Bub Thomas wrote: > What I don’t understand why the local_cm_response_timeout set to 254 > instead of 20 can block IBV_WR_SEND from client to server while the > opposite direction from server to client works!? local_cm_response_timeout is a 5-bit value. It's 4.096 x 2 ^ local_cm_response_timeout micro-seconds if that helps any. > You don’t have a more detailed description to the libibcm parameters? > There are a lot more that I don’t understand. ;-) You will need to refer to the IB spec, sections 12.6 and 12.7 for descriptions. - Sean From mshefty at ichips.intel.com Tue Sep 12 08:44:37 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 08:44:37 -0700 Subject: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept In-Reply-To: References: Message-ID: <4506D5E5.2010602@ichips.intel.com> Or Gerlitz wrote: > + * In the case of error, a reject message is sent to the remote side and the > + * state of the qp associated with the id is modified to error, such that any > + * previously posted receive buffers would be flushed. Hmm... this makes me question whether this is what it should be doing. Is there any reason not to reject the connection if accept fails? - Sean From sean.hefty at intel.com Tue Sep 12 09:03:33 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 09:03:33 -0700 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <450673FC.3000309@voltaire.com> Message-ID: <000101c6d684$ff9f87b0$d8248686@amr.corp.intel.com> >> Can you queue this for 2.6.19 ? Roland, can you pull this patch in for 2.6.19? It's SVN check-in 9273. --- Clarify that rdma_destroy_id cancels outstanding asynchronous operations on the Associated id. Signed-off-by: Or Gerlitz Signed-off-by: Sean Hefty Index: rdma_cm.h =================================================================== --- rdma_cm.h (revision 9272) +++ rdma_cm.h (revision 9273) @@ -126,6 +126,14 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, void *context, enum rdma_port_space ps); +/** + * rdma_destroy_id - Destroys an RDMA identifier. + * + * @id: RDMA identifier. + * + * Note: calling this function has the effect of canceling in-flight + * asynchronous operations associated with the id. + */ void rdma_destroy_id(struct rdma_cm_id *id); /** From mshefty at ichips.intel.com Tue Sep 12 09:09:22 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 09:09:22 -0700 Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases. In-Reply-To: <20060911175312.GC15556@mellanox.co.il> References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com> <20060910111145.GA12111@mellanox.co.il> <4505A08E.5000705@ichips.intel.com> <20060911175312.GC15556@mellanox.co.il> Message-ID: <4506DBB2.5020400@ichips.intel.com> Michael S. Tsirkin wrote: >>The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a >>second call is not made to rdma_connect after the first call fails. So we're >>probably safe deferring this until 2.6.19, unless someone has code which calls >>rdma_connect twice. > > SDP can do this I think. To clarify, SDP would need to do something like: ret = rdma_connect(id_7471 ...) if (ret) rdma_connect(id_7471 ...) The same ID would need to be used twice. - Sean From rdreier at cisco.com Tue Sep 12 09:13:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 12 Sep 2006 09:13:55 -0700 Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction In-Reply-To: <000101c6d684$ff9f87b0$d8248686@amr.corp.intel.com> (Sean Hefty's message of "Tue, 12 Sep 2006 09:03:33 -0700") References: <000101c6d684$ff9f87b0$d8248686@amr.corp.intel.com> Message-ID: Thanks, applied. From sean.hefty at intel.com Tue Sep 12 09:19:36 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 09:19:36 -0700 Subject: [openib-general] an example to use of multicast messages over the verbs exists in the openib svn In-Reply-To: <4506A4C4.9070907@dev.mellanox.co.il> Message-ID: <000201c6d687$3d808640$d8248686@amr.corp.intel.com> >This test (for now) don't send any join message to the SA, it only >attach (and detach) the QP to the multicast group. I posted a simple multicast test program that uses the proposed libibsa interface in: http://openib.org/pipermail/openib-general/2006-August/025433.html (See the program at the bottom of the message.) Combined with the kernel support, this will result in sending join messages to the SA. - Sean From rdreier at cisco.com Tue Sep 12 09:28:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 12 Sep 2006 09:28:44 -0700 Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs In-Reply-To: <20060911235446.GB19021@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 12 Sep 2006 02:54:46 +0300") References: <20060911235446.GB19021@mellanox.co.il> Message-ID: Roland> My gut reaction is that it seems pretty ugly. Michael> Hmm. All of it or just some bits? Well, the idea of pushing timewait handling down into the low-level drivers seems strange to me. I don't think any other stack or any other OS does anything like this. Michael> Could be a library function in core so that ipath etc can Michael> reuse it. But note how there's no dependency between Michael> drivers here - no reason to block change in mthca until Michael> ipath/ehca implement this functionality, too. I guess the only thing would be that we should implement this for mthca to maximize the amount ipath/ehca can reuse when they implement this. Michael> Not entirely corect. Please look at 9.7.1 - search for Michael> "stale packets": OK, this is somewhat convincing... Michael> I don't see how this limits the rate of QP Michael> creation. Could you explain? Once all QPs are tied up in timewait state, then new QPs can only be created as old QPs leave timewait. Probably there are enough QPs and timewait is short enough that this won't be a problem in practice, but it's the same idea in theory as a busy server running out of fds because of sockets in timewait state. - R. From caitlinb at broadcom.com Tue Sep 12 09:31:13 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 12 Sep 2006 09:31:13 -0700 Subject: [openib-general] RDMA question In-Reply-To: References: Message-ID: <469958e00609120931n56b58444r86b0473b4bb79651@mail.gmail.com> On 9/12/06, Makia Minich wrote: > I'm looking for some information on whether or not you can set a service > level for RDMA packets (as a way to start working on a QoS design). > Transport independent QoS is not truly feasible. You'll have to apply QoS to the underlying transport (IB or IP) using IB or IP tools and concepts. You *can* identify one or more transport neutral Classes of Service, and then have your application layer select that class of service. But translating the class of service to actual network controls will always be transport specific. So the short answer is that you don't set a service level for RDMA packets, you set a service level for IB or IP packets that happen to carry RDMA. From sean.hefty at intel.com Tue Sep 12 09:41:12 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 09:41:12 -0700 Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs In-Reply-To: Message-ID: <000301c6d68a$42028490$d8248686@amr.corp.intel.com> >Well, the idea of pushing timewait handling down into the low-level >drivers seems strange to me. I don't think any other stack or any >other OS does anything like this. I think the Windows IB stack may do something similar. The difficulty is doing this at a higher level is that the QP must be destroyed in order for the CQs / PD to be destroyed. The IB CM already tracks timewait, so could notify drivers when a QP can be re-used, but that requires connections to go through the CM. For stale connection handling, that may be a good thing anyway, but there's nothing enforcing this today. - Sean From jlentini at netapp.com Tue Sep 12 11:15:42 2006 From: jlentini at netapp.com (James Lentini) Date: Tue, 12 Sep 2006 14:15:42 -0400 (EDT) Subject: [openib-general] [PATCH DAPLTEST] - compile failure on FC5/X86_64 In-Reply-To: <1156859264.31129.12.camel@stevo-desktop> References: <1156859264.31129.12.camel@stevo-desktop> Message-ID: On Tue, 29 Aug 2006, Steve Wise wrote: > Dunno if this is the correct fix for all platforms/distros, but it > worked for me on FC5/X86_64... CLK_TCK wasn't getting defined for > this distro... Committed in revision 9442. From mst at mellanox.co.il Tue Sep 12 11:24:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Sep 2006 21:24:42 +0300 Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure cases. In-Reply-To: <4506DBB2.5020400@ichips.intel.com> References: <4506DBB2.5020400@ichips.intel.com> Message-ID: <20060912182442.GA23428@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] cma_connect_ib leaks memory in failure cases. > > Michael S. Tsirkin wrote: > >>The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a > >>second call is not made to rdma_connect after the first call fails. So we're > >>probably safe deferring this until 2.6.19, unless someone has code which calls > >>rdma_connect twice. > > > > SDP can do this I think. > > To clarify, SDP would need to do something like: > > ret = rdma_connect(id_7471 ...) > if (ret) > rdma_connect(id_7471 ...) > > The same ID would need to be used twice. > > - Sean > Sure - if connect on socket fails, application can retry. -- MST From rdreier at cisco.com Tue Sep 12 13:26:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 12 Sep 2006 13:26:21 -0700 Subject: [openib-general] user-mode data strucures In-Reply-To: <1158092339.9296.20.camel@trinity.ogc.int> (Tom Tucker's message of "Tue, 12 Sep 2006 15:18:59 -0500") References: <1158092339.9296.20.camel@trinity.ogc.int> Message-ID: Tom> In working with the Intel compilers recently, however, I've Tom> found that this compiler attempts to align data structures on Tom> boundaries that are native to the data types. So uint64_t's Tom> are aligned on a 64b boundary. This is an issue for Tom> ibv_recv_wr and ibv_send_wr because they are immediately Tom> preceded by a *next ptr which is 32b on 32b architectures. Ugh. How about swapping wr_id and next in the ibv_recv_wr and ibv_send_wr structures? I hate adding __attribute__((packed)) because it ruins things on ia64 et al. - R. From tom at opengridcomputing.com Tue Sep 12 13:18:59 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 12 Sep 2006 15:18:59 -0500 Subject: [openib-general] user-mode data strucures Message-ID: <1158092339.9296.20.camel@trinity.ogc.int> Roland: The user-mode data structures do not include specific alignment instructions to compilers. This all works great provided that the libraries and the applications are built using the same compiler. In working with the Intel compilers recently, however, I've found that this compiler attempts to align data structures on boundaries that are native to the data types. So uint64_t's are aligned on a 64b boundary. This is an issue for ibv_recv_wr and ibv_send_wr because they are immediately preceded by a *next ptr which is 32b on 32b architectures. To make a long story short, I've added __attribute__((packed)) to my locally installed header files to make this work, but we should probably either pad these data structures internally or explicitly pack them like I'm doing now. What do people think? Tom From tom at opengridcomputing.com Tue Sep 12 14:45:15 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 12 Sep 2006 16:45:15 -0500 Subject: [openib-general] user-mode data strucures In-Reply-To: References: <1158092339.9296.20.camel@trinity.ogc.int> Message-ID: <1158097515.9296.25.camel@trinity.ogc.int> On Tue, 2006-09-12 at 13:26 -0700, Roland Dreier wrote: > Tom> In working with the Intel compilers recently, however, I've > Tom> found that this compiler attempts to align data structures on > Tom> boundaries that are native to the data types. So uint64_t's > Tom> are aligned on a 64b boundary. This is an issue for > Tom> ibv_recv_wr and ibv_send_wr because they are immediately > Tom> preceded by a *next ptr which is 32b on 32b architectures. > > Ugh. > > How about swapping wr_id and next in the ibv_recv_wr and ibv_send_wr > structures? I hate adding __attribute__((packed)) because it ruins > things on ia64 et al. > I think that just moves the alignment issue to the first word of the sge since it's first element is a uint64_t. I think the only thing that works across the board without packing is to #if __BITS_IN_WORD==32 add a pad word after *next. erf...ugly code. > - R. From Brian.Cain at ge.com Tue Sep 12 14:50:24 2006 From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare)) Date: Tue, 12 Sep 2006 17:50:24 -0400 Subject: [openib-general] [PATCH] leak in *_pingpong.c? Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033E84C59@CINMLVEM11.e2k.ad.ge.com> Be gentle, it's my first patch submission. :) The following is untested, but it looks like it's probably pretty trivial. Index: examples/rc_pingpong.c =================================================================== --- examples/rc_pingpong.c (revision 9442) +++ examples/rc_pingpong.c (working copy) @@ -143,6 +143,7 @@ asprintf(&service, "%d", port); n = getaddrinfo(servername, service, &hints, &res); + free(service); if (n < 0) { fprintf(stderr, "%s for %s:%d\n", gai_strerror(n), servername, port); @@ -209,6 +210,7 @@ asprintf(&service, "%d", port); n = getaddrinfo(NULL, service, &hints, &res); + free(service); if (n < 0) { fprintf(stderr, "%s for port %d\n", gai_strerror(n), port); Index: examples/srq_pingpong.c =================================================================== --- examples/srq_pingpong.c (revision 9442) +++ examples/srq_pingpong.c (working copy) @@ -154,6 +154,7 @@ asprintf(&service, "%d", port); n = getaddrinfo(servername, service, &hints, &res); + free(service); if (n < 0) { fprintf(stderr, "%s for %s:%d\n", gai_strerror(n), servername, port); @@ -233,6 +234,7 @@ asprintf(&service, "%d", port); n = getaddrinfo(NULL, service, &hints, &res); + free(service); if (n < 0) { fprintf(stderr, "%s for port %d\n", gai_strerror(n), port); Index: examples/uc_pingpong.c =================================================================== --- examples/uc_pingpong.c (revision 9442) +++ examples/uc_pingpong.c (working copy) @@ -131,6 +131,7 @@ asprintf(&service, "%d", port); n = getaddrinfo(servername, service, &hints, &res); + free(service); if (n < 0) { fprintf(stderr, "%s for %s:%d\n", gai_strerror(n), servername, port); @@ -197,6 +198,7 @@ asprintf(&service, "%d", port); n = getaddrinfo(NULL, service, &hints, &res); + free(service); if (n < 0) { fprintf(stderr, "%s for port %d\n", gai_strerror(n), port); Index: examples/ud_pingpong.c =================================================================== --- examples/ud_pingpong.c (revision 9442) +++ examples/ud_pingpong.c (working copy) @@ -132,6 +132,7 @@ asprintf(&service, "%d", port); n = getaddrinfo(servername, service, &hints, &res); + free(service); if (n < 0) { fprintf(stderr, "%s for %s:%d\n", gai_strerror(n), servername, port); @@ -198,6 +199,7 @@ asprintf(&service, "%d", port); n = getaddrinfo(NULL, service, &hints, &res); + free(service); if (n < 0) { fprintf(stderr, "%s for port %d\n", gai_strerror(n), port); -- -Brian From somenath at veritas.com Tue Sep 12 15:08:22 2006 From: somenath at veritas.com (somenath) Date: Tue, 12 Sep 2006 15:08:22 -0700 Subject: [openib-general] HCAs with and without memory In-Reply-To: <4503B42C.60405@dev.mellanox.co.il> References: <4503B42C.60405@dev.mellanox.co.il> Message-ID: <45072FD6.3090802@veritas.com> is there any performance difference observed between memFree and non-memFree HCAs? thanks, som. Dotan Barak wrote: >Hi john. > >john t wrote: > > >>Hi OpenIB group, >> >>What is the difference between HCAs with memory and without memory. >>How is the on-board memory used by HCAs? Is it that data is first >>copied into this memory and then into physical memory? >> >>Regards, >>John T. >> >> > >If you are asking about Mellanox HCAs i can answer you: > >The difference is the technology which those HCAs are using: >The HCAs without the attached memory are using the memfree technology. > >The main difference between the 2 HCAs is where the context of the >various resources is located: in the host memory or in the attached memory. > >The data itself (during data movement) is not stored in this memory at >any point in the attached memory. > >Dotan > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From rdreier at cisco.com Tue Sep 12 15:21:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 12 Sep 2006 15:21:28 -0700 Subject: [openib-general] user-mode data strucures In-Reply-To: <1158097515.9296.25.camel@trinity.ogc.int> (Tom Tucker's message of "Tue, 12 Sep 2006 16:45:15 -0500") References: <1158092339.9296.20.camel@trinity.ogc.int> <1158097515.9296.25.camel@trinity.ogc.int> Message-ID: Tom> I think that just moves the alignment issue to the first word Tom> of the sge since it's first element is a uint64_t. I think Tom> the only thing that works across the board without packing is Tom> to #if __BITS_IN_WORD==32 add a pad word after Tom> *next. erf...ugly code. Actually I think we're OK -- the sg_list member is a pointer, which will be 32 bits too. It looks to me like ibv_send_wr has an even number of 32-bit quantities before the union, so everything should pack naturally. From tom at opengridcomputing.com Tue Sep 12 15:27:58 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 12 Sep 2006 17:27:58 -0500 Subject: [openib-general] user-mode data strucures In-Reply-To: References: <1158092339.9296.20.camel@trinity.ogc.int> <1158097515.9296.25.camel@trinity.ogc.int> Message-ID: <1158100078.9296.29.camel@trinity.ogc.int> On Tue, 2006-09-12 at 15:21 -0700, Roland Dreier wrote: > Tom> I think that just moves the alignment issue to the first word > Tom> of the sge since it's first element is a uint64_t. I think > Tom> the only thing that works across the board without packing is > Tom> to #if __BITS_IN_WORD==32 add a pad word after > Tom> *next. erf...ugly code. > > Actually I think we're OK -- the sg_list member is a pointer, which will > be 32 bits too. It looks to me like ibv_send_wr has an even number of > 32-bit quantities before the union, so everything should pack naturally. Oops, you're right, I'm dumb... We're done then. From rdreier at cisco.com Tue Sep 12 15:47:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 12 Sep 2006 15:47:29 -0700 Subject: [openib-general] user-mode data strucures In-Reply-To: <1158100078.9296.29.camel@trinity.ogc.int> (Tom Tucker's message of "Tue, 12 Sep 2006 17:27:58 -0500") References: <1158092339.9296.20.camel@trinity.ogc.int> <1158097515.9296.25.camel@trinity.ogc.int> <1158100078.9296.29.camel@trinity.ogc.int> Message-ID: OK, I checked the corresponding change into the libibverbs 1.1 devel tree. I'm not sure how to fix this in libibverbs 1.0 without affecting the ABI though... - R. From tom at opengridcomputing.com Tue Sep 12 16:10:36 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 12 Sep 2006 18:10:36 -0500 Subject: [openib-general] user-mode data strucures In-Reply-To: References: <1158092339.9296.20.camel@trinity.ogc.int> <1158097515.9296.25.camel@trinity.ogc.int> <1158100078.9296.29.camel@trinity.ogc.int> Message-ID: <1158102636.9296.34.camel@trinity.ogc.int> I'm OK with a work-around in the near term. BTW, how do we correlate libibverbs 1.x with cat /sys/class/infiniband/verbs/abi_version? On Tue, 2006-09-12 at 15:47 -0700, Roland Dreier wrote: > OK, I checked the corresponding change into the libibverbs 1.1 devel > tree. I'm not sure how to fix this in libibverbs 1.0 without > affecting the ABI though... > > - R. From mshefty at ichips.intel.com Tue Sep 12 16:17:11 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 16:17:11 -0700 Subject: [openib-general] [PATCH] IB/cma: add rdma_establish In-Reply-To: <20060911222956.GD17098@mellanox.co.il> References: <20060907214524.GA14791@mellanox.co.il> <4505E130.8010301@ichips.intel.com> <20060911222956.GD17098@mellanox.co.il> Message-ID: <45073FF7.7020506@ichips.intel.com> Michael S. Tsirkin wrote: >>>As a side note, reasons for frequent loss of RTU must be investigated. >> >>A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU >>never showing up? > > > Seems like that. I know fir sure I do accept after REP but remote side never > gets ESTABLISHED. I looked at the code, then ran some tests. The REP is retried until an RTU is received, or its number of retries is exhausted. By modifying the IB CM, I was able to force RTU drops. Using madeye, I could see that the REP would be retried, resulting in the RTU being resent. After 4 drops, I had the code receive the RTU, which allowed the test to proceed. A couple things to look at in OFED would be the setting of max cm retries and the cm timeout. - Sean From ralphc at pathscale.com Tue Sep 12 17:40:10 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Tue, 12 Sep 2006 17:40:10 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver Message-ID: <1158108010.8759.192.camel@brick.pathscale.com> Problem: The IB kernel to IB device driver interface uses dma_map_single() and dma_map_sg() to allocate device bus addresses for HW DMA. These bus addresses are passed to the IB device driver via ib_post_send() and ib_post_recv(). The ib_ipath driver needs kernel virtual addresses in order to be able to copy data to/from the posted work requests since it does not use HW DMA. It currently relies on the mapping being one-to-one and cannot reasonably reverse the mapping when an IOMMU is present. History: I first proposed modifying the dma_* routines to allow a device driver to interpose on the function calls. This was not well received by the Linux kernel maintainers since it would have too much impact on the current code. I also tried proposing adding a flag to the ib_device structure and modifying the kernel IB code to check the flag and pass either the dma_*() mapped address or a kernel virtual address. This works OK for kmalloc() buffers where dma_map_single() is being called but doesn't work well for SRP which has lists of physical pages and calls dma_map_sg(). It also means that the kernel IB layer needs to explicitly handle two different kinds of addresses. Current Proposal: My current proposal is to provide wrapper routines for the dma_*() routines which only the IB kernel code would use. These ib_dma_*() variants would allow a device driver to interpose on the call and do appropriate code to convert the kernel virtual or physical page addresses to something the device driver can handle. For ib_mthca and ib_ehca, these would result in the corresponding dma_*() routine being called. For ib_ipath, a different implementation would be needed. My expectation is that this would add little overhead, be easy to explain and document, and would be straightforward to convert existing code to the new convention (see sample patch below). I would like to get some consensus that this is an acceptable approach before I spend a bunch of time developing it further. Index: ib_verbs.h =================================================================== --- ib_verbs.h (revision 9441) +++ ib_verbs.h (working copy) @@ -43,6 +43,7 @@ #include #include +#include #include #include @@ -984,6 +985,19 @@ struct ib_device { struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad); + int (*mapping_error)(dma_addr_t dma_addr); + dma_addr_t (*map_single)(struct device *hwdev, + void *ptr, size_t size, + int direction); + void (*unmap_single)(struct device *dev, + dma_addr_t addr, + size_t size, int direction); + int (*map_sg)(struct device *hwdev, + struct scatterlist *sg, + int nents, int direction); + void (*unmap_sg)(struct device *hwdev, + struct scatterlist *sg, + int nents, int direction); struct module *owner; struct class_device class_dev; @@ -1392,6 +1406,64 @@ static inline int ib_req_ncomp_notif(str struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags); /** + * ib_dma_mapping_error - + */ +static inline int ib_dma_mapping_error(struct ib_device *dev, + dma_addr_t dma_addr) +{ + return dev->mapping_error ? + dev->mapping_error(dma_addr) : dma_mapping_error(dma_addr); +} + +/** + * ib_dma_map_single - + */ +static inline dma_addr_t ib_dma_map_single(struct ib_device *dev, + void *cpu_addr, size_t size, + enum dma_data_direction direction) +{ + return dev->map_single ? + dev->map_single(dev, cpu_addr, size, direction) : + dma_map_single(dev->dma_device, cpu_addr, size, direction); +} + +/** + * ib_dma_unmap_single - + */ +static inline void ib_dma_unmap_single(struct ib_device *dev, + dma_addr_t addr, size_t size, + enum dma_data_direction direction) +{ + dev->unmap_single ? + dev->unmap_single(dev, addr, size, direction) : + dma_unmap_single(dev->dma_device, addr, size, direction); +} + +/** + * ib_dma_map_sg - + */ +static inline dma_addr_t ib_dma_map_sg(struct ib_device *dev, + struct scatterlist *sg, int nents, + enum dma_data_direction direction) +{ + return dev->map_sg ? + dev->map_sg(dev, sg, nents, direction) : + dma_map_sg(dev->dma_device, sg, nents, direction); +} + +/** + * ib_dma_unmap_sg - + */ +static inline void ib_dma_unmap_sg(struct ib_device *dev, + struct scatterlist *sg, int nents, + enum dma_data_direction direction) +{ + dev->unmap_sg ? + dev->unmap_sg(dev, sg, nents, direction) : + dma_unmap_sg(dev->dma_device, sg, nents, direction); +} + +/** * ib_reg_phys_mr - Prepares a virtually addressed memory region for use * by an HCA. * @pd: The protection domain associated assigned to the registered region. From tom at opengridcomputing.com Tue Sep 12 18:10:17 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 12 Sep 2006 20:10:17 -0500 Subject: [openib-general] CMA issue: bind selects the same port after close In-Reply-To: Message-ID: There is a whole array of Linux port management services that perform exactly the logic that you are trying to emulate. Wouldn't our efforts be more productively spent figuring out to use the existing services you are currently trying to emulate? What do you do, for example, when the port allocation policy in the kernel changes? Change your emulation? I completely understand that the existing port management services are not exported, but functionally, they support multiple port spaces, show up in netstat, etc... Can someone please explain to me the reluctance to use these services in favor of replicating them? Sorry if this reads as a rant...but I feel we're on the wrong track... On 9/11/06 11:27 PM, "Krishna Kumar2" wrote: > Hi Michael, > >>> The basic problem in the CMA is in cma_alloc_port(). If the port > number (passed >>> in as snum) is 0, the first available port starting at >>> sysctl_local_port_range[0] is used. We could instead start our search > by >>> adding an increasing counter or a random value to the lower-end of the > port >>> range. Then expand the code to handle searching below our starting > value if we >>> failed to find one above it. >> >> Sounds good. >> >>> Are the port numbers assigned by TCP sequential or more random? >> >> TCP ports seem to be sequential. > > Are you getting sequential port numbers ? inet_csk_get_port() is actually > using random > number to get the *starting* value between sysctl_local_port_range[0] and > sysctl_local_port_range[2]. Once it gets this starting number, it goes > sequentially all the > way to the high limit (sysctl*[1]) and then loops back from low > (sysctl*[0]) limit until all > the numbers in the middle are looked at. > > I think we can easily use the same logic. Sean's second option seems to be > followed > here "> > adding a random value to the lower-end of the port range" > > Thanks, > > - KK > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rjwalsh at pathscale.com Tue Sep 12 20:01:54 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 12 Sep 2006 20:01:54 -0700 Subject: [openib-general] ibv_driver_init renamed? Message-ID: <450774A2.8080402@pathscale.com> Somewhere between OFED-1.1-RC3 and -RC4, the ibv_driver_init function was renamed to openib_driver_init. We at QLogic were aware this change was being made and so now our user verbs support does not work at all in RC4. Why did something like this happen between two release candidates? Regards, Robert. From jgunthorpe at obsidianresearch.com Tue Sep 12 20:10:54 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 12 Sep 2006 21:10:54 -0600 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com> References: <1158108010.8759.192.camel@brick.pathscale.com> Message-ID: <20060913031054.GA4464@obsidianresearch.com> On Tue, Sep 12, 2006 at 05:40:10PM -0700, Ralph Campbell wrote: > The ib_ipath driver needs kernel virtual addresses in order to be able > to copy data to/from the posted work requests since it does not > use HW DMA. It currently relies on the mapping being one-to-one > and cannot reasonably reverse the mapping when an IOMMU is present. I'm sure this must have been answered, but given a PCI domain:bus:device:function tuple and a DMA address, shouldn't any effects of an IOMMU be easially duplicated in software to result in a cpu-bus physical address? Ie on AMD64 it is just a matter of following the GART tables in software - assuming the address in question hits the GART region (which for ipath, I expect, it never would) Jason From rdreier at cisco.com Tue Sep 12 20:15:53 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 12 Sep 2006 20:15:53 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com> (Ralph Campbell's message of "Tue, 12 Sep 2006 17:40:10 -0700") References: <1158108010.8759.192.camel@brick.pathscale.com> Message-ID: > My current proposal is to provide wrapper routines for the > dma_*() routines which only the IB kernel code would use. > These ib_dma_*() variants would allow a device driver to interpose > on the call and do appropriate code to convert the kernel virtual > or physical page addresses to something the device driver can handle. > For ib_mthca and ib_ehca, these would result in the corresponding > dma_*() routine being called. For ib_ipath, a different implementation > would be needed. Seems like the least-bad way forward. A few comments on the proposed implementation: > @@ -984,6 +985,19 @@ struct ib_device { > struct ib_grh *in_grh, > struct ib_mad *in_mad, > struct ib_mad *out_mad); > + int (*mapping_error)(dma_addr_t dma_addr); > + dma_addr_t (*map_single)(struct device *hwdev, > + void *ptr, size_t size, > + int direction); > + void (*unmap_single)(struct device *dev, > + dma_addr_t addr, > + size_t size, int direction); > + int (*map_sg)(struct device *hwdev, > + struct scatterlist *sg, > + int nents, int direction); > + void (*unmap_sg)(struct device *hwdev, > + struct scatterlist *sg, > + int nents, int direction); First of all I would put all this into a "struct ib_dma_ops" or something like that, so struct ib_device can have just a member like struct ib_dma_ops *dma_ops; That keeps the definition of struct ib_device from getting too much more gigantic, and also makes it easy for the core to export a standard dma_ops pointer that devices that use the default implementation can use. Why not make the DMA operations take a struct ib_device * instead of a struct device *? I think that would actually clean up the consumer code, and it would make it easier for ipath -- otherwise you have to find your way back from the struct device *. Also, I think you will need a few more methods. has a definition of DMA operations that might be useful to refer too. But for example SRP uses at least dma_sync_single_for_cpu() and dma_sync_single_for_device(). Actually that might be the only extra method needed for now. - R. From rdreier at cisco.com Tue Sep 12 20:19:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 12 Sep 2006 20:19:22 -0700 Subject: [openib-general] ibv_driver_init renamed? In-Reply-To: <450774A2.8080402@pathscale.com> (Robert Walsh's message of "Tue, 12 Sep 2006 20:01:54 -0700") References: <450774A2.8080402@pathscale.com> Message-ID: I just replied to the other copy of this email: Because OFED 1.1-rc2 and -rc3 inadvertently contained libibverbs code taken from the unstable unreleased libibverbs 1.1 tree. -rc4 reverted back to the stable libibverbs 1.0 code. http://openib.org/bugzilla/show_bug.cgi?id=219 has more details. From rdreier at cisco.com Tue Sep 12 20:21:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 12 Sep 2006 20:21:37 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <20060913031054.GA4464@obsidianresearch.com> (Jason Gunthorpe's message of "Tue, 12 Sep 2006 21:10:54 -0600") References: <1158108010.8759.192.camel@brick.pathscale.com> <20060913031054.GA4464@obsidianresearch.com> Message-ID: Jason> I'm sure this must have been answered, but given a PCI Jason> domain:bus:device:function tuple and a DMA address, Jason> shouldn't any effects of an IOMMU be easially duplicated in Jason> software to result in a cpu-bus physical address? Ie on Jason> AMD64 it is just a matter of following the GART tables in Jason> software - assuming the address in question hits the GART Jason> region (which for ipath, I expect, it never would) Yes, you could do this. However there's no exported interface for reversing a DMA mapping. And it seems like a lot of unneeded complexity to add -- why not just avoid the DMA mapping in the first place? - R. From mst at mellanox.co.il Tue Sep 12 20:57:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 06:57:20 +0300 Subject: [openib-general] CMA issue: bind selects the same port after close In-Reply-To: References: Message-ID: <20060913035720.GA20225@mellanox.co.il> Quoting r. Tom Tucker : > Subject: Re: [openib-general] CMA issue: bind selects the same port after close > > > There is a whole array of Linux port management services that perform > exactly the logic that you are trying to emulate. Wouldn't our efforts be > more productively spent figuring out to use the existing services you are > currently trying to emulate? What do you do, for example, when the port > allocation policy in the kernel changes? Change your emulation? > > I completely understand that the existing port management services are not > exported, but functionally, they support multiple port spaces, show up in > netstat, etc... Can someone please explain to me the reluctance to use these > services in favor of replicating them? > > Sorry if this reads as a rant...but I feel we're on the wrong track... Hmm. inet_csk_get_port actually *is* exported, and while it might be hard for CMA to use it (needs struct sock*), maybe it is easy for SDP. So, possibly we should just leave the CMA port allocation as is, and enhance SDP to use inet_csk_get_port. -- MST From mst at mellanox.co.il Tue Sep 12 21:14:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 07:14:38 +0300 Subject: [openib-general] [Bug 232] SLES10 PPC64: uverbs_mem.c fails to link due to missing HPAGE_SHIFT In-Reply-To: <20060913040938.734472283D4@openib.ca.sandia.gov> References: <20060913040938.734472283D4@openib.ca.sandia.gov> Message-ID: <20060913041438.GC20225@mellanox.co.il> Probably not exported. Look at ia64 work around. Quoting r. bugzilla-daemon at openib.org : Subject: [Bug 232] SLES10 PPC64: uverbs_mem.c fails to link due to missing HPAGE_SHIFT http://openib.org/bugzilla/show_bug.cgi?id=232 ------- Comment #1 from bos at pathscale.com 2006-09-12 21:09 ------- I see HPAGE_SHIFT in /proc/kallsyms. This gets weirder. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. -- MST From sean.hefty at intel.com Tue Sep 12 21:39:29 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 12 Sep 2006 21:39:29 -0700 Subject: [openib-general] CMA issue: bind selects the same port after close In-Reply-To: <20060913035720.GA20225@mellanox.co.il> Message-ID: <000001c6d6ee$9a47e000$5bd9180a@amr.corp.intel.com> >> I completely understand that the existing port management services are not >> exported, but functionally, they support multiple port spaces, show up in >> netstat, etc... Can someone please explain to me the reluctance to use these >> services in favor of replicating them? My reluctance to use the existing port spaces is that we're not guaranteed to run TCP or IP. I'm happy to map the address spaces, but that's not the same as using those addresses when you're not using that protocol. >inet_csk_get_port actually *is* exported, and while it might be hard for CMA to >use it (needs struct sock*), maybe it is easy for SDP. I did look at this, but the use of struck sock made it extremely difficult for the CMA to use the existing calls. >So, possibly we should just leave the CMA port allocation as is, >and enhance SDP to use inet_csk_get_port. That sounds reasonable. - Sean From bos at pathscale.com Tue Sep 12 21:51:18 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 12 Sep 2006 21:51:18 -0700 Subject: [openib-general] [Bug 232] SLES10 PPC64: uverbs_mem.c fails to link due to missing HPAGE_SHIFT In-Reply-To: <20060913041438.GC20225@mellanox.co.il> References: <20060913040938.734472283D4@openib.ca.sandia.gov> <20060913041438.GC20225@mellanox.co.il> Message-ID: <1158123078.30173.13.camel@sardonyx> On Wed, 2006-09-13 at 07:14 +0300, Michael S. Tsirkin wrote: > Probably not exported. Look at ia64 work around. That's right; it's not exported. I don't see any sign of a possible workaround for powerpc, though; none of the necessary stuff is exported. I'm inclined to think that this patch and the hpage backport patch should probably be dropped. References: <20060907214524.GA14791@mellanox.co.il> <4505E130.8010301@ichips.intel.com> <20060911222956.GD17098@mellanox.co.il> <45073FF7.7020506@ichips.intel.com> Message-ID: <20060913052341.GD20225@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] IB/cma: add rdma_establish > > Michael S. Tsirkin wrote: > >>>As a side note, reasons for frequent loss of RTU must be investigated. > >> > >>A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU > >>never showing up? > > > > > > Seems like that. I know fir sure I do accept after REP but remote side never > > gets ESTABLISHED. > > I looked at the code, then ran some tests. The REP is retried until an RTU is > received, or its number of retries is exhausted. By modifying the IB CM, I was > able to force RTU drops. Using madeye, I could see that the REP would be > retried, resulting in the RTU being resent. After 4 drops, I had the code > receive the RTU, which allowed the test to proceed. > > A couple things to look at in OFED would be the setting of max cm retries and > the cm timeout. > > - Sean OFED uses CMA from upstream kernel. If default parameters there are inappropriate, maybe should fix them? BTW, how about the idea of exporting max cm retries in transport-independent header? -- MST From mst at mellanox.co.il Tue Sep 12 22:24:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 08:24:40 +0300 Subject: [openib-general] CMA issue: bind selects the same port after close In-Reply-To: <000001c6d6ee$9a47e000$5bd9180a@amr.corp.intel.com> References: <20060913035720.GA20225@mellanox.co.il> <000001c6d6ee$9a47e000$5bd9180a@amr.corp.intel.com> Message-ID: <20060913052440.GE20225@mellanox.co.il> Quoting r. Sean Hefty : > >So, possibly we should just leave the CMA port allocation as is, > >and enhance SDP to use inet_csk_get_port. > > That sounds reasonable. OK, so this needs looking into. -- MST From krkumar2 at in.ibm.com Tue Sep 12 22:37:53 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Wed, 13 Sep 2006 11:07:53 +0530 Subject: [openib-general] [PATCH] Optimize cma_process_remove() Message-ID: <20060913053753.5539.76298.sendpatchset@localhost.localdomain> Hi Sean, > I believe that this has the same issue. If a user tries to destroy an > rdma_cm_id, it will remove itself from the "device list". (This is why the ID's > are moved to a new list, so that the removal still works.) In the code above, > destroy thread(s) will remove ID(s) from the remove_list while we're trying to > walk it. Thanks for the explanation. So a list_del_init() would be the best thing to do. Another option is to add a remove_list to rdma_id_private by which this entry could be added to a local remove_list and traversed without holding a lock, but it doesn't make sense to add that for one case. Does the following patch look OK ? Thanks, - KK Signed-off-by: Krishna Kumar diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-13 10:56:56.000000000 +0530 +++ new/core/cma.c 2006-09-13 10:57:20.000000000 +0530 @@ -2332,12 +2332,9 @@ static int cma_remove_id_dev(struct rdma static void cma_process_remove(struct cma_device *cma_dev) { - struct list_head remove_list; struct rdma_id_private *id_priv; int ret; - INIT_LIST_HEAD(&remove_list); - mutex_lock(&lock); while (!list_empty(&cma_dev->id_list)) { id_priv = list_entry(cma_dev->id_list.next, @@ -2348,8 +2345,7 @@ static void cma_process_remove(struct cm continue; } - list_del(&id_priv->list); - list_add_tail(&id_priv->list, &remove_list); + list_del_init(&id_priv->list); atomic_inc(&id_priv->refcount); mutex_unlock(&lock); From mst at mellanox.co.il Tue Sep 12 22:55:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 08:55:29 +0300 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com> References: <1158108010.8759.192.camel@brick.pathscale.com> Message-ID: <20060913055529.GF20225@mellanox.co.il> Quoting r. Ralph Campbell : > Subject: How to support IOMMUs for ipath driver > > Problem: > > The IB kernel to IB device driver interface uses dma_map_single() > and dma_map_sg() to allocate device bus addresses for HW DMA. > These bus addresses are passed to the IB device driver via ib_post_send() > and ib_post_recv(). > > The ib_ipath driver needs kernel virtual addresses in order to be able > to copy data to/from the posted work requests since it does not > use HW DMA. It currently relies on the mapping being one-to-one > and cannot reasonably reverse the mapping when an IOMMU is present. Actually, Mellanox HCAs support DMA, but they *also* have the ability to copy data to the posted work requests - this is the INLINE flag which we only implemented in userspace - but not in kernel, since we could not get at the kernel virtual address. It is actually useful for reducing latency for small messages. I wander whether ehca also can benefit from this capability. Anyone? Unfortunately, the API you propose is not flexible enough in this respect as it still does not seem to allow this optimization in kernel. If we are changing the API, I would like the new API to be flexible enough to enable this optimization. I am not exactly sure what the best way to do that would be. How about making it possible for ULPs to pass in kernel virtual address in post send? -- MST From eitan at mellanox.co.il Tue Sep 12 22:58:56 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 13 Sep 2006 08:58:56 +0300 Subject: [openib-general] [PATCH][TRIVIAL] OpenSM: Eliminate unused max_port_profile parameter In-Reply-To: <1158053698.27427.144058.camel@hal.voltaire.com> References: <1158053698.27427.144058.camel@hal.voltaire.com> Message-ID: <45079E20.303@mellanox.co.il> Hi Hal, Thanks for leaning this up. Eitan Hal Rosenstock wrote: >OpenSM: Eliminate unused max_port_profile parameter in OpenSM subnet >options structure > >Signed-off-by: Hal Rosenstock > >Index: include/opensm/osm_subnet.h >=================================================================== >--- include/opensm/osm_subnet.h (revision 9424) >+++ include/opensm/osm_subnet.h (working copy) >@@ -269,7 +269,6 @@ typedef struct _osm_subn_opt > boolean_t console; > cl_map_t port_prof_ignore_guids; > boolean_t port_profile_switch_nodes; >- uint32_t max_port_profile; > osm_pfn_ui_extension_t pfn_ui_pre_lid_assign; > void * ui_pre_lid_assign_ctx; > osm_pfn_ui_mcast_extension_t pfn_ui_mcast_fdb_assign; >@@ -405,10 +404,6 @@ typedef struct _osm_subn_opt > * If TRUE will count the number of switch nodes routed through > * the link. If FALSE - only CA/RT nodes are counted. > * >-* max_port_profile >-* Prevent routing through a port subscribed with more than this >-* number of routes. >-* > * pfn_ui_pre_lid_assign > * A UI function to be invoked prior to lid assigment. It should > * return 1 if any change was made to any lid or 0 otherwise. >Index: include/opensm/osm_switch.h >=================================================================== >--- include/opensm/osm_switch.h (revision 9347) >+++ include/opensm/osm_switch.h (working copy) >@@ -1108,7 +1108,6 @@ osm_switch_recommend_path( > IN OUT uint16_t *p_num_used_sys, > IN OUT uint64_t *remote_node_guids, > IN OUT uint16_t *p_num_used_nodes, >- IN const uint32_t max_routes_subscribed, > IN boolean_t ui_ucast_fdb_assign_func_defined > ); > /* >@@ -1139,12 +1138,6 @@ osm_switch_recommend_path( > * p_num_used_nodes > * [in out] The number of remote nodes used for routing to the port. > * >-* max_routes_subscribed >-* [in] The maximum allowed number of target lids routed through >-* a specific port of the switch. If the port already assigned >-* (in the lfdb) this number of target lids - it will not be used >-* even if it has the smallest hops count to the target lid. >-* > * ui_ucast_fdb_assign_func_defined > * [in] If TRUE - this means that there is a ui ucast_fdb_assign table > * function defined (in pfn_ui_ucast_fdb_assign in subnet opts). This >Index: opensm/osm_subnet.c >=================================================================== >--- opensm/osm_subnet.c (revision 9423) >+++ opensm/osm_subnet.c (working copy) >@@ -483,7 +483,6 @@ osm_subn_set_default_opt( > p_opt->no_qos = FALSE; > p_opt->accum_log_file = TRUE; > p_opt->port_profile_switch_nodes = FALSE; >- p_opt->max_port_profile = 0xffffffff; > p_opt->pfn_ui_pre_lid_assign = NULL; > p_opt->ui_pre_lid_assign_ctx = NULL; > p_opt->pfn_ui_mcast_fdb_assign = NULL; >Index: opensm/osm_switch.c >=================================================================== >--- opensm/osm_switch.c (revision 9427) >+++ opensm/osm_switch.c (working copy) >@@ -233,7 +233,6 @@ osm_switch_recommend_path( > IN OUT uint16_t *p_num_used_sys, > IN OUT uint64_t *remote_node_guids, > IN OUT uint16_t *p_num_used_nodes, >- IN const uint32_t max_routes_subscribed, > IN boolean_t ui_ucast_fdb_assign_func_defined > ) > { >@@ -425,8 +424,7 @@ osm_switch_recommend_path( > /* > the count is min but also lower then the max subscribed > */ >- if( (check_count < least_paths) && >- (check_count <= max_routes_subscribed)) >+ if( check_count < least_paths ) > { > port_found = TRUE; > best_port = port_num; >Index: opensm/osm_ucast_mgr.c >=================================================================== >--- opensm/osm_ucast_mgr.c (revision 9347) >+++ opensm/osm_ucast_mgr.c (working copy) >@@ -281,7 +281,7 @@ __osm_ucast_mgr_dump_ucast_routes( > best_port = osm_switch_recommend_path( > p_sw, lid_ho, TRUE, > NULL, NULL, NULL, NULL, /* No LMC Optimization */ >- 0xffffffff, ui_ucast_fdb_assign_func_defined ); >+ ui_ucast_fdb_assign_func_defined ); > sprintf( line, "No %u hop path possible via port %u!", > best_hops, best_port ); > strcat( p_mgr->p_report_buf, line ); >@@ -752,12 +752,10 @@ __osm_ucast_mgr_process_port( > port = osm_switch_recommend_path( p_sw, lid_ho, ignore_existing, > remote_sys_guids, &num_used_sys, > remote_node_guids, &num_used_nodes, >- p_mgr->p_subn->opt.max_port_profile, > ui_ucast_fdb_assign_func_defined ); > else > port = osm_switch_recommend_path( p_sw, lid_ho, ignore_existing, > NULL, NULL, NULL, NULL, >- p_mgr->p_subn->opt.max_port_profile, > ui_ucast_fdb_assign_func_defined ); > > /* > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Tue Sep 12 23:25:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 09:25:18 +0300 Subject: [openib-general] OFED-1.1-rc4 is ready In-Reply-To: <1158125915.30173.27.camel@sardonyx> References: <1158125915.30173.27.camel@sardonyx> Message-ID: <20060913062518.GL20225@mellanox.co.il> Quoting r. Bryan O'Sullivan : > > the ibv_driver_init function was changed to openib_driver_init. > > By the way, I find it unsettling that the current libibverbs internal > ABI allows silent breakage like this that cannot be detected except at > runtime, and then only when the right hardware is present. > > Mind you, I don't have any better suggestions in mind (at least not at > 10:30pm). > > But I worry about the possibility this leaves open for botched field > upgrades breaking userspace in you-don't-find-out-until-it's-too-late > ways when libibverbs 1.1 starts being used. libipathverbs can simply export both ibv_driver_init and openib_driver_init like libmthca does, that's what we'll do for OFED. Or maybe Doug here can come up with some symbol versioning trick. Dough? -- MST From erezz at voltaire.com Wed Sep 13 01:20:37 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 13 Sep 2006 11:20:37 +0300 Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine In-Reply-To: <20060912134725.GC22369@mellanox.co.il> References: <20060912134725.GC22369@mellanox.co.il> Message-ID: <4507BF55.8010507@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Bub Thomas : > >> Subject: RE: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine >> >> Michael, >> I don't understand what you mean on the iser trouble. >> > > Or Gerlitz from Voltaire is the iser maintainer. I Cc him. > > It seems that the iSER version in OFED-1.1-rc3 is not compatible with open-iscsi in SLES 10. I will take a look in it. Michael - I'm taking responsibility on iSER from Or Gerlitz. Can you cc me on this kind of e-mails in the future? Erez From ogerlitz at voltaire.com Wed Sep 13 01:26:47 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 13 Sep 2006 11:26:47 +0300 Subject: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept In-Reply-To: <4506D5E5.2010602@ichips.intel.com> References: <4506D5E5.2010602@ichips.intel.com> Message-ID: <4507C0C7.3020600@voltaire.com> Sean Hefty wrote: > Or Gerlitz wrote: >> + * In the case of error, a reject message is sent to the remote side >> and the >> + * state of the qp associated with the id is modified to error, such >> that any >> + * previously posted receive buffers would be flushed. > > Hmm... this makes me question whether this is what it should be doing. > Is there any reason not to reject the connection if accept fails? I think this (sending REJ, modifying the QP to ERROR) is exactly what it should be doing. Why would someone count/expect that a REJ would not be sent in this case? Even if for some reason which i don't see now we will do some change here (eg let the ULP send the REJ etc), lets have this patch which document what we have now merged for 2.6.19. Or. From ogerlitz at voltaire.com Wed Sep 13 02:00:50 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 13 Sep 2006 12:00:50 +0300 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com> References: <1158108010.8759.192.camel@brick.pathscale.com> Message-ID: <4507C8C2.6050206@voltaire.com> Ralph Campbell wrote: > Problem: > > The IB kernel to IB device driver interface uses dma_map_single() > and dma_map_sg() to allocate device bus addresses for HW DMA. > These bus addresses are passed to the IB device driver via ib_post_send() > and ib_post_recv(). > > The ib_ipath driver needs kernel virtual addresses in order to be able > to copy data to/from the posted work requests since it does not > use HW DMA. It currently relies on the mapping being one-to-one > and cannot reasonably reverse the mapping when an IOMMU is present. Oops, please note that one can get through the DMA api a DMA address for a page which is currently **not** mapped into the kernel virtual address space (that is page_address(p) is NULL), so you must add kmap and kunmap into your fast RX/TX code path. Examples for scenarios when this happen i can think of are Direct I/O and some sort of pre-fetching done by File-System. Some pages present in a kernel SG which needs to be sent/received/RDMA-ed over IB need not be mapped into the kernel virtual address space. As for RDMA, please note that the problem has two faces, the remote device which does the RDMA or the local device does RDMA from/to and second, the local device. Since you need to be able interop between devices that support DMA mappings to ones which do not, how do you suggest to manage the addresses for the following schemes (1 stands for device supporting DMA addresses and 0 for device which does not) <1,1> <1,0> <0,1> <0,0> Please assume for the purpose of discussion that each side knows the polarity of the remote side? After writing the section on RDMA i think i might went to the wrong direction since ipath emulates RDMA in SW, can you shed some light on this? > I also tried proposing adding a flag to the ib_device structure > and modifying the kernel IB code to check the flag and pass > either the dma_*() mapped address or a kernel virtual address. > This works OK for kmalloc() buffers where dma_map_single() is > being called but doesn't work well for SRP which has lists > of physical pages and calls dma_map_sg(). > It also means that the kernel IB layer needs to explicitly handle > two different kinds of addresses. Just a note, its not just SRP there... its any ulp which needs to move over IB data present bunch of pages (eg packed in a kernel SG list), namely iSER, NFSoRDMA, Lustre, IB native imp of send_page(), etc. Or. From ogerlitz at voltaire.com Wed Sep 13 02:13:16 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 13 Sep 2006 12:13:16 +0300 Subject: [openib-general] [PATCH] IB/cma: add rdma_establish In-Reply-To: <4506D0A1.7060405@ichips.intel.com> References: <20060907214524.GA14791@mellanox.co.il> <4505E130.8010301@ichips.intel.com> <450670D2.4040805@voltaire.com> <4506D0A1.7060405@ichips.intel.com> Message-ID: <4507CBAC.40008@voltaire.com> Sean Hefty wrote: > Or Gerlitz wrote: >> Just to make sure, you come to say that you would merge this patch >> instead the one that had the CM track local qp numbers and install a >> callback for the consumer QP to catch the async event etc? > > correct > >> Indeed the **patch** for itself is somehow simpler, but the consumer >> must get established event before posting sends to the qp so they need >> to either queue RX-es or modify the QP to RTS before sending the REP. > > The first patch only allows the option of waiting for the established > event. > >> Is rdma_established() --> cm_establish() callable from non >> interruptible context? > > Yes > >> Also does the patch ensures only one ESTABLISHED event would be called >> for the id, no matter if rdma_establish() and an RTU reception happen >> in parallel? > > Yes OK, thanks for all the clarifications. Or. From mst at mellanox.co.il Wed Sep 13 02:31:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 12:31:29 +0300 Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine In-Reply-To: <4507BF55.8010507@voltaire.com> References: <20060912134725.GC22369@mellanox.co.il> <4507BF55.8010507@voltaire.com> Message-ID: <20060913093129.GH22222@mellanox.co.il> Quoting r. Erez Zilber : > Subject: Re: Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine > > > Michael S. Tsirkin wrote: > > Quoting r. Bub Thomas : > > > >> Subject: RE: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine > >> > >> Michael, > >> I don't understand what you mean on the iser trouble. > >> > > > > Or Gerlitz from Voltaire is the iser maintainer. I Cc him. > > > > > It seems that the iSER version in OFED-1.1-rc3 is not compatible with > open-iscsi in SLES 10. I will take a look in it. Please do - note we need a patch today to make it into (hopefully last) RC. > Michael - I'm taking responsibility on iSER from Or Gerlitz. Can you cc > me on this kind of e-mails in the future? > > Erez Sure. -- MST From halr at voltaire.com Wed Sep 13 03:21:12 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 06:21:12 -0400 Subject: [openib-general] [PATCH] libibmad: Add sa_rpc_call API Message-ID: <1158142806.27427.193351.camel@hal.voltaire.com> libibmad: Add sa_rpc_call API Signed-off-by: Hal Rosenstock Index: libibmad/include/infiniband/mad.h =================================================================== --- libibmad/include/infiniband/mad.h (revision 9425) +++ libibmad/include/infiniband/mad.h (working copy) @@ -748,6 +748,8 @@ safe_smp_set(void *rcvbuf, ib_portid_t * /* sa.c */ uint8_t * sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, uint timeout); +uint8_t * sa_rpc_call(void *ibmad_port, void *rcvbuf, ib_portid_t *portid, + ib_sa_call_t *sa, uint timeout); int ib_path_query(ib_gid_t srcgid, ib_gid_t destgid, ib_portid_t *sm_id, void *buf); /* returns lid */ Index: libibmad/src/libibmad.map =================================================================== --- libibmad/src/libibmad.map (revision 9425) +++ libibmad/src/libibmad.map (working copy) @@ -1,4 +1,4 @@ -IBMAD_1.1 { +IBMAD_1.2 { global: _mad_dump; _mad_dump_field; @@ -79,6 +79,7 @@ IBMAD_1.1 { madrpc_unlock; ib_path_query; sa_call; + sa_rpc_call; mad_alloc; mad_free; mad_receive; Index: libibmad/src/sa.c =================================================================== --- libibmad/src/sa.c (revision 9425) +++ libibmad/src/sa.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -50,7 +50,8 @@ #define DEBUG if (ibdebug) IBWARN uint8_t * -sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, uint timeout) +sa_rpc_call(void *ibmad_port, void *rcvbuf, ib_portid_t *portid, + ib_sa_call_t *sa, uint timeout) { ib_rpc_t rpc = {0}; uint8_t *p; @@ -77,7 +78,7 @@ sa_call(void *rcvbuf, ib_portid_t *porti if (!portid->qkey) portid->qkey = IB_DEFAULT_QP1_QKEY; - p = madrpc_rmpp(&rpc, portid, 0/*&sa->rmpp*/, rcvbuf); /* TODO: RMPP */ + p = mad_rpc_rmpp(ibmad_port, &rpc, portid, 0/*&sa->rmpp*/, rcvbuf); /* TODO: RMPP */ sa->recsz = rpc.recsz; Index: libibmad/src/rpc.c =================================================================== --- libibmad/src/rpc.c (revision 9425) +++ libibmad/src/rpc.c (working copy) @@ -386,3 +386,14 @@ mad_rpc_close_port(void *port_id) umad_close_port(p->port_id); free(p); } + +uint8_t * +sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, uint timeout) +{ + struct ibmad_port port; + + port.port_id = mad_portid; + port.class_agents[IB_SA_CLASS] = mad_class_agent(IB_SA_CLASS); + return sa_rpc_call(&port, rcvbuf, portid, sa, timeout); +} + From ogerlitz at voltaire.com Wed Sep 13 04:27:51 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 13 Sep 2006 14:27:51 +0300 Subject: [openib-general] IPOIB failover ? In-Reply-To: <1158067443.11227.207.camel@localhost.localdomain> References: <1158067443.11227.207.camel@localhost.localdomain> Message-ID: <4507EB37.5080702@voltaire.com> Richard Frank wrote: > Does IPOIB in this stack support transparent fail over between ports and > across redundant HCAs using a "virtual IP" ? I am working on a patch to the linux bonding driver which will allow it to enslave also IPoIB devices for the active-backup mode. I will send an RFC to netdev for review next week. Does this meets your needs? Does by virtual IP you mean an ***alias address*** assigned at one point of time to one ipoib device and in another point of time (eg during fail-over) to a second ipoib device? does this approach have any advantage on the bonding approach? Or. From mst at mellanox.co.il Wed Sep 13 05:01:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 15:01:54 +0300 Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish In-Reply-To: <45073FF7.7020506@ichips.intel.com> References: <45073FF7.7020506@ichips.intel.com> Message-ID: <20060913120154.GA23890@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] IB/cma: add rdma_establish > > Michael S. Tsirkin wrote: > >>>As a side note, reasons for frequent loss of RTU must be investigated. > >> > >>A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU > >>never showing up? > > > > > > Seems like that. I know fir sure I do accept after REP but remote side never > > gets ESTABLISHED. > > I looked at the code, then ran some tests. The REP is retried until an RTU is > received, or its number of retries is exhausted. By modifying the IB CM, I was > able to force RTU drops. Using madeye, I could see that the REP would be > retried, resulting in the RTU being resent. After 4 drops, I had the code > receive the RTU, which allowed the test to proceed. > > A couple things to look at in OFED would be the setting of max cm retries and > the cm timeout. What I think we need for 2.6.18 is the following. Pls comment. IB/cma: increase the retry count in CMA from 3 to maximum 15. 3 seems low - we see connections failing under stress - and in any case looks like an arbitrary number. 15 is the max value allowed by spec. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d6f99d5..5d625a8 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -49,7 +49,7 @@ MODULE_DESCRIPTION("Generic RDMA CM Agen MODULE_LICENSE("Dual BSD/GPL"); #define CMA_CM_RESPONSE_TIMEOUT 20 -#define CMA_MAX_CM_RETRIES 3 +#define CMA_MAX_CM_RETRIES 15 static void cma_add_one(struct ib_device *device); static void cma_remove_one(struct ib_device *device); -- MST From Richard.Frank at oracle.com Wed Sep 13 05:12:19 2006 From: Richard.Frank at oracle.com (Richard Frank) Date: Wed, 13 Sep 2006 08:12:19 -0400 Subject: [openib-general] IPOIB failover ? In-Reply-To: <4507EB37.5080702@voltaire.com> References: <1158067443.11227.207.camel@localhost.localdomain> <4507EB37.5080702@voltaire.com> Message-ID: <1158149539.13254.45.camel@localhost.localdomain> Supporting IPOIB fail over with the Bonding driver will work - we currently use this for GE, etc. On Wed, 2006-09-13 at 14:27 +0300, Or Gerlitz wrote: > Richard Frank wrote: > > Does IPOIB in this stack support transparent fail over between ports and > > across redundant HCAs using a "virtual IP" ? > > I am working on a patch to the linux bonding driver which will allow it > to enslave also IPoIB devices for the active-backup mode. I will send an > RFC to netdev for review next week. Does this meets your needs? > > Does by virtual IP you mean an ***alias address*** assigned at one point > of time to one ipoib device and in another point of time (eg during > fail-over) to a second ipoib device? does this approach have any > advantage on the bonding approach? > > Or. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From johnt1johnt2 at gmail.com Wed Sep 13 06:05:58 2006 From: johnt1johnt2 at gmail.com (john t) Date: Wed, 13 Sep 2006 18:35:58 +0530 Subject: [openib-general] ibis Message-ID: Hi, In OFED there are commands like ibis, ibdmsh and ibmssh all of these provide a shell prompt and allow some operations like "new_IBFabric", "delete_IBFabric" etc. What are these operations and how to use them. Besides there are many commands which ask for a topology file. How do I generate a topology file. Is it same as produced by "ibnetdiscover" (which is not working in my case)? Regards John T. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.bub at thomson.net Wed Sep 13 06:39:55 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Wed, 13 Sep 2006 15:39:55 +0200 Subject: [openib-general] OFED can't compile against sa.h under SLES10 x86_64 Message-ID: Michael, this is another little issue using OFED under SLES10. In sa.h there is the definition of a struct ibv_sa_path_record that gets re-defined against ib_sa_path_record in the same header file. While the gcc 3.3.3 compile of SLES 9 is OK with this the gcc 4.1 comiple of SLEs 10 does not like this. Thomas Bub -------------- next part -------------- An HTML attachment was scrubbed... URL: From Brian.Cain at ge.com Wed Sep 13 06:50:53 2006 From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare)) Date: Wed, 13 Sep 2006 09:50:53 -0400 Subject: [openib-general] IPOIB failover ? In-Reply-To: <1158149539.13254.45.camel@localhost.localdomain> Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033E84F8F@CINMLVEM11.e2k.ad.ge.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Richard Frank > Sent: Wednesday, September 13, 2006 7:12 AM > To: Or Gerlitz > Cc: openib-general at openib.org > Subject: Re: [openib-general] IPOIB failover ? > > Supporting IPOIB fail over with the Bonding driver will work - we > currently use this for GE, etc. You can also get failover with IPoIB if you're willing to use SCTP as the transport. -Brian From halr at voltaire.com Wed Sep 13 06:47:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 09:47:42 -0400 Subject: [openib-general] ibis In-Reply-To: References: Message-ID: <1158155255.13748.171.camel@hal.voltaire.com> Hi, On Wed, 2006-09-13 at 09:05, john t wrote: > Hi, > > In OFED there are commands like ibis, ibdmsh and ibmssh I'm adding Eitan who is the maintainer for these tools. > all of these provide a shell prompt and allow some operations like > "new_IBFabric", "delete_IBFabric" etc. What are these operations and > how to use them. > > Besides there are many commands which ask for a topology file. How do > I generate a topology file. Is it same as produced by "ibnetdiscover" I don't think so. > (which is not working in my case)? What are the symptoms ? Do you have ib_umad module loaded ? Does it have proper permissions ? -- Hal > Regards > John T. > > ______________________________________________________________________ > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From thomas.bub at thomson.net Wed Sep 13 07:11:29 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Wed, 13 Sep 2006 16:11:29 +0200 Subject: [openib-general] How to connect gen2 CM to gen1 IBGD CM? Message-ID: Sean, with your patience, the cmpost.c example and the OFED 1.1-rc4 on all machines I finally got a gen2 connection under SLES10 even with a 32-Bit executable on a x86_64 machine. Cool! Now the last part on my journey is standing out. It's a gen2 client connecting to a gen1 IBGD server. I have to do this since my gen1 server is running a 2.4 Montavista RT Linux on a PowerPC that I can't upgrade to gen2. :-( BTW.: Our application is a high speed film image transfer in the film postproduction industry leveraging the benefits of the high speed IB RDMA transport. While I have gen1 to gen1 and gen2 to gen2 running the only thing that is missing is the gen2 connecting to gen1. Just tried this with my test-executables but I did not get anything to the gen1 server. The gen1 userspace application does not even receive the IB_CM_REQ. So since your cmpost example did help me a lot on gen2 the question is: Do you have a cmpost for gen1 IBGD I can use to connect from gen2 to gen1? Or is there any other trick to play here? Thanks in advance for your assistance Thomas ............................................................ Thomas Bub Grass Valley Germany GmbH Brunnenweg 9 64331 Weiterstadt, Germany Tel: +49 6150 104 147 Fax: +49 6150 104 656 Email: Thomas.Bub at thomson.net www.GrassValley.com ............................................................ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Wed Sep 13 07:22:57 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 13 Sep 2006 17:22:57 +0300 Subject: [openib-general] ibis In-Reply-To: References: Message-ID: <45081441.80703@mellanox.co.il> Hi John, You should read the man pages of these commands. If you use OFED 1.0 you should have those as part of the main doc directory. But they are all present in the source tree too. Please see https://openib.org/svn/gen2/utils/src/linux-user : ibdm/doc/ibdmtr.1 ibdm/doc/ibdmchk.1 ibdm/doc/ibdmsh.1 ibdm/doc/ibdm-topo-file.1 ibdm/doc/ibdm-ibnl-file.1 ibdm/doc/ibtopodiff.1 ibis/doc/ibis.1 ibmgtsim/doc/IBMgtSim.1 ibmgtsim/doc/RunSimTest.1 ibmgtsim/doc/ibmsquit.1 ibmgtsim/doc/mkSimNodeDir.1 ibmgtsim/doc/ibmssh.1 Regarding the topology file, you should read the man page: ibdm/doc/ibdm-topo-file.1 It is not the one generated by ibnetdiscover Eitan john t wrote: > Hi, > > In OFED there are commands like ibis, ibdmsh and ibmssh all of these > provide > a shell prompt and allow some operations like "new_IBFabric", > "delete_IBFabric" etc. What are these operations and how to use them. > > Besides there are many commands which ask for a topology file. How do I > generate a topology file. Is it same as produced by "ibnetdiscover" > (which > is not working in my case)? > > Regards > John T. > >------------------------------------------------------------------------ > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Wed Sep 13 07:51:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 17:51:22 +0300 Subject: [openib-general] OFED can't compile against sa.h under SLES10 x86_64 In-Reply-To: References: Message-ID: <20060913145122.GB24608@mellanox.co.il> Quoting r. Bub Thomas : > Subject: OFED can't compile against sa.h under SLES10 x86_64 > > Michael, > > this is another little issue using OFED under SLES10. > > In sa.h there is the definition of a struct ibv_sa_path_record that gets re-defined against ib_sa_path_record in the same header file. > > While the gcc 3.3.3 compile of SLES 9 is OK with this the gcc 4.1 comiple of SLEs 10 does not like this. > > Thomas Bub > I don't see that. What files are affected? What kind of error do you see? -- MST From rdreier at cisco.com Wed Sep 13 08:09:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Sep 2006 08:09:29 -0700 Subject: [openib-general] user-mode data strucures References: <1158092339.9296.20.camel@trinity.ogc.int> <1158097515.9296.25.camel@trinity.ogc.int> <1158100078.9296.29.camel@trinity.ogc.int> <1158102636.9296.34.camel@trinity.ogc.int> Message-ID: Tom> I'm OK with a work-around in the near term. BTW, how do we Tom> correlate libibverbs 1.x with cat Tom> /sys/class/infiniband/verbs/abi_version? abi_version is the ABI exported by the kernel -- all up-to-date versions of libibverbs (1.0.x and 1.1.x) should be able to cope with all kernel versions. So there's not really a connection. - R. From mst at mellanox.co.il Wed Sep 13 08:57:26 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 18:57:26 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limit MTU to 1K Message-ID: <20060913155726.GA24954@mellanox.co.il> Tavor systems get better performance with 1K MTU. Since there does not seem to be any way to find out whether the remote system uses Tavor, add an option to limit the MTU globally. Signed-off-by: Michael S. Tsirkin --- Sean, can you ack the following for 2.6.18 please? Index: linux-2.6.18-rc2-devel/drivers/infiniband/core/cma.c =================================================================== --- linux-2.6.18-rc2-devel.orig/drivers/infiniband/core/cma.c 2006-09-11 16:01:37.000000000 +0300 +++ linux-2.6.18-rc2-devel/drivers/infiniband/core/cma.c 2006-09-13 18:51:45.000000000 +0300 @@ -48,6 +48,10 @@ MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); MODULE_LICENSE("Dual BSD/GPL"); +static int tavor_quirk = 0; +module_param_named(tavor_quirk, tavor_quirk, int, 0644); +MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk: limit MTU to 1K if > 0"); + #define CMA_CM_RESPONSE_TIMEOUT 20 #define CMA_MAX_CM_RETRIES 3 @@ -1123,6 +1127,11 @@ static int cma_query_ib_route(struct rdm path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); path_rec.numb_path = 1; + if (tavor_quirk) { + path_rec.mtu_selector = IB_SA_LTE; + path_rec.mtu = IB_MTU_1024; + } + id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device, id_priv->id.port_num, &path_rec, IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | -- MST From halr at voltaire.com Wed Sep 13 09:06:21 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 12:06:21 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limit MTU to 1K In-Reply-To: <20060913155726.GA24954@mellanox.co.il> References: <20060913155726.GA24954@mellanox.co.il> Message-ID: <1158163574.13748.5521.camel@hal.voltaire.com> On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > Tavor systems get better performance with 1K MTU. Since there does > not seem to be any way to find out whether the remote system uses Tavor, > add an option to limit the MTU globally. Can't Tavor be determined locally ? And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > Signed-off-by: Michael S. Tsirkin > > --- > > Sean, can you ack the following for 2.6.18 please? > > Index: linux-2.6.18-rc2-devel/drivers/infiniband/core/cma.c > =================================================================== > --- linux-2.6.18-rc2-devel.orig/drivers/infiniband/core/cma.c 2006-09-11 16:01:37.000000000 +0300 > +++ linux-2.6.18-rc2-devel/drivers/infiniband/core/cma.c 2006-09-13 18:51:45.000000000 +0300 > @@ -48,6 +48,10 @@ MODULE_AUTHOR("Sean Hefty"); > MODULE_DESCRIPTION("Generic RDMA CM Agent"); > MODULE_LICENSE("Dual BSD/GPL"); > > +static int tavor_quirk = 0; > +module_param_named(tavor_quirk, tavor_quirk, int, 0644); > +MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk: limit MTU to 1K if > 0"); > + > #define CMA_CM_RESPONSE_TIMEOUT 20 > #define CMA_MAX_CM_RETRIES 3 > > @@ -1123,6 +1127,11 @@ static int cma_query_ib_route(struct rdm > path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr)); > path_rec.numb_path = 1; > > + if (tavor_quirk) { > + path_rec.mtu_selector = IB_SA_LTE; > + path_rec.mtu = IB_MTU_1024; > + } > + > id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device, > id_priv->id.port_num, &path_rec, > IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | Aren't more component mask bits needed here for MTU selector and MTU ? -- Hal From mst at mellanox.co.il Wed Sep 13 09:22:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 19:22:45 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158163574.13748.5521.camel@hal.voltaire.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> Message-ID: <20060913162245.GA25666@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > Tavor systems get better performance with 1K MTU. Since there does > > not seem to be any way to find out whether the remote system uses Tavor, > > add an option to limit the MTU globally. > > Can't Tavor be determined locally ? It can, but we need this for remote tavor as well, anyway. > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side and if this does not conflict with MTU selector. However 1. Even opensm does not implement this optimization yet 2. We need to work with existing SMs too -- MST From mshefty at ichips.intel.com Wed Sep 13 09:30:26 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 09:30:26 -0700 Subject: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept In-Reply-To: References: Message-ID: <45083222.9000005@ichips.intel.com> Committed to svn 9461. Roland, can you also pull into 2.6.19? Signed-off-by: Sean Hefty Or Gerlitz wrote: > Document the reject sending and modifying qp to error done in rdma_accept > > Signed-off-by: Or Gerlitz > > diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h > index 402c63d..f932c16 100644 > --- a/include/rdma/rdma_cm.h > +++ b/include/rdma/rdma_cm.h > @@ -237,6 +237,10 @@ int rdma_listen(struct rdma_cm_id *id, i > * Typically, this routine is only called by the listener to accept a connection > * request. It must also be called on the active side of a connection if the > * user is performing their own QP transitions. > + * > + * In the case of error, a reject message is sent to the remote side and the > + * state of the qp associated with the id is modified to error, such that any > + * previously posted receive buffers would be flushed. > */ > int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); From halr at voltaire.com Wed Sep 13 09:26:30 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 12:26:30 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913162245.GA25666@mellanox.co.il> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> Message-ID: <1158164787.13748.6289.camel@hal.voltaire.com> On Wed, 2006-09-13 at 12:22, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > Tavor systems get better performance with 1K MTU. Since there does > > > not seem to be any way to find out whether the remote system uses Tavor, > > > add an option to limit the MTU globally. > > > > Can't Tavor be determined locally ? > > It can, but we need this for remote tavor as well, anyway. > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > and if this does not conflict with MTU selector. But it only needs the MTU on each local side (once for the REQ and on the remote side for the REP). It would mean that if the local side were capable of larger MTU and the remote side were Tavor, that the REQ would be REJ with MTU too large and need to be retried at a smaller MTU. > However > 1. Even opensm does not implement this optimization yet What optimization ? I don't understand what you are saying OpenSM doesn't support. > 2. We need to work with existing SMs too Not sure what the SA issue is here. -- Hal From ftillier at silverstorm.com Wed Sep 13 09:39:00 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Wed, 13 Sep 2006 09:39:00 -0700 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913162245.GA25666@mellanox.co.il> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> Message-ID: <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> On 9/13/06, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > Tavor systems get better performance with 1K MTU. Since there does > > > not seem to be any way to find out whether the remote system uses Tavor, > > > add an option to limit the MTU globally. > > > > Can't Tavor be determined locally ? > > It can, but we need this for remote tavor as well, anyway. > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > and if this does not conflict with MTU selector. You can't do this because the SA doesn't have a way to tell if a path query is going to be used for RC or UD, and IPoIB needs paths with 2K MTU. Would be nice if the CM REP would allow the MTU to be negotiated down. There is plenty of space in the REP if we were to use up some of the reserved fields. - Fab From halr at voltaire.com Wed Sep 13 09:35:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 12:35:58 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> Message-ID: <1158165350.13748.6667.camel@hal.voltaire.com> Hi Fab, On Wed, 2006-09-13 at 12:39, Fabian Tillier wrote: > On 9/13/06, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > > Tavor systems get better performance with 1K MTU. Since there does > > > > not seem to be any way to find out whether the remote system uses Tavor, > > > > add an option to limit the MTU globally. > > > > > > Can't Tavor be determined locally ? > > > > It can, but we need this for remote tavor as well, anyway. > > > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > > and if this does not conflict with MTU selector. > > You can't do this because the SA doesn't have a way to tell if a path > query is going to be used for RC or UD, and IPoIB needs paths with 2K > MTU. Are you referring to IPoIB-CM ? The patch appears to be for the SA PR request prior to the CM REQ. I don't think it affects IPoIB SA PR requests. -- Hal > Would be nice if the CM REP would allow the MTU to be negotiated down. > There is plenty of space in the REP if we were to use up some of the > reserved fields. > > - Fab From mshefty at ichips.intel.com Wed Sep 13 10:13:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 10:13:33 -0700 Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish In-Reply-To: <20060913120154.GA23890@mellanox.co.il> References: <45073FF7.7020506@ichips.intel.com> <20060913120154.GA23890@mellanox.co.il> Message-ID: <45083C3D.1000209@ichips.intel.com> Michael S. Tsirkin wrote: > What I think we need for 2.6.18 is the following. Pls comment. > > > IB/cma: increase the retry count in CMA from 3 to maximum 15. > 3 seems low - we see connections failing under stress - and in any case looks > like an arbitrary number. 15 is the max value allowed by spec. > > Signed-off-by: Michael S. Tsirkin Dropping 3 packets in a row seems likely only under stress testing, so I'm not sure that this is worthy of a change to 2.6.18 at this point (we're at rc7). This seems fine for 19 though. Acked-by: Sean Hefty From mshefty at ichips.intel.com Wed Sep 13 10:19:54 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 10:19:54 -0700 Subject: [openib-general] [PATCH] Optimize cma_process_remove() In-Reply-To: <20060913053753.5539.76298.sendpatchset@localhost.localdomain> References: <20060913053753.5539.76298.sendpatchset@localhost.localdomain> Message-ID: <45083DBA.2070809@ichips.intel.com> Krishna Kumar wrote: > Thanks for the explanation. So a list_del_init() would be the best > thing to do. Another option is to add a remove_list to rdma_id_private > by which this entry could be added to a local remove_list and traversed > without holding a lock, but it doesn't make sense to add that for one case. > > Does the following patch look OK ? Thanks - I committed this to svn 9462. - Sean From bos at pathscale.com Wed Sep 13 10:25:18 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 13 Sep 2006 10:25:18 -0700 Subject: [openib-general] OFED can't compile against sa.h under SLES10 x86_64 In-Reply-To: References: Message-ID: <1158168318.4503.12.camel@sardonyx> On Wed, 2006-09-13 at 15:39 +0200, Bub Thomas wrote: > While the gcc 3.3.3 compile of SLES 9 is OK with this the gcc 4.1 > comiple of SLEs 10 does not like this. I haven't seen this happen, and I do a lot of x86_64 SLES10 builds. References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <1158165350.13748.6667.camel@hal.voltaire.com> Message-ID: <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com> Hi Hal, On 13 Sep 2006 12:35:58 -0400, Hal Rosenstock wrote: > Hi Fab, > > On Wed, 2006-09-13 at 12:39, Fabian Tillier wrote: > > On 9/13/06, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > > > Tavor systems get better performance with 1K MTU. Since there does > > > > > not seem to be any way to find out whether the remote system uses Tavor, > > > > > add an option to limit the MTU globally. > > > > > > > > Can't Tavor be determined locally ? > > > > > > It can, but we need this for remote tavor as well, anyway. > > > > > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > > > > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > > > and if this does not conflict with MTU selector. > > > > You can't do this because the SA doesn't have a way to tell if a path > > query is going to be used for RC or UD, and IPoIB needs paths with 2K > > MTU. > > Are you referring to IPoIB-CM ? > > The patch appears to be for the SA PR request prior to the CM REQ. I > don't think it affects IPoIB SA PR requests. I interpreted Michael's comment as suggesting the SA return paths with a 1K MTU when it detects that either endpoint is Tavor. The SA has access to this information based on the vendor ID/device ID in the node record. If I understood Michael's comment properly, this will have the side effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as I know, there is no way to specify whether a path is needed for UD vs. RC in the path query. I like your suggestion to reject with a smaller MTU. Seems like the proper way to handle this, as well as allowing for the retry logic to be put in the CMA itself so clients don't have to deal with it. - Fab From halr at voltaire.com Wed Sep 13 10:21:35 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 13:21:35 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <1158165350.13748.6667.camel@hal.voltaire.com> <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com> Message-ID: <1158168091.13748.8242.camel@hal.voltaire.com> Hi Fab, On Wed, 2006-09-13 at 13:23, Fabian Tillier wrote: > Hi Hal, > > On 13 Sep 2006 12:35:58 -0400, Hal Rosenstock wrote: > > Hi Fab, > > > > On Wed, 2006-09-13 at 12:39, Fabian Tillier wrote: > > > On 9/13/06, Michael S. Tsirkin wrote: > > > > Quoting r. Hal Rosenstock : > > > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > > > > Tavor systems get better performance with 1K MTU. Since there does > > > > > > not seem to be any way to find out whether the remote system uses Tavor, > > > > > > add an option to limit the MTU globally. > > > > > > > > > > Can't Tavor be determined locally ? > > > > > > > > It can, but we need this for remote tavor as well, anyway. > > > > > > > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > > > > > > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > > > > and if this does not conflict with MTU selector. > > > > > > You can't do this because the SA doesn't have a way to tell if a path > > > query is going to be used for RC or UD, and IPoIB needs paths with 2K > > > MTU. > > > > Are you referring to IPoIB-CM ? > > > > The patch appears to be for the SA PR request prior to the CM REQ. I > > don't think it affects IPoIB SA PR requests. > > I interpreted Michael's comment as suggesting the SA return paths with > a 1K MTU when it detects that either endpoint is Tavor. The SA has > access to this information based on the vendor ID/device ID in the > node record. That's the part I missed. > If I understood Michael's comment properly, this will have the side > effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as > I know, there is no way to specify whether a path is needed for UD vs. > RC in the path query. I don't know how either. I don't think it can be done (at least currently per the standard). > I like your suggestion to reject with a smaller MTU. Seems like the > proper way to handle this, as well as allowing for the retry logic to > be put in the CMA itself so clients don't have to deal with it. But a penalty is paid for connect setup (more connection setup latency) in more round trips until the right MTU is achieved so as most engineering "solutions" it is a tradeoff with pros and cons. -- Hal > - Fab From mshefty at ichips.intel.com Wed Sep 13 10:28:17 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 10:28:17 -0700 Subject: [openib-general] How to connect gen2 CM to gen1 IBGD CM? In-Reply-To: References: Message-ID: <45083FB1.4070808@ichips.intel.com> Bub Thomas wrote: > Do you have a cmpost for gen1 IBGD I can use to connect from gen2 to gen1? No - the gen1 code is really the old Topspin code. Topspin is now part of Cisco, so they may have something. > Or is there any other trick to play here? I don't think so. I'm pretty sure that this has been tried before and has worked. Can you try connecting from the gen1 system to the gen2 system and see if the REQ shows up? On the gen2 system, when you send the REQ, what happens? Does the REQ just timeout, or does it receive a REJ message back from the gen1 system? - Sean From ralphc at pathscale.com Wed Sep 13 10:33:09 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 13 Sep 2006 10:33:09 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <20060913031054.GA4464@obsidianresearch.com> References: <1158108010.8759.192.camel@brick.pathscale.com> <20060913031054.GA4464@obsidianresearch.com> Message-ID: <1158168789.8759.199.camel@brick.pathscale.com> On Tue, 2006-09-12 at 21:10 -0600, Jason Gunthorpe wrote: > On Tue, Sep 12, 2006 at 05:40:10PM -0700, Ralph Campbell wrote: > > > The ib_ipath driver needs kernel virtual addresses in order to be able > > to copy data to/from the posted work requests since it does not > > use HW DMA. It currently relies on the mapping being one-to-one > > and cannot reasonably reverse the mapping when an IOMMU is present. > > I'm sure this must have been answered, but given a PCI > domain:bus:device:function tuple and a DMA address, shouldn't any > effects of an IOMMU be easially duplicated in software to result in a > cpu-bus physical address? Ie on AMD64 it is just a matter of following > the GART tables in software - assuming the address in question hits > the GART region (which for ipath, I expect, it never would) > > Jason The problem is that this reverse mapping code would either need to be added to every device driver for every possible IOMMU or it would need to be added to the general dma interface as a new architecture dependent interface. Neither of these is acceptable to the kernel community. From ralphc at pathscale.com Wed Sep 13 10:35:02 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 13 Sep 2006 10:35:02 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: References: <1158108010.8759.192.camel@brick.pathscale.com> Message-ID: <1158168902.8759.201.camel@brick.pathscale.com> On Tue, 2006-09-12 at 20:15 -0700, Roland Dreier wrote: > > My current proposal is to provide wrapper routines for the > > dma_*() routines which only the IB kernel code would use. > > These ib_dma_*() variants would allow a device driver to interpose > > on the call and do appropriate code to convert the kernel virtual > > or physical page addresses to something the device driver can handle. > > For ib_mthca and ib_ehca, these would result in the corresponding > > dma_*() routine being called. For ib_ipath, a different implementation > > would be needed. > > Seems like the least-bad way forward. > > A few comments on the proposed implementation: > > > @@ -984,6 +985,19 @@ struct ib_device { > > struct ib_grh *in_grh, > > struct ib_mad *in_mad, > > struct ib_mad *out_mad); > > + int (*mapping_error)(dma_addr_t dma_addr); > > + dma_addr_t (*map_single)(struct device *hwdev, > > + void *ptr, size_t size, > > + int direction); > > + void (*unmap_single)(struct device *dev, > > + dma_addr_t addr, > > + size_t size, int direction); > > + int (*map_sg)(struct device *hwdev, > > + struct scatterlist *sg, > > + int nents, int direction); > > + void (*unmap_sg)(struct device *hwdev, > > + struct scatterlist *sg, > > + int nents, int direction); > > First of all I would put all this into a "struct ib_dma_ops" or > something like that, so struct ib_device can have just a member like > > struct ib_dma_ops *dma_ops; > > That keeps the definition of struct ib_device from getting too much > more gigantic, and also makes it easy for the core to export a > standard dma_ops pointer that devices that use the default > implementation can use. > > Why not make the DMA operations take a struct ib_device * instead of a > struct device *? I think that would actually clean up the consumer > code, and it would make it easier for ipath -- otherwise you have to > find your way back from the struct device *. > > Also, I think you will need a few more methods. > has a definition of DMA operations that might be useful to refer too. > But for example SRP uses at least dma_sync_single_for_cpu() and > dma_sync_single_for_device(). Actually that might be the only extra > method needed for now. > > - R. These are all good suggestions and I will incorporate them. From mshefty at ichips.intel.com Wed Sep 13 11:18:13 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 11:18:13 -0700 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158164787.13748.6289.camel@hal.voltaire.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <1158164787.13748.6289.camel@hal.voltaire.com> Message-ID: <45084B65.4000007@ichips.intel.com> Hal Rosenstock wrote: > But it only needs the MTU on each local side (once for the REQ and on > the remote side for the REP). It would mean that if the local side were > capable of larger MTU and the remote side were Tavor, that the REQ would > be REJ with MTU too large and need to be retried at a smaller MTU. I agree with this approach. The user should determine the proper MTU based on local information, and either set it to 1k if sending a REQ, or REJ the REQ if the MTU is too large. I'm not sure that this policy should be in the CMA, versus the consumer, but I can go with the CMA. I do think that the MTU could be negotiated down as part of the private data in the REP, but this would need to be done outside of the CMA. - Sean From ralphc at pathscale.com Wed Sep 13 11:30:57 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 13 Sep 2006 11:30:57 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <4507C8C2.6050206@voltaire.com> References: <1158108010.8759.192.camel@brick.pathscale.com> <4507C8C2.6050206@voltaire.com> Message-ID: <1158172258.8759.230.camel@brick.pathscale.com> On Wed, 2006-09-13 at 12:00 +0300, Or Gerlitz wrote: > Ralph Campbell wrote: > > Problem: > > > > The IB kernel to IB device driver interface uses dma_map_single() > > and dma_map_sg() to allocate device bus addresses for HW DMA. > > These bus addresses are passed to the IB device driver via ib_post_send() > > and ib_post_recv(). > > > > The ib_ipath driver needs kernel virtual addresses in order to be able > > to copy data to/from the posted work requests since it does not > > use HW DMA. It currently relies on the mapping being one-to-one > > and cannot reasonably reverse the mapping when an IOMMU is present. > > Oops, please note that one can get through the DMA api a DMA address for > a page which is currently **not** mapped into the kernel virtual address > space (that is page_address(p) is NULL), so you must add kmap and kunmap > into your fast RX/TX code path. Yes, these are called "high pages". > Examples for scenarios when this happen i can think of are Direct I/O > and some sort of pre-fetching done by File-System. Some pages present in > a kernel SG which needs to be sent/received/RDMA-ed over IB need not be > mapped into the kernel virtual address space. Well, the other parts of the kernel might not need a kernel virtual address but the ib_ipath driver still does. > As for RDMA, please note that the problem has two faces, the remote > device which does the RDMA or the local device does RDMA from/to and > second, the local device. > > Since you need to be able interop between devices that support DMA > mappings to ones which do not, how do you suggest to manage the > addresses for the following schemes (1 stands for device supporting DMA > addresses and 0 for device which does not) > > <1,1> > <1,0> > <0,1> > <0,0> > > Please assume for the purpose of discussion that each side knows the > polarity of the remote side? > > After writing the section on RDMA i think i might went to the wrong > direction since ipath emulates RDMA in SW, can you shed some light on this? I don't understand what you are talking about. There is an IB wire protocol for RDMA, SEND, etc. That doesn't change depending on the HCA. The InfiniPath HCA has a ring buffer of receive buffers and all incoming IB packets are DMA'ed into one of these buffers. The ib_ipath software driver examines the packet and copies it to the appropriate address. For a packet received with a RC_RDMA_WRITE_FIRST, the RKEY and IB address are used to convert that into a kernel virtual address and the data is copied. The same happens for RC_SEND_FIRST but the KV address comes from the LKEY and address in the work request posted by ib_post_recv(). Sending data is similar, the driver constructs a packet with the appropriate opcode and writes it to the chip which puts it on the wire. > > I also tried proposing adding a flag to the ib_device structure > > and modifying the kernel IB code to check the flag and pass > > either the dma_*() mapped address or a kernel virtual address. > > This works OK for kmalloc() buffers where dma_map_single() is > > being called but doesn't work well for SRP which has lists > > of physical pages and calls dma_map_sg(). > > It also means that the kernel IB layer needs to explicitly handle > > two different kinds of addresses. > > Just a note, its not just SRP there... its any ulp which needs to move > over IB data present bunch of pages (eg packed in a kernel SG list), > namely iSER, NFSoRDMA, Lustre, IB native imp of send_page(), etc. Sure. In each such case, the code would need to be modified to use the ib_dma_*() routines instead of dma_*() for addresses used with the LKEY/RKEY returned from ibv_get_dma_mr(). From mst at mellanox.co.il Wed Sep 13 12:03:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 22:03:28 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> Message-ID: <20060913190328.GB26959@mellanox.co.il> Quoting r. Fabian Tillier : > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > On 9/13/06, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > > Tavor systems get better performance with 1K MTU. Since there does > > > > not seem to be any way to find out whether the remote system uses Tavor, > > > > add an option to limit the MTU globally. > > > > > > Can't Tavor be determined locally ? > > > > It can, but we need this for remote tavor as well, anyway. > > > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > > and if this does not conflict with MTU selector. > > You can't do this because the SA doesn't have a way to tell if a path > query is going to be used for RC or UD, and IPoIB needs paths with 2K > MTU. I think we can do that without breaking IPoIB. IPoIB needs mtu >= 1K. IPoIB sets mtu selector to >= 2K. I am talking about users that do not set mtu selector. -- MST From mst at mellanox.co.il Wed Sep 13 12:05:39 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 22:05:39 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158168091.13748.8242.camel@hal.voltaire.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <1158165350.13748.6667.camel@hal.voltaire.com> <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com> <1158168091.13748.8242.camel@hal.voltaire.com> Message-ID: <20060913190539.GC26959@mellanox.co.il> Quoting r. Hal Rosenstock : > > If I understood Michael's comment properly, this will have the side > > effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as > > I know, there is no way to specify whether a path is needed for UD vs. > > RC in the path query. > > I don't know how either. I don't think it can be done (at least > currently per the standard). We don't really need to know whether path is for RC or UD QP. IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K. In this case SM will return path with MTU >= 2K. CMA will not set mtu selector and then SM will choose MTU for best performance. -- MST From mst at mellanox.co.il Wed Sep 13 12:13:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 22:13:43 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158164787.13748.6289.camel@hal.voltaire.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <1158164787.13748.6289.camel@hal.voltaire.com> Message-ID: <20060913191343.GD26959@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > On Wed, 2006-09-13 at 12:22, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > > Tavor systems get better performance with 1K MTU. Since there does > > > > not seem to be any way to find out whether the remote system uses Tavor, > > > > add an option to limit the MTU globally. > > > > > > Can't Tavor be determined locally ? > > > > It can, but we need this for remote tavor as well, anyway. > > > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > > and if this does not conflict with MTU selector. > > But it only needs the MTU on each local side (once for the REQ and on > the remote side for the REP). It would mean that if the local side were > capable of larger MTU and the remote side were Tavor, that the REQ would > be REJ with MTU too large and need to be retried at a smaller MTU. This has 3 implications that make it impractical: . connection rate will suffer greatly . this will need ot be done in each ulp, and it's a lot of code . protocols such as sdp explicitly say what to do on rej and do not seem to speak about retries > > However > > 1. Even opensm does not implement this optimization yet > > What optimization ? I don't understand what you are saying OpenSM > doesn't support. > > > 2. We need to work with existing SMs too > > Not sure what the SA issue is here. If path MTU selector in path query allows MTU 1K (e.g. "best MTU") and one of the sides is Tavor, select the best MTU that is 1K and not the largest possible. If path MTU selector requires 2K MTU, return path with 2K MTU. -- MST From mst at mellanox.co.il Wed Sep 13 12:18:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 22:18:41 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <45084B65.4000007@ichips.intel.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <1158164787.13748.6289.camel@hal.voltaire.com> <45084B65.4000007@ichips.intel.com> Message-ID: <20060913191841.GE26959@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > Hal Rosenstock wrote: > > But it only needs the MTU on each local side (once for the REQ and on > > the remote side for the REP). It would mean that if the local side were > > capable of larger MTU and the remote side were Tavor, that the REQ would > > be REJ with MTU too large and need to be retried at a smaller MTU. > > I agree with this approach. The user should determine the proper MTU based on > local information, and either set it to 1k if sending a REQ, or REJ the REQ if > the MTU is too large. I'm not sure that this policy should be in the CMA, > versus the consumer, but I can go with the CMA. > > I do think that the MTU could be negotiated down as part of the private data in > the REP, but this would need to be done outside of the CMA. > > - Sean Putting knowledge about hw quirks in all protocols is really horrible. MTU should be decided by SA as part of path information. If ULPs have spicific limitations wrt MTU they should use mtu selector in path record query. -- MST From mshefty at ichips.intel.com Wed Sep 13 12:22:30 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 12:22:30 -0700 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913190328.GB26959@mellanox.co.il> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <20060913190328.GB26959@mellanox.co.il> Message-ID: <45085A76.8080800@ichips.intel.com> Michael S. Tsirkin wrote: > I think we can do that without breaking IPoIB. > IPoIB needs mtu >= 1K. IPoIB sets mtu selector to >= 2K. > I am talking about users that do not set mtu selector. The ipoib spec requires support for a 2k MTU, but allows support for smaller MTUs. I agree that if the ipoib implementation requires an MTU of 2k, then it should be setting this as part of its query request. - Sean From mst at mellanox.co.il Wed Sep 13 12:30:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 22:30:16 +0300 Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish In-Reply-To: <45083C3D.1000209@ichips.intel.com> References: <45073FF7.7020506@ichips.intel.com> <20060913120154.GA23890@mellanox.co.il> <45083C3D.1000209@ichips.intel.com> Message-ID: <20060913193016.GF26959@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish > > Michael S. Tsirkin wrote: > > What I think we need for 2.6.18 is the following. Pls comment. > > > > > > IB/cma: increase the retry count in CMA from 3 to maximum 15. > > 3 seems low - we see connections failing under stress - and in any case looks > > like an arbitrary number. 15 is the max value allowed by spec. > > > > Signed-off-by: Michael S. Tsirkin > > Dropping 3 packets in a row seems likely only under stress testing, so I'm not > sure that this is worthy of a change to 2.6.18 at this point (we're at rc7). I don't really understand. The fix is a one-liner. The problem is observed in practice, under stress. Who *wants* systems that fall apart under stress? It seems that with retry of 3, chances of losing one out of 3 packets would be close to 100% if loss rate is about 10%. Ranking it up to 15, you need loss rate on top of 50% to get close to 100% chance of losing connection request. Losing a DREP is also bad - as it leaves stale connections around munching up resources. So why aren't we fixing this? -- MST From halr at voltaire.com Wed Sep 13 12:25:37 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 15:25:37 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913190539.GC26959@mellanox.co.il> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <1158165350.13748.6667.camel@hal.voltaire.com> <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com> <1158168091.13748.8242.camel@hal.voltaire.com> <20060913190539.GC26959@mellanox.co.il> Message-ID: <1158175522.13748.12872.camel@hal.voltaire.com> On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > > If I understood Michael's comment properly, this will have the side > > > effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as > > > I know, there is no way to specify whether a path is needed for UD vs. > > > RC in the path query. > > > > I don't know how either. I don't think it can be done (at least > > currently per the standard). > > We don't really need to know whether path is for RC or UD QP. > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K. That's the default and not the minimum MTU (for IPoIB). > In this case SM will return path with MTU >= 2K. > CMA will not set mtu selector and then SM will choose MTU for best performance. From sean.hefty at intel.com Wed Sep 13 12:32:18 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 12:32:18 -0700 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913191841.GE26959@mellanox.co.il> Message-ID: <000701c6d76b$53761d40$ff0da8c0@amr.corp.intel.com> >Putting knowledge about hw quirks in all protocols is really horrible. Agreed. >MTU should be decided by SA as part of path information. >If ULPs have spicific limitations wrt MTU they should use mtu selector >in path record query. Thinking about this more, the proper place for this does seem to be in the selection of the path record (where you put it), rather than during connection establishment. Although, I don't like the idea of the CMA changing every path to use an MTU of 1k. - Sean From mshefty at ichips.intel.com Wed Sep 13 12:40:38 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 12:40:38 -0700 Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish In-Reply-To: <20060913193016.GF26959@mellanox.co.il> References: <45073FF7.7020506@ichips.intel.com> <20060913120154.GA23890@mellanox.co.il> <45083C3D.1000209@ichips.intel.com> <20060913193016.GF26959@mellanox.co.il> Message-ID: <45085EB6.80107@ichips.intel.com> Michael S. Tsirkin wrote: > I don't really understand. The fix is a one-liner. > The problem is observed in practice, under stress. > Who *wants* systems that fall apart under stress? My view is: is this worth delaying the release of the kernel? And I don't see that it is at this point in the 2.6.18 release cycle. This does not fix a system crash. It only allows a connection to be made if the system is under heavy stress. > It seems that with retry of 3, chances of losing > one out of 3 packets would be close to 100% if loss rate is about 10%. > Ranking it up to 15, you need loss rate on top of 50% to get close to 100% > chance of losing connection request. I'm not quite following the math here. > Losing a DREP is also bad - as it leaves stale connections around > munching up resources. Yes - but retrying the DREQ doesn't end up fixing the issue. The side that sends the DREP often ends of entering and exiting timewait before the DREQ can be retried. This results in the DREQ being lost. Eventually the DREQ will time out, and the connection will be torn down. Increasing the number of times that the DREQ is retried ends up increasing how long the connection stays around. - Sean From halr at voltaire.com Wed Sep 13 13:10:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 16:10:15 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913190328.GB26959@mellanox.co.il> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <20060913190328.GB26959@mellanox.co.il> Message-ID: <1158178200.13748.14583.camel@hal.voltaire.com> On Wed, 2006-09-13 at 15:03, Michael S. Tsirkin wrote: > Quoting r. Fabian Tillier : > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > On 9/13/06, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > > > Tavor systems get better performance with 1K MTU. Since there does > > > > > not seem to be any way to find out whether the remote system uses Tavor, > > > > > add an option to limit the MTU globally. > > > > > > > > Can't Tavor be determined locally ? > > > > > > It can, but we need this for remote tavor as well, anyway. > > > > > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > > > > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > > > and if this does not conflict with MTU selector. > > > > You can't do this because the SA doesn't have a way to tell if a path > > query is going to be used for RC or UD, and IPoIB needs paths with 2K > > MTU. > > I think we can do that without breaking IPoIB. > IPoIB needs mtu >= 1K. Huh ? > IPoIB sets mtu selector to >= 2K. I don't think that's a requirement for IPoIB. > I am talking about users that do not set mtu selector. Understood. -- Hal From rdreier at cisco.com Wed Sep 13 13:45:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Sep 2006 13:45:34 -0700 Subject: [openib-general] How to connect gen2 CM to gen1 IBGD CM? In-Reply-To: <45083FB1.4070808@ichips.intel.com> (Sean Hefty's message of "Wed, 13 Sep 2006 10:28:17 -0700") References: <45083FB1.4070808@ichips.intel.com> Message-ID: >> Do you have a cmpost for gen1 IBGD I can use to connect from >> gen2 to gen1? Sean> No - the gen1 code is really the old Topspin code. Topspin Sean> is now part of Cisco, so they may have something. No, no one has bothered to port any of that stuff to the old obsolete stack. >> Or is there any other trick to play here? Sean> I don't think so. I'm pretty sure that this has been tried Sean> before and has worked. Can you try connecting from the gen1 Sean> system to the gen2 system and see if the REQ shows up? Yes, for example Mellanox SRP target code is based on gen1, and the current Linux ("gen2") SRP initiator can connect to it fine. - R. From mst at mellanox.co.il Wed Sep 13 13:54:31 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 23:54:31 +0300 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector (was Re: [PATCH for-2.6.18] IB/cma: option to limitMTUto 1K) In-Reply-To: <1158178200.13748.14583.camel@hal.voltaire.com> References: <1158178200.13748.14583.camel@hal.voltaire.com> Message-ID: <20060913205430.GA27766@mellanox.co.il> > > IPoIB sets mtu selector to >= 2K. > > I don't think that's a requirement for IPoIB. whatever MTU IPoIB needs, it should set selector appropriately. > > I am talking about users that do not set mtu selector. > > Understood. Roland, would it make sense for this to go upstream? In my opinion, it's important to have this in sooner rather than later since this is a question of interoperability with SM. If we have IPoIB implementations that don't set MTU selector appropriately, we'll need workarounds in SM. ---- IPoIB in linux needs 2K MTU. Therefore it must set mtu selector in path record query accordingly. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index cf71d2a..e92c3f8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -441,9 +441,11 @@ static struct ipoib_path *path_rec_creat INIT_LIST_HEAD(&path->neigh_list); memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid)); - path->pathrec.sgid = priv->local_gid; - path->pathrec.pkey = cpu_to_be16(priv->pkey); - path->pathrec.numb_path = 1; + path->pathrec.sgid = priv->local_gid; + path->pathrec.pkey = cpu_to_be16(priv->pkey); + path->pathrec.numb_path = 1; + path->pathrec.mtu = IB_MTU_2048; + path->pathrec.mtu_selector = IB_SA_GTE; return path; } -- MST From mst at mellanox.co.il Wed Sep 13 13:56:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Sep 2006 23:56:08 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <000701c6d76b$53761d40$ff0da8c0@amr.corp.intel.com> References: <000701c6d76b$53761d40$ff0da8c0@amr.corp.intel.com> Message-ID: <20060913205608.GB27766@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > >Putting knowledge about hw quirks in all protocols is really horrible. > > Agreed. > > >MTU should be decided by SA as part of path information. > >If ULPs have spicific limitations wrt MTU they should use mtu selector > >in path record query. > > Thinking about this more, the proper place for this does seem to be in the > selection of the path record (where you put it), rather than during connection > establishment. > > Although, I don't like the idea of the CMA changing every path to use an MTU of > 1k. Well, that's why it's off by default. So, Ack? -- MST From mst at mellanox.co.il Wed Sep 13 14:09:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 00:09:40 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158175522.13748.12872.camel@hal.voltaire.com> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <1158165350.13748.6667.camel@hal.voltaire.com> <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com> <1158168091.13748.8242.camel@hal.voltaire.com> <20060913190539.GC26959@mellanox.co.il> <1158175522.13748.12872.camel@hal.voltaire.com> Message-ID: <20060913210940.GC27766@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > > If I understood Michael's comment properly, this will have the side > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as > > > > I know, there is no way to specify whether a path is needed for UD vs. > > > > RC in the path query. > > > > > > I don't know how either. I don't think it can be done (at least > > > currently per the standard). > > > > We don't really need to know whether path is for RC or UD QP. > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K. > > That's the default and not the minimum MTU (for IPoIB). How isn't it? By default, IPoIB reports 2K MTU to linux. So it will get 2K packets, and since IB swiches can not fragment packets, they will simply get dropped. I conclude that IPoIB by default requires minimum mtu of 2K. Right? And it's not a problem since all HCAs support 2K. -- MST From halr at voltaire.com Wed Sep 13 14:03:57 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 17:03:57 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913191343.GD26959@mellanox.co.il> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <1158164787.13748.6289.camel@hal.voltaire.com> <20060913191343.GD26959@mellanox.co.il> Message-ID: <1158181429.13748.16691.camel@hal.voltaire.com> On Wed, 2006-09-13 at 15:13, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > On Wed, 2006-09-13 at 12:22, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote: > > > > > Tavor systems get better performance with 1K MTU. Since there does > > > > > not seem to be any way to find out whether the remote system uses Tavor, > > > > > add an option to limit the MTU globally. > > > > > > > > Can't Tavor be determined locally ? > > > > > > It can, but we need this for remote tavor as well, anyway. > > > > > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ? > > > > > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side > > > and if this does not conflict with MTU selector. > > > > But it only needs the MTU on each local side (once for the REQ and on > > the remote side for the REP). It would mean that if the local side were > > capable of larger MTU and the remote side were Tavor, that the REQ would > > be REJ with MTU too large and need to be retried at a smaller MTU. > > This has 3 implications that make it impractical: > . connection rate will suffer greatly > . this will need ot be done in each ulp, and it's a lot of code > . protocols such as sdp explicitly say what to do on rej > and do not seem to speak about retries OK. > > > However > > > 1. Even opensm does not implement this optimization yet > > > > What optimization ? I don't understand what you are saying OpenSM > > doesn't support. > > > > > 2. We need to work with existing SMs too > > > > Not sure what the SA issue is here. > > If path MTU selector in path query allows MTU 1K (e.g. "best MTU") > and one of the sides is Tavor, select the best MTU that is 1K > and not the largest possible. How would it be identified if the SA supports this ? > If path MTU selector requires 2K MTU, return path with 2K MTU. Also, I'm not sure that this is the required difference in the SA requests :-( -- Hal From mst at mellanox.co.il Wed Sep 13 14:13:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 00:13:29 +0300 Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish In-Reply-To: <45085EB6.80107@ichips.intel.com> References: <45073FF7.7020506@ichips.intel.com> <20060913120154.GA23890@mellanox.co.il> <45083C3D.1000209@ichips.intel.com> <20060913193016.GF26959@mellanox.co.il> <45085EB6.80107@ichips.intel.com> Message-ID: <20060913211328.GD27766@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish > > Michael S. Tsirkin wrote: > > I don't really understand. The fix is a one-liner. > > The problem is observed in practice, under stress. > > Who *wants* systems that fall apart under stress? > > My view is: is this worth delaying the release of the kernel? One line very low risk patch won't delay the release of the kernel. > And I don't see > that it is at this point in the 2.6.18 release cycle. This does not fix a > system crash. It only allows a connection to be made if the system is under > heavy stress. Well, applications happen to need connections to do stuff. If you can't connect, what good is it that it does not crash? No? -- MST From mst at mellanox.co.il Wed Sep 13 14:17:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 00:17:54 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158181429.13748.16691.camel@hal.voltaire.com> References: <1158181429.13748.16691.camel@hal.voltaire.com> Message-ID: <20060913211754.GE27766@mellanox.co.il> Quoting r. Hal Rosenstock : > > If path MTU selector in path query allows MTU 1K (e.g. "best MTU") > > and one of the sides is Tavor, select the best MTU that is 1K > > and not the largest possible. > > How would it be identified if the SA supports this ? You mean, if SA ignores mtu selector? Then we are not worse off than we were before we set it - we get 2K MTU for tavor and it works a bit slower. > > If path MTU selector requires 2K MTU, return path with 2K MTU. > > Also, I'm not sure that this is the required difference in the SA > requests :-( What do you mean? Its not required, but its legal and it will give us better performance. -- MST From mshefty at ichips.intel.com Wed Sep 13 14:22:35 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 14:22:35 -0700 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913205608.GB27766@mellanox.co.il> References: <000701c6d76b$53761d40$ff0da8c0@amr.corp.intel.com> <20060913205608.GB27766@mellanox.co.il> Message-ID: <4508769B.8070000@ichips.intel.com> Michael S. Tsirkin wrote: >>Although, I don't like the idea of the CMA changing every path to use an MTU of >>1k. > > Well, that's why it's off by default. > So, Ack? I'd like to find a way to support a 1k MTU to tavor HCAs without making the MTU 1k to other HCAs, in case we're dealing with a heterogeneous environment. Is this really the responsibility of the querying node or the SA? - Sean From mshefty at ichips.intel.com Wed Sep 13 14:24:34 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 14:24:34 -0700 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913210940.GC27766@mellanox.co.il> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <1158165350.13748.6667.camel@hal.voltaire.com> <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com> <1158168091.13748.8242.camel@hal.voltaire.com> <20060913190539.GC26959@mellanox.co.il> <1158175522.13748.12872.camel@hal.voltaire.com> <20060913210940.GC27766@mellanox.co.il> Message-ID: <45087712.3050504@ichips.intel.com> Michael S. Tsirkin wrote: >>That's the default and not the minimum MTU (for IPoIB). > > How isn't it? By default, IPoIB reports 2K MTU to linux. > So it will get 2K packets, and since IB swiches > can not fragment packets, they will simply get dropped. I think this is simply the difference between the spec and the implementation. Given that the implementation requires a 2k MTU, IMO it should request paths with a 2k MTU. - Sean From halr at voltaire.com Wed Sep 13 14:20:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 17:20:15 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913211754.GE27766@mellanox.co.il> References: <1158181429.13748.16691.camel@hal.voltaire.com> <20060913211754.GE27766@mellanox.co.il> Message-ID: <1158182401.13748.17301.camel@hal.voltaire.com> On Wed, 2006-09-13 at 17:17, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > > If path MTU selector in path query allows MTU 1K (e.g. "best MTU") > > > and one of the sides is Tavor, select the best MTU that is 1K > > > and not the largest possible. > > > > How would it be identified if the SA supports this ? > > You mean, if SA ignores mtu selector? No; I meant detect that one end of the PR request is a Tavor. Wasn't that part of it ? If SA doesn't support MTU selector and ignoring MTU selector, it is not compliant and should be fixed. > Then we are not worse off than we were before we set it - we get 2K MTU for > tavor and it works a bit slower. > > > > If path MTU selector requires 2K MTU, return path with 2K MTU. > > > > Also, I'm not sure that this is the required difference in the SA > > requests :-( > > What do you mean? > Its not required, but its legal and it will give us better performance. I mean that there is no requirement on what the IPoIB SA PR request looks like what you are using to differentiate from the PR requests for a connection setup. -- Hal From halr at voltaire.com Wed Sep 13 14:21:28 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 17:21:28 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913210940.GC27766@mellanox.co.il> References: <1158163574.13748.5521.camel@hal.voltaire.com> <20060913162245.GA25666@mellanox.co.il> <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com> <1158165350.13748.6667.camel@hal.voltaire.com> <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com> <1158168091.13748.8242.camel@hal.voltaire.com> <20060913190539.GC26959@mellanox.co.il> <1158175522.13748.12872.camel@hal.voltaire.com> <20060913210940.GC27766@mellanox.co.il> Message-ID: <1158182416.13748.17303.camel@hal.voltaire.com> On Wed, 2006-09-13 at 17:09, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > > If I understood Michael's comment properly, this will have the side > > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as > > > > > I know, there is no way to specify whether a path is needed for UD vs. > > > > > RC in the path query. > > > > > > > > I don't know how either. I don't think it can be done (at least > > > > currently per the standard). > > > > > > We don't really need to know whether path is for RC or UD QP. > > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K. > > > > That's the default and not the minimum MTU (for IPoIB). > > How isn't it? Look at RFC 4391 as to the requirement. > By default, IPoIB reports 2K MTU to linux. > So it will get 2K packets, and since IB swiches > can not fragment packets, they will simply get dropped. With ifconfig, the MTU can be changed. Fragmentation is at the IP layer in the end station stack, not the IB switches. > I conclude that IPoIB by default requires minimum mtu of 2K. > Right? Not minimum. > And it's not a problem since all HCAs support 2K. or more but it could be less per the RFC. -- Hal From robert.j.woodruff at intel.com Wed Sep 13 14:43:56 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 13 Sep 2006 14:43:56 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready Message-ID: Robert Walsh wrote, > > [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 > 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | > iters=10000 | duplex=0 | cma=0 | > 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 > VAddr 0x00002a95dd3480 > 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 > VAddr 0x00002a95c85480 > 4730:main: Completion with error at client: > 4730:main: Failed status 9: wr_id 3 > 4730:main: scnt=7584, ccnt=6584 > [woody at rkl-13 bin]$ >Hi Woody, Robert Walsh wrote, >When RC4 is available, there should be a patch in there that will fix >this. Can you let us know if you continue to see problems? >Regards, > Robert. I installed RC4 and now get this, [woody at rkl-13 bin]$ ./ib_rdma_bw 9035: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=0 | libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /usr/local/ofed/lib64/infiniband 9035:main: No IB devices found I tried getting the latest ofed 1.1 ipathverbs from svn today that I thought would have a fix for this, and I think I got it built ok, although the mellanox build environment is less than intuitive, but it still seems to fail. Guess we will try again with RC5 tomorrow. woody From mst at mellanox.co.il Wed Sep 13 14:43:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 00:43:23 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <4508769B.8070000@ichips.intel.com> References: <4508769B.8070000@ichips.intel.com> Message-ID: <20060913214323.GF27766@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > Michael S. Tsirkin wrote: > >>Although, I don't like the idea of the CMA changing every path to use an MTU of > >>1k. > > > > Well, that's why it's off by default. > > So, Ack? > > I'd like to find a way to support a 1k MTU to tavor HCAs without making the MTU > 1k to other HCAs, in case we're dealing with a heterogeneous environment. IMO the cleanway is to do it in SA. > > Is this really the responsibility of the querying node or the SA? IMO it's really SA's job. But, a simple option as a work around for SA's that don't support it properly would be also nice. -- MST From mst at mellanox.co.il Wed Sep 13 14:45:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 00:45:20 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158182401.13748.17301.camel@hal.voltaire.com> References: <1158182401.13748.17301.camel@hal.voltaire.com> Message-ID: <20060913214520.GG27766@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > On Wed, 2006-09-13 at 17:17, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > > If path MTU selector in path query allows MTU 1K (e.g. "best MTU") > > > > and one of the sides is Tavor, select the best MTU that is 1K > > > > and not the largest possible. > > > > > > How would it be identified if the SA supports this ? > > > > You mean, if SA ignores mtu selector? > > No; I meant detect that one end of the PR request is a Tavor. Wasn't > that part of it ? SA can easily figure out it's talking to tavor by looking at vendor part id. > If SA doesn't support MTU selector and ignoring MTU selector, it is not > compliant and should be fixed. > > > Then we are not worse off than we were before we set it - we get 2K MTU for > > tavor and it works a bit slower. > > > > > > If path MTU selector requires 2K MTU, return path with 2K MTU. > > > > > > Also, I'm not sure that this is the required difference in the SA > > > requests :-( > > > > What do you mean? > > Its not required, but its legal and it will give us better performance. > > I mean that there is no requirement on what the IPoIB SA PR request > looks like what you are using to differentiate from the PR requests for > a connection setup. Correct. But if IPoIB requires 2K MTU it must use MTU selector, if it does not it's OK to give it a smaller MTU. -- MST From mst at mellanox.co.il Wed Sep 13 14:49:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 00:49:55 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158182416.13748.17303.camel@hal.voltaire.com> References: <1158182416.13748.17303.camel@hal.voltaire.com> Message-ID: <20060913214955.GH27766@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > On Wed, 2006-09-13 at 17:09, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote: > > > > Quoting r. Hal Rosenstock : > > > > > > If I understood Michael's comment properly, this will have the side > > > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as > > > > > > I know, there is no way to specify whether a path is needed for UD vs. > > > > > > RC in the path query. > > > > > > > > > > I don't know how either. I don't think it can be done (at least > > > > > currently per the standard). > > > > > > > > We don't really need to know whether path is for RC or UD QP. > > > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K. > > > > > > That's the default and not the minimum MTU (for IPoIB). > > > > How isn't it? > > Look at RFC 4391 as to the requirement. I'm talking about our implementation not the spec. > > By default, IPoIB reports 2K MTU to linux. > > So it will get 2K packets, and since IB swiches > > can not fragment packets, they will simply get dropped. > > With ifconfig, the MTU can be changed. Fragmentation is at the IP layer > in the end station stack, not the IB switches. AFAIK linux won't fragment packets that do not exceed MTU and MSS. > > I conclude that IPoIB by default requires minimum mtu of 2K. > > Right? > > Not minimum. > > > And it's not a problem since all HCAs support 2K. > > or more but it could be less per the RFC. Again, if IPoIB implementation does not need 2K mtu there's no problem to give it 1K in path. If it wants 2K MTU it must set selector accordingly. -- MST From dledford at redhat.com Wed Sep 13 14:52:33 2006 From: dledford at redhat.com (Doug Ledford) Date: Wed, 13 Sep 2006 17:52:33 -0400 Subject: [openib-general] OFED-1.1-rc4 is ready In-Reply-To: <20060913062518.GL20225@mellanox.co.il> References: <1158125915.30173.27.camel@sardonyx> <20060913062518.GL20225@mellanox.co.il> Message-ID: <1158184353.2661.22.camel@fc6.xsintricity.com> On Wed, 2006-09-13 at 09:25 +0300, Michael S. Tsirkin wrote: > Quoting r. Bryan O'Sullivan : > > > the ibv_driver_init function was changed to openib_driver_init. > > > > By the way, I find it unsettling that the current libibverbs internal > > ABI allows silent breakage like this that cannot be detected except at > > runtime, and then only when the right hardware is present. > > > > Mind you, I don't have any better suggestions in mind (at least not at > > 10:30pm). > > > > But I worry about the possibility this leaves open for botched field > > upgrades breaking userspace in you-don't-find-out-until-it's-too-late > > ways when libibverbs 1.1 starts being used. > > libipathverbs can simply export both ibv_driver_init and > openib_driver_init like libmthca does, that's what we'll do for OFED. > > Or maybe Doug here can come up with some symbol versioning trick. > Dough? I don't think you can do symbol versioning here. For symbol versioning to work you have to have a compile time map from the source used to the version you are linking to. For all the drivers, like mthca, they are compiled after libibverbs, and so libibverbs is built blind to the drivers if you will, yet it is the drivers that provide the symbol and therefore the symbol version according to the linker, so libibverbs can never have the automated type symbol versioning. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From trimmer at silverstorm.com Wed Sep 13 14:54:26 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Wed, 13 Sep 2006 17:54:26 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <4508769B.8070000@ichips.intel.com> Message-ID: > From: Sean Hefty > Sent: Wednesday, September 13, 2006 5:23 PM > To: Michael S. Tsirkin > Cc: openib-general at openib.org > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to > limitMTU to 1K > > Michael S. Tsirkin wrote: > >>Although, I don't like the idea of the CMA changing every path to use an > MTU of > >>1k. > > > > Well, that's why it's off by default. > > So, Ack? > > I'd like to find a way to support a 1k MTU to tavor HCAs without making > the MTU > 1k to other HCAs, in case we're dealing with a heterogeneous environment. > > Is this really the responsibility of the querying node or the SA? > > - Sean > The real issue here is how to handle "optimization" tricks for selected models of HCAs. While Tavor supports a 2K MTU and works with it, it has been found to offer better MPI bandwidth when running 1K MTU. For many other ULPs no difference in performance is observable (because many other ULPs don't stress the HCA the way MPI bandwidth benchmarks do). Another dimension to this problem is that its not clear what the best optimization will be in heterogeneous environments. Such as a Tavor HCA talking to a Sinai, Arbel or other type of TCA based device using a non-MPI protocol (such as a storage target). In those environments a 2K MTU may perform the same (or depending on the storage target, perhaps even better). At this point I would suggest this is a subtle performance issue specific to MPI and MPI libraries can appropriately provide options to tune the maximum MTU MPI to use or request (which is only one of dozens of MPI tunables needed to fine tune MPI). MPI environments will tend to be more homogeneous which also simplifies the solution. Pushing these types of ULP and source/destination specific issues into the core stack or SM will get very complex very quick. Given the issue on the table (Tavor performance) is specific to an older HCA model, it may not even be that critical since the highest performance customers have long since moved toward PCIe and DDR fabrics, neither of which are supported by Tavor. Todd Rimmer From rdreier at cisco.com Wed Sep 13 14:58:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Sep 2006 14:58:34 -0700 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: <20060913205430.GA27766@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 13 Sep 2006 23:54:31 +0300") References: <1158178200.13748.14583.camel@hal.voltaire.com> <20060913205430.GA27766@mellanox.co.il> Message-ID: Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu Michael> selector in path record query accordingly. Umm -- why does it need a 2K MTU? As far as I know it should work fine with any MTU, assuming the SA sets the MTU of the broadcast multicast group correctly. - R. From mst at mellanox.co.il Wed Sep 13 15:01:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 01:01:03 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: References: Message-ID: <20060913220103.GA28790@mellanox.co.il> Quoting r. Rimmer, Todd : > Subject: RE: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > From: Sean Hefty > > Sent: Wednesday, September 13, 2006 5:23 PM > > To: Michael S. Tsirkin > > Cc: openib-general at openib.org > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to > > limitMTU to 1K > > > > Michael S. Tsirkin wrote: > > >>Although, I don't like the idea of the CMA changing every path to > use an > > MTU of > > >>1k. > > > > > > Well, that's why it's off by default. > > > So, Ack? > > > > I'd like to find a way to support a 1k MTU to tavor HCAs without > making > > the MTU > > 1k to other HCAs, in case we're dealing with a heterogeneous > environment. > > > > Is this really the responsibility of the querying node or the SA? > > > > - Sean > > > > The real issue here is how to handle "optimization" tricks for selected > models of HCAs. While Tavor supports a 2K MTU and works with it, it has > been found to offer better MPI bandwidth when running 1K MTU. For many > other ULPs no difference in performance is observable (because many > other ULPs don't stress the HCA the way MPI bandwidth benchmarks do). > > Another dimension to this problem is that its not clear what the best > optimization will be in heterogeneous environments. Such as a Tavor HCA > talking to a Sinai, Arbel or other type of TCA based device using a > non-MPI protocol (such as a storage target). In those environments a 2K > MTU may perform the same (or depending on the storage target, perhaps > even better). If Tavor is involved at either end, 1K MTU is better than 2K MTU. > At this point I would suggest this is a subtle performance issue > specific to MPI This is not specific to MPI. All ULPs experience this issue. > and MPI libraries can appropriately provide options to > tune the maximum MTU MPI to use or request (which is only one of dozens > of MPI tunables needed to fine tune MPI). MPI environments will tend to > be more homogeneous which also simplifies the solution. > > Pushing these types of ULP and source/destination specific issues into > the core stack or SM will get very complex very quick. It's actually relatively simple. > Given the issue > on the table (Tavor performance) is specific to an older HCA model, it > may not even be that critical since the highest performance customers > have long since moved toward PCIe and DDR fabrics, neither of which are > supported by Tavor. All the more reason to pt the simple logic in one place and not expect all apprlications to optimize for this hardware. -- MST From halr at voltaire.com Wed Sep 13 14:57:27 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 17:57:27 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913214955.GH27766@mellanox.co.il> References: <1158182416.13748.17303.camel@hal.voltaire.com> <20060913214955.GH27766@mellanox.co.il> Message-ID: <1158184605.13748.18709.camel@hal.voltaire.com> On Wed, 2006-09-13 at 17:49, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > On Wed, 2006-09-13 at 17:09, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > > > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote: > > > > > Quoting r. Hal Rosenstock : > > > > > > > If I understood Michael's comment properly, this will have the side > > > > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as > > > > > > > I know, there is no way to specify whether a path is needed for UD vs. > > > > > > > RC in the path query. > > > > > > > > > > > > I don't know how either. I don't think it can be done (at least > > > > > > currently per the standard). > > > > > > > > > > We don't really need to know whether path is for RC or UD QP. > > > > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K. > > > > > > > > That's the default and not the minimum MTU (for IPoIB). > > > > > > How isn't it? > > > > Look at RFC 4391 as to the requirement. > > I'm talking about our implementation not the spec. Don't we risk interop issues by relying on things not required in the spec ? > > > By default, IPoIB reports 2K MTU to linux. > > > So it will get 2K packets, and since IB swiches > > > can not fragment packets, they will simply get dropped. > > > > With ifconfig, the MTU can be changed. Fragmentation is at the IP layer > > in the end station stack, not the IB switches. > > AFAIK linux won't fragment packets that do not exceed MTU and MSS. > > > > I conclude that IPoIB by default requires minimum mtu of 2K. > > > Right? > > > > Not minimum. > > > > > And it's not a problem since all HCAs support 2K. > > > > or more but it could be less per the RFC. > > Again, if IPoIB implementation does not need 2K mtu there's > no problem to give it 1K in path. If it wants 2K MTU it must > set selector accordingly. From mst at mellanox.co.il Wed Sep 13 15:08:59 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 01:08:59 +0300 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: References: Message-ID: <20060913220859.GB28790@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu > Michael> selector in path record query accordingly. > > Umm -- why does it need a 2K MTU? As far as I know it should work > fine with any MTU, assuming the SA sets the MTU of the broadcast > multicast group correctly. Hmm, you are right, it is just that existing implementations all set that to 2K. But there is a silent assumption that MTU of any path is >= broadcast multicast group MTU, and this is what I want to fix. Like this then? We could look at dev->mtu instead, but that's a couple of extra lines and I'm not sure it's worth the complexity. What do you think? -- IPoIB in linux needs MTU on any path to be >= broadcast mtu. Therefore it must set mtu selector in path record query accordingly. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index cf71d2a..3bc052f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -441,9 +441,11 @@ static struct ipoib_path *path_rec_creat INIT_LIST_HEAD(&path->neigh_list); memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid)); - path->pathrec.sgid = priv->local_gid; - path->pathrec.pkey = cpu_to_be16(priv->pkey); - path->pathrec.numb_path = 1; + path->pathrec.sgid = priv->local_gid; + path->pathrec.pkey = cpu_to_be16(priv->pkey); + path->pathrec.numb_path = 1; + path->pathrec.mtu = priv->broadcast->mcmember.mtu; + path->pathrec.mtu_selector = IB_SA_GTE; return path; } -- MST From mst at mellanox.co.il Wed Sep 13 15:11:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 01:11:40 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <1158184605.13748.18709.camel@hal.voltaire.com> References: <1158184605.13748.18709.camel@hal.voltaire.com> Message-ID: <20060913221140.GC28790@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > On Wed, 2006-09-13 at 17:49, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > On Wed, 2006-09-13 at 17:09, Michael S. Tsirkin wrote: > > > > Quoting r. Hal Rosenstock : > > > > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K > > > > > > > > > > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote: > > > > > > Quoting r. Hal Rosenstock : > > > > > > > > If I understood Michael's comment properly, this will have the side > > > > > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs. As far as > > > > > > > > I know, there is no way to specify whether a path is needed for UD vs. > > > > > > > > RC in the path query. > > > > > > > > > > > > > > I don't know how either. I don't think it can be done (at least > > > > > > > currently per the standard). > > > > > > > > > > > > We don't really need to know whether path is for RC or UD QP. > > > > > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K. > > > > > > > > > > That's the default and not the minimum MTU (for IPoIB). > > > > > > > > How isn't it? > > > > > > Look at RFC 4391 as to the requirement. > > > > I'm talking about our implementation not the spec. > > Don't we risk interop issues by relying on things not required in the > spec ? Yo confuse me. IPoIB currently assumes that broadcast group MTU <= path MTU for any path, but does not set MTU selector in SA query so SA could thinkably give it any MTU. This is assumption not in the spec and I think should be fixed ASAP, by setting path selector. -- MST From mshefty at ichips.intel.com Wed Sep 13 15:10:38 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 15:10:38 -0700 Subject: [openib-general] [PATCH] IB/cma: add rdma_establish In-Reply-To: <20060907214524.GA14791@mellanox.co.il> References: <20060907214524.GA14791@mellanox.co.il> Message-ID: <450881DE.9070806@ichips.intel.com> Michael S. Tsirkin wrote: > IB/cma: add rdma_establish > > Make it possible for ULPs to handle RTU loss by calling > rdma_establish. I've committed this patch to svn 9470. It still requires exporting the rdma_establish call to userspace. - Sean From jgunthorpe at obsidianresearch.com Wed Sep 13 15:19:40 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 13 Sep 2006 16:19:40 -0600 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: <20060913220859.GB28790@mellanox.co.il> References: <20060913220859.GB28790@mellanox.co.il> Message-ID: <20060913221940.GC31285@obsidianresearch.com> On Thu, Sep 14, 2006 at 01:08:59AM +0300, Michael S. Tsirkin wrote: > > Umm -- why does it need a 2K MTU? As far as I know it should work > > fine with any MTU, assuming the SA sets the MTU of the broadcast > > multicast group correctly. > > Hmm, you are right, it is just that existing implementations all > set that to 2K. IPv6 has a required minimum MTU of 1280 bytes. If IPv6 is to be used over IB then the MTU must be 2k. Jason From trimmer at silverstorm.com Wed Sep 13 15:28:14 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Wed, 13 Sep 2006 18:28:14 -0400 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: <20060913220103.GA28790@mellanox.co.il> Message-ID: > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Wednesday, September 13, 2006 6:01 PM > To: Rimmer, Todd > Cc: Sean Hefty; openib-general at openib.org > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to > limitMTU to 1K > > Quoting r. Rimmer, Todd : > > > > Pushing these types of ULP and source/destination specific issues into > > the core stack or SM will get very complex very quick. > > It's actually relatively simple. So here is how it gets complex. The best MTU needs to be selected for various combos such as: Tavor w/IPoIB Tavor to Storage target with SRP Tavor to eHCA with SDP Tavor to PathScale with MPI Tavor to DDR Arbel with SRP etc etc The answer for many of the above combos may not be 1K MTU runs best. Hence if we try to support this in the SA, it needs to know about all these subtle combinations. The IB spec avoids such complex combos by having each Node reports its MTU capabilities (as well as others like outstanding RDMA reads, etc). > > > Given the issue > > on the table (Tavor performance) is specific to an older HCA model, it > > may not even be that critical since the highest performance customers > > have long since moved toward PCIe and DDR fabrics, neither of which are > > supported by Tavor. > > All the more reason to pt the simple logic in one place > and not expect all apprlications to optimize for this hardware. All the reason to invest in more important requirements, such as SDP Z-Copy. Especially since most of the performance critical applications (Open MPI, Scali MPI, MVAPICH MPI, etc) have already implemented this optimization. Todd Rimmer From halr at voltaire.com Wed Sep 13 15:37:24 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 18:37:24 -0400 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: <20060913220859.GB28790@mellanox.co.il> References: <20060913220859.GB28790@mellanox.co.il> Message-ID: <1158187004.13748.20243.camel@hal.voltaire.com> On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote: > Quoting r. Roland Dreier : > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu > > Michael> selector in path record query accordingly. > > > > Umm -- why does it need a 2K MTU? As far as I know it should work > > fine with any MTU, assuming the SA sets the MTU of the broadcast > > multicast group correctly. > > Hmm, you are right, it is just that existing implementations all > set that to 2K. By default yes. It can be configured. > But there is a silent assumption that MTU of any path is >= broadcast > multicast group MTU, and this is what I want to fix. The spec says: "The value (for IB MTU) assigned to the broadcast-GID must not be greater than any physical link MTU spanned by the IPoIB subnet". so if the broadcast group is improperly setup not to follow this, there will be other issues. It doesn't need to be included in the PR request. -- Hal > Like this then? We could look at dev->mtu instead, but that's > a couple of extra lines and I'm not sure it's worth the complexity. > What do you think? > > -- > > IPoIB in linux needs MTU on any path to be >= broadcast mtu. > Therefore it must set mtu selector in path record query accordingly. > > Signed-off-by: Michael S. Tsirkin > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index cf71d2a..3bc052f 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -441,9 +441,11 @@ static struct ipoib_path *path_rec_creat > INIT_LIST_HEAD(&path->neigh_list); > > memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid)); > - path->pathrec.sgid = priv->local_gid; > - path->pathrec.pkey = cpu_to_be16(priv->pkey); > - path->pathrec.numb_path = 1; > + path->pathrec.sgid = priv->local_gid; > + path->pathrec.pkey = cpu_to_be16(priv->pkey); > + path->pathrec.numb_path = 1; > + path->pathrec.mtu = priv->broadcast->mcmember.mtu; > + path->pathrec.mtu_selector = IB_SA_GTE; > > return path; > } From rjwalsh at pathscale.com Wed Sep 13 15:49:25 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 13 Sep 2006 15:49:25 -0700 Subject: [openib-general] [openfabrics-ewg] OFED-1.1-rc4 is ready In-Reply-To: <1158184353.2661.22.camel@fc6.xsintricity.com> References: <1158125915.30173.27.camel@sardonyx> <20060913062518.GL20225@mellanox.co.il> <1158184353.2661.22.camel@fc6.xsintricity.com> Message-ID: <45088AF5.4020806@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I don't think you can do symbol versioning here. Right - probably being a little more verbose about finding the file but not the symbol would be a good idea, though. It took me a bit of gdb work to track the problem down: not really a big deal, but a clearer error might have helped. Another idea (I haven't thought the implications through yet - just throwing it out there) is to dlsym() a "version" symbol that the library is expected to provide and check that it matches what you expect it to: sort of like the way the user verbs stuff checks that the kernel uverbs module matches it. If dlsym() fails to find a symbol, then you're running an older one anyway. None of this helps at compile time, where it would be preferable to spot the problem. Another not-really-sat-down-and-thought-about-it idea is to have something like this in infiniband/driver.h: #define VERBS_LIBRARY_VERSION 2 #ifdef VERBS_DRIVER_VERSION #if VERBS_DRIVER_VERSION != VERBS_LIBRARY_VERSION #error verbs library version doesn't match driver version. #endif #endif The VERBS_LIBRARY_VERSION would be bumped on an API change. VERBS_DRIVER_VERSION would be defined in the driver library (in mthca.h, ipathverbs.h and ehca_uinit.c) and would be updated to match. Just brainstorming. Anyone else got any thoughts or suggestions? Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQiK9fzvnpzTd9fxAQLIEAf9G56jSWtGyG3kFwnbV7WpMNVmuA04xvm4 FA/y3BwjsNAckfBGm+13BVUuvqs9idm7UmC82jaXxIvm+cwoDNfBXSUj/4VqJW/y ZHESz0ulcyNXEhEANoIFb2NjmL1Fadl8cWEPW9rDPxyw7eSke/Wd1a8qwkKA+1dq L9L5+Cp72IV+5cKm4EPqV+R+MeO5UjNkd06/g4XVKVuEMYhnTvBhpu9ePt+mZ1zP otwwC/eI5ngvMAk2thBQfi0zEaFkqiLkiEUGP/PofmaJZuN4lcp1R/2FSiP7K2fj 3KY6HLGl+6wDpjJ0PpnIhSp3h3vFkeRFtJHKhNhOr+vM8qiGrQTlHg== =LVqb -----END PGP SIGNATURE----- From rdreier at cisco.com Wed Sep 13 16:07:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Sep 2006 16:07:19 -0700 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: <20060913220859.GB28790@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 14 Sep 2006 01:08:59 +0300") References: <20060913220859.GB28790@mellanox.co.il> Message-ID: > + path->pathrec.mtu = priv->broadcast->mcmember.mtu; > + path->pathrec.mtu_selector = IB_SA_GTE; Does this do anything without setting the component mask of the actual request?? - R. From halr at voltaire.com Wed Sep 13 16:37:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2006 19:37:58 -0400 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: References: <20060913220859.GB28790@mellanox.co.il> Message-ID: <1158190635.13748.22540.camel@hal.voltaire.com> On Wed, 2006-09-13 at 19:07, Roland Dreier wrote: > > + path->pathrec.mtu = priv->broadcast->mcmember.mtu; > > + path->pathrec.mtu_selector = IB_SA_GTE; > > Does this do anything without setting the component mask of the actual request?? As you imply (if you are asking for verification), SA would ignore these fields without the corresponding CM bits set. -- Hal > - R. From mshefty at ichips.intel.com Wed Sep 13 17:02:14 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 13 Sep 2006 17:02:14 -0700 Subject: [openib-general] [PATCH v3] ib_sa: require SA registration In-Reply-To: References: <000501c6c57b$2594dd00$8698070a@amr.corp.intel.com> Message-ID: <45089C06.4050908@ichips.intel.com> Roland Dreier wrote: > OK, I added the following to my for-2.6.19 branch. The differences > from your patch are: > > - CMA can have a static variable (good to avoid clashes with a global > 'sa_client' variable name too) > - IPoIB does not use multicast module upstream, fix ipoib_multicast.c too. > - Simplify sa_query.c changes a little. I don't like the > "deref_client" name for a function, since it sounds too much like > dereferencing a pointer rather than dropping a reference. And I > also didn't like ib_sa_client_get() having a magic side effect of > setting query->client. So I just open-coded more stuff. > > How does this look? I took the changes in your for-2.6.19 branch, modified the original patch to match, and committed that to svn. - Sean From yc_zhou at ncic.ac.cn Wed Sep 13 17:26:18 2006 From: yc_zhou at ncic.ac.cn (Yingchao Zhou) Date: Thu, 14 Sep 2006 08:26:18 +0800 Subject: [openib-general] Problem related to integration of OS/NIC Message-ID: <20060914002634.42842FB046@ncic.ac.cn> The current kernel set PAGE_COPY without write bit. This will cause intermittent non-cosistent data for user-level network drivers such as Infiniband, Quadrics and Myrinet. Which has also be mentioned by Costin Iancu in the paper "HUNTing the Overlap " (PACT'05). An example of such phenomena is the following sequences: register a memory space BUFF for receive message, receive message, call mprotect(...PROT_NONE...) and mprotect(...PROT_READ|PROT_WRITE) one by one, write into BUFF, then receive again. The second time received data will perhaps not be the data sent by the peer machine but the data written by itself in the 4th step. The reson is that : 1) User-level network driver locks phy pages when memory space is registered; 2) 2 calls to mprotect change ptes in the space to PAGE_COPY, so write any page in the space will cause a page fault; 3) In the page fault handler, it goes to do_wp_page, and in it if Page Is Locked, a new page is generated and filled into the pte, which is the COW(Copy-On-Write). So the physical page seen by the host is not the same one by the NIC. Adding PAGE_RW to PAGE_COPY will resolve this problem. In my option, the reason for absense of RW is to save memory by mapping all those only read pages into ZERO_PAGE. But is there really programs which make many read-ops in memory space without even initialize them? ___________________________________________________ _ Yingchao Zhou _ _ ICT, CAS _ _ (86)010-62601009 _ ___________________________________________________ From tom at opengridcomputing.com Wed Sep 13 20:33:23 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 13 Sep 2006 22:33:23 -0500 Subject: [openib-general] CMA issue: bind selects the same port after close In-Reply-To: <000001c6d6ee$9a47e000$5bd9180a@amr.corp.intel.com> Message-ID: On 9/12/06 11:39 PM, "Sean Hefty" wrote: >>> I completely understand that the existing port management services are not >>> exported, but functionally, they support multiple port spaces, show up in >>> netstat, etc... Can someone please explain to me the reluctance to use these >>> services in favor of replicating them? > > My reluctance to use the existing port spaces is that we're not guaranteed to > run TCP or IP. I'm happy to map the address spaces, but that's not the same > as > using those addresses when you're not using that protocol. > >> inet_csk_get_port actually *is* exported, and while it might be hard for CMA >> to >> use it (needs struct sock*), maybe it is easy for SDP. > Yes, I agree. This is the crux of the issue. The sock structure is coupled with inet_csk_get_port, and it is not trivial in size. This service, however, is itself built on lower level port allocation services that are not coupled with struct sock, but are also not exported. So what I think needs to be done is to look at these lower level services and decide a) how to effectively export them, and b) rationalize their export. > I did look at this, but the use of struck sock made it extremely difficult for > the CMA to use the existing calls. > >> So, possibly we should just leave the CMA port allocation as is, >> and enhance SDP to use inet_csk_get_port. > > That sounds reasonable. > Short term, perhaps, but long-term, I think we end up with this same kind of logic being replicated in ULP all over the place. > - Sean From mst at mellanox.co.il Wed Sep 13 21:46:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 07:46:22 +0300 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: <1158187004.13748.20243.camel@hal.voltaire.com> References: <1158187004.13748.20243.camel@hal.voltaire.com> Message-ID: <20060914044622.GA24586@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote: > > Quoting r. Roland Dreier : > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > > > Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu > > > Michael> selector in path record query accordingly. > > > > > > Umm -- why does it need a 2K MTU? As far as I know it should work > > > fine with any MTU, assuming the SA sets the MTU of the broadcast > > > multicast group correctly. > > > > Hmm, you are right, it is just that existing implementations all > > set that to 2K. > > By default yes. It can be configured. > > > But there is a silent assumption that MTU of any path is >= broadcast > > multicast group MTU, and this is what I want to fix. > > The spec says: > "The value (for IB MTU) assigned to the broadcast-GID must not be > greater than any physical link MTU spanned by the IPoIB subnet". > so if the broadcast group is improperly setup not to follow this, there > will be other issues. Correct. IPoIB uses broadcast group MTU to get the value reported to Linux. If some link has a lower MTU IPoIB can not use it. > It doesn't need to be included in the PR request. I disagree here. If you do not set selector, SA is free to return a path with lower MTU even though physical link allows higher MTU. Does it say otherwise somewhere? -- MST From mst at mellanox.co.il Wed Sep 13 22:03:35 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 08:03:35 +0300 Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K In-Reply-To: References: <20060913220103.GA28790@mellanox.co.il> Message-ID: <20060914050334.GD24586@mellanox.co.il> Quoting r. Rimmer, Todd : > > All the more reason to pt the simple logic in one place > > and not expect all apprlications to optimize for this hardware. > > All the reason to invest in more important requirements This is completely orthogonal - Tavor gets better speed with 1K MTU no matter what you do. -- MST From mst at mellanox.co.il Wed Sep 13 22:35:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 08:35:09 +0300 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: References: Message-ID: <20060914053509.GA24868@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > + path->pathrec.mtu = priv->broadcast->mcmember.mtu; > > + path->pathrec.mtu_selector = IB_SA_GTE; > > Does this do anything without setting the component mask of the actual request?? > Ugh. Correct of course. The SA I was testing against seems to have a bug. The following is yet untested - any more obvious gotchas you can see? diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index cf71d2a..c8e8dd3 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -441,9 +441,11 @@ static struct ipoib_path *path_rec_creat INIT_LIST_HEAD(&path->neigh_list); memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid)); - path->pathrec.sgid = priv->local_gid; - path->pathrec.pkey = cpu_to_be16(priv->pkey); - path->pathrec.numb_path = 1; + path->pathrec.sgid = priv->local_gid; + path->pathrec.pkey = cpu_to_be16(priv->pkey); + path->pathrec.numb_path = 1; + path->pathrec.mtu = priv->broadcast->mcmember.mtu; + path->pathrec.mtu_selector = IB_SA_GTE; return path; } @@ -464,7 +466,8 @@ static int path_rec_start(struct net_dev IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_NUMB_PATH | - IB_SA_PATH_REC_PKEY, + IB_SA_PATH_REC_PKEY | + IB_SA_PATH_REC_MTU_SELECTOR, 1000, GFP_ATOMIC, path_rec_completion, path, &path->query); -- MST From mst at mellanox.co.il Wed Sep 13 22:41:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 08:41:05 +0300 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <20060914054104.GC24868@mellanox.co.il> Quoting r. Woodruff, Robert J : > I tried getting the latest ofed 1.1 ipathverbs from svn today that I > thought would have > a fix for this, and I think I got it built ok, although the mellanox > build environment is less than intuitive, but it still seems to fail. > Guess we will try again with RC5 tomorrow. It's actually OFED build environment now :) So you really should report improvement suggestions on list. -- MST From thomas.bub at thomson.net Wed Sep 13 23:03:10 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Thu, 14 Sep 2006 08:03:10 +0200 Subject: [openib-general] OFED can't compile against sa.h under SLES10 x86_64 Message-ID: Michael and Bryan, find the libibcm example cmpost.c that fails to compile enclosed. My compiler output looks like: cc -ggdb -Wall -O0 -I. -I./usr/include -I./oibfix -I/usr/local/ofed/include -I/usr/src/linux/drivers/infiniband/include -D__x86_64__ -c -o cmpost.o cmpost.c cmpost.c: In function 'query_for_path': cmpost.c:658: error: invalid use of undefined type 'struct ibv_sa_path_rec' cmpost.c:658: error: dereferencing pointer to incomplete type cmpost.c: In function 'run_client': cmpost.c:679: warning: assignment from incompatible pointer type make: *** [cmpost.o] Error 1 It's OK under SLES9 but fails under SLES10. Thanks Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: cmpost.c Type: application/octet-stream Size: 16040 bytes Desc: cmpost.c URL: From ogerlitz at voltaire.com Wed Sep 13 23:36:00 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 14 Sep 2006 09:36:00 +0300 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <1158172258.8759.230.camel@brick.pathscale.com> References: <1158108010.8759.192.camel@brick.pathscale.com> <4507C8C2.6050206@voltaire.com> <1158172258.8759.230.camel@brick.pathscale.com> Message-ID: <4508F850.5050804@voltaire.com> Ralph Campbell wrote: > On Wed, 2006-09-13 at 12:00 +0300, Or Gerlitz wrote: >> Ralph Campbell wrote: > Well, the other parts of the kernel might not need a kernel virtual > address but the ib_ipath driver still does. So you agree there is a need to kmap/kunamp pages which the user wants to use with IB and are not mapped into the kernel virt address space? > I don't understand what you are talking about. There is an IB > wire protocol for RDMA, SEND, etc. That doesn't change depending > on the HCA. > The InfiniPath HCA has a ring buffer of receive buffers and all > incoming IB packets are DMA'ed into one of these buffers. > The ib_ipath software driver examines the packet and > copies it to the appropriate address. For a packet received with > a RC_RDMA_WRITE_FIRST, the RKEY and IB address are used to convert > that into a kernel virtual address and the data is copied. > The same happens for RC_SEND_FIRST but the KV address comes from > the LKEY and address in the work request posted by ib_post_recv(). OK, this make sense. Lets see if i follow: you say that the Infinipath HCA is RX DMA-able but it does RX DMA to the ipath driver private RX buffers and then the driver copies from these buffers to the user buffer. My guess is that you do that to support both recv and rdma read on this QP since if you would only need to support recv you can have the hca dma-ing to the user posted rx buffer. > Sending data is similar, the driver constructs a packet with the > appropriate opcode and writes it to the chip which puts it on > the wire. OK. From ogerlitz at voltaire.com Thu Sep 14 00:12:36 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 14 Sep 2006 10:12:36 +0300 (IDT) Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name Message-ID: change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add help text clarifying what the thing does. Adding the help text also has the side effect of the cma config being visible when one does make menuconfig Signed-off-by: Or Gerlitz diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 69a53d4..7feea77 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -29,11 +29,16 @@ config INFINIBAND_USER_ACCESS libibverbs, libibcm and a hardware driver library from . -config INFINIBAND_ADDR_TRANS +config INFINIBAND_RDMA_CM bool depends on INFINIBAND && INET default y - + ---help--- + RDMA transport independent communication management support. + This includes handling of IP to RDMA address resolution (eg IB ARP), + IB route resolution (eg IB SA Path query) and interaction with the + transport communication manager (eg the IB and iWARP CM). + source "drivers/infiniband/hw/mthca/Kconfig" source "drivers/infiniband/hw/ipath/Kconfig" diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 68e73ec..531b3c4 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -1,4 +1,4 @@ -infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o +infiniband-$(CONFIG_INFINIBAND_RDMA_CM) := ib_addr.o rdma_cm.o obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ ib_cm.o $(infiniband-y) From erezz at voltaire.com Thu Sep 14 01:28:48 2006 From: erezz at voltaire.com (Erez Zilber) Date: Thu, 14 Sep 2006 11:28:48 +0300 Subject: [openib-general] fix iSER description and selections in Kconfig In-Reply-To: References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> Message-ID: <450912C0.8070807@voltaire.com> Roland Dreier wrote: > There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig > file. ISER only depends on INFINIBAND && SCSI. However it is easily > possible to enable INFINIBAND and SCSI without enabling INET (in fact > they can be enabled without NET as in the original config in this thread). > > iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it > depends on, so this alone will result in a broken config. However > nothing will enable INET (which I think you said iser depends on). So > something like the below is required, I think. Although it would > probably be better to make iser depend on INET (as ISCSI_TCP does) > rather than selecting NET and INET. > > Toralf, can you confirm that applying this patch and doing make > oldconfig and make with your original config works OK? > > Thanks, > Roland > > diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig > index fead87d..a122bb4 100644 > --- a/drivers/infiniband/ulp/iser/Kconfig > +++ b/drivers/infiniband/ulp/iser/Kconfig > @@ -1,6 +1,8 @@ > config INFINIBAND_ISER > tristate "ISCSI RDMA Protocol" > depends on INFINIBAND && SCSI > + select NET > + select INET > select SCSI_ISCSI_ATTRS > ---help--- > Support for the ISCSI RDMA Protocol over InfiniBand. This > Roland, I think that the patch below covers all cases. It depends on the patch that Or sent this morning for the config entry of the CMA. fix the description of iSER in Kconfig. It is not accurate. Also, iSER used the CMA and INET. It depends on SCSI_ISCSI_ATTRS that depends on NET. Selecting NET, INET & INFINIBAND_RDMA_CM ensures that the config won't break. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/Kconfig | 11 +++++++---- 1 files changed, 7 insertions(+), 4 deletions(-) 3dc4e3bf0716d502a6fd7e62806c4932e8978e6b diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig index fead87d..c251855 100644 --- a/drivers/infiniband/ulp/iser/Kconfig +++ b/drivers/infiniband/ulp/iser/Kconfig @@ -1,11 +1,14 @@ config INFINIBAND_ISER - tristate "ISCSI RDMA Protocol" + tristate "iSCSI Extensions for RDMA (iSER)" depends on INFINIBAND && SCSI + select NET + select INET + select INFINIBAND_RDMA_CM select SCSI_ISCSI_ATTRS ---help--- - Support for the ISCSI RDMA Protocol over InfiniBand. This - allows you to access storage devices that speak ISER/ISCSI + Support for the iSCSI Extensions for RDMA (iSER) Protocol over InfiniBand. This + allows you to access storage devices that speak iSCSI over iSER over InfiniBand. The ISER protocol is defined by IETF. - See . + See . -- 1.2.6 From erezz at voltaire.com Thu Sep 14 01:42:22 2006 From: erezz at voltaire.com (Erez Zilber) Date: Thu, 14 Sep 2006 11:42:22 +0300 Subject: [openib-general] 2 SLES 10 backport directories Message-ID: <450915EE.1090705@voltaire.com> Michael, I saw that there are 2 SLES 10 backport directories in the svn: https://openib.org/svn/gen2/branches/backport/sles10/ - this one contains patches that we added for SLES 10 https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/ - this one was added later by you. Can we unite them? Here's my motivation: I want to be able to install SLES 10, replace its infiniband dir with infiniband from openib's svn, apply all SLES 10 patches (from a single directory) and then it should work. This should help us in future OFED releases. Thanks -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com From erezz at voltaire.com Thu Sep 14 02:03:00 2006 From: erezz at voltaire.com (Erez Zilber) Date: Thu, 14 Sep 2006 12:03:00 +0300 Subject: [openib-general] [PATCH] IB/iser: fix iSER description and selections in Kconfig In-Reply-To: <450912C0.8070807@voltaire.com> References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> <450912C0.8070807@voltaire.com> Message-ID: <45091AC4.3090005@voltaire.com> Erez Zilber wrote: > Roland Dreier wrote: > >> There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig >> file. ISER only depends on INFINIBAND && SCSI. However it is easily >> possible to enable INFINIBAND and SCSI without enabling INET (in fact >> they can be enabled without NET as in the original config in this thread). >> >> iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it >> depends on, so this alone will result in a broken config. However >> nothing will enable INET (which I think you said iser depends on). So >> something like the below is required, I think. Although it would >> probably be better to make iser depend on INET (as ISCSI_TCP does) >> rather than selecting NET and INET. >> >> Toralf, can you confirm that applying this patch and doing make >> oldconfig and make with your original config works OK? >> >> Thanks, >> Roland >> >> diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig >> index fead87d..a122bb4 100644 >> --- a/drivers/infiniband/ulp/iser/Kconfig >> +++ b/drivers/infiniband/ulp/iser/Kconfig >> @@ -1,6 +1,8 @@ >> config INFINIBAND_ISER >> tristate "ISCSI RDMA Protocol" >> depends on INFINIBAND && SCSI >> + select NET >> + select INET >> select SCSI_ISCSI_ATTRS >> ---help--- >> Support for the ISCSI RDMA Protocol over InfiniBand. This >> >> > Roland, > > I think that the patch below covers all cases. It depends on the patch > that Or sent this morning for the config entry of the CMA. > > > Please ignore the previous message. I didn't format the subject correctly. Here it is again: fix the description of iSER in Kconfig. It is not accurate. Also, iSER used the CMA and INET. It depends on SCSI_ISCSI_ATTRS that depends on NET. Selecting NET, INET & INFINIBAND_RDMA_CM ensures that the config won't break. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/Kconfig | 11 +++++++---- 1 files changed, 7 insertions(+), 4 deletions(-) 3dc4e3bf0716d502a6fd7e62806c4932e8978e6b diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig index fead87d..c251855 100644 --- a/drivers/infiniband/ulp/iser/Kconfig +++ b/drivers/infiniband/ulp/iser/Kconfig @@ -1,11 +1,14 @@ config INFINIBAND_ISER - tristate "ISCSI RDMA Protocol" + tristate "iSCSI Extensions for RDMA (iSER)" depends on INFINIBAND && SCSI + select NET + select INET + select INFINIBAND_RDMA_CM select SCSI_ISCSI_ATTRS ---help--- - Support for the ISCSI RDMA Protocol over InfiniBand. This - allows you to access storage devices that speak ISER/ISCSI + Support for the iSCSI Extensions for RDMA (iSER) Protocol over InfiniBand. This + allows you to access storage devices that speak iSCSI over iSER over InfiniBand. The ISER protocol is defined by IETF. - See . + See . -- 1.2.6 From ogerlitz at voltaire.com Thu Sep 14 03:51:20 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 14 Sep 2006 13:51:20 +0300 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com> References: <1158108010.8759.192.camel@brick.pathscale.com> Message-ID: <45093428.5010009@voltaire.com> Ralph Campbell wrote: > +static inline dma_addr_t ib_dma_map_sg(struct ib_device *dev, > + struct scatterlist *sg, int nents, > + enum dma_data_direction direction) > +{ > + return dev->map_sg ? > + dev->map_sg(dev, sg, nents, direction) : > + dma_map_sg(dev->dma_device, sg, nents, direction); > +} As SG dma mapping happens in place and you don't want to change struct scatterlist for every arch, i think you would need to keep some mapping (hash) from each struct scatterlist to its ipath buddy... Also you would need to implement the sg_dma_address() and sg_dma_len() macros used by ULP code when page/s is/are to be input-ed for the IB verbs layer eg to get an SG FMR-ed or send/recv from/into a page and use queries into the ipath scatterlist buddy. Or. From halr at voltaire.com Thu Sep 14 03:54:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2006 06:54:03 -0400 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: <20060914044622.GA24586@mellanox.co.il> References: <1158187004.13748.20243.camel@hal.voltaire.com> <20060914044622.GA24586@mellanox.co.il> Message-ID: <1158231231.13748.47916.camel@hal.voltaire.com> On Thu, 2006-09-14 at 00:46, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote: > > > Quoting r. Roland Dreier : > > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > > > > > Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu > > > > Michael> selector in path record query accordingly. > > > > > > > > Umm -- why does it need a 2K MTU? As far as I know it should work > > > > fine with any MTU, assuming the SA sets the MTU of the broadcast > > > > multicast group correctly. > > > > > > Hmm, you are right, it is just that existing implementations all > > > set that to 2K. > > > > By default yes. It can be configured. > > > > > But there is a silent assumption that MTU of any path is >= broadcast > > > multicast group MTU, and this is what I want to fix. > > > > The spec says: > > "The value (for IB MTU) assigned to the broadcast-GID must not be > > greater than any physical link MTU spanned by the IPoIB subnet". > > so if the broadcast group is improperly setup not to follow this, there > > will be other issues. > > Correct. IPoIB uses broadcast group MTU to get the value reported to > Linux. If some link has a lower MTU IPoIB can not use it. > > > It doesn't need to be included in the PR request. > > I disagree here. If you do not set selector, SA is free to return > a path with lower MTU even though physical link allows higher MTU. > Does it say otherwise somewhere? No but isn't this relying on using PRs in a certain way by IPoIB implementations (and any other UD application) v. connected apps ? -- Hal From jackm at dev.mellanox.co.il Thu Sep 14 04:12:27 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 14 Sep 2006 14:12:27 +0300 Subject: [openib-general] OFED can't compile against sa.h under SLES10 x86_64 Message-ID: <200609141412.27577.jackm@dev.mellanox.co.il> I was unable to reproduce the problem you describe, under SLES10 x86_64. Here, your cmpost.c file compiled and linked without any problems. I used a slightly different gcc command line (given below). I took the cmpost.c file you provided, placed it under /usr/local/ofed/src/openib-1.1/src/userspace/libibcm/examples (under an OFED 1.1-rc5 prerelease candidate installation). I then did the following: cd libibcm/examples gcc -ggdb -Wall -O0 -I/usr/local/ofed/include -D__x86_64__ /usr/local/ofed/lib64/libibcommon.so /usr/local/ofed/lib64/librdmacm.so /usr/local/ofed/lib64/libibcm.so -o cmpost cmpost.c (the above gcc command is broken up into several lines for easy reading) The compilation was successful. I did not experience any compilation or linkage problems. I was able to run the resulting "cmpost" executable file. gcc version: gcc (GCC) 4.1.0 (SUSE Linux) Linux distribution: (from file /etc/SuSE-release): SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 Kernel version (uname -a): Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux I then retried everything using OFED 1.1 RC4, and also succeeded in compiling and running cmpost.c. The following is the list of OFED packages that I installed for the above experiment: ib_ipoib ib_mthca ib_verbs kernel-ib kernel-ib-devel libibcm libibcm-devel libibcommon libibcommon-devel libibmad libibmad-devel libibumad libibumad-devel libibverbs libibverbs-devel libibverbs-utils libmthca libmthca-devel librdmacm librdmacm-devel librdmacm-utils ofed-scripts - Jack From mst at mellanox.co.il Thu Sep 14 04:14:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 14:14:03 +0300 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: <1158231231.13748.47916.camel@hal.voltaire.com> References: <1158231231.13748.47916.camel@hal.voltaire.com> Message-ID: <20060914111403.GA25691@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > On Thu, 2006-09-14 at 00:46, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > > > On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote: > > > > Quoting r. Roland Dreier : > > > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > > > > > > > Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu > > > > > Michael> selector in path record query accordingly. > > > > > > > > > > Umm -- why does it need a 2K MTU? As far as I know it should work > > > > > fine with any MTU, assuming the SA sets the MTU of the broadcast > > > > > multicast group correctly. > > > > > > > > Hmm, you are right, it is just that existing implementations all > > > > set that to 2K. > > > > > > By default yes. It can be configured. > > > > > > > But there is a silent assumption that MTU of any path is >= broadcast > > > > multicast group MTU, and this is what I want to fix. > > > > > > The spec says: > > > "The value (for IB MTU) assigned to the broadcast-GID must not be > > > greater than any physical link MTU spanned by the IPoIB subnet". > > > so if the broadcast group is improperly setup not to follow this, there > > > will be other issues. > > > > Correct. IPoIB uses broadcast group MTU to get the value reported to > > Linux. If some link has a lower MTU IPoIB can not use it. > > > > > It doesn't need to be included in the PR request. > > > > I disagree here. If you do not set selector, SA is free to return > > a path with lower MTU even though physical link allows higher MTU. > > Does it say otherwise somewhere? > > No but isn't this relying on using PRs in a certain way by IPoIB > implementations (and any other UD application) v. connected apps ? Not really. Tavor is faster with 1K MTU than with 2K MTU - it does not matter connected or not. So, for me, it makes sense for SM to choose 1K if Tavor is involved, unless application requested otherwise. If an application (again, no matter connected or UD) needs a specific MTU it should use mtu selector in path query. If it does not, SM is free to choose any MTU supported by link, for best performance. If one end is Tavor, this happens to be 1K and not the maximum MTU. So what we have here is IPoIB bug - it requires that path mtu >= bcast group mtu, but does not pass this information in query. This only happens to work if SM always selects max link MTU for each path query. Makes sense? -- MST From halr at voltaire.com Thu Sep 14 04:35:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2006 07:35:10 -0400 Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector In-Reply-To: <20060914111403.GA25691@mellanox.co.il> References: <1158231231.13748.47916.camel@hal.voltaire.com> <20060914111403.GA25691@mellanox.co.il> Message-ID: <1158233667.13748.49356.camel@hal.voltaire.com> On Thu, 2006-09-14 at 07:14, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > On Thu, 2006-09-14 at 00:46, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > > > > > On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote: > > > > > Quoting r. Roland Dreier : > > > > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector > > > > > > > > > > > > Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu > > > > > > Michael> selector in path record query accordingly. > > > > > > > > > > > > Umm -- why does it need a 2K MTU? As far as I know it should work > > > > > > fine with any MTU, assuming the SA sets the MTU of the broadcast > > > > > > multicast group correctly. > > > > > > > > > > Hmm, you are right, it is just that existing implementations all > > > > > set that to 2K. > > > > > > > > By default yes. It can be configured. > > > > > > > > > But there is a silent assumption that MTU of any path is >= broadcast > > > > > multicast group MTU, and this is what I want to fix. > > > > > > > > The spec says: > > > > "The value (for IB MTU) assigned to the broadcast-GID must not be > > > > greater than any physical link MTU spanned by the IPoIB subnet". > > > > so if the broadcast group is improperly setup not to follow this, there > > > > will be other issues. > > > > > > Correct. IPoIB uses broadcast group MTU to get the value reported to > > > Linux. If some link has a lower MTU IPoIB can not use it. > > > > > > > It doesn't need to be included in the PR request. > > > > > > I disagree here. If you do not set selector, SA is free to return > > > a path with lower MTU even though physical link allows higher MTU. > > > Does it say otherwise somewhere? > > > > No but isn't this relying on using PRs in a certain way by IPoIB > > implementations (and any other UD application) v. connected apps ? > > Not really. > > Tavor is faster with 1K MTU than with 2K MTU - it does not matter connected or > not. So, for me, it makes sense for SM to choose 1K if Tavor is involved, > unless application requested otherwise. > > If an application (again, no matter connected or UD) needs a specific MTU it > should use mtu selector in path query. If it does not, SM is free to choose any > MTU supported by link, for best performance. If one end is Tavor, this happens to > be 1K and not the maximum MTU. > > So what we have here is IPoIB bug - it requires that path mtu >= bcast group > mtu, but does not pass this information in query. This only happens to work > if SM always selects max link MTU for each path query. > Makes sense? Understood. As I said in a previous email, if it happens that the path MTU < broadcast group MTU, I think there would be join issues for some nodes out there. -- Hal From thomas.bub at thomson.net Thu Sep 14 05:07:38 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Thu, 14 Sep 2006 14:07:38 +0200 Subject: [openib-general] OFED can't compile against sa.h under SLES10 x86_64 Message-ID: Jack et all, I have to apologize my -I. include path pointed to OFED-1.0.1 includes where the ibv_sa_path_record was not defined yet. Doing it right it works Thanks to all for the sudden support Thomas (humbling backwards) ;-) > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Jack Morgenstein > Sent: Thursday, September 14, 2006 1:12 PM > To: Bub Thomas > Cc: openib-general at openib.org > Subject: [openib-general] OFED can't compile against sa.h under SLES10 > x86_64 > > I was unable to reproduce the problem you describe, under SLES10 x86_64. > Here, your cmpost.c file compiled and linked without any problems. > I used a slightly different gcc command line (given below). > > I took the cmpost.c file you provided, placed it under > /usr/local/ofed/src/openib-1.1/src/userspace/libibcm/examples > (under an OFED 1.1-rc5 prerelease candidate installation). > I then did the following: > > cd libibcm/examples > > gcc -ggdb -Wall -O0 -I/usr/local/ofed/include -D__x86_64__ > /usr/local/ofed/lib64/libibcommon.so > /usr/local/ofed/lib64/librdmacm.so > /usr/local/ofed/lib64/libibcm.so -o cmpost cmpost.c > > (the above gcc command is broken up into several lines for easy reading) > > The compilation was successful. I did not experience any compilation or > linkage problems. > I was able to run the resulting "cmpost" executable file. > > gcc version: gcc (GCC) 4.1.0 (SUSE Linux) > Linux distribution: (from file /etc/SuSE-release): > SUSE Linux Enterprise Server 10 (x86_64) > VERSION = 10 > > Kernel version (uname -a): > Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 > x86_64 x86_64 GNU/Linux > > I then retried everything using OFED 1.1 RC4, and also succeeded in > compiling and running cmpost.c. > > The following is the list of OFED packages that I installed for the above > experiment: > ib_ipoib > ib_mthca > ib_verbs > kernel-ib > kernel-ib-devel > libibcm > libibcm-devel > libibcommon > libibcommon-devel > libibmad > libibmad-devel > libibumad > libibumad-devel > libibverbs > libibverbs-devel > libibverbs-utils > libmthca > libmthca-devel > librdmacm > librdmacm-devel > librdmacm-utils > ofed-scripts > > - Jack > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From halr at voltaire.com Thu Sep 14 06:12:28 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2006 09:12:28 -0400 Subject: [openib-general] [PATCH] OpenSM: Change default temp directory location for non Windows platforms Message-ID: <1158239541.13748.52729.camel@hal.voltaire.com> OpenSM: Change default temp directory location for non Windows platforms This patch is intended for both trunk and 1.1. Signed-off-by: Hal Rosenstock Index: ../osm/include/opensm/osm_base.h =================================================================== --- ../osm/include/opensm/osm_base.h (revision 9347) +++ ../osm/include/opensm/osm_base.h (working copy) @@ -176,16 +176,15 @@ BEGIN_C_DECLS * OSM_DEFAULT_TMP_DIR * * DESCRIPTION -* Specifies the default temporary directory for the log file, subnet.lst -* and the other log files (with the exception of osm.log for Linux being -* in /var/log). +* Specifies the default temporary directory for the log file, +* subnet.lst, and other log files. * * SYNOPSIS */ #ifdef __WIN__ #define OSM_DEFAULT_TMP_DIR GetOsmTempPath() #else -#define OSM_DEFAULT_TMP_DIR "/tmp/" +#define OSM_DEFAULT_TMP_DIR "/var/log/" #endif /***********/ From eitan at mellanox.co.il Thu Sep 14 06:39:16 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 14 Sep 2006 16:39:16 +0300 Subject: [openib-general] [PATCH] OpenSM: Change default temp directory location for non Windows platforms In-Reply-To: <1158239541.13748.52729.camel@hal.voltaire.com> References: <1158239541.13748.52729.camel@hal.voltaire.com> Message-ID: <45095B84.6050900@mellanox.co.il> Hi Hal, Looks simple enough to get into the OFED 1.1 I assume you are going to commit it into the branch? EZ Hal Rosenstock wrote: >OpenSM: Change default temp directory location for non Windows platforms > >This patch is intended for both trunk and 1.1. > >Signed-off-by: Hal Rosenstock > >Index: ../osm/include/opensm/osm_base.h >=================================================================== >--- ../osm/include/opensm/osm_base.h (revision 9347) >+++ ../osm/include/opensm/osm_base.h (working copy) >@@ -176,16 +176,15 @@ BEGIN_C_DECLS > * OSM_DEFAULT_TMP_DIR > * > * DESCRIPTION >-* Specifies the default temporary directory for the log file, subnet.lst >-* and the other log files (with the exception of osm.log for Linux being >-* in /var/log). >+* Specifies the default temporary directory for the log file, >+* subnet.lst, and other log files. > * > * SYNOPSIS > */ > #ifdef __WIN__ > #define OSM_DEFAULT_TMP_DIR GetOsmTempPath() > #else >-#define OSM_DEFAULT_TMP_DIR "/tmp/" >+#define OSM_DEFAULT_TMP_DIR "/var/log/" > #endif > /***********/ > > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Thu Sep 14 07:06:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 17:06:51 +0300 Subject: [openib-general] [PATCH] OpenSM: Change default temp directory location for non Windows platforms In-Reply-To: <1158239541.13748.52729.camel@hal.voltaire.com> References: <1158239541.13748.52729.camel@hal.voltaire.com> Message-ID: <20060914140651.GE25691@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: [PATCH] OpenSM: Change default temp directory location for non Windows platforms > > OpenSM: Change default temp directory location for non Windows platforms > > This patch is intended for both trunk and 1.1. Could you please delay the commit till tomorrow so that we can get RC5 out? This still can get in before final. -- MST From halr at voltaire.com Thu Sep 14 07:05:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2006 10:05:58 -0400 Subject: [openib-general] [PATCH] OpenSM: Change default temp directory location for non Windows platforms In-Reply-To: <45095B84.6050900@mellanox.co.il> References: <1158239541.13748.52729.camel@hal.voltaire.com> <45095B84.6050900@mellanox.co.il> Message-ID: <1158242732.13748.54281.camel@hal.voltaire.com> Hi Eitan, On Thu, 2006-09-14 at 09:39, Eitan Zahavi wrote: > Hi Hal, > Looks simple enough to get into the OFED 1.1 > I assume you are going to commit it into the branch? Done (r9484). -- Hal > > EZ > > Hal Rosenstock wrote: > > >OpenSM: Change default temp directory location for non Windows platforms > > > >This patch is intended for both trunk and 1.1. > > > >Signed-off-by: Hal Rosenstock > > > >Index: ../osm/include/opensm/osm_base.h > >=================================================================== > >--- ../osm/include/opensm/osm_base.h (revision 9347) > >+++ ../osm/include/opensm/osm_base.h (working copy) > >@@ -176,16 +176,15 @@ BEGIN_C_DECLS > > * OSM_DEFAULT_TMP_DIR > > * > > * DESCRIPTION > >-* Specifies the default temporary directory for the log file, subnet.lst > >-* and the other log files (with the exception of osm.log for Linux being > >-* in /var/log). > >+* Specifies the default temporary directory for the log file, > >+* subnet.lst, and other log files. > > * > > * SYNOPSIS > > */ > > #ifdef __WIN__ > > #define OSM_DEFAULT_TMP_DIR GetOsmTempPath() > > #else > >-#define OSM_DEFAULT_TMP_DIR "/tmp/" > >+#define OSM_DEFAULT_TMP_DIR "/var/log/" > > #endif > > /***********/ > > > > > > > > > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From mst at mellanox.co.il Thu Sep 14 07:19:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 17:19:01 +0300 Subject: [openib-general] Fwd: IPoIB Multicast Message-ID: <20060914141901.GG25691@mellanox.co.il> Subject: IPoIB Multicast Date: Thu, 14 Sep 2006 17:08:55 +0300 From: "Eitan Zahavi" > > Quoting the > > A node joining an IP multicast group must first construct a MGID > according to the rule described in section 4 above. Once the correct > MGID is calculated, the node must call the SA of the outbound link > to attempt a "FullMember" join of the IB multicast group > corresponding to the MGID. If the IB multicast group doesn't already > exist, one must be created first with the IPoIB link MTU. The MGID > MUST use the same P_Key, Q_Key, SL, MTU and HopLimit as those used > in the broadcast-GID. For the rest of attributes too, the values > used in the broadcast-GID SHOULD be used. Hmm, IPoIB does not seem to copy anything except the pkey. Looks like a compliance issue. Specifically, I'm not sure what "other attributes" are, but I think this should include the static rate. Right? -- MST From thomas.bub at thomson.net Thu Sep 14 07:28:20 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Thu, 14 Sep 2006 16:28:20 +0200 Subject: [openib-general] Different byte order between gen1 CM and gen2 CM ->RE: How to connect gen2 CM to gen1 IBGD CM? Message-ID: Sean, I should have checked this earlier after you told me last time that the LID is taken in network order by the gen2 CM instead of host order in gen1. This time it was the service_id I stumbeled over. After putting my service_id into network order I could at least get a REQ_RECEIVED. The rest must be fine tuning from here onwards. Do you know rany other Verbs or CM parameter that does have a different byte order between gen1 and gen2? Thanks Thomas P.S.: Maybe someone should put a big "Warning" sign somewhere so that others don't stumple into that pit again. ;-) _____________________________________________ From: Bub Thomas Sent: Wednesday, September 13, 2006 4:11 PM To: 'Sean Hefty'; 'Thomas.Bub at gmx.net' Cc: openib-general at openib.org Subject: How to connect gen2 CM to gen1 IBGD CM? Sean, with your patience, the cmpost.c example and the OFED 1.1-rc4 on all machines I finally got a gen2 connection under SLES10 even with a 32-Bit executable on a x86_64 machine. Cool! Now the last part on my journey is standing out. It's a gen2 client connecting to a gen1 IBGD server. I have to do this since my gen1 server is running a 2.4 Montavista RT Linux on a PowerPC that I can't upgrade to gen2. :-( BTW.: Our application is a high speed film image transfer in the film postproduction industry leveraging the benefits of the high speed IB RDMA transport. While I have gen1 to gen1 and gen2 to gen2 running the only thing that is missing is the gen2 connecting to gen1. Just tried this with my test-executables but I did not get anything to the gen1 server. The gen1 userspace application does not even receive the IB_CM_REQ. So since your cmpost example did help me a lot on gen2 the question is: Do you have a cmpost for gen1 IBGD I can use to connect from gen2 to gen1? Or is there any other trick to play here? Thanks in advance for your assistance Thomas ............................................................ Thomas Bub Grass Valley Germany GmbH Brunnenweg 9 64331 Weiterstadt, Germany Tel: +49 6150 104 147 Fax: +49 6150 104 656 Email: Thomas.Bub at thomson.net www.GrassValley.com ............................................................ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu Sep 14 08:25:11 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Sep 2006 08:25:11 -0700 Subject: [openib-general] Different byte order between gen1 CM and gen2 CM ->RE: How to connect gen2 CM to gen1 IBGD CM? In-Reply-To: References: Message-ID: <45097457.5020007@ichips.intel.com> Bub Thomas wrote: > Do you know rany other Verbs or CM parameter that does have a different > byte order between gen1 and gen2? I'm not really familiar with the gen1 code. > P.S.: Maybe someone should put a big “Warning” sign somewhere so that > others don’t stumple into that pit again. ;-) The byte ordering in the kernel APIs are fairly clear about this, but that documentation didn't carry up to userspace everywhere. I will update the userspace documentation, but it may take me a few weeks to get to this. - Sean From rdreier at cisco.com Thu Sep 14 08:30:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 08:30:29 -0700 Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name In-Reply-To: (Or Gerlitz's message of "Thu, 14 Sep 2006 10:12:36 +0300 (IDT)") References: Message-ID: Or> change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add Or> help text clarifying what the thing does. Adding the help text Or> also has the side effect of the cma config being visible when Or> one does make menuconfig Why do we want to make this config option visible? Isn't it better for it to just take the right value automatically? - R. From rdreier at cisco.com Thu Sep 14 08:31:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 08:31:45 -0700 Subject: [openib-general] [PATCH] IB/iser: fix iSER description and selections in Kconfig In-Reply-To: <45091AC4.3090005@voltaire.com> (Erez Zilber's message of "Thu, 14 Sep 2006 12:03:00 +0300") References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> <450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com> Message-ID: Wouldn't it better just to depend on INET the way ISCSI_TCP does? 'select' is more fragile and harder to maintain than 'depends' since you always have to make sure you select the full dependency tree of every option you really need. - R. From mshefty at ichips.intel.com Thu Sep 14 08:35:44 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Sep 2006 08:35:44 -0700 Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name In-Reply-To: References: Message-ID: <450976D0.8020803@ichips.intel.com> Or Gerlitz wrote: > change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add help text > clarifying what the thing does. Adding the help text also has the side > effect of the cma config being visible when one does make menuconfig > > Signed-off-by: Or Gerlitz Acked-by: Sean Hefty Were you wanting this for 2.6.19? From eli at dev.mellanox.co.il Thu Sep 14 08:47:54 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Thu, 14 Sep 2006 18:47:54 +0300 Subject: [openib-general] ipoib send only failure Message-ID: <1158248874.18456.9.camel@localhost> Hi, when running a test I encountered the following scenario: the test sends to multicast address ipoib issues send only joins which fails. successive joins to this group will not be attempted since the query field of the mcast object holds the old pointer. From eli at dev.mellanox.co.il Thu Sep 14 08:47:58 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Thu, 14 Sep 2006 18:47:58 +0300 Subject: [openib-general] [PATCH] ipoib sendonly join Message-ID: <1158248878.18456.11.camel@localhost> When sendonly join fails mcast->query must be set to NULL in order that succeesive joins will be attempted for the group. Signed-off-by: Eli Cohen Signed-off-by: Michael S. Tsirkin Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-09-12 14:28:33.000000000 +0300 +++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-09-14 17:17:12.000000000 +0300 @@ -326,6 +326,7 @@ /* Clear the busy flag so we try again */ clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + mcast->query = NULL; } complete(&mcast->done); From sean.hefty at intel.com Thu Sep 14 09:33:16 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 14 Sep 2006 09:33:16 -0700 Subject: [openib-general] [PATCH v2] ib_sa: add generic RMPP query interface In-Reply-To: <000601c6c580$8343eb30$8698070a@amr.corp.intel.com> Message-ID: <000101c6d81b$7b2f6ca0$97d8180a@amr.corp.intel.com> Patch updated to svn tip, which includes SA registration. The following patch adds a generic interface to send MADs to the SA. The primary motivation of adding these calls is to expand the SA query interface to include RMPP responses for users wanting more than a single attribute returned from a query (e.g. multipath record queries), but it also simplifies a userspace interface. The implementation of existing SA query routines were layered on top of the generic query interface. Signed-off-by: Sean Hefty --- Index: include/rdma/ib_sa.h =================================================================== --- include/rdma/ib_sa.h (revision 9490) +++ include/rdma/ib_sa.h (working copy) @@ -82,6 +82,32 @@ enum { IB_SA_ATTR_INFORM_INFO_REC = 0xf3 }; +/* Length of SA attributes on the wire */ +enum { + IB_SA_ATTR_CLASS_PORTINFO_LEN = 72, + IB_SA_ATTR_NOTICE_LEN = 80, + IB_SA_ATTR_INFORM_INFO_LEN = 36, + IB_SA_ATTR_NODE_REC_LEN = 108, + IB_SA_ATTR_PORT_INFO_REC_LEN = 58, + IB_SA_ATTR_SL2VL_REC_LEN = 16, + IB_SA_ATTR_SWITCH_REC_LEN = 21, + IB_SA_ATTR_LINEAR_FDB_REC_LEN = 72, + IB_SA_ATTR_RANDOM_FDB_REC_LEN = 72, + IB_SA_ATTR_MCAST_FDB_REC_LEN = 72, + IB_SA_ATTR_SM_INFO_REC_LEN = 25, + IB_SA_ATTR_LINK_REC_LEN = 6, + IB_SA_ATTR_GUID_INFO_REC_LEN = 72, + IB_SA_ATTR_SERVICE_REC_LEN = 176, + IB_SA_ATTR_PARTITION_REC_LEN = 72, + IB_SA_ATTR_PATH_REC_LEN = 64, + IB_SA_ATTR_VL_ARB_REC_LEN = 72, + IB_SA_ATTR_MC_MEMBER_REC_LEN = 52, + IB_SA_ATTR_TRACE_REC_LEN = 46, + IB_SA_ATTR_MULTI_PATH_REC_LEN = 56, + IB_SA_ATTR_SERVICE_ASSOC_REC_LEN= 80, + IB_SA_ATTR_INFORM_INFO_REC_LEN = 60 +}; + enum ib_sa_selector { IB_SA_GTE = 0, IB_SA_LTE = 1, @@ -270,10 +296,83 @@ void ib_sa_register_client(struct ib_sa_ */ void ib_sa_unregister_client(struct ib_sa_client *client); +struct ib_sa_iter; + +/** + * ib_sa_iter_create - Create an iterator that may be used to walk through + * a list of returned SA records. + * @mad_recv_wc: A received response from the SA. + * + * This call allocates an iterator that is used to walk through a list of + * SA records. Users must free the iterator by calling ib_sa_iter_free. + */ +struct ib_sa_iter *ib_sa_iter_create(struct ib_mad_recv_wc *mad_recv_wc); + +/** + * ib_sa_iter_free - Release an iterator. + * @iter: The iterator to free. + */ +void ib_sa_iter_free(struct ib_sa_iter *iter); + +/** + * ib_sa_iter_next - Move an iterator to reference the next attribute and + * return the attribute. + * @iter: The iterator to move. + * + * The referenced attribute will be in wire format. The funtion returns NULL + * if there are no more attributes to return. + */ +void *ib_sa_iter_next(struct ib_sa_iter *iter); + +/** + * ib_sa_attr_size - Return the length of an SA attribute on the wire. + * @attr_id: Attribute identifier. + */ +int ib_sa_attr_size(__be16 attr_id); + struct ib_sa_query; void ib_sa_cancel_query(int id, struct ib_sa_query *query); +/** + * ib_sa_send_mad - Send a MAD to the SA. + * @client:SA client + * @device:device to send query on + * @port_num: port number to send query on + * @method:MAD method to use in the send. + * @attr:Reference to attribute in wire format to send in MAD. + * @attr_id:Attribute type identifier. + * @comp_mask:component mask to send in MAD + * @timeout_ms:time to wait for response, if one is expected + * @retries:number of times to retry request + * @gfp_mask:GFP mask to use for internal allocations + * @callback:function called when query completes, times out or is + * canceled + * @context:opaque user context passed to callback + * @sa_query:query context, used to cancel query + * + * Send a message to the SA. If a response is expected (timeout_ms is + * non-zero), the callback function will be called when the query completes. + * Status is 0 for a successful response, -EINTR if the query + * is canceled, -ETIMEDOUT is the query timed out, or -EIO if an error + * occurred sending the query. Mad_recv_wc will reference any returned + * response from the SA. It is the responsibility of the caller to free + * mad_recv_wc by call ib_free_recv_mad() if it is non-NULL. + * + * If the return value of ib_sa_send_mad() is negative, it is an + * error code. Otherwise it is a query ID that can be used to cancel + * the query. + */ +int ib_sa_send_mad(struct ib_sa_client *client, + struct ib_device *device, u8 port_num, + int method, void *attr, __be16 attr_id, + ib_sa_comp_mask comp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, + void (*callback)(int status, + struct ib_mad_recv_wc *mad_recv_wc, + void *context), + void *context, struct ib_sa_query **query); + int ib_sa_path_rec_get(struct ib_sa_client *client, struct ib_device *device, u8 port_num, struct ib_sa_path_rec *rec, Index: core/sa_query.c =================================================================== --- core/sa_query.c (revision 9490) +++ core/sa_query.c (working copy) @@ -73,31 +73,42 @@ struct ib_sa_device { }; struct ib_sa_query { - void (*callback)(struct ib_sa_query *, int, struct ib_sa_mad *); - void (*release)(struct ib_sa_query *); + void (*callback)(int, struct ib_mad_recv_wc *, void *); struct ib_sa_client *client; struct ib_sa_port *port; struct ib_mad_send_buf *mad_buf; struct ib_sa_sm_ah *sm_ah; + void *context; int id; }; struct ib_sa_service_query { void (*callback)(int, struct ib_sa_service_rec *, void *); void *context; - struct ib_sa_query sa_query; + struct ib_sa_query *sa_query; }; struct ib_sa_path_query { void (*callback)(int, struct ib_sa_path_rec *, void *); void *context; - struct ib_sa_query sa_query; + struct ib_sa_query *sa_query; }; struct ib_sa_mcmember_query { void (*callback)(int, struct ib_sa_mcmember_rec *, void *); void *context; - struct ib_sa_query sa_query; + struct ib_sa_query *sa_query; +}; + +struct ib_sa_iter { + struct ib_mad_recv_wc *recv_wc; + struct ib_mad_recv_buf *recv_buf; + int attr_size; + int attr_offset; + int data_offset; + int data_left; + void *attr; + u8 attr_data[0]; }; static void ib_sa_add_one(struct ib_device *device); @@ -532,9 +543,17 @@ EXPORT_SYMBOL(ib_init_ah_from_mcmember); int ib_sa_pack_attr(void *dst, void *src, int attr_id) { switch (attr_id) { + case IB_SA_ATTR_SERVICE_REC: + ib_pack(service_rec_table, ARRAY_SIZE(service_rec_table), + src, dst); + break; case IB_SA_ATTR_PATH_REC: ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), src, dst); break; + case IB_SA_ATTR_MC_MEMBER_REC: + ib_pack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table), + src, dst); + break; default: return -EINVAL; } @@ -545,9 +564,17 @@ EXPORT_SYMBOL(ib_sa_pack_attr); int ib_sa_unpack_attr(void *dst, void *src, int attr_id) { switch (attr_id) { + case IB_SA_ATTR_SERVICE_REC: + ib_unpack(service_rec_table, ARRAY_SIZE(service_rec_table), + src, dst); + break; case IB_SA_ATTR_PATH_REC: ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table), src, dst); break; + case IB_SA_ATTR_MC_MEMBER_REC: + ib_unpack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table), + src, dst); + break; default: return -EINVAL; } @@ -555,15 +582,100 @@ int ib_sa_unpack_attr(void *dst, void *s } EXPORT_SYMBOL(ib_sa_unpack_attr); -static void init_mad(struct ib_sa_mad *mad, struct ib_mad_agent *agent) +/* Return size of SA attributes on the wire. */ +int ib_sa_attr_size(__be16 attr_id) { - unsigned long flags; + int size; - memset(mad, 0, sizeof *mad); + switch (be16_to_cpu(attr_id)) { + case IB_SA_ATTR_CLASS_PORTINFO: + size = IB_SA_ATTR_CLASS_PORTINFO_LEN; + break; + case IB_SA_ATTR_NOTICE: + size = IB_SA_ATTR_NOTICE_LEN; + break; + case IB_SA_ATTR_INFORM_INFO: + size = IB_SA_ATTR_INFORM_INFO_LEN; + break; + case IB_SA_ATTR_NODE_REC: + size = IB_SA_ATTR_NODE_REC_LEN; + break; + case IB_SA_ATTR_PORT_INFO_REC: + size = IB_SA_ATTR_PORT_INFO_REC_LEN; + break; + case IB_SA_ATTR_SL2VL_REC: + size = IB_SA_ATTR_SL2VL_REC_LEN; + break; + case IB_SA_ATTR_SWITCH_REC: + size = IB_SA_ATTR_SWITCH_REC_LEN; + break; + case IB_SA_ATTR_LINEAR_FDB_REC: + size = IB_SA_ATTR_LINEAR_FDB_REC_LEN; + break; + case IB_SA_ATTR_RANDOM_FDB_REC: + size = IB_SA_ATTR_RANDOM_FDB_REC_LEN; + break; + case IB_SA_ATTR_MCAST_FDB_REC: + size = IB_SA_ATTR_MCAST_FDB_REC_LEN; + break; + case IB_SA_ATTR_SM_INFO_REC: + size = IB_SA_ATTR_SM_INFO_REC_LEN; + break; + case IB_SA_ATTR_LINK_REC: + size = IB_SA_ATTR_LINK_REC_LEN; + break; + case IB_SA_ATTR_GUID_INFO_REC: + size = IB_SA_ATTR_GUID_INFO_REC_LEN; + break; + case IB_SA_ATTR_SERVICE_REC: + size = IB_SA_ATTR_SERVICE_REC_LEN; + break; + case IB_SA_ATTR_PARTITION_REC: + size = IB_SA_ATTR_PARTITION_REC_LEN; + break; + case IB_SA_ATTR_PATH_REC: + size = IB_SA_ATTR_PATH_REC_LEN; + break; + case IB_SA_ATTR_VL_ARB_REC: + size = IB_SA_ATTR_VL_ARB_REC_LEN; + break; + case IB_SA_ATTR_MC_MEMBER_REC: + size = IB_SA_ATTR_MC_MEMBER_REC_LEN; + break; + case IB_SA_ATTR_TRACE_REC: + size = IB_SA_ATTR_TRACE_REC_LEN; + break; + case IB_SA_ATTR_MULTI_PATH_REC: + size = IB_SA_ATTR_MULTI_PATH_REC_LEN; + break; + case IB_SA_ATTR_SERVICE_ASSOC_REC: + size = IB_SA_ATTR_SERVICE_ASSOC_REC_LEN; + break; + case IB_SA_ATTR_INFORM_INFO_REC: + size = IB_SA_ATTR_INFORM_INFO_REC_LEN; + break; + default: + size = 0; + break; + } + return size; +} +EXPORT_SYMBOL(ib_sa_attr_size); + +static void init_mad(struct ib_sa_mad *mad, struct ib_mad_agent *agent, + int method, void *attr, __be16 attr_id, + ib_sa_comp_mask comp_mask) +{ + unsigned long flags; mad->mad_hdr.base_version = IB_MGMT_BASE_VERSION; mad->mad_hdr.mgmt_class = IB_MGMT_CLASS_SUBN_ADM; mad->mad_hdr.class_version = IB_SA_CLASS_VERSION; + mad->mad_hdr.method = method; + mad->mad_hdr.attr_id = attr_id; + mad->sa_hdr.comp_mask = comp_mask; + + memcpy(mad->data, attr, ib_sa_attr_size(attr_id)); spin_lock_irqsave(&tid_lock, flags); mad->mad_hdr.tid = @@ -617,31 +729,162 @@ retry: return ret ? ret : id; } -static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query, - int status, - struct ib_sa_mad *mad) +struct ib_sa_iter *ib_sa_iter_create(struct ib_mad_recv_wc *mad_recv_wc) +{ + struct ib_sa_iter *iter; + struct ib_sa_mad *mad = (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad; + int attr_size, attr_offset; + + attr_offset = be16_to_cpu(mad->sa_hdr.attr_offset) * 8; + attr_size = ib_sa_attr_size(mad->mad_hdr.attr_id); + if (!attr_size || attr_offset < attr_size) + return ERR_PTR(-EINVAL); + + iter = kzalloc(sizeof *iter + attr_size, GFP_KERNEL); + if (!iter) + return ERR_PTR(-ENOMEM); + + iter->data_left = mad_recv_wc->mad_len - IB_MGMT_SA_HDR; + iter->recv_wc = mad_recv_wc; + iter->recv_buf = &mad_recv_wc->recv_buf; + iter->attr_offset = attr_offset; + iter->attr_size = attr_size; + return iter; +} +EXPORT_SYMBOL(ib_sa_iter_create); + +void ib_sa_iter_free(struct ib_sa_iter *iter) +{ + kfree(iter); +} +EXPORT_SYMBOL(ib_sa_iter_free); + +void *ib_sa_iter_next(struct ib_sa_iter *iter) +{ + struct ib_sa_mad *mad; + int left, offset = 0; + + while (iter->data_left >= iter->attr_offset) { + while (iter->data_offset < IB_MGMT_SA_DATA) { + mad = (struct ib_sa_mad *) iter->recv_buf->mad; + + left = IB_MGMT_SA_DATA - iter->data_offset; + if (left < iter->attr_size) { + /* copy first piece of the attribute */ + iter->attr = &iter->attr_data; + memcpy(iter->attr, + &mad->data[iter->data_offset], left); + offset = left; + break; + } else if (offset) { + /* copy the second piece of the attribute */ + memcpy(iter->attr + offset, &mad->data[0], + iter->attr_size - offset); + iter->data_offset = iter->attr_size - offset; + offset = 0; + } else { + iter->attr = &mad->data[iter->data_offset]; + iter->data_offset += iter->attr_size; + } + + iter->data_left -= iter->attr_offset; + goto out; + } + iter->data_offset = 0; + iter->recv_buf = list_entry(iter->recv_buf->list.next, + struct ib_mad_recv_buf, list); + } + iter->attr = NULL; +out: + return iter->attr; +} +EXPORT_SYMBOL(ib_sa_iter_next); + +int ib_sa_send_mad(struct ib_sa_client *client, + struct ib_device *device, u8 port_num, + int method, void *attr, __be16 attr_id, + ib_sa_comp_mask comp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, + void (*callback)(int status, + struct ib_mad_recv_wc *mad_recv_wc, + void *context), + void *context, struct ib_sa_query **query) { - struct ib_sa_path_query *query = - container_of(sa_query, struct ib_sa_path_query, sa_query); + struct ib_sa_query *sa_query; + struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client); + struct ib_sa_port *port; + struct ib_mad_agent *agent; + int ret; + + if (!sa_dev) + return -ENODEV; + + port = &sa_dev->port[port_num - sa_dev->start_port]; + agent = port->agent; + + sa_query = kmalloc(sizeof *sa_query, gfp_mask); + if (!sa_query) + return -ENOMEM; + + sa_query->mad_buf = ib_create_send_mad(agent, 1, 0, + method == IB_SA_METHOD_GET_MULTI, + IB_MGMT_SA_HDR, IB_MGMT_SA_DATA, + gfp_mask); + if (!sa_query->mad_buf) { + ret = -ENOMEM; + goto err1; + } - if (mad) { - struct ib_sa_path_rec rec; + sa_query->port = port; + sa_query->callback = callback; + sa_query->context = context; - ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table), - mad->data, &rec); - query->callback(status, &rec, query->context); - } else - query->callback(status, NULL, query->context); + init_mad(sa_query->mad_buf->mad, agent, method, attr, attr_id, + comp_mask); + + ib_sa_client_get(client); + sa_query->client = client; + ret = send_mad(sa_query, timeout_ms, retries, gfp_mask); + if (ret < 0) + goto err2; + + *query = sa_query; + return ret; + +err2: + ib_sa_client_put(sa_query->client); + ib_free_send_mad(sa_query->mad_buf); +err1: + kfree(query); + return ret; } +EXPORT_SYMBOL(ib_sa_send_mad); -static void ib_sa_path_rec_release(struct ib_sa_query *sa_query) +static void ib_sa_path_rec_callback(int status, + struct ib_mad_recv_wc *mad_recv_wc, + void *context) { - kfree(container_of(sa_query, struct ib_sa_path_query, sa_query)); + struct ib_sa_path_query *query = context; + + if (query->callback) { + if (mad_recv_wc) { + struct ib_sa_mad *mad; + struct ib_sa_path_rec rec; + + mad = (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad; + ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table), + mad->data, &rec); + query->callback(status, &rec, query->context); + } else + query->callback(status, NULL, query->context); + } + if (mad_recv_wc) + ib_free_recv_mad(mad_recv_wc); + kfree(query); } /** * ib_sa_path_rec_get - Start a Path get query - * @client:SA client * @device:device to send query on * @port_num: port number to send query on * @rec:Path Record to send in query @@ -677,91 +920,54 @@ int ib_sa_path_rec_get(struct ib_sa_clie struct ib_sa_query **sa_query) { struct ib_sa_path_query *query; - struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client); - struct ib_sa_port *port; - struct ib_mad_agent *agent; - struct ib_sa_mad *mad; + u8 path[IB_SA_ATTR_PATH_REC_LEN]; int ret; - if (!sa_dev) - return -ENODEV; - - port = &sa_dev->port[port_num - sa_dev->start_port]; - agent = port->agent; - query = kmalloc(sizeof *query, gfp_mask); if (!query) return -ENOMEM; - query->sa_query.mad_buf = ib_create_send_mad(agent, 1, 0, - 0, IB_MGMT_SA_HDR, - IB_MGMT_SA_DATA, gfp_mask); - if (!query->sa_query.mad_buf) { - ret = -ENOMEM; - goto err1; - } - - ib_sa_client_get(client); - query->sa_query.client = client; query->callback = callback; query->context = context; - mad = query->sa_query.mad_buf->mad; - init_mad(mad, agent); - - query->sa_query.callback = callback ? ib_sa_path_rec_callback : NULL; - query->sa_query.release = ib_sa_path_rec_release; - query->sa_query.port = port; - mad->mad_hdr.method = IB_MGMT_METHOD_GET; - mad->mad_hdr.attr_id = cpu_to_be16(IB_SA_ATTR_PATH_REC); - mad->sa_hdr.comp_mask = comp_mask; - - ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), rec, mad->data); - - *sa_query = &query->sa_query; - - ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask); + ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), rec, path); + ret = ib_sa_send_mad(client, device, port_num, IB_MGMT_METHOD_GET, path, + cpu_to_be16(IB_SA_ATTR_PATH_REC), comp_mask, + timeout_ms, retries, gfp_mask, + ib_sa_path_rec_callback, query, &query->sa_query); if (ret < 0) - goto err2; + kfree(query); return ret; - -err2: - *sa_query = NULL; - ib_sa_client_put(query->sa_query.client); - ib_free_send_mad(query->sa_query.mad_buf); - -err1: - kfree(query); - return ret; } EXPORT_SYMBOL(ib_sa_path_rec_get); -static void ib_sa_service_rec_callback(struct ib_sa_query *sa_query, - int status, - struct ib_sa_mad *mad) +static void ib_sa_service_rec_callback(int status, + struct ib_mad_recv_wc *mad_recv_wc, + void *context) { - struct ib_sa_service_query *query = - container_of(sa_query, struct ib_sa_service_query, sa_query); + struct ib_sa_service_query *query = context; - if (mad) { - struct ib_sa_service_rec rec; - - ib_unpack(service_rec_table, ARRAY_SIZE(service_rec_table), - mad->data, &rec); - query->callback(status, &rec, query->context); - } else - query->callback(status, NULL, query->context); -} - -static void ib_sa_service_rec_release(struct ib_sa_query *sa_query) -{ - kfree(container_of(sa_query, struct ib_sa_service_query, sa_query)); + if (query->callback) { + if (mad_recv_wc) { + struct ib_sa_mad *mad; + struct ib_sa_service_rec rec; + + mad = (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad; + ib_unpack(service_rec_table, + ARRAY_SIZE(service_rec_table), + mad->data, &rec); + query->callback(status, &rec, query->context); + } else + query->callback(status, NULL, query->context); + } + if (mad_recv_wc) + ib_free_recv_mad(mad_recv_wc); + kfree(query); } /** * ib_sa_service_rec_query - Start Service Record operation - * @client:SA client * @device:device to send request on * @port_num: port number to send request on * @method:SA method - should be get, set, or delete @@ -799,98 +1005,56 @@ int ib_sa_service_rec_query(struct ib_sa struct ib_sa_query **sa_query) { struct ib_sa_service_query *query; - struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client); - struct ib_sa_port *port; - struct ib_mad_agent *agent; - struct ib_sa_mad *mad; + u8 service[IB_SA_ATTR_SERVICE_REC_LEN]; int ret; - if (!sa_dev) - return -ENODEV; - - port = &sa_dev->port[port_num - sa_dev->start_port]; - agent = port->agent; - - if (method != IB_MGMT_METHOD_GET && - method != IB_MGMT_METHOD_SET && - method != IB_SA_METHOD_DELETE) - return -EINVAL; - query = kmalloc(sizeof *query, gfp_mask); if (!query) return -ENOMEM; - query->sa_query.mad_buf = ib_create_send_mad(agent, 1, 0, - 0, IB_MGMT_SA_HDR, - IB_MGMT_SA_DATA, gfp_mask); - if (!query->sa_query.mad_buf) { - ret = -ENOMEM; - goto err1; - } - - ib_sa_client_get(client); - query->sa_query.client = client; query->callback = callback; query->context = context; - mad = query->sa_query.mad_buf->mad; - init_mad(mad, agent); - - query->sa_query.callback = callback ? ib_sa_service_rec_callback : NULL; - query->sa_query.release = ib_sa_service_rec_release; - query->sa_query.port = port; - mad->mad_hdr.method = method; - mad->mad_hdr.attr_id = cpu_to_be16(IB_SA_ATTR_SERVICE_REC); - mad->sa_hdr.comp_mask = comp_mask; - - ib_pack(service_rec_table, ARRAY_SIZE(service_rec_table), - rec, mad->data); - - *sa_query = &query->sa_query; - - ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask); + ib_pack(service_rec_table, ARRAY_SIZE(service_rec_table), rec, service); + ret = ib_sa_send_mad(client, device, port_num, method, service, + cpu_to_be16(IB_SA_ATTR_SERVICE_REC), comp_mask, + timeout_ms, retries, gfp_mask, + ib_sa_service_rec_callback, query, + &query->sa_query); if (ret < 0) - goto err2; - - return ret; + kfree(query); -err2: - *sa_query = NULL; - ib_sa_client_put(query->sa_query.client); - ib_free_send_mad(query->sa_query.mad_buf); - -err1: - kfree(query); return ret; } EXPORT_SYMBOL(ib_sa_service_rec_query); -static void ib_sa_mcmember_rec_callback(struct ib_sa_query *sa_query, - int status, - struct ib_sa_mad *mad) +static void ib_sa_mcmember_rec_callback(int status, + struct ib_mad_recv_wc *mad_recv_wc, + void *context) { - struct ib_sa_mcmember_query *query = - container_of(sa_query, struct ib_sa_mcmember_query, sa_query); - - if (mad) { - struct ib_sa_mcmember_rec rec; - - ib_unpack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table), - mad->data, &rec); - query->callback(status, &rec, query->context); - } else - query->callback(status, NULL, query->context); -} + struct ib_sa_mcmember_query *query = context; -static void ib_sa_mcmember_rec_release(struct ib_sa_query *sa_query) -{ - kfree(container_of(sa_query, struct ib_sa_mcmember_query, sa_query)); + if (query->callback) { + if (mad_recv_wc) { + struct ib_sa_mad *mad; + struct ib_sa_mcmember_rec rec; + + mad = (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad; + ib_unpack(mcmember_rec_table, + ARRAY_SIZE(mcmember_rec_table), + mad->data, &rec); + query->callback(status, &rec, query->context); + } else + query->callback(status, NULL, query->context); + } + if (mad_recv_wc) + ib_free_recv_mad(mad_recv_wc); + kfree(query); } int ib_sa_mcmember_rec_query(struct ib_sa_client *client, struct ib_device *device, u8 port_num, - u8 method, - struct ib_sa_mcmember_rec *rec, + u8 method, struct ib_sa_mcmember_rec *rec, ib_sa_comp_mask comp_mask, int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, @@ -900,64 +1064,27 @@ int ib_sa_mcmember_rec_query(struct ib_s struct ib_sa_query **sa_query) { struct ib_sa_mcmember_query *query; - struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client); - struct ib_sa_port *port; - struct ib_mad_agent *agent; - struct ib_sa_mad *mad; + u8 mcmember[IB_SA_ATTR_MC_MEMBER_REC_LEN]; int ret; - if (!sa_dev) - return -ENODEV; - - port = &sa_dev->port[port_num - sa_dev->start_port]; - agent = port->agent; - query = kmalloc(sizeof *query, gfp_mask); if (!query) return -ENOMEM; - query->sa_query.mad_buf = ib_create_send_mad(agent, 1, 0, - 0, IB_MGMT_SA_HDR, - IB_MGMT_SA_DATA, gfp_mask); - if (!query->sa_query.mad_buf) { - ret = -ENOMEM; - goto err1; - } - - ib_sa_client_get(client); - query->sa_query.client = client; query->callback = callback; query->context = context; - mad = query->sa_query.mad_buf->mad; - init_mad(mad, agent); - - query->sa_query.callback = callback ? ib_sa_mcmember_rec_callback : NULL; - query->sa_query.release = ib_sa_mcmember_rec_release; - query->sa_query.port = port; - mad->mad_hdr.method = method; - mad->mad_hdr.attr_id = cpu_to_be16(IB_SA_ATTR_MC_MEMBER_REC); - mad->sa_hdr.comp_mask = comp_mask; - ib_pack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table), - rec, mad->data); - - *sa_query = &query->sa_query; - - ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask); + rec, mcmember); + ret = ib_sa_send_mad(client, device, port_num, method, mcmember, + cpu_to_be16(IB_SA_ATTR_MC_MEMBER_REC), comp_mask, + timeout_ms, retries, gfp_mask, + ib_sa_mcmember_rec_callback, query, + &query->sa_query); if (ret < 0) - goto err2; + kfree(query); return ret; - -err2: - *sa_query = NULL; - ib_sa_client_put(query->sa_query.client); - ib_free_send_mad(query->sa_query.mad_buf); - -err1: - kfree(query); - return ret; } EXPORT_SYMBOL(ib_sa_mcmember_rec_query); @@ -973,13 +1100,13 @@ static void send_handler(struct ib_mad_a /* No callback -- already got recv */ break; case IB_WC_RESP_TIMEOUT_ERR: - query->callback(query, -ETIMEDOUT, NULL); + query->callback(-ETIMEDOUT, NULL, query->context); break; case IB_WC_WR_FLUSH_ERR: - query->callback(query, -EINTR, NULL); + query->callback(-EINTR, NULL, query->context); break; default: - query->callback(query, -EIO, NULL); + query->callback(-EIO, NULL, query->context); break; } @@ -990,7 +1117,7 @@ static void send_handler(struct ib_mad_a ib_free_send_mad(mad_send_wc->send_buf); kref_put(&query->sm_ah->ref, free_sm_ah); ib_sa_client_put(query->client); - query->release(query); + kfree(query); } static void recv_handler(struct ib_mad_agent *mad_agent, @@ -1002,17 +1129,11 @@ static void recv_handler(struct ib_mad_a mad_buf = (void *) (unsigned long) mad_recv_wc->wc->wr_id; query = mad_buf->context[0]; - if (query->callback) { - if (mad_recv_wc->wc->status == IB_WC_SUCCESS) - query->callback(query, - mad_recv_wc->recv_buf.mad->mad_hdr.status ? - -EINVAL : 0, - (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad); - else - query->callback(query, -EIO, NULL); - } - - ib_free_recv_mad(mad_recv_wc); + if (query->callback) + query->callback(mad_recv_wc->recv_buf.mad->mad_hdr.status ? + -EINVAL : 0, mad_recv_wc, query->context); + else + ib_free_recv_mad(mad_recv_wc); } static void ib_sa_add_one(struct ib_device *device) @@ -1046,8 +1167,9 @@ static void ib_sa_add_one(struct ib_devi sa_dev->port[i].agent = ib_register_mad_agent(device, i + s, IB_QPT_GSI, - NULL, 0, send_handler, - recv_handler, sa_dev); + NULL, IB_MGMT_RMPP_VERSION, + send_handler, recv_handler, + sa_dev); if (IS_ERR(sa_dev->port[i].agent)) goto err; From vlad at dev.mellanox.co.il Thu Sep 14 09:39:16 2006 From: vlad at dev.mellanox.co.il (vlad at dev.mellanox.co.il) Date: Thu, 14 Sep 2006 19:39:16 +0300 (IDT) Subject: [openib-general] OFED-1.1-RC5 is ready In-Reply-To: <4507D4E2.90406@dev.mellanox.co.il> References: <4507D4E2.90406@dev.mellanox.co.il> Message-ID: <22607.194.90.237.34.1158251956.squirrel@dev.mellanox.co.il> Hi, OFED-1.1-rc5 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ File: OFED-1.1-rc5.tgz Please report any issues in bugzilla http://openib.org/bugzilla/ Release details: ================ Build_id: OFED-1.1-rc5 openib-1.1 (REV=9485) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09 # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm OS support: =========== Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up3 - Redhat EL4 up4 kernel.org: - Kernel 2.6.17 Bug fixes from OFED-1.1-rc4: ========================== 1. ISER compilation fixed on SLES10 2. Fixed build on SLES9 PPC64 3. Updated libehca 4. OpenSM fixes 5. Added tavor_quirk option to rdma_cm module (disabled by default): Tavor performance quirk: limit MTU to 1K if > 0 (int) Known issues: ============= libipathverbs compilation fails on SLES10 (Bug:204) OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday or Tuesday. Regards, Vladimir > Hi, > > The plan is to issue OFED RC5 on Thursday 9/14 and final release next > week. I am aware of the following issues: > > > 1) Compilation on SLES9 on PPC - Jack Morgenstein > 2) Huge pages on PPC - Eli Cohen > 3) libipathverbs: - Qlogic > a) libipathverbs ABI issue > b) libipathverbs build on SLES10 > 4) SDP performance on Tavor - Michael Tsirkin > 5) iSER issue on SLES10 - Voltaire > > > In order to meet tomorrow's RC5 release all owners please send your > patches by end of today. > > > Regards, > > Aviram > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From gallen at arlut.utexas.edu Thu Sep 14 09:50:04 2006 From: gallen at arlut.utexas.edu (Greg Allen) Date: Thu, 14 Sep 2006 11:50:04 -0500 Subject: [openib-general] IB for FC5/x86_64 Message-ID: Is there a set of RPMs or SRPMs for FC5/x86_64? Even better, a yum server with them? I've tried generating them from the svn tree, but I keep getting hung up. I love the way it works in RHEL4, but the SATA controller in my new box is currently unsupported in RHEL4. Thanks, -Greg -- Gregory E. Allen, MSEE Engineering Scientist Applied Research Laboratories: The University of Texas at Austin Please help find my missing daughter: http://FindSabrina.org/ From mst at mellanox.co.il Thu Sep 14 10:15:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Sep 2006 20:15:30 +0300 Subject: [openib-general] IB for FC5/x86_64 In-Reply-To: References: Message-ID: <20060914171530.GB27318@mellanox.co.il> Quoting r. Greg Allen : > Subject: IB for FC5/x86_64 > > Is there a set of RPMs or SRPMs for FC5/x86_64? Even better, a yum > server with them? I've tried generating them from the svn tree, but I > keep getting hung up. > > I love the way it works in RHEL4, but the SATA controller in my new > box is currently unsupported in RHEL4. > > Thanks, > -Greg Try OFED 1.1 RC. https://openib.org/svn/gen2/branches/1.1/ofed/releases/ -- MST From ralphc at pathscale.com Thu Sep 14 10:55:47 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Thu, 14 Sep 2006 10:55:47 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <4508F850.5050804@voltaire.com> References: <1158108010.8759.192.camel@brick.pathscale.com> <4507C8C2.6050206@voltaire.com> <1158172258.8759.230.camel@brick.pathscale.com> <4508F850.5050804@voltaire.com> Message-ID: <1158256547.8759.260.camel@brick.pathscale.com> On Thu, 2006-09-14 at 09:36 +0300, Or Gerlitz wrote: > Ralph Campbell wrote: > > On Wed, 2006-09-13 at 12:00 +0300, Or Gerlitz wrote: > >> Ralph Campbell wrote: > > > Well, the other parts of the kernel might not need a kernel virtual > > address but the ib_ipath driver still does. > > So you agree there is a need to kmap/kunamp pages which the user wants > to use with IB and are not mapped into the kernel virt address space? Yes, I agree for systems which have high memory pages. > > I don't understand what you are talking about. There is an IB > > wire protocol for RDMA, SEND, etc. That doesn't change depending > > on the HCA. > > The InfiniPath HCA has a ring buffer of receive buffers and all > > incoming IB packets are DMA'ed into one of these buffers. > > The ib_ipath software driver examines the packet and > > copies it to the appropriate address. For a packet received with > > a RC_RDMA_WRITE_FIRST, the RKEY and IB address are used to convert > > that into a kernel virtual address and the data is copied. > > The same happens for RC_SEND_FIRST but the KV address comes from > > the LKEY and address in the work request posted by ib_post_recv(). > > OK, this make sense. > > Lets see if i follow: you say that the Infinipath HCA is RX DMA-able but > it does RX DMA to the ipath driver private RX buffers and then the > driver copies from these buffers to the user buffer. My guess is that > you do that to support both recv and rdma read on this QP since if you > would only need to support recv you can have the hca dma-ing to the user > posted rx buffer. You mostly understand. The hardware doesn't have separate receive queues for each QP. All packets go into a single (or at most 4 currently) receive queues and the driver figures out which QP, RDMA memory region, etc. to copy them to. From ralphc at pathscale.com Thu Sep 14 12:43:39 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Thu, 14 Sep 2006 12:43:39 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <45093428.5010009@voltaire.com> References: <1158108010.8759.192.camel@brick.pathscale.com> <45093428.5010009@voltaire.com> Message-ID: <1158263019.8759.324.camel@brick.pathscale.com> On Thu, 2006-09-14 at 13:51 +0300, Or Gerlitz wrote: > Ralph Campbell wrote: > > +static inline dma_addr_t ib_dma_map_sg(struct ib_device *dev, > > + struct scatterlist *sg, int nents, > > + enum dma_data_direction direction) > > +{ > > + return dev->map_sg ? > > + dev->map_sg(dev, sg, nents, direction) : > > + dma_map_sg(dev->dma_device, sg, nents, direction); > > +} > > As SG dma mapping happens in place and you don't want to change struct > scatterlist for every arch, i think you would need to keep some mapping > (hash) from each struct scatterlist to its ipath buddy... > > Also you would need to implement the sg_dma_address() and sg_dma_len() > macros used by ULP code when page/s is/are to be input-ed for the IB > verbs layer eg to get an SG FMR-ed or send/recv from/into a page and use > queries into the ipath scatterlist buddy. > > Or. Here is my thinking so far: The driver is passed an LKEY/RKEY plus an address. For ib_get_dma_mr(), the address is currently from dma_map_single(), dma_map_page(), or dma_map_sg(). With the ib_dma_*() routines, I can intercept these calls and return something instead of a bus or IOMMU address. I would like to return a kernel virtual address since that is the simplest and is what I ultimately need. This is trivial for dma_map_single() and trivial for low memory pages for dma_map_page(). I think I can safely just return error for architectures with high memory pages since the driver really only works on 64-bit systems (for a variety of reasons which I won't go into) and those systems don't have high memory. If I did have to support high memory pages, I think the DMA address would have to be the address of some kmalloc'ed structure containing the page pointer and offset (or an index into a table of such data structures). I wouldn't want to make ib_dma_map_single() have to use that but then I would need a way to distingush addresses returned from ib_dma_map_single() and dma_map_page()/dma_map_sg(). ib_dma_map_sg() is a bit more complex. The struct scatterlist is defined in the architecture specific headers. The sg_dma_address() and sg_dma_len() macros are the exported interface for accessing the DMA address and length. I would like to minimize the impact to architecture specific code. Given these constraints, I think the best thing to do is add ib_sg_dma_address() and ib_sg_dma_len() functions which should be used instead of sg_dma_address() and sg_dma_len(). Struct scatterlist has to contain at least page, offset, and length since the SCSI code relies on those. ib_sg_dma_address would return the page_address() of sg->page but wouldn't be able to rely on other fields which might be in the struct scatterlist. Again, if high page support is needed, ib_sg_dma_address() would have to do something trickier like for ib_dma_map_page(). From robert.j.woodruff at intel.com Thu Sep 14 12:52:08 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 14 Sep 2006 12:52:08 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready Message-ID: Robert Walsh wrote, > > [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 > 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | > iters=10000 | duplex=0 | cma=0 | > 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 > VAddr 0x00002a95dd3480 > 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 > VAddr 0x00002a95c85480 > 4730:main: Completion with error at client: > 4730:main: Failed status 9: wr_id 3 > 4730:main: scnt=7584, ccnt=6584 > [woody at rkl-13 bin]$ >Hi Woody, Robert Walsh wrote, >When RC4 is available, there should be a patch in there that will fix >this. Can you let us know if you continue to see problems? >Regards, > Robert. I installed RC5 and now it just hangs, [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 4702: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | iters=10000 | duplex=0 | cma=0 | 4702: Local address: LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200 VAddr 0x00002a95dc8480 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200 VAddr 0x00002a95c7c480 hangs here and have to cntrl-c the test. Intel MPI also fails with, # Barrier [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with error. status=0x8. cookie=0x514ee0 rank 1 in job 4 rkl-13_32779 caused collective abort of all ranks exit status of rank 1: killed by signal 9 woody From rjwalsh at pathscale.com Thu Sep 14 13:24:30 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Thu, 14 Sep 2006 13:24:30 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <4509BA7E.3060906@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I installed RC5 and now it just hangs, > > [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 > 4702: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | > iters=10000 | duplex=0 | cma=0 | > 4702: Local address: LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200 > VAddr 0x00002a95dc8480 > 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200 > VAddr 0x00002a95c7c480 > hangs here and have to cntrl-c the test. > > > Intel MPI also fails with, > # Barrier > [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with > error. status=0x8. cookie=0x514ee0 > rank 1 in job 4 rkl-13_32779 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 OK - thanks for the report - I'll look into it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQm6fvzvnpzTd9fxAQKmiggAhKyznnhzO3ndlYYJx58cSX8XK/R5WNz0 CVhrKxVtjhq+cYaP6HAC9HmwuhMm18vlHGmw8fvoiwrhYP1h7dxaVgiAt9dX2rRz svPd4rZnfIu+L9oZYmy7XBkfawwQR30IZPSUbfQDU1ag2r44HsnyZ6VpKucuHLfL jUFxryC2lmwAU6GhuTKJ8k7XEEQBL3UoczPfL/PTwpFVYvM8CjMgLjwhIfqH++Hv khciAfsl8HgK5Hd6jj1WCOzMyZmL7GBGrpTsia/hgUGOHkpmEC9wy3dSDZeIqCbI 4cs961Y2TIuciNraaLPbF4mhFFgaLJe4nzxSeTLfcbfxXraSqKbn9Q== =pWln -----END PGP SIGNATURE----- From rdreier at cisco.com Thu Sep 14 13:38:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 13:38:56 -0700 Subject: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept In-Reply-To: <45083222.9000005@ichips.intel.com> (Sean Hefty's message of "Wed, 13 Sep 2006 09:30:26 -0700") References: <45083222.9000005@ichips.intel.com> Message-ID: > Committed to svn 9461. Roland, can you also pull into 2.6.19? Done. I merge > 100 patches every kernel release. If I have to spend an extra 5 minutes creating a patch or pulling it out of svn, then I end up burning an extra day of stupid work. If 20+ people who contribute patches sent me clean patches, then everyone will be happier because I'll be able to merge things quicker and focus on productive work. Thanks, Roland From sean.hefty at intel.com Thu Sep 14 13:43:42 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 14 Sep 2006 13:43:42 -0700 Subject: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept In-Reply-To: Message-ID: <000201c6d83e$776c8350$97d8180a@amr.corp.intel.com> >I merge > 100 patches every kernel release. If I have to spend an >extra 5 minutes creating a patch or pulling it out of svn, then I end >up burning an extra day of stupid work. If 20+ people who contribute >patches sent me clean patches, then everyone will be happier because >I'll be able to merge things quicker and focus on productive work. Sorry about that. I was assuming that you could use Or's original patch directly. - Sean From rdreier at cisco.com Thu Sep 14 13:45:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 13:45:47 -0700 Subject: [openib-general] [PATCH] RDMA/cma: document error flow of rdma_accept In-Reply-To: <000201c6d83e$776c8350$97d8180a@amr.corp.intel.com> (Sean Hefty's message of "Thu, 14 Sep 2006 13:43:42 -0700") References: <000201c6d83e$776c8350$97d8180a@amr.corp.intel.com> Message-ID: Sean> Sorry about that. I was assuming that you could use Or's Sean> original patch directly. Then I have to track down the original email which isn't always easy either. Anyway don't take it personally, I just created that block for standard use now -- you'll probably see it again in other threads ;) - R. From rdreier at cisco.com Thu Sep 14 13:52:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 13:52:49 -0700 Subject: [openib-general] [PATCH] ipoib sendonly join In-Reply-To: <1158248878.18456.11.camel@localhost> (Eli cohen's message of "Thu, 14 Sep 2006 18:47:58 +0300") References: <1158248878.18456.11.camel@localhost> Message-ID: Thanks, applied From rdreier at cisco.com Thu Sep 14 13:59:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 13:59:48 -0700 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This contains a few last-minute fixes -- a couple of one-liners, and a panic fix that turns out to be pure deletions: Eli Cohen: IPoIB: Retry failed send-only multicast group joins Ishai Rabinovitz: IB/srp: Don't schedule reconnect from srp Michael S. Tsirkin: RDMA/cma: Increase the IB CM retry count in CMA drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 1 + drivers/infiniband/ulp/srp/ib_srp.c | 14 -------------- 3 files changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d6f99d5..5d625a8 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -49,7 +49,7 @@ MODULE_DESCRIPTION("Generic RDMA CM Agen MODULE_LICENSE("Dual BSD/GPL"); #define CMA_CM_RESPONSE_TIMEOUT 20 -#define CMA_MAX_CM_RETRIES 3 +#define CMA_MAX_CM_RETRIES 15 static void cma_add_one(struct ib_device *device); static void cma_remove_one(struct ib_device *device); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index b5e6a7b..ec356ce 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -326,6 +326,7 @@ ipoib_mcast_sendonly_join_complete(int s /* Clear the busy flag so we try again */ clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + mcast->query = NULL; } complete(&mcast->done); diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8257d5a..fd8344c 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -799,13 +799,6 @@ static void srp_process_rsp(struct srp_t spin_unlock_irqrestore(target->scsi_host->host_lock, flags); } -static void srp_reconnect_work(void *target_ptr) -{ - struct srp_target_port *target = target_ptr; - - srp_reconnect_target(target); -} - static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc) { struct srp_iu *iu; @@ -858,7 +851,6 @@ static void srp_completion(struct ib_cq { struct srp_target_port *target = target_ptr; struct ib_wc wc; - unsigned long flags; ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(cq, 1, &wc) > 0) { @@ -866,10 +858,6 @@ static void srp_completion(struct ib_cq printk(KERN_ERR PFX "failed %s status %d\n", wc.wr_id & SRP_OP_RECV ? "receive" : "send", wc.status); - spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state == SRP_TARGET_LIVE) - schedule_work(&target->work); - spin_unlock_irqrestore(target->scsi_host->host_lock, flags); break; } @@ -1705,8 +1693,6 @@ static ssize_t srp_create_target(struct target->scsi_host = target_host; target->srp_host = host; - INIT_WORK(&target->work, srp_reconnect_work, target); - INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); for (i = 0; i < SRP_SQ_SIZE; ++i) { From rdreier at cisco.com Thu Sep 14 14:00:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 14:00:47 -0700 Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish In-Reply-To: <20060913120154.GA23890@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 13 Sep 2006 15:01:54 +0300") References: <45073FF7.7020506@ichips.intel.com> <20060913120154.GA23890@mellanox.co.il> Message-ID: OK, I put this in 2.6.18 since I had a few other fixes that I thought should go into 2.6.18 too. It was a close call between merging this now or putting it into 2.6.19 and waiting for 2.6.18.1, but I don't think it matters much either way. - R. From rdreier at cisco.com Thu Sep 14 14:11:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 14:11:32 -0700 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: <20060914141901.GG25691@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 14 Sep 2006 17:19:01 +0300") References: <20060914141901.GG25691@mellanox.co.il> Message-ID: > Hmm, IPoIB does not seem to copy anything except the pkey. > Looks like a compliance issue. Well, the only MUST attributes we seem to be missing are HopLimit and MTU (P_Key, Q_Key and SL are all copied from the broadcast group when creating a new multicast group). > Specifically, I'm not sure what "other attributes" are, but I think > this should include the static rate. Right? I guess we should definitely do HopLimit and MTU, since those are MUSTs. The only other attribute we look at is Rate, so I guess we should set that also. - R. From sean.hefty at intel.com Thu Sep 14 16:17:33 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 14 Sep 2006 16:17:33 -0700 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ Message-ID: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> Currently a DREP is only sent in response to a DREQ if a connection has been found matching the DREQ, and it is in the proper state. Once a DREP is sent, the local connection moves into timewait. Duplicate DREQs received while in this state result in re-sending the DREP. However, it's likely that the local connection will enter and exit timewait before the remote side times out a lost DREP and resends a DREQ. There are a couple possible solutions to this. One is to increase how long a connection remains in timewait, by multiplying its wait time by max_cm_retries. This can greatly increase the timewait state before a QP can be re-used when CM messages are not lost. An alternative is to send a DREP in response to a DREQ, even if a local connection is not found, which is what this patch does. Signed-off-by: Sean Hefty --- Index: cm.c =================================================================== --- cm.c (revision 9490) +++ cm.c (working copy) @@ -1900,6 +1900,32 @@ out: spin_unlock_irqrestore(&cm_id_priv- } EXPORT_SYMBOL(ib_send_cm_drep); +static int cm_issue_drep(struct cm_port *port, + struct ib_mad_recv_wc *mad_recv_wc) +{ + struct ib_mad_send_buf *msg = NULL; + struct cm_dreq_msg *dreq_msg; + struct cm_drep_msg *drep_msg; + int ret; + + ret = cm_alloc_response_msg(port, mad_recv_wc, &msg); + if (ret) + return ret; + + dreq_msg = (struct cm_dreq_msg *) mad_recv_wc->recv_buf.mad; + drep_msg = (struct cm_drep_msg *) msg->mad; + + cm_format_mad_hdr(&drep_msg->hdr, CM_DREP_ATTR_ID, dreq_msg->hdr.tid); + drep_msg->remote_comm_id = dreq_msg->local_comm_id; + drep_msg->local_comm_id = dreq_msg->remote_comm_id; + + ret = ib_post_send_mad(msg, NULL); + if (ret) + cm_free_msg(msg); + + return ret; +} + static int cm_dreq_handler(struct cm_work *work) { struct cm_id_private *cm_id_priv; @@ -1911,8 +1937,10 @@ static int cm_dreq_handler(struct cm_wor dreq_msg = (struct cm_dreq_msg *)work->mad_recv_wc->recv_buf.mad; cm_id_priv = cm_acquire_id(dreq_msg->remote_comm_id, dreq_msg->local_comm_id); - if (!cm_id_priv) + if (!cm_id_priv) { + cm_issue_drep(work->port, work->mad_recv_wc); return -EINVAL; + } work->cm_event.private_data = &dreq_msg->private_data; From halr at voltaire.com Thu Sep 14 17:27:17 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2006 20:27:17 -0400 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: References: <20060914141901.GG25691@mellanox.co.il> Message-ID: <1158280030.25157.19154.camel@hal.voltaire.com> On Thu, 2006-09-14 at 17:11, Roland Dreier wrote: > > Hmm, IPoIB does not seem to copy anything except the pkey. > > Looks like a compliance issue. > > Well, the only MUST attributes we seem to be missing are HopLimit and MTU > (P_Key, Q_Key and SL are all copied from the broadcast group when > creating a new multicast group). Are HopLimit and MTU needed for join ? I thought it was fine to wildcard those for a join. If they are specified though, they do need to match the group. > > Specifically, I'm not sure what "other attributes" are, but I think > > this should include the static rate. Right? > > I guess we should definitely do HopLimit and MTU, since those are > MUSTs. Are you referring to create or join here ? > The only other attribute we look at is Rate, so I guess we > should set that also. The rate should be set properly (with the rate selector set to exactly) in the response regardless of whether it was set in the request (e.g. wildcarded or not). -- Hal > - R. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Thu Sep 14 17:35:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 17:35:54 -0700 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: <1158280030.25157.19154.camel@hal.voltaire.com> (Hal Rosenstock's message of "14 Sep 2006 20:27:17 -0400") References: <20060914141901.GG25691@mellanox.co.il> <1158280030.25157.19154.camel@hal.voltaire.com> Message-ID: Hal> Are you referring to create or join here ? The whole thing is about new groups that IPoIB creates. Currently we don't specify HopLimit, MTU or Rate, and the IPoIB RFC says we should. From rjwalsh at pathscale.com Thu Sep 14 17:39:29 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Thu, 14 Sep 2006 17:39:29 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <4509F641.5030302@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I installed RC5 and now it just hangs, Wow - we can't even get RC5 to build here. What distro are you running? I've tried this on RC4 + a fixed libipathverbs package and it runs OK (although it does take a while, which might explain the hang you were seeing.) But mostly I'm curious how you get RC5 to build at all. We really really really shouldn't be attempting to turn RC's around as fast as RC4 to RC5 went: we basically had about enough time to throw a patch together without being able to do much testing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQn2QPzvnpzTd9fxAQJFogf/fJidIu6UVaSTbGMyia66kgYrtrL5lvtr FcmyBI01SbjOUnd9rfejt0y1IeN+1O88wBBJBnQPSi3aRUmCufuGYRWM9T2ZXmw8 PxCLyN44AvyF/B6SUfwr8ygXcAQ2nJPvxfdpnEyFlTxBf5gatDg00YiSRu88NtxR 5DrDsK/8OSpy6j0lRVoB7hJh2cs74NhtXawvvzlmGBI4ZhoTmifNPSmPnXwMHJ7+ a4A+dK1cSqjLFUXDh6WPIM5OHS6bKbQeKQ3J4H+I99uK+5n3fb/9CP+Z/aZ3/JEG Qg9dfgsF4onKNBDsXPoGHjI1iU+FOghLFZCTvYXirkqXPgVsTAVK5A== =hwu5 -----END PGP SIGNATURE----- From halr at voltaire.com Thu Sep 14 17:37:47 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2006 20:37:47 -0400 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: References: <20060914141901.GG25691@mellanox.co.il> <1158280030.25157.19154.camel@hal.voltaire.com> Message-ID: <1158280653.25157.19569.camel@hal.voltaire.com> On Thu, 2006-09-14 at 20:35, Roland Dreier wrote: > Hal> Are you referring to create or join here ? > > The whole thing is about new groups that IPoIB creates. Currently we > don't specify HopLimit, MTU or Rate, and the IPoIB RFC says we should. That indeed is true for create. However, send only members can never create a group (only full members can do this). Am I confusing this with a different patch which went by ? -- Hal From rdreier at cisco.com Thu Sep 14 17:52:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 17:52:16 -0700 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: <1158280653.25157.19569.camel@hal.voltaire.com> (Hal Rosenstock's message of "14 Sep 2006 20:37:47 -0400") References: <20060914141901.GG25691@mellanox.co.il> <1158280030.25157.19154.camel@hal.voltaire.com> <1158280653.25157.19569.camel@hal.voltaire.com> Message-ID: Hal> That indeed is true for create. However, send only members Hal> can never create a group (only full members can do this). Am Hal> I confusing this with a different patch which went by ? Yes, I think so. Look back to the beginning of this thread for the initial report of the problem. - R. From halr at voltaire.com Thu Sep 14 17:52:31 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2006 20:52:31 -0400 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: References: <20060914141901.GG25691@mellanox.co.il> <1158280030.25157.19154.camel@hal.voltaire.com> <1158280653.25157.19569.camel@hal.voltaire.com> Message-ID: <1158281542.25157.20107.camel@hal.voltaire.com> On Thu, 2006-09-14 at 20:52, Roland Dreier wrote: > Hal> That indeed is true for create. However, send only members > Hal> can never create a group (only full members can do this). Am > Hal> I confusing this with a different patch which went by ? > > Yes, I think so. Look back to the beginning of this thread for the > initial report of the problem. I see now. It does refer to full members. I know this used to be right (with perhaps the exception of rate which was what the email referred to). Is my memory wrong ? -- Hal From rjwalsh at pathscale.com Thu Sep 14 18:22:21 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Thu, 14 Sep 2006 18:22:21 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: References: Message-ID: <450A004D.9090804@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Woodruff, Robert J wrote: > Robert Walsh wrote, >> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 >> 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | >> iters=10000 | duplex=0 | cma=0 | >> 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey > 0x2302400 >> VAddr 0x00002a95dd3480 >> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey > 0x2402500 >> VAddr 0x00002a95c85480 >> 4730:main: Completion with error at client: >> 4730:main: Failed status 9: wr_id 3 >> 4730:main: scnt=7584, ccnt=6584 >> [woody at rkl-13 bin]$ > >> Hi Woody, > Robert Walsh wrote, >> When RC4 is available, there should be a patch in there that will fix >> this. Can you let us know if you continue to see problems? > >> Regards, >> Robert. > > I installed RC5 and now it just hangs, > > [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 > 4702: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | > iters=10000 | duplex=0 | cma=0 | > 4702: Local address: LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200 > VAddr 0x00002a95dc8480 > 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200 > VAddr 0x00002a95c7c480 > hangs here and have to cntrl-c the test. > > > Intel MPI also fails with, > # Barrier > [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with > error. status=0x8. cookie=0x514ee0 > rank 1 in job 4 rkl-13_32779 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 Hi Woody, So, we built everything using RC5 plus the libipathverbs from subversion and we were successfully able to run ib_rdma_bw (with your arguments above) and Intel MPI (a simple MPI hello world program). I'm going to continue testing with the Intel MPI testsuite and some applications ISV applications. I'll keep you informed. Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQoATfzvnpzTd9fxAQLUKQf9E1ps9XbbXplMm6+5O/XDdlWF0BQws1SC L/aGygh34fZSkpGmCrfze3HhsaOqasu9gUOsJQ89jX6pKNkv4tJAxSJCr+n+bdG3 21Bqr9gcM0MbzrDvOcUDHqvnmC0THlCf0XhikjKg/FJR1e48BIiAOFUzfi0VvI36 G1ZtD8xZXydOfWq7Z4xvyf9Y3qNPIeSKR2JZGJQoGHjxY4+vcteK0UVHfic1Bgpy 9uql47af6tncN+CazYcwf8xnHegiDr34iEEre5wUz//Qy62j8JNPnxhit0W9lXij zFszTkOHQeibxbFWi9ZRyigTmHanxxRUuznW54NL8NIF30jhnmcksQ== =06gu -----END PGP SIGNATURE----- From rdreier at cisco.com Thu Sep 14 19:19:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Sep 2006 19:19:00 -0700 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: <1158281542.25157.20107.camel@hal.voltaire.com> (Hal Rosenstock's message of "14 Sep 2006 20:52:31 -0400") References: <20060914141901.GG25691@mellanox.co.il> <1158280030.25157.19154.camel@hal.voltaire.com> <1158280653.25157.19569.camel@hal.voltaire.com> <1158281542.25157.20107.camel@hal.voltaire.com> Message-ID: Hal> I see now. It does refer to full members. I know this used to Hal> be right (with perhaps the exception of rate which was what Hal> the email referred to). Is my memory wrong ? I think that IPoIB has always used the attributes required by the IBA spec to create a multicast group, but not all the attributes required by the IPoIB spec. - R. From sweitzen at cisco.com Thu Sep 14 21:39:59 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 14 Sep 2006 21:39:59 -0700 Subject: [openib-general] [openfabrics-ewg] [PATCH] OFED 1.1-rc3 is ready Message-ID: > > I installed RC5 and now it just hangs, > > Wow - we can't even get RC5 to build here. What distro are > you running? > > I've tried this on RC4 + a fixed libipathverbs package and it runs OK > (although it does take a while, which might explain the hang you were > seeing.) > > But mostly I'm curious how you get RC5 to build at all. > > We really really really shouldn't be attempting to turn RC's around as > fast as RC4 to RC5 went: we basically had about enough time to throw a > patch together without being able to do much testing. I think many of us are in agreement, before RC6 I propose we only check in critical work on the release branch, and get some time in to thoroughly test RC5. Non-critical fixes can wait until after 1.1. I personally would like a week to test RC5. I feel like we had forgotten what the E in OFED stands for, if we have to slip the release schedule to make this code really stable I'm in favor of it. Scott From mst at mellanox.co.il Thu Sep 14 22:03:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 15 Sep 2006 08:03:50 +0300 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <450A004D.9090804@pathscale.com> References: <450A004D.9090804@pathscale.com> Message-ID: <20060915050349.GD24221@mellanox.co.il> Well, it looks like the libipathverbs that went into 1.1 branch was botched. How come? Please note that Mellanox for one is unable to test libipathverbs at all. libipathverbs maintainers, please, try to fix by Sunday. And please, test the changes before you commit them. Quoting r. Robert Walsh : Subject: Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Woodruff, Robert J wrote: > Robert Walsh wrote, >> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 >> 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | >> iters=10000 | duplex=0 | cma=0 | >> 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey > 0x2302400 >> VAddr 0x00002a95dd3480 >> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey > 0x2402500 >> VAddr 0x00002a95c85480 >> 4730:main: Completion with error at client: >> 4730:main: Failed status 9: wr_id 3 >> 4730:main: scnt=7584, ccnt=6584 >> [woody at rkl-13 bin]$ > >> Hi Woody, > Robert Walsh wrote, >> When RC4 is available, there should be a patch in there that will fix >> this. Can you let us know if you continue to see problems? > >> Regards, >> Robert. > > I installed RC5 and now it just hangs, > > [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12 > 4702: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 | > iters=10000 | duplex=0 | cma=0 | > 4702: Local address: LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200 > VAddr 0x00002a95dc8480 > 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200 > VAddr 0x00002a95c7c480 > hangs here and have to cntrl-c the test. > > > Intel MPI also fails with, > # Barrier > [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with > error. status=0x8. cookie=0x514ee0 > rank 1 in job 4 rkl-13_32779 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 Hi Woody, So, we built everything using RC5 plus the libipathverbs from subversion and we were successfully able to run ib_rdma_bw (with your arguments above) and Intel MPI (a simple MPI hello world program). I'm going to continue testing with the Intel MPI testsuite and some applications ISV applications. I'll keep you informed. Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQoATfzvnpzTd9fxAQLUKQf9E1ps9XbbXplMm6+5O/XDdlWF0BQws1SC L/aGygh34fZSkpGmCrfze3HhsaOqasu9gUOsJQ89jX6pKNkv4tJAxSJCr+n+bdG3 21Bqr9gcM0MbzrDvOcUDHqvnmC0THlCf0XhikjKg/FJR1e48BIiAOFUzfi0VvI36 G1ZtD8xZXydOfWq7Z4xvyf9Y3qNPIeSKR2JZGJQoGHjxY4+vcteK0UVHfic1Bgpy 9uql47af6tncN+CazYcwf8xnHegiDr34iEEre5wUz//Qy62j8JNPnxhit0W9lXij zFszTkOHQeibxbFWi9ZRyigTmHanxxRUuznW54NL8NIF30jhnmcksQ== =06gu -----END PGP SIGNATURE----- _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg -- MST From mst at mellanox.co.il Thu Sep 14 22:09:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 15 Sep 2006 08:09:53 +0300 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <4509F641.5030302@pathscale.com> References: <4509F641.5030302@pathscale.com> Message-ID: <20060915050953.GE24221@mellanox.co.il> Quoting r. Robert Walsh : > Subject: Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > I installed RC5 and now it just hangs, > > Wow - we can't even get RC5 to build here. What distro are you running? > > I've tried this on RC4 + a fixed libipathverbs package and it runs OK > (although it does take a while, which might explain the hang you were > seeing.) > > But mostly I'm curious how you get RC5 to build at all. > > We really really really shouldn't be attempting to turn RC's around as > fast as RC4 to RC5 went: we basically had about enough time to throw a > patch together without being able to do much testing. Changes are expected to be tested before you commit. This is really maintainer's responsibility, please take it seriously. -- MST From rjwalsh at pathscale.com Thu Sep 14 23:04:55 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Thu, 14 Sep 2006 23:04:55 -0700 Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready In-Reply-To: <20060915050953.GE24221@mellanox.co.il> References: <4509F641.5030302@pathscale.com> <20060915050953.GE24221@mellanox.co.il> Message-ID: <450A4287.1070309@pathscale.com> > Changes are expected to be tested before you commit. > This is really maintainer's responsibility, please take it seriously. I have to take exception here. It's only possible for us to make a serious attempt at doing something like this if OFED takes a more serious approach to the idea of what a "release candidate" is. Throwing out an RC in one day was not a good idea; nor was changing an API in the middle of the process. We're lucky that's all that broke. Regards, Robert. From thomas.bub at thomson.net Thu Sep 14 23:24:19 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Fri, 15 Sep 2006 08:24:19 +0200 Subject: [openib-general] Any chance to get 32-Bit libraries on SLES9 x86_64? Message-ID: Is there any chance/trick to get 32-Bit Libraries build and usable on SLES9 x86_64? When I installed OFED-1.1-rc4 I get: WARNING: sysfsutils 32-bit version is required to build 32-bit libibverbs package. WARNING: Skiping build of 32-bit libraries. I googled around and didn't find any sysfsutils 32-bit for SLES9. I now that tit is working under SLES10 b ut our customer base is on SLES9 and very conservative when it comes down to using the latest and greates Os/distribution. Thomas ............................................................ Thomas Bub Grass Valley Germany GmbH Brunnenweg 9 64331 Weiterstadt, Germany Tel: +49 6150 104 147 Fax: +49 6150 104 656 Email: Thomas.Bub at thomson.net www.GrassValley.com ............................................................ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Fri Sep 15 00:24:51 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Fri, 15 Sep 2006 00:24:51 -0700 (PDT) Subject: [openib-general] [Bug 222] ib_uverbs fails to load on ia64, OFED 1.1 - rc3 Message-ID: <20060915072451.2C0652283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=222 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED ------- Comment #2 from sweitzen at cisco.com 2006-09-15 00:24 ------- Now loads OK on RHEL4 U3 ia64 using OFED 1.1 rc4. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From thomas.bub at thomson.net Fri Sep 15 02:37:54 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Fri, 15 Sep 2006 11:37:54 +0200 Subject: [openib-general] What can be the reason for VAPI_WR_FLUSH_ERR when sending from gen2 to gen1 Message-ID: This seems to be the very last little bug in my journey migrating from gen1 client and server to gen2 client and gen1 server. While I came over all CM issues I had so far (thanks to Sean Heafty) I'm now in the situation that I have a gen2 client connected to a gen1 server via CM. Unfortunately the first IBV_WR_SEND causes a: (syndrome=0xf9=VAPI_WR_FLUSH_ERR , opcode=6=VAPI_CQE_RQ_SEND_DATA) error in the receive completion queue of the server. Doing the CM connection and the first send in the opposite direction from gen1 client to gen2 server it is OK. Needless to say that connection and send between gen1 <-> gen1 and gen2<->gen2 is OK as well. I copied Erez Cohen from Mellanox as well. Maybe someone can explain me in more detail what the error is about and how to avoid it. Thanks Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Fri Sep 15 04:45:35 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 15 Sep 2006 14:45:35 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices Message-ID: <86y7sle4kg.fsf@mtl066.yok.mtl.com> Hi Hal The following patch solves an issue with OpenSM preferring largest MTU for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) devices instead of using a 1K MTU which is best for this device. Since this is a device specific quirk I have added a configuration option named enable_quirks which is FALSE by default to enable this functionality. To summarize the functionality change: 1. Added enable_quirks option 2. If enable_quirks is FALSE do nothing 3. If a specific MTU is requested (either =2K or >1K) do nothing 4. If either source port or destination port is a Tavor device MTU is limited to 1K (can be further reduced by path traversal) Target is both trunk and OFED 1.1 Thanks Eitan Signed-off-by: Eitan Zahavi Index: include/opensm/osm_subnet.h =================================================================== --- include/opensm/osm_subnet.h (revision 9493) +++ include/opensm/osm_subnet.h (working copy) @@ -286,6 +286,7 @@ typedef struct _osm_subn_opt osm_qos_options_t qos_sw0_options; osm_qos_options_t qos_swe_options; osm_qos_options_t qos_rtr_options; + boolean_t enable_quirks; } osm_subn_opt_t; /* * FIELDS @@ -469,6 +470,10 @@ typedef struct _osm_subn_opt * qos_rtr_options * QoS options for router ports * +* enable_quirks +* Enable high risk new features and not fully qualified +* hardware specific work arounds +* * SEE ALSO * Subnet object *********/ Index: include/opensm/osm_base.h =================================================================== --- include/opensm/osm_base.h (revision 9493) +++ include/opensm/osm_base.h (working copy) @@ -778,6 +778,34 @@ typedef enum _osm_mcast_req_type #define MAX_UPDN_GUID_FILE_LINE_LENGTH 120 /**********/ +/****s* OpenSM: Base/VendorOUIs +* NAME +* VendorOUIs +* +* DESCRIPTION +* Known device vendor ID and GUID OUIs +* +* SYNOPSIS +*/ +#define OSM_VENDOR_ID_INTEL 0x00D0B7 +#define OSM_VENDOR_ID_MELLANOX 0x0002C9 +#define OSM_VENDOR_ID_REDSWITCH 0x000617 +#define OSM_VENDOR_ID_SILVERSTORM 0x00066A +#define OSM_VENDOR_ID_TOPSPIN 0x0005AD +#define OSM_VENDOR_ID_FUJITSU 0x00E000 +#define OSM_VENDOR_ID_FUJITSU2 0x000B5D +#define OSM_VENDOR_ID_VOLTAIRE 0x0008F1 +#define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 +#define OSM_VENDOR_ID_PATHSCALE 0x001175 +#define OSM_VENDOR_ID_IBM 0x000255 +#define OSM_VENDOR_ID_DIVERGENET 0x00084E +#define OSM_VENDOR_ID_FLEXTRONICS 0x000B8C +#define OSM_VENDOR_ID_AGILENT 0x0030D3 +#define OSM_VENDOR_ID_OBSIDIAN 0x001777 +#define OSM_VENDOR_ID_BAYMICRO 0x000BC1 +#define OSM_VENDOR_ID_LSILOGIC 0x00A0B8 +/**********/ + END_C_DECLS #endif /* _OSM_BASE_H_ */ Index: opensm/osm_sa_multipath_record.c =================================================================== --- opensm/osm_sa_multipath_record.c (revision 9493) +++ opensm/osm_sa_multipath_record.c (working copy) @@ -150,6 +150,75 @@ osm_mpr_rcv_init( /********************************************************************** **********************************************************************/ +static inline boolean_t +__osm_sa_multipath_rec_is_tavor_port( + IN const osm_port_t* const p_port) +{ + osm_node_t const* p_node; + ib_net32_t vend_id; + + p_node = osm_port_get_parent_node( p_port ); + vend_id = ib_node_info_get_vendor_id( &p_node->node_info ); + + return( (p_node->node_info.device_id == CL_HTON16(23108)) && + ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || + (vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || + (vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || + (vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE)))); +} + +/********************************************************************** + **********************************************************************/ +boolean_t + __osm_sa_multipath_rec_apply_tavor_mtu_limit( + IN const ib_multipath_rec_t* const p_mpr, + IN const osm_port_t* const p_src_port, + IN const osm_port_t* const p_dest_port, + IN const ib_net64_t comp_mask) +{ + uint8_t required_mtu; + + /* only if one of the ports is a Tavor device */ + if (! __osm_sa_multipath_rec_is_tavor_port(p_src_port) && + ! __osm_sa_multipath_rec_is_tavor_port(p_dest_port) ) + return( FALSE ); + + /* + we can apply the patch if either: + 1. No MTU required + 2. Required MTU < + 3. Required MTU = 1K or 512 or 256 + 4. Required MTU > 256 or 512 + */ + required_mtu = ib_multipath_rec_mtu( p_mpr ); + if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && + ( comp_mask & IB_PR_COMPMASK_MTU ) ) + { + switch( ib_multipath_rec_mtu_sel( p_mpr ) ) + { + case 0: /* must be greater than */ + case 2: /* exact match */ + if( IB_MTU_LEN_1024 < required_mtu ) + return(FALSE); + break; + + case 1: /* must be less than */ + case 3: /* largest available */ + /* can't be disqualified by this one */ + break; + + default: + /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ + CL_ASSERT( FALSE ); + break; + } + } + + return(TRUE); +} + +/********************************************************************** + **********************************************************************/ static ib_api_status_t __osm_mpr_rcv_get_path_parms( IN osm_mpr_rcv_t* const p_rcv, @@ -195,6 +264,23 @@ __osm_mpr_rcv_get_path_parms( mtu = ib_port_info_get_mtu_cap( p_pi ); rate = ib_port_info_compute_rate( p_pi ); + /* + Mellanox Tavor device performance is better using 1K MTU. + If required MTU and MTU selector are such that 1K is OK + and one of the ends of the path is Tavor we override the + port MTU with 1K. + */ + if ( p_rcv->p_subn->opt.enable_quirks && + __osm_sa_multipath_rec_apply_tavor_mtu_limit( + p_mpr, p_src_port, p_dest_port, comp_mask) ) + if (mtu > IB_MTU_LEN_1024) + { + mtu = IB_MTU_LEN_1024; + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_path_parms: " + "Optimized Path MTU to 1K for Mellanox Tavor device\n"); + } + if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC && cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) ) required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp ); Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 9493) +++ opensm/osm_subnet.c (working copy) @@ -494,6 +494,7 @@ osm_subn_set_default_opt( p_opt->ucast_dump_file = NULL; p_opt->updn_guid_file = NULL; p_opt->exit_on_fatal = TRUE; + p_opt->enable_quirks = FALSE; subn_set_default_qos_options(&p_opt->qos_options); subn_set_default_qos_options(&p_opt->qos_ca_options); subn_set_default_qos_options(&p_opt->qos_sw0_options); @@ -979,6 +980,10 @@ osm_subn_parse_conf_file( subn_parse_qos_options("qos_rtr", p_key, p_val, &p_opts->qos_rtr_options); + __osm_subn_opts_unpack_boolean( + "enable_quirks", + p_key, p_val, &p_opts->enable_quirks); + } } fclose(opts_file); @@ -1179,11 +1184,15 @@ osm_subn_write_conf_file( "force_log_flush %s\n\n" "# Log file to be used\n" "log_file %s\n\n" + "# Limit the the size of the log file. If overrun log is restarted\n" "log_max_size %lu\n\n" + "# If TRUE will accumulate the log over multiple OpenSM sessions\n" "accum_log_file %s\n\n" "# The directory to hold the file OpenSM dumps\n" "dump_files_dir %s\n\n" - "# If TRUE if OpenSM should disable multicast support\n" + "# If TRUE enables new high risk options and hardware specific quirks\n" + "enable_quirks %s\n\n" + "# If TRUE OpenSM should disable multicast support\n" "no_multicast_option %s\n\n" "# No multicast routing is performed if TRUE\n" "disable_multicast %s\n\n" @@ -1195,6 +1204,7 @@ osm_subn_write_conf_file( p_opts->log_max_size, p_opts->accum_log_file ? "TRUE" : "FALSE", p_opts->dump_files_dir, + p_opts->enable_quirks ? "TRUE" : "FALSE", p_opts->no_multicast_option ? "TRUE" : "FALSE", p_opts->disable_multicast ? "TRUE" : "FALSE", p_opts->exit_on_fatal ? "TRUE" : "FALSE" Index: opensm/osm_helper.c =================================================================== --- opensm/osm_helper.c (revision 9493) +++ opensm/osm_helper.c (working copy) @@ -2289,24 +2289,6 @@ osm_get_node_type_str_fixed_width( return( __osm_node_type_str_fixed_width[node_type] ); } -#define OSM_VENDOR_ID_INTEL 0x00D0B7 -#define OSM_VENDOR_ID_MELLANOX 0x0002C9 -#define OSM_VENDOR_ID_REDSWITCH 0x000617 -#define OSM_VENDOR_ID_SILVERSTORM 0x00066A -#define OSM_VENDOR_ID_TOPSPIN 0x0005AD -#define OSM_VENDOR_ID_FUJITSU 0x00E000 -#define OSM_VENDOR_ID_FUJITSU2 0x000B5D -#define OSM_VENDOR_ID_VOLTAIRE 0x0008F1 -#define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 -#define OSM_VENDOR_ID_PATHSCALE 0x001175 -#define OSM_VENDOR_ID_IBM 0x000255 -#define OSM_VENDOR_ID_DIVERGENET 0x00084E -#define OSM_VENDOR_ID_FLEXTRONICS 0x000B8C -#define OSM_VENDOR_ID_AGILENT 0x0030D3 -#define OSM_VENDOR_ID_OBSIDIAN 0x001777 -#define OSM_VENDOR_ID_BAYMICRO 0x000BC1 -#define OSM_VENDOR_ID_LSILOGIC 0x00A0B8 - /********************************************************************** **********************************************************************/ const char* Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 9493) +++ opensm/osm_sa_path_record.c (working copy) @@ -57,6 +57,7 @@ #include #include #include +#include #include #include #include @@ -150,6 +151,75 @@ osm_pr_rcv_init( /********************************************************************** **********************************************************************/ +static inline boolean_t +__osm_sa_path_rec_is_tavor_port( + IN const osm_port_t* const p_port) +{ + osm_node_t const* p_node; + ib_net32_t vend_id; + + p_node = osm_port_get_parent_node( p_port ); + vend_id = ib_node_info_get_vendor_id( &p_node->node_info ); + + return( (p_node->node_info.device_id == CL_HTON16(23108)) && + ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || + (vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || + (vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || + (vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE)))); +} + +/********************************************************************** + **********************************************************************/ +static boolean_t + __osm_sa_path_rec_apply_tavor_mtu_limit( + IN const ib_path_rec_t* const p_pr, + IN const osm_port_t* const p_src_port, + IN const osm_port_t* const p_dest_port, + IN const ib_net64_t comp_mask) +{ + uint8_t required_mtu; + + /* only if one of the ports is a Tavor device */ + if (! __osm_sa_path_rec_is_tavor_port(p_src_port) && + ! __osm_sa_path_rec_is_tavor_port(p_dest_port) ) + return( FALSE ); + + /* + we can apply the patch if either: + 1. No MTU required + 2. Required MTU < + 3. Required MTU = 1K or 512 or 256 + 4. Required MTU > 256 or 512 + */ + required_mtu = ib_path_rec_mtu( p_pr ); + if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && + ( comp_mask & IB_PR_COMPMASK_MTU ) ) + { + switch( ib_path_rec_mtu_sel( p_pr ) ) + { + case 0: /* must be greater than */ + case 2: /* exact match */ + if( IB_MTU_LEN_1024 < required_mtu ) + return(FALSE); + break; + + case 1: /* must be less than */ + case 3: /* largest available */ + /* can't be disqualified by this one */ + break; + + default: + /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ + CL_ASSERT( FALSE ); + break; + } + } + + return(TRUE); +} + +/********************************************************************** + **********************************************************************/ static ib_api_status_t __osm_pr_rcv_get_path_parms( IN osm_pr_rcv_t* const p_rcv, @@ -191,6 +261,23 @@ __osm_pr_rcv_get_path_parms( mtu = ib_port_info_get_mtu_cap( p_pi ); rate = ib_port_info_compute_rate( p_pi ); + /* + Mellanox Tavor device performance is better using 1K MTU. + If required MTU and MTU selector are such that 1K is OK + and one of the ends of the path is Tavor we override the + port MTU with 1K. + */ + if ( p_rcv->p_subn->opt.enable_quirks && + __osm_sa_path_rec_apply_tavor_mtu_limit( + p_pr, p_src_port, p_dest_port, comp_mask) ) + if (mtu > IB_MTU_LEN_1024) + { + mtu = IB_MTU_LEN_1024; + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_pr_rcv_get_path_parms: " + "Optimized Path MTU to 1K for Mellanox Tavor device\n"); + } + /* Walk the subnet object from source to destination, tracking the most restrictive rate and mtu values along the way... @@ -444,10 +531,10 @@ __osm_pr_rcv_get_path_parms( */ /* we silently ignore cases where only the MTU selector is defined */ + required_mtu = ib_path_rec_mtu( p_pr ); if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && ( comp_mask & IB_PR_COMPMASK_MTU ) ) { - required_mtu = ib_path_rec_mtu( p_pr ); switch( ib_path_rec_mtu_sel( p_pr ) ) { case 0: /* must be greater than */ From mst at mellanox.co.il Fri Sep 15 08:29:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 15 Sep 2006 18:29:17 +0300 Subject: [openib-general] Any chance to get 32-Bit libraries on SLES9 x86_64? In-Reply-To: References: Message-ID: <20060915152917.GC25880@mellanox.co.il> Quoting r. Bub Thomas : > Subject: Any chance to get 32-Bit libraries on SLES9 x86_64? > > Is there any chance/trick to get 32-Bit Libraries build and usable on SLES9 x86_64? > > When I installed OFED-1.1-rc4 I get: > > > > WARNING: sysfsutils 32-bit version is required to build 32-bit libibverbs package. > > WARNING: Skiping build of 32-bit libraries. > > I googled around and didn't find any sysfsutils 32-bit for SLES9. > > I now that tit is working under SLES10 b ut our customer base is on SLES9 and very conservative when it comes down to using the latest and greates Os/distribution. > > Thomas > Well, you need 32 bit of libsysfs otehrwise nothing will work. -- MST From sashak at voltaire.com Fri Sep 15 15:17:09 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 16 Sep 2006 01:17:09 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <86y7sle4kg.fsf@mtl066.yok.mtl.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> Message-ID: <20060915221709.GB5891@sashak.voltaire.com> Hi Eitan, Some comments about the patch. On 14:45 Fri 15 Sep , Eitan Zahavi wrote: > Hi Hal > > The following patch solves an issue with OpenSM preferring largest MTU > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) > devices instead of using a 1K MTU which is best for this device. > > Since this is a device specific quirk I have added a configuration option > named enable_quirks which is FALSE by default to enable this functionality. > > To summarize the functionality change: > 1. Added enable_quirks option > 2. If enable_quirks is FALSE do nothing I see those quirks are SA specific. Then should this option be called 'enable_sa_quirks' instead? > 3. If a specific MTU is requested (either =2K or >1K) do nothing > 4. If either source port or destination port is a Tavor device > MTU is limited to 1K (can be further reduced by path traversal) > > Target is both trunk and OFED 1.1 > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi > > Index: include/opensm/osm_subnet.h > =================================================================== > --- include/opensm/osm_subnet.h (revision 9493) > +++ include/opensm/osm_subnet.h (working copy) > @@ -286,6 +286,7 @@ typedef struct _osm_subn_opt > osm_qos_options_t qos_sw0_options; > osm_qos_options_t qos_swe_options; > osm_qos_options_t qos_rtr_options; > + boolean_t enable_quirks; > } osm_subn_opt_t; > /* > * FIELDS > @@ -469,6 +470,10 @@ typedef struct _osm_subn_opt > * qos_rtr_options > * QoS options for router ports > * > +* enable_quirks > +* Enable high risk new features and not fully qualified > +* hardware specific work arounds > +* > * SEE ALSO > * Subnet object > *********/ > Index: include/opensm/osm_base.h > =================================================================== > --- include/opensm/osm_base.h (revision 9493) > +++ include/opensm/osm_base.h (working copy) > @@ -778,6 +778,34 @@ typedef enum _osm_mcast_req_type > #define MAX_UPDN_GUID_FILE_LINE_LENGTH 120 > /**********/ > > +/****s* OpenSM: Base/VendorOUIs > +* NAME > +* VendorOUIs > +* > +* DESCRIPTION > +* Known device vendor ID and GUID OUIs > +* > +* SYNOPSIS > +*/ > +#define OSM_VENDOR_ID_INTEL 0x00D0B7 > +#define OSM_VENDOR_ID_MELLANOX 0x0002C9 > +#define OSM_VENDOR_ID_REDSWITCH 0x000617 > +#define OSM_VENDOR_ID_SILVERSTORM 0x00066A > +#define OSM_VENDOR_ID_TOPSPIN 0x0005AD > +#define OSM_VENDOR_ID_FUJITSU 0x00E000 > +#define OSM_VENDOR_ID_FUJITSU2 0x000B5D > +#define OSM_VENDOR_ID_VOLTAIRE 0x0008F1 > +#define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 > +#define OSM_VENDOR_ID_PATHSCALE 0x001175 > +#define OSM_VENDOR_ID_IBM 0x000255 > +#define OSM_VENDOR_ID_DIVERGENET 0x00084E > +#define OSM_VENDOR_ID_FLEXTRONICS 0x000B8C > +#define OSM_VENDOR_ID_AGILENT 0x0030D3 > +#define OSM_VENDOR_ID_OBSIDIAN 0x001777 > +#define OSM_VENDOR_ID_BAYMICRO 0x000BC1 > +#define OSM_VENDOR_ID_LSILOGIC 0x00A0B8 > +/**********/ > + > END_C_DECLS > > #endif /* _OSM_BASE_H_ */ > Index: opensm/osm_sa_multipath_record.c > =================================================================== > --- opensm/osm_sa_multipath_record.c (revision 9493) > +++ opensm/osm_sa_multipath_record.c (working copy) > @@ -150,6 +150,75 @@ osm_mpr_rcv_init( > > /********************************************************************** > **********************************************************************/ > +static inline boolean_t > +__osm_sa_multipath_rec_is_tavor_port( > + IN const osm_port_t* const p_port) > +{ > + osm_node_t const* p_node; > + ib_net32_t vend_id; > + > + p_node = osm_port_get_parent_node( p_port ); > + vend_id = ib_node_info_get_vendor_id( &p_node->node_info ); > + > + return( (p_node->node_info.device_id == CL_HTON16(23108)) && > + ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || > + (vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || > + (vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || > + (vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE)))); > +} > + > +/********************************************************************** > + **********************************************************************/ > +boolean_t > + __osm_sa_multipath_rec_apply_tavor_mtu_limit( > + IN const ib_multipath_rec_t* const p_mpr, > + IN const osm_port_t* const p_src_port, > + IN const osm_port_t* const p_dest_port, > + IN const ib_net64_t comp_mask) > +{ > + uint8_t required_mtu; > + > + /* only if one of the ports is a Tavor device */ > + if (! __osm_sa_multipath_rec_is_tavor_port(p_src_port) && > + ! __osm_sa_multipath_rec_is_tavor_port(p_dest_port) ) > + return( FALSE ); > + > + /* > + we can apply the patch if either: > + 1. No MTU required > + 2. Required MTU < > + 3. Required MTU = 1K or 512 or 256 > + 4. Required MTU > 256 or 512 > + */ > + required_mtu = ib_multipath_rec_mtu( p_mpr ); > + if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > + ( comp_mask & IB_PR_COMPMASK_MTU ) ) Should here be IB_MPR_COMPMASK_* instead of IB_PR_COMPMASK_*? > + { > + switch( ib_multipath_rec_mtu_sel( p_mpr ) ) > + { > + case 0: /* must be greater than */ > + case 2: /* exact match */ > + if( IB_MTU_LEN_1024 < required_mtu ) > + return(FALSE); > + break; > + > + case 1: /* must be less than */ > + case 3: /* largest available */ > + /* can't be disqualified by this one */ > + break; > + > + default: > + /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ > + CL_ASSERT( FALSE ); > + break; > + } > + } > + > + return(TRUE); > +} > + > +/********************************************************************** > + **********************************************************************/ > static ib_api_status_t > __osm_mpr_rcv_get_path_parms( > IN osm_mpr_rcv_t* const p_rcv, > @@ -195,6 +264,23 @@ __osm_mpr_rcv_get_path_parms( > mtu = ib_port_info_get_mtu_cap( p_pi ); > rate = ib_port_info_compute_rate( p_pi ); > > + /* > + Mellanox Tavor device performance is better using 1K MTU. > + If required MTU and MTU selector are such that 1K is OK > + and one of the ends of the path is Tavor we override the > + port MTU with 1K. > + */ > + if ( p_rcv->p_subn->opt.enable_quirks && > + __osm_sa_multipath_rec_apply_tavor_mtu_limit( > + p_mpr, p_src_port, p_dest_port, comp_mask) ) > + if (mtu > IB_MTU_LEN_1024) > + { > + mtu = IB_MTU_LEN_1024; > + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > + "__osm_mpr_rcv_get_path_parms: " > + "Optimized Path MTU to 1K for Mellanox Tavor device\n"); > + } > + This part is pure hardcode, isn't it? Could this be at least isolated in single call 'osm_*_do_quirks()' or like thin? > if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC && > cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) ) > required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp ); > Index: opensm/osm_subnet.c > =================================================================== > --- opensm/osm_subnet.c (revision 9493) > +++ opensm/osm_subnet.c (working copy) > @@ -494,6 +494,7 @@ osm_subn_set_default_opt( > p_opt->ucast_dump_file = NULL; > p_opt->updn_guid_file = NULL; > p_opt->exit_on_fatal = TRUE; > + p_opt->enable_quirks = FALSE; > subn_set_default_qos_options(&p_opt->qos_options); > subn_set_default_qos_options(&p_opt->qos_ca_options); > subn_set_default_qos_options(&p_opt->qos_sw0_options); > @@ -979,6 +980,10 @@ osm_subn_parse_conf_file( > subn_parse_qos_options("qos_rtr", > p_key, p_val, &p_opts->qos_rtr_options); > > + __osm_subn_opts_unpack_boolean( > + "enable_quirks", > + p_key, p_val, &p_opts->enable_quirks); > + > } > } > fclose(opts_file); > @@ -1179,11 +1184,15 @@ osm_subn_write_conf_file( > "force_log_flush %s\n\n" > "# Log file to be used\n" > "log_file %s\n\n" > + "# Limit the the size of the log file. If overrun log is restarted\n" > "log_max_size %lu\n\n" > + "# If TRUE will accumulate the log over multiple OpenSM sessions\n" > "accum_log_file %s\n\n" > "# The directory to hold the file OpenSM dumps\n" > "dump_files_dir %s\n\n" > - "# If TRUE if OpenSM should disable multicast support\n" > + "# If TRUE enables new high risk options and hardware specific quirks\n" > + "enable_quirks %s\n\n" > + "# If TRUE OpenSM should disable multicast support\n" > "no_multicast_option %s\n\n" > "# No multicast routing is performed if TRUE\n" > "disable_multicast %s\n\n" > @@ -1195,6 +1204,7 @@ osm_subn_write_conf_file( > p_opts->log_max_size, > p_opts->accum_log_file ? "TRUE" : "FALSE", > p_opts->dump_files_dir, > + p_opts->enable_quirks ? "TRUE" : "FALSE", > p_opts->no_multicast_option ? "TRUE" : "FALSE", > p_opts->disable_multicast ? "TRUE" : "FALSE", > p_opts->exit_on_fatal ? "TRUE" : "FALSE" > Index: opensm/osm_helper.c > =================================================================== > --- opensm/osm_helper.c (revision 9493) > +++ opensm/osm_helper.c (working copy) > @@ -2289,24 +2289,6 @@ osm_get_node_type_str_fixed_width( > return( __osm_node_type_str_fixed_width[node_type] ); > } > > -#define OSM_VENDOR_ID_INTEL 0x00D0B7 > -#define OSM_VENDOR_ID_MELLANOX 0x0002C9 > -#define OSM_VENDOR_ID_REDSWITCH 0x000617 > -#define OSM_VENDOR_ID_SILVERSTORM 0x00066A > -#define OSM_VENDOR_ID_TOPSPIN 0x0005AD > -#define OSM_VENDOR_ID_FUJITSU 0x00E000 > -#define OSM_VENDOR_ID_FUJITSU2 0x000B5D > -#define OSM_VENDOR_ID_VOLTAIRE 0x0008F1 > -#define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 > -#define OSM_VENDOR_ID_PATHSCALE 0x001175 > -#define OSM_VENDOR_ID_IBM 0x000255 > -#define OSM_VENDOR_ID_DIVERGENET 0x00084E > -#define OSM_VENDOR_ID_FLEXTRONICS 0x000B8C > -#define OSM_VENDOR_ID_AGILENT 0x0030D3 > -#define OSM_VENDOR_ID_OBSIDIAN 0x001777 > -#define OSM_VENDOR_ID_BAYMICRO 0x000BC1 > -#define OSM_VENDOR_ID_LSILOGIC 0x00A0B8 > - > /********************************************************************** > **********************************************************************/ > const char* > Index: opensm/osm_sa_path_record.c > =================================================================== > --- opensm/osm_sa_path_record.c (revision 9493) > +++ opensm/osm_sa_path_record.c (working copy) > @@ -57,6 +57,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -150,6 +151,75 @@ osm_pr_rcv_init( > > /********************************************************************** > **********************************************************************/ > +static inline boolean_t > +__osm_sa_path_rec_is_tavor_port( > + IN const osm_port_t* const p_port) > +{ > + osm_node_t const* p_node; > + ib_net32_t vend_id; > + > + p_node = osm_port_get_parent_node( p_port ); > + vend_id = ib_node_info_get_vendor_id( &p_node->node_info ); > + > + return( (p_node->node_info.device_id == CL_HTON16(23108)) && > + ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || > + (vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || > + (vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || > + (vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE)))); > +} > + > +/********************************************************************** > + **********************************************************************/ > +static boolean_t > + __osm_sa_path_rec_apply_tavor_mtu_limit( > + IN const ib_path_rec_t* const p_pr, > + IN const osm_port_t* const p_src_port, > + IN const osm_port_t* const p_dest_port, > + IN const ib_net64_t comp_mask) > +{ > + uint8_t required_mtu; > + > + /* only if one of the ports is a Tavor device */ > + if (! __osm_sa_path_rec_is_tavor_port(p_src_port) && > + ! __osm_sa_path_rec_is_tavor_port(p_dest_port) ) > + return( FALSE ); > + > + /* > + we can apply the patch if either: > + 1. No MTU required > + 2. Required MTU < > + 3. Required MTU = 1K or 512 or 256 > + 4. Required MTU > 256 or 512 > + */ > + required_mtu = ib_path_rec_mtu( p_pr ); > + if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > + ( comp_mask & IB_PR_COMPMASK_MTU ) ) > + { > + switch( ib_path_rec_mtu_sel( p_pr ) ) > + { > + case 0: /* must be greater than */ > + case 2: /* exact match */ > + if( IB_MTU_LEN_1024 < required_mtu ) > + return(FALSE); > + break; > + > + case 1: /* must be less than */ > + case 3: /* largest available */ > + /* can't be disqualified by this one */ > + break; > + > + default: > + /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ > + CL_ASSERT( FALSE ); > + break; > + } > + } > + > + return(TRUE); > +} > + > +/********************************************************************** > + **********************************************************************/ > static ib_api_status_t > __osm_pr_rcv_get_path_parms( > IN osm_pr_rcv_t* const p_rcv, > @@ -191,6 +261,23 @@ __osm_pr_rcv_get_path_parms( > mtu = ib_port_info_get_mtu_cap( p_pi ); > rate = ib_port_info_compute_rate( p_pi ); > > + /* > + Mellanox Tavor device performance is better using 1K MTU. > + If required MTU and MTU selector are such that 1K is OK > + and one of the ends of the path is Tavor we override the > + port MTU with 1K. > + */ > + if ( p_rcv->p_subn->opt.enable_quirks && > + __osm_sa_path_rec_apply_tavor_mtu_limit( > + p_pr, p_src_port, p_dest_port, comp_mask) ) > + if (mtu > IB_MTU_LEN_1024) > + { > + mtu = IB_MTU_LEN_1024; > + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > + "__osm_pr_rcv_get_path_parms: " > + "Optimized Path MTU to 1K for Mellanox Tavor device\n"); > + } > + The same is here (about hardcodes). Also I see that tavor specific functions are pretty similar for PR and MPR cases. Why not to share this in something like osm_sa_quirks.c? Sasha > /* > Walk the subnet object from source to destination, > tracking the most restrictive rate and mtu values along the way... > @@ -444,10 +531,10 @@ __osm_pr_rcv_get_path_parms( > */ > > /* we silently ignore cases where only the MTU selector is defined */ > + required_mtu = ib_path_rec_mtu( p_pr ); > if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > ( comp_mask & IB_PR_COMPMASK_MTU ) ) > { > - required_mtu = ib_path_rec_mtu( p_pr ); > switch( ib_path_rec_mtu_sel( p_pr ) ) > { > case 0: /* must be greater than */ > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Fri Sep 15 16:06:35 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2006 19:06:35 -0400 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <20060915221709.GB5891@sashak.voltaire.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> <20060915221709.GB5891@sashak.voltaire.com> Message-ID: <1158361564.25157.67561.camel@hal.voltaire.com> Hi Sasha, On Fri, 2006-09-15 at 18:17, Sasha Khapyorsky wrote: > Hi Eitan, > > Some comments about the patch. > > On 14:45 Fri 15 Sep , Eitan Zahavi wrote: > > Hi Hal > > > > The following patch solves an issue with OpenSM preferring largest MTU > > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) > > devices instead of using a 1K MTU which is best for this device. > > > > Since this is a device specific quirk I have added a configuration option > > named enable_quirks which is FALSE by default to enable this functionality. > > > > To summarize the functionality change: > > 1. Added enable_quirks option > > 2. If enable_quirks is FALSE do nothing > > I see those quirks are SA specific. Then should this option be called > 'enable_sa_quirks' instead? Not sure what the right "granularity" is for this. Would all quirks be enabled at once or would this end up being a pick and choose ? > > 3. If a specific MTU is requested (either =2K or >1K) do nothing > > 4. If either source port or destination port is a Tavor device > > MTU is limited to 1K (can be further reduced by path traversal) > > > > Target is both trunk and OFED 1.1 > > > > Thanks > > > > Eitan > > > > Signed-off-by: Eitan Zahavi > > > > Index: include/opensm/osm_subnet.h > > =================================================================== > > --- include/opensm/osm_subnet.h (revision 9493) > > +++ include/opensm/osm_subnet.h (working copy) > > @@ -286,6 +286,7 @@ typedef struct _osm_subn_opt > > osm_qos_options_t qos_sw0_options; > > osm_qos_options_t qos_swe_options; > > osm_qos_options_t qos_rtr_options; > > + boolean_t enable_quirks; > > } osm_subn_opt_t; > > /* > > * FIELDS > > @@ -469,6 +470,10 @@ typedef struct _osm_subn_opt > > * qos_rtr_options > > * QoS options for router ports > > * > > +* enable_quirks > > +* Enable high risk new features and not fully qualified > > +* hardware specific work arounds > > +* > > * SEE ALSO > > * Subnet object > > *********/ > > Index: include/opensm/osm_base.h > > =================================================================== > > --- include/opensm/osm_base.h (revision 9493) > > +++ include/opensm/osm_base.h (working copy) > > @@ -778,6 +778,34 @@ typedef enum _osm_mcast_req_type > > #define MAX_UPDN_GUID_FILE_LINE_LENGTH 120 > > /**********/ > > > > +/****s* OpenSM: Base/VendorOUIs > > +* NAME > > +* VendorOUIs > > +* > > +* DESCRIPTION > > +* Known device vendor ID and GUID OUIs > > +* > > +* SYNOPSIS > > +*/ > > +#define OSM_VENDOR_ID_INTEL 0x00D0B7 > > +#define OSM_VENDOR_ID_MELLANOX 0x0002C9 > > +#define OSM_VENDOR_ID_REDSWITCH 0x000617 > > +#define OSM_VENDOR_ID_SILVERSTORM 0x00066A > > +#define OSM_VENDOR_ID_TOPSPIN 0x0005AD > > +#define OSM_VENDOR_ID_FUJITSU 0x00E000 > > +#define OSM_VENDOR_ID_FUJITSU2 0x000B5D > > +#define OSM_VENDOR_ID_VOLTAIRE 0x0008F1 > > +#define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 > > +#define OSM_VENDOR_ID_PATHSCALE 0x001175 > > +#define OSM_VENDOR_ID_IBM 0x000255 > > +#define OSM_VENDOR_ID_DIVERGENET 0x00084E > > +#define OSM_VENDOR_ID_FLEXTRONICS 0x000B8C > > +#define OSM_VENDOR_ID_AGILENT 0x0030D3 > > +#define OSM_VENDOR_ID_OBSIDIAN 0x001777 > > +#define OSM_VENDOR_ID_BAYMICRO 0x000BC1 > > +#define OSM_VENDOR_ID_LSILOGIC 0x00A0B8 > > +/**********/ > > + > > END_C_DECLS > > > > #endif /* _OSM_BASE_H_ */ > > Index: opensm/osm_sa_multipath_record.c > > =================================================================== > > --- opensm/osm_sa_multipath_record.c (revision 9493) > > +++ opensm/osm_sa_multipath_record.c (working copy) > > @@ -150,6 +150,75 @@ osm_mpr_rcv_init( > > > > /********************************************************************** > > **********************************************************************/ > > +static inline boolean_t > > +__osm_sa_multipath_rec_is_tavor_port( > > + IN const osm_port_t* const p_port) > > +{ > > + osm_node_t const* p_node; > > + ib_net32_t vend_id; > > + > > + p_node = osm_port_get_parent_node( p_port ); > > + vend_id = ib_node_info_get_vendor_id( &p_node->node_info ); > > + > > + return( (p_node->node_info.device_id == CL_HTON16(23108)) && > > + ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE)))); > > +} > > + > > +/********************************************************************** > > + **********************************************************************/ > > +boolean_t > > + __osm_sa_multipath_rec_apply_tavor_mtu_limit( > > + IN const ib_multipath_rec_t* const p_mpr, > > + IN const osm_port_t* const p_src_port, > > + IN const osm_port_t* const p_dest_port, > > + IN const ib_net64_t comp_mask) > > +{ > > + uint8_t required_mtu; > > + > > + /* only if one of the ports is a Tavor device */ > > + if (! __osm_sa_multipath_rec_is_tavor_port(p_src_port) && > > + ! __osm_sa_multipath_rec_is_tavor_port(p_dest_port) ) > > + return( FALSE ); > > + > > + /* > > + we can apply the patch if either: > > + 1. No MTU required > > + 2. Required MTU < > > + 3. Required MTU = 1K or 512 or 256 > > + 4. Required MTU > 256 or 512 > > + */ > > + required_mtu = ib_multipath_rec_mtu( p_mpr ); > > + if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > > + ( comp_mask & IB_PR_COMPMASK_MTU ) ) > > Should here be IB_MPR_COMPMASK_* instead of IB_PR_COMPMASK_*? Good catch. > > + { > > + switch( ib_multipath_rec_mtu_sel( p_mpr ) ) > > + { > > + case 0: /* must be greater than */ > > + case 2: /* exact match */ > > + if( IB_MTU_LEN_1024 < required_mtu ) > > + return(FALSE); > > + break; > > + > > + case 1: /* must be less than */ > > + case 3: /* largest available */ > > + /* can't be disqualified by this one */ > > + break; > > + > > + default: > > + /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ > > + CL_ASSERT( FALSE ); > > + break; > > + } > > + } > > + > > + return(TRUE); > > +} > > + > > +/********************************************************************** > > + **********************************************************************/ > > static ib_api_status_t > > __osm_mpr_rcv_get_path_parms( > > IN osm_mpr_rcv_t* const p_rcv, > > @@ -195,6 +264,23 @@ __osm_mpr_rcv_get_path_parms( > > mtu = ib_port_info_get_mtu_cap( p_pi ); > > rate = ib_port_info_compute_rate( p_pi ); > > > > + /* > > + Mellanox Tavor device performance is better using 1K MTU. > > + If required MTU and MTU selector are such that 1K is OK > > + and one of the ends of the path is Tavor we override the > > + port MTU with 1K. > > + */ > > + if ( p_rcv->p_subn->opt.enable_quirks && > > + __osm_sa_multipath_rec_apply_tavor_mtu_limit( > > + p_mpr, p_src_port, p_dest_port, comp_mask) ) > > + if (mtu > IB_MTU_LEN_1024) > > + { > > + mtu = IB_MTU_LEN_1024; > > + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > > + "__osm_mpr_rcv_get_path_parms: " > > + "Optimized Path MTU to 1K for Mellanox Tavor device\n"); > > + } > > + > > This part is pure hardcode, isn't it? Could this be at least isolated in > single call 'osm_*_do_quirks()' or like thin? Perhaps. This can be worked on the trunk. > > if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC && > > cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) ) > > required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp ); > > Index: opensm/osm_subnet.c > > =================================================================== > > --- opensm/osm_subnet.c (revision 9493) > > +++ opensm/osm_subnet.c (working copy) > > @@ -494,6 +494,7 @@ osm_subn_set_default_opt( > > p_opt->ucast_dump_file = NULL; > > p_opt->updn_guid_file = NULL; > > p_opt->exit_on_fatal = TRUE; > > + p_opt->enable_quirks = FALSE; > > subn_set_default_qos_options(&p_opt->qos_options); > > subn_set_default_qos_options(&p_opt->qos_ca_options); > > subn_set_default_qos_options(&p_opt->qos_sw0_options); > > @@ -979,6 +980,10 @@ osm_subn_parse_conf_file( > > subn_parse_qos_options("qos_rtr", > > p_key, p_val, &p_opts->qos_rtr_options); > > > > + __osm_subn_opts_unpack_boolean( > > + "enable_quirks", > > + p_key, p_val, &p_opts->enable_quirks); > > + > > } > > } > > fclose(opts_file); > > @@ -1179,11 +1184,15 @@ osm_subn_write_conf_file( > > "force_log_flush %s\n\n" > > "# Log file to be used\n" > > "log_file %s\n\n" > > + "# Limit the the size of the log file. If overrun log is restarted\n" > > "log_max_size %lu\n\n" > > + "# If TRUE will accumulate the log over multiple OpenSM sessions\n" > > "accum_log_file %s\n\n" > > "# The directory to hold the file OpenSM dumps\n" > > "dump_files_dir %s\n\n" > > - "# If TRUE if OpenSM should disable multicast support\n" > > + "# If TRUE enables new high risk options and hardware specific quirks\n" > > + "enable_quirks %s\n\n" > > + "# If TRUE OpenSM should disable multicast support\n" > > "no_multicast_option %s\n\n" > > "# No multicast routing is performed if TRUE\n" > > "disable_multicast %s\n\n" > > @@ -1195,6 +1204,7 @@ osm_subn_write_conf_file( > > p_opts->log_max_size, > > p_opts->accum_log_file ? "TRUE" : "FALSE", > > p_opts->dump_files_dir, > > + p_opts->enable_quirks ? "TRUE" : "FALSE", > > p_opts->no_multicast_option ? "TRUE" : "FALSE", > > p_opts->disable_multicast ? "TRUE" : "FALSE", > > p_opts->exit_on_fatal ? "TRUE" : "FALSE" > > Index: opensm/osm_helper.c > > =================================================================== > > --- opensm/osm_helper.c (revision 9493) > > +++ opensm/osm_helper.c (working copy) > > @@ -2289,24 +2289,6 @@ osm_get_node_type_str_fixed_width( > > return( __osm_node_type_str_fixed_width[node_type] ); > > } > > > > -#define OSM_VENDOR_ID_INTEL 0x00D0B7 > > -#define OSM_VENDOR_ID_MELLANOX 0x0002C9 > > -#define OSM_VENDOR_ID_REDSWITCH 0x000617 > > -#define OSM_VENDOR_ID_SILVERSTORM 0x00066A > > -#define OSM_VENDOR_ID_TOPSPIN 0x0005AD > > -#define OSM_VENDOR_ID_FUJITSU 0x00E000 > > -#define OSM_VENDOR_ID_FUJITSU2 0x000B5D > > -#define OSM_VENDOR_ID_VOLTAIRE 0x0008F1 > > -#define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 > > -#define OSM_VENDOR_ID_PATHSCALE 0x001175 > > -#define OSM_VENDOR_ID_IBM 0x000255 > > -#define OSM_VENDOR_ID_DIVERGENET 0x00084E > > -#define OSM_VENDOR_ID_FLEXTRONICS 0x000B8C > > -#define OSM_VENDOR_ID_AGILENT 0x0030D3 > > -#define OSM_VENDOR_ID_OBSIDIAN 0x001777 > > -#define OSM_VENDOR_ID_BAYMICRO 0x000BC1 > > -#define OSM_VENDOR_ID_LSILOGIC 0x00A0B8 > > - > > /********************************************************************** > > **********************************************************************/ > > const char* > > Index: opensm/osm_sa_path_record.c > > =================================================================== > > --- opensm/osm_sa_path_record.c (revision 9493) > > +++ opensm/osm_sa_path_record.c (working copy) > > @@ -57,6 +57,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -150,6 +151,75 @@ osm_pr_rcv_init( > > > > /********************************************************************** > > **********************************************************************/ > > +static inline boolean_t > > +__osm_sa_path_rec_is_tavor_port( > > + IN const osm_port_t* const p_port) > > +{ > > + osm_node_t const* p_node; > > + ib_net32_t vend_id; > > + > > + p_node = osm_port_get_parent_node( p_port ); > > + vend_id = ib_node_info_get_vendor_id( &p_node->node_info ); > > + > > + return( (p_node->node_info.device_id == CL_HTON16(23108)) && > > + ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE)))); > > +} > > + > > +/********************************************************************** > > + **********************************************************************/ > > +static boolean_t > > + __osm_sa_path_rec_apply_tavor_mtu_limit( > > + IN const ib_path_rec_t* const p_pr, > > + IN const osm_port_t* const p_src_port, > > + IN const osm_port_t* const p_dest_port, > > + IN const ib_net64_t comp_mask) > > +{ > > + uint8_t required_mtu; > > + > > + /* only if one of the ports is a Tavor device */ > > + if (! __osm_sa_path_rec_is_tavor_port(p_src_port) && > > + ! __osm_sa_path_rec_is_tavor_port(p_dest_port) ) > > + return( FALSE ); > > + > > + /* > > + we can apply the patch if either: > > + 1. No MTU required > > + 2. Required MTU < > > + 3. Required MTU = 1K or 512 or 256 > > + 4. Required MTU > 256 or 512 > > + */ > > + required_mtu = ib_path_rec_mtu( p_pr ); > > + if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > > + ( comp_mask & IB_PR_COMPMASK_MTU ) ) > > + { > > + switch( ib_path_rec_mtu_sel( p_pr ) ) > > + { > > + case 0: /* must be greater than */ > > + case 2: /* exact match */ > > + if( IB_MTU_LEN_1024 < required_mtu ) > > + return(FALSE); > > + break; > > + > > + case 1: /* must be less than */ > > + case 3: /* largest available */ > > + /* can't be disqualified by this one */ > > + break; > > + > > + default: > > + /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ > > + CL_ASSERT( FALSE ); > > + break; > > + } > > + } > > + > > + return(TRUE); > > +} > > + > > +/********************************************************************** > > + **********************************************************************/ > > static ib_api_status_t > > __osm_pr_rcv_get_path_parms( > > IN osm_pr_rcv_t* const p_rcv, > > @@ -191,6 +261,23 @@ __osm_pr_rcv_get_path_parms( > > mtu = ib_port_info_get_mtu_cap( p_pi ); > > rate = ib_port_info_compute_rate( p_pi ); > > > > + /* > > + Mellanox Tavor device performance is better using 1K MTU. > > + If required MTU and MTU selector are such that 1K is OK > > + and one of the ends of the path is Tavor we override the > > + port MTU with 1K. > > + */ > > + if ( p_rcv->p_subn->opt.enable_quirks && > > + __osm_sa_path_rec_apply_tavor_mtu_limit( > > + p_pr, p_src_port, p_dest_port, comp_mask) ) > > + if (mtu > IB_MTU_LEN_1024) > > + { > > + mtu = IB_MTU_LEN_1024; > > + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > > + "__osm_pr_rcv_get_path_parms: " > > + "Optimized Path MTU to 1K for Mellanox Tavor device\n"); > > + } > > + > > The same is here (about hardcodes). > > Also I see that tavor specific functions are pretty similar for PR and > MPR cases. Why not to share this in something like osm_sa_quirks.c? I think we can work on this on the trunk and see if there is an OFED 1.1 opening. -- Hal > Sasha > > > /* > > Walk the subnet object from source to destination, > > tracking the most restrictive rate and mtu values along the way... > > @@ -444,10 +531,10 @@ __osm_pr_rcv_get_path_parms( > > */ > > > > /* we silently ignore cases where only the MTU selector is defined */ > > + required_mtu = ib_path_rec_mtu( p_pr ); > > if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > > ( comp_mask & IB_PR_COMPMASK_MTU ) ) > > { > > - required_mtu = ib_path_rec_mtu( p_pr ); > > switch( ib_path_rec_mtu_sel( p_pr ) ) > > { > > case 0: /* must be greater than */ > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From halr at voltaire.com Fri Sep 15 16:26:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2006 19:26:00 -0400 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <86y7sle4kg.fsf@mtl066.yok.mtl.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> Message-ID: <1158362707.25157.68156.camel@hal.voltaire.com> Hi Eitan, On Fri, 2006-09-15 at 07:45, Eitan Zahavi wrote: > Hi Hal > > The following patch solves an issue with OpenSM preferring largest MTU > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) > devices instead of using a 1K MTU which is best for this device. > > Since this is a device specific quirk I have added a configuration option > named enable_quirks which is FALSE by default to enable this functionality. > > To summarize the functionality change: > 1. Added enable_quirks option > 2. If enable_quirks is FALSE do nothing > 3. If a specific MTU is requested (either =2K or >1K) do nothing > 4. If either source port or destination port is a Tavor device > MTU is limited to 1K (can be further reduced by path traversal) > > Target is both trunk and OFED 1.1 Thanks. Applied to both trunk and 1.1 with the MPR compmask change that Sasha saw and some other cosmetic changes. Please retest to be sure. -- Hal > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi From Sudhakar.Dindukurti at Sun.COM Fri Sep 15 17:00:08 2006 From: Sudhakar.Dindukurti at Sun.COM (Sudhakar Dindukurti) Date: Fri, 15 Sep 2006 17:00:08 -0700 Subject: [openib-general] Some questions on OF interfaces Message-ID: <450B3E88.3080407@Sun.COM> Hello, I am new to OpenFabrics and trying to understand OF interfaces. I appreciate if some one could provide answers to the following questions. 1) How/when to use IB_SEND_INLINE feature ? 2) What are possible values for struct ib_device_attr -> page_size_cap ? Thanks in advance, Sudhakar From sashak at voltaire.com Fri Sep 15 17:04:32 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 16 Sep 2006 03:04:32 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <1158361564.25157.67561.camel@hal.voltaire.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> <20060915221709.GB5891@sashak.voltaire.com> <1158361564.25157.67561.camel@hal.voltaire.com> Message-ID: <20060916000432.GB8912@sashak.voltaire.com> On 19:06 Fri 15 Sep , Hal Rosenstock wrote: > Hi Sasha, > > On Fri, 2006-09-15 at 18:17, Sasha Khapyorsky wrote: > > Hi Eitan, > > > > Some comments about the patch. > > > > On 14:45 Fri 15 Sep , Eitan Zahavi wrote: > > > Hi Hal > > > > > > The following patch solves an issue with OpenSM preferring largest MTU > > > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) > > > devices instead of using a 1K MTU which is best for this device. > > > > > > Since this is a device specific quirk I have added a configuration option > > > named enable_quirks which is FALSE by default to enable this functionality. > > > > > > To summarize the functionality change: > > > 1. Added enable_quirks option > > > 2. If enable_quirks is FALSE do nothing > > > > I see those quirks are SA specific. Then should this option be called > > 'enable_sa_quirks' instead? > > Not sure what the right "granularity" is for this. Would all quirks be > enabled at once or would this end up being a pick and choose ? Of course this matters how we define this (so I was asking). Right now I see that this is used for SA. Sasha > > > > 3. If a specific MTU is requested (either =2K or >1K) do nothing > > > 4. If either source port or destination port is a Tavor device > > > MTU is limited to 1K (can be further reduced by path traversal) > > > > > > Target is both trunk and OFED 1.1 > > > > > > Thanks > > > > > > Eitan > > > > > > Signed-off-by: Eitan Zahavi > > > > > > Index: include/opensm/osm_subnet.h > > > =================================================================== > > > --- include/opensm/osm_subnet.h (revision 9493) > > > +++ include/opensm/osm_subnet.h (working copy) > > > @@ -286,6 +286,7 @@ typedef struct _osm_subn_opt > > > osm_qos_options_t qos_sw0_options; > > > osm_qos_options_t qos_swe_options; > > > osm_qos_options_t qos_rtr_options; > > > + boolean_t enable_quirks; > > > } osm_subn_opt_t; > > > /* > > > * FIELDS > > > @@ -469,6 +470,10 @@ typedef struct _osm_subn_opt > > > * qos_rtr_options > > > * QoS options for router ports > > > * > > > +* enable_quirks > > > +* Enable high risk new features and not fully qualified > > > +* hardware specific work arounds > > > +* > > > * SEE ALSO > > > * Subnet object > > > *********/ > > > Index: include/opensm/osm_base.h > > > =================================================================== > > > --- include/opensm/osm_base.h (revision 9493) > > > +++ include/opensm/osm_base.h (working copy) > > > @@ -778,6 +778,34 @@ typedef enum _osm_mcast_req_type > > > #define MAX_UPDN_GUID_FILE_LINE_LENGTH 120 > > > /**********/ > > > > > > +/****s* OpenSM: Base/VendorOUIs > > > +* NAME > > > +* VendorOUIs > > > +* > > > +* DESCRIPTION > > > +* Known device vendor ID and GUID OUIs > > > +* > > > +* SYNOPSIS > > > +*/ > > > +#define OSM_VENDOR_ID_INTEL 0x00D0B7 > > > +#define OSM_VENDOR_ID_MELLANOX 0x0002C9 > > > +#define OSM_VENDOR_ID_REDSWITCH 0x000617 > > > +#define OSM_VENDOR_ID_SILVERSTORM 0x00066A > > > +#define OSM_VENDOR_ID_TOPSPIN 0x0005AD > > > +#define OSM_VENDOR_ID_FUJITSU 0x00E000 > > > +#define OSM_VENDOR_ID_FUJITSU2 0x000B5D > > > +#define OSM_VENDOR_ID_VOLTAIRE 0x0008F1 > > > +#define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 > > > +#define OSM_VENDOR_ID_PATHSCALE 0x001175 > > > +#define OSM_VENDOR_ID_IBM 0x000255 > > > +#define OSM_VENDOR_ID_DIVERGENET 0x00084E > > > +#define OSM_VENDOR_ID_FLEXTRONICS 0x000B8C > > > +#define OSM_VENDOR_ID_AGILENT 0x0030D3 > > > +#define OSM_VENDOR_ID_OBSIDIAN 0x001777 > > > +#define OSM_VENDOR_ID_BAYMICRO 0x000BC1 > > > +#define OSM_VENDOR_ID_LSILOGIC 0x00A0B8 > > > +/**********/ > > > + > > > END_C_DECLS > > > > > > #endif /* _OSM_BASE_H_ */ > > > Index: opensm/osm_sa_multipath_record.c > > > =================================================================== > > > --- opensm/osm_sa_multipath_record.c (revision 9493) > > > +++ opensm/osm_sa_multipath_record.c (working copy) > > > @@ -150,6 +150,75 @@ osm_mpr_rcv_init( > > > > > > /********************************************************************** > > > **********************************************************************/ > > > +static inline boolean_t > > > +__osm_sa_multipath_rec_is_tavor_port( > > > + IN const osm_port_t* const p_port) > > > +{ > > > + osm_node_t const* p_node; > > > + ib_net32_t vend_id; > > > + > > > + p_node = osm_port_get_parent_node( p_port ); > > > + vend_id = ib_node_info_get_vendor_id( &p_node->node_info ); > > > + > > > + return( (p_node->node_info.device_id == CL_HTON16(23108)) && > > > + ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || > > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || > > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || > > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE)))); > > > +} > > > + > > > +/********************************************************************** > > > + **********************************************************************/ > > > +boolean_t > > > + __osm_sa_multipath_rec_apply_tavor_mtu_limit( > > > + IN const ib_multipath_rec_t* const p_mpr, > > > + IN const osm_port_t* const p_src_port, > > > + IN const osm_port_t* const p_dest_port, > > > + IN const ib_net64_t comp_mask) > > > +{ > > > + uint8_t required_mtu; > > > + > > > + /* only if one of the ports is a Tavor device */ > > > + if (! __osm_sa_multipath_rec_is_tavor_port(p_src_port) && > > > + ! __osm_sa_multipath_rec_is_tavor_port(p_dest_port) ) > > > + return( FALSE ); > > > + > > > + /* > > > + we can apply the patch if either: > > > + 1. No MTU required > > > + 2. Required MTU < > > > + 3. Required MTU = 1K or 512 or 256 > > > + 4. Required MTU > 256 or 512 > > > + */ > > > + required_mtu = ib_multipath_rec_mtu( p_mpr ); > > > + if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > > > + ( comp_mask & IB_PR_COMPMASK_MTU ) ) > > > > Should here be IB_MPR_COMPMASK_* instead of IB_PR_COMPMASK_*? > > Good catch. > > > > + { > > > + switch( ib_multipath_rec_mtu_sel( p_mpr ) ) > > > + { > > > + case 0: /* must be greater than */ > > > + case 2: /* exact match */ > > > + if( IB_MTU_LEN_1024 < required_mtu ) > > > + return(FALSE); > > > + break; > > > + > > > + case 1: /* must be less than */ > > > + case 3: /* largest available */ > > > + /* can't be disqualified by this one */ > > > + break; > > > + > > > + default: > > > + /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ > > > + CL_ASSERT( FALSE ); > > > + break; > > > + } > > > + } > > > + > > > + return(TRUE); > > > +} > > > + > > > +/********************************************************************** > > > + **********************************************************************/ > > > static ib_api_status_t > > > __osm_mpr_rcv_get_path_parms( > > > IN osm_mpr_rcv_t* const p_rcv, > > > @@ -195,6 +264,23 @@ __osm_mpr_rcv_get_path_parms( > > > mtu = ib_port_info_get_mtu_cap( p_pi ); > > > rate = ib_port_info_compute_rate( p_pi ); > > > > > > + /* > > > + Mellanox Tavor device performance is better using 1K MTU. > > > + If required MTU and MTU selector are such that 1K is OK > > > + and one of the ends of the path is Tavor we override the > > > + port MTU with 1K. > > > + */ > > > + if ( p_rcv->p_subn->opt.enable_quirks && > > > + __osm_sa_multipath_rec_apply_tavor_mtu_limit( > > > + p_mpr, p_src_port, p_dest_port, comp_mask) ) > > > + if (mtu > IB_MTU_LEN_1024) > > > + { > > > + mtu = IB_MTU_LEN_1024; > > > + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > > > + "__osm_mpr_rcv_get_path_parms: " > > > + "Optimized Path MTU to 1K for Mellanox Tavor device\n"); > > > + } > > > + > > > > This part is pure hardcode, isn't it? Could this be at least isolated in > > single call 'osm_*_do_quirks()' or like thin? > > Perhaps. This can be worked on the trunk. > > > > if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC && > > > cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) ) > > > required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp ); > > > Index: opensm/osm_subnet.c > > > =================================================================== > > > --- opensm/osm_subnet.c (revision 9493) > > > +++ opensm/osm_subnet.c (working copy) > > > @@ -494,6 +494,7 @@ osm_subn_set_default_opt( > > > p_opt->ucast_dump_file = NULL; > > > p_opt->updn_guid_file = NULL; > > > p_opt->exit_on_fatal = TRUE; > > > + p_opt->enable_quirks = FALSE; > > > subn_set_default_qos_options(&p_opt->qos_options); > > > subn_set_default_qos_options(&p_opt->qos_ca_options); > > > subn_set_default_qos_options(&p_opt->qos_sw0_options); > > > @@ -979,6 +980,10 @@ osm_subn_parse_conf_file( > > > subn_parse_qos_options("qos_rtr", > > > p_key, p_val, &p_opts->qos_rtr_options); > > > > > > + __osm_subn_opts_unpack_boolean( > > > + "enable_quirks", > > > + p_key, p_val, &p_opts->enable_quirks); > > > + > > > } > > > } > > > fclose(opts_file); > > > @@ -1179,11 +1184,15 @@ osm_subn_write_conf_file( > > > "force_log_flush %s\n\n" > > > "# Log file to be used\n" > > > "log_file %s\n\n" > > > + "# Limit the the size of the log file. If overrun log is restarted\n" > > > "log_max_size %lu\n\n" > > > + "# If TRUE will accumulate the log over multiple OpenSM sessions\n" > > > "accum_log_file %s\n\n" > > > "# The directory to hold the file OpenSM dumps\n" > > > "dump_files_dir %s\n\n" > > > - "# If TRUE if OpenSM should disable multicast support\n" > > > + "# If TRUE enables new high risk options and hardware specific quirks\n" > > > + "enable_quirks %s\n\n" > > > + "# If TRUE OpenSM should disable multicast support\n" > > > "no_multicast_option %s\n\n" > > > "# No multicast routing is performed if TRUE\n" > > > "disable_multicast %s\n\n" > > > @@ -1195,6 +1204,7 @@ osm_subn_write_conf_file( > > > p_opts->log_max_size, > > > p_opts->accum_log_file ? "TRUE" : "FALSE", > > > p_opts->dump_files_dir, > > > + p_opts->enable_quirks ? "TRUE" : "FALSE", > > > p_opts->no_multicast_option ? "TRUE" : "FALSE", > > > p_opts->disable_multicast ? "TRUE" : "FALSE", > > > p_opts->exit_on_fatal ? "TRUE" : "FALSE" > > > Index: opensm/osm_helper.c > > > =================================================================== > > > --- opensm/osm_helper.c (revision 9493) > > > +++ opensm/osm_helper.c (working copy) > > > @@ -2289,24 +2289,6 @@ osm_get_node_type_str_fixed_width( > > > return( __osm_node_type_str_fixed_width[node_type] ); > > > } > > > > > > -#define OSM_VENDOR_ID_INTEL 0x00D0B7 > > > -#define OSM_VENDOR_ID_MELLANOX 0x0002C9 > > > -#define OSM_VENDOR_ID_REDSWITCH 0x000617 > > > -#define OSM_VENDOR_ID_SILVERSTORM 0x00066A > > > -#define OSM_VENDOR_ID_TOPSPIN 0x0005AD > > > -#define OSM_VENDOR_ID_FUJITSU 0x00E000 > > > -#define OSM_VENDOR_ID_FUJITSU2 0x000B5D > > > -#define OSM_VENDOR_ID_VOLTAIRE 0x0008F1 > > > -#define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 > > > -#define OSM_VENDOR_ID_PATHSCALE 0x001175 > > > -#define OSM_VENDOR_ID_IBM 0x000255 > > > -#define OSM_VENDOR_ID_DIVERGENET 0x00084E > > > -#define OSM_VENDOR_ID_FLEXTRONICS 0x000B8C > > > -#define OSM_VENDOR_ID_AGILENT 0x0030D3 > > > -#define OSM_VENDOR_ID_OBSIDIAN 0x001777 > > > -#define OSM_VENDOR_ID_BAYMICRO 0x000BC1 > > > -#define OSM_VENDOR_ID_LSILOGIC 0x00A0B8 > > > - > > > /********************************************************************** > > > **********************************************************************/ > > > const char* > > > Index: opensm/osm_sa_path_record.c > > > =================================================================== > > > --- opensm/osm_sa_path_record.c (revision 9493) > > > +++ opensm/osm_sa_path_record.c (working copy) > > > @@ -57,6 +57,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > #include > > > #include > > > #include > > > @@ -150,6 +151,75 @@ osm_pr_rcv_init( > > > > > > /********************************************************************** > > > **********************************************************************/ > > > +static inline boolean_t > > > +__osm_sa_path_rec_is_tavor_port( > > > + IN const osm_port_t* const p_port) > > > +{ > > > + osm_node_t const* p_node; > > > + ib_net32_t vend_id; > > > + > > > + p_node = osm_port_get_parent_node( p_port ); > > > + vend_id = ib_node_info_get_vendor_id( &p_node->node_info ); > > > + > > > + return( (p_node->node_info.device_id == CL_HTON16(23108)) && > > > + ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || > > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || > > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || > > > + (vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE)))); > > > +} > > > + > > > +/********************************************************************** > > > + **********************************************************************/ > > > +static boolean_t > > > + __osm_sa_path_rec_apply_tavor_mtu_limit( > > > + IN const ib_path_rec_t* const p_pr, > > > + IN const osm_port_t* const p_src_port, > > > + IN const osm_port_t* const p_dest_port, > > > + IN const ib_net64_t comp_mask) > > > +{ > > > + uint8_t required_mtu; > > > + > > > + /* only if one of the ports is a Tavor device */ > > > + if (! __osm_sa_path_rec_is_tavor_port(p_src_port) && > > > + ! __osm_sa_path_rec_is_tavor_port(p_dest_port) ) > > > + return( FALSE ); > > > + > > > + /* > > > + we can apply the patch if either: > > > + 1. No MTU required > > > + 2. Required MTU < > > > + 3. Required MTU = 1K or 512 or 256 > > > + 4. Required MTU > 256 or 512 > > > + */ > > > + required_mtu = ib_path_rec_mtu( p_pr ); > > > + if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > > > + ( comp_mask & IB_PR_COMPMASK_MTU ) ) > > > + { > > > + switch( ib_path_rec_mtu_sel( p_pr ) ) > > > + { > > > + case 0: /* must be greater than */ > > > + case 2: /* exact match */ > > > + if( IB_MTU_LEN_1024 < required_mtu ) > > > + return(FALSE); > > > + break; > > > + > > > + case 1: /* must be less than */ > > > + case 3: /* largest available */ > > > + /* can't be disqualified by this one */ > > > + break; > > > + > > > + default: > > > + /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ > > > + CL_ASSERT( FALSE ); > > > + break; > > > + } > > > + } > > > + > > > + return(TRUE); > > > +} > > > + > > > +/********************************************************************** > > > + **********************************************************************/ > > > static ib_api_status_t > > > __osm_pr_rcv_get_path_parms( > > > IN osm_pr_rcv_t* const p_rcv, > > > @@ -191,6 +261,23 @@ __osm_pr_rcv_get_path_parms( > > > mtu = ib_port_info_get_mtu_cap( p_pi ); > > > rate = ib_port_info_compute_rate( p_pi ); > > > > > > + /* > > > + Mellanox Tavor device performance is better using 1K MTU. > > > + If required MTU and MTU selector are such that 1K is OK > > > + and one of the ends of the path is Tavor we override the > > > + port MTU with 1K. > > > + */ > > > + if ( p_rcv->p_subn->opt.enable_quirks && > > > + __osm_sa_path_rec_apply_tavor_mtu_limit( > > > + p_pr, p_src_port, p_dest_port, comp_mask) ) > > > + if (mtu > IB_MTU_LEN_1024) > > > + { > > > + mtu = IB_MTU_LEN_1024; > > > + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > > > + "__osm_pr_rcv_get_path_parms: " > > > + "Optimized Path MTU to 1K for Mellanox Tavor device\n"); > > > + } > > > + > > > > The same is here (about hardcodes). > > > > Also I see that tavor specific functions are pretty similar for PR and > > MPR cases. Why not to share this in something like osm_sa_quirks.c? > > I think we can work on this on the trunk and see if there is an OFED 1.1 > opening. > > -- Hal > > > Sasha > > > > > /* > > > Walk the subnet object from source to destination, > > > tracking the most restrictive rate and mtu values along the way... > > > @@ -444,10 +531,10 @@ __osm_pr_rcv_get_path_parms( > > > */ > > > > > > /* we silently ignore cases where only the MTU selector is defined */ > > > + required_mtu = ib_path_rec_mtu( p_pr ); > > > if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) && > > > ( comp_mask & IB_PR_COMPMASK_MTU ) ) > > > { > > > - required_mtu = ib_path_rec_mtu( p_pr ); > > > switch( ib_path_rec_mtu_sel( p_pr ) ) > > > { > > > case 0: /* must be greater than */ > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > From halr at voltaire.com Fri Sep 15 17:12:21 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2006 20:12:21 -0400 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <20060916000432.GB8912@sashak.voltaire.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> <20060915221709.GB5891@sashak.voltaire.com> <1158361564.25157.67561.camel@hal.voltaire.com> <20060916000432.GB8912@sashak.voltaire.com> Message-ID: <1158365510.25157.69612.camel@hal.voltaire.com> On Fri, 2006-09-15 at 20:04, Sasha Khapyorsky wrote: [snip...] > > > > To summarize the functionality change: > > > > 1. Added enable_quirks option > > > > 2. If enable_quirks is FALSE do nothing > > > > > > I see those quirks are SA specific. Then should this option be called > > > 'enable_sa_quirks' instead? > > > > Not sure what the right "granularity" is for this. Would all quirks be > > enabled at once or would this end up being a pick and choose ? > > Of course this matters how we define this (so I was asking). Right now I > see that this is used for SA. Would there be other SA quirks ? Would there be SM quirks ? Are they more hardware related and then tavor_quirks would make more sense ? Would all the quirks be enabled together or is it a mix and match ? It's unclear to me which is the best way to go on these aspects. -- Hal From bgreen at nas.nasa.gov Sat Sep 16 00:29:15 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Sat, 16 Sep 2006 00:29:15 -0700 Subject: [openib-general] patch trouble Message-ID: <200609160729.k8G7TFFl020478@ece06.nas.nasa.gov> Hello, Many of the patches in subversion fail to have an effect when I apply them to a kernel, because they create headers in 'drivers/infiniband/include' which depend on being included before the like-named headers in the toplevel 'include'. Is there a step I am missing to make the headers in 'drivers/infiniband/include' get chosen for inclusion first? Here is an example of such a patch that creates a header file that never gets included: https://openib.org/svn/gen2/branches/backport/2.6.12/gfp_6138_to_2_6_13.patch Index: linux-2.6.9/drivers/infiniband/include/linux/types.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.9/drivers/infiniband/include/linux/types.h 2006-04-02 14:40:14.000000000 +0200 @@ -0,0 +1,10 @@ +#ifndef LINUX_TYPES_BACKPORT_H +#define LINUX_TYPES_BACKPORT_H + +#include_next + +#ifdef __KERNEL__ +typedef unsigned int gfp_t; +#endif + +#endif From eitan at mellanox.co.il Sat Sep 16 05:19:45 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 16 Sep 2006 15:19:45 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <1158365510.25157.69612.camel@hal.voltaire.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> <20060915221709.GB5891@sashak.voltaire.com> <1158361564.25157.67561.camel@hal.voltaire.com> <20060916000432.GB8912@sashak.voltaire.com> <1158365510.25157.69612.camel@hal.voltaire.com> Message-ID: <450BEBE1.1060506@mellanox.co.il> Hi Shasha, Hal, I'm back online. Hal Rosenstock wrote: >On Fri, 2006-09-15 at 20:04, Sasha Khapyorsky wrote: >[snip...] > > >>>>>To summarize the functionality change: >>>>>1. Added enable_quirks option >>>>>2. If enable_quirks is FALSE do nothing >>>>> >>>>> >>>>I see those quirks are SA specific. Then should this option be called >>>>'enable_sa_quirks' instead? >>>> >>>> >>>Not sure what the right "granularity" is for this. Would all quirks be >>>enabled at once or would this end up being a pick and choose ? >>> >>> >>Of course this matters how we define this (so I was asking). Right now I >>see that this is used for SA. >> >> > >Would there be other SA quirks ? Would there be SM quirks ? >Are they more hardware related and then tavor_quirks would make more >sense ? > > Who knows? >Would all the quirks be enabled together or is it a mix and match ? > > I think that as we currently one only one quirk we can avoid this till later when we will have a few. >It's unclear to me which is the best way to go on these aspects. > >-- Hal > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From eitan at mellanox.co.il Sat Sep 16 05:20:18 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 16 Sep 2006 15:20:18 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <1158362707.25157.68156.camel@hal.voltaire.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> <1158362707.25157.68156.camel@hal.voltaire.com> Message-ID: <450BEC02.6010805@mellanox.co.il> Hi Hal, Many thanks! Eitan Hal Rosenstock wrote: >Hi Eitan, > >On Fri, 2006-09-15 at 07:45, Eitan Zahavi wrote: > > >>Hi Hal >> >>The following patch solves an issue with OpenSM preferring largest MTU >>for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) >>devices instead of using a 1K MTU which is best for this device. >> >>Since this is a device specific quirk I have added a configuration option >>named enable_quirks which is FALSE by default to enable this functionality. >> >>To summarize the functionality change: >>1. Added enable_quirks option >>2. If enable_quirks is FALSE do nothing >>3. If a specific MTU is requested (either =2K or >1K) do nothing >>4. If either source port or destination port is a Tavor device >> MTU is limited to 1K (can be further reduced by path traversal) >> >>Target is both trunk and OFED 1.1 >> >> > >Thanks. Applied to both trunk and 1.1 with the MPR compmask change that >Sasha saw and some other cosmetic changes. Please retest to be sure. > >-- Hal > > > >>Thanks >> >>Eitan >> >>Signed-off-by: Eitan Zahavi >> >> > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Sat Sep 16 10:56:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 16 Sep 2006 20:56:28 +0300 Subject: [openib-general] patch trouble In-Reply-To: <200609160729.k8G7TFFl020478@ece06.nas.nasa.gov> References: <200609160729.k8G7TFFl020478@ece06.nas.nasa.gov> Message-ID: <20060916175628.GB22267@mellanox.co.il> Quoting r. Bryan Green : > Subject: patch trouble > > Hello, > Many of the patches in subversion fail to have an effect when I apply them to a kernel, > because they create headers in 'drivers/infiniband/include' which depend on being included > before the like-named headers in the toplevel 'include'. Is there a step I am missing to > make the headers in 'drivers/infiniband/include' get chosen for inclusion first? Note that backport patches are intended to be applied in an out-of-kernel fashion - they are not changing the kernel at all. So you build as an out-of-tree driver, and dd something like this to make command line: LINUXINCLUDE='-I$(CWD)/include \ -I$(CWD)/drivers/infiniband/include \ -Iinclude \ $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include) \ -include include/linux/autoconf.h \ -include $(CWD)/include/linux/autoconf.h \ ' \ You can find an example here https://openib.org/svn/gen2/trunk/ofed/openib/scripts/Makefile BTW, Mellanox is not actively supporting backport patches on the svn trunk. If you want code that works on something other than 2.6.17, I suggest you pull backports for the ofed branch (forked from 2.6.18-rc6) from ofed_1_1 tree by pulling git://www.mellanox.co.il/~git/infiniband ofed_1_1 and looking in ofed_scripts directory. -- MST From mst at mellanox.co.il Sat Sep 16 21:46:26 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Sep 2006 07:46:26 +0300 Subject: [openib-general] 2 SLES 10 backport directories In-Reply-To: <450915EE.1090705@voltaire.com> References: <450915EE.1090705@voltaire.com> Message-ID: <20060917044626.GA26054@mellanox.co.il> Quoting r. Erez Zilber : > Subject: 2 SLES 10 backport directories > > Michael, > > I saw that there are 2 SLES 10 backport directories in the svn: > > https://openib.org/svn/gen2/branches/backport/sles10/ - this one > contains patches that we added for SLES 10 > > https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/ - this one > was added later by you. > > Can we unite them? > > Here's my motivation: I want to be able to install SLES 10, replace its > infiniband dir with infiniband from openib's svn, apply all SLES 10 > patches (from a single directory) and then it should work. > > This should help us in future OFED releases. I'd like that too, but there's a difficulty here. The rest of the backport patches make it possible to build IB support out of kernel, without patching the kernel code itself. This is an explicit requirement of some users, so we made an effort to preserve this ability, and so far it works with the rest of the IB stack - assuming that user has built infiniband support as a module or disabled it - but that's what most people currenty have, anyway. Unfortunately sles10 patches for iser that you mention violate this rule - they patch the iscsi support that is already there as part of the kernel. So unless this can be fixed somehow, we need the iscsi stuff separate, so that 1. we know to apply it in kernel source directory, not where we unpacked IB code 2. it can be applied conditionally when the user has enabled iser, so that others still have the ability not to touch their kernel -- MST From erezz at voltaire.com Sun Sep 17 02:05:15 2006 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 17 Sep 2006 12:05:15 +0300 Subject: [openib-general] [PATCH] IB/iser: fix iSER description and selections in Kconfig In-Reply-To: References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> <450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com> Message-ID: <450D0FCB.1000401@voltaire.com> Roland Dreier wrote: > Wouldn't it better just to depend on INET the way ISCSI_TCP does? > 'select' is more fragile and harder to maintain than 'depends' since > you always have to make sure you select the full dependency tree of > every option you really need. > > - R. > There are 3 additional required config entries: NET, INET & INFINIBAND_RDMA_CM. Do you suggest to 'depned' on them or 'depned' on some of them and 'select' the rest? Also, since I'm not familiar enough with 'make rndconfig', here's a question: if iSER 'depends' on INET, is it possible that 'make rndconfig' will enable iSER without enabling INET? Thanks Erez From ogerlitz at voltaire.com Sun Sep 17 02:57:30 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 17 Sep 2006 12:57:30 +0300 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <1158263019.8759.324.camel@brick.pathscale.com> References: <1158108010.8759.192.camel@brick.pathscale.com> <45093428.5010009@voltaire.com> <1158263019.8759.324.camel@brick.pathscale.com> Message-ID: <450D1C0A.90906@voltaire.com> Ralph Campbell wrote: > Here is my thinking so far: > > The driver is passed an LKEY/RKEY plus an address. > For ib_get_dma_mr(), the address is currently from > dma_map_single(), dma_map_page(), or dma_map_sg(). > With the ib_dma_*() routines, I can intercept these calls > and return something instead of a bus or IOMMU address. > I would like to return a kernel virtual address since that > is the simplest and is what I ultimately need. This is > trivial for dma_map_single() and trivial for low memory > pages for dma_map_page(). > > I think I can safely just return error for architectures > with high memory pages since the driver really only works > on 64-bit systems (for a variety of reasons which I won't > go into) and those systems don't have high memory. Again (and please go and check me), pages you need to DMA (ie move over IB) need **not** be mapped into the kernel virtual address space and this happens **not** only under ia32 high-memory scheme, please see my other email for two examples (direct I/O etc) > ib_sg_dma_address would return the page_address() of sg->page > but wouldn't be able to rely on other fields which might be in > the struct scatterlist. your design seems to reply on three fields: page, offset and length, so ib_sg_map_sg(scat) is kmap-ping whatever pages which are not mapped now into kvirt ib_dma_unmap_sg(scat) is kunmap-ping those pages you were mapping before (you might need an aux data structure to keep which need kunmap) ib_sg_dma_address(scat) is page_address(scat->page) + scat->offset ib_sg_dma_len(scat) is scat->length Or. From ogerlitz at voltaire.com Sun Sep 17 04:52:09 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 17 Sep 2006 14:52:09 +0300 Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name In-Reply-To: References: Message-ID: <450D36E9.1000502@voltaire.com> Roland Dreier wrote: > Or> change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add > Or> help text clarifying what the thing does. Adding the help text > Or> also has the side effect of the cma config being visible when > Or> one does make menuconfig > > Why do we want to make this config option visible? Isn't it better > for it to just take the right value automatically? I want it to be visible so if some other config **depends** on it the use can **see** this config and select it. Also as of the importance of the rdma cm within the IB stack being along with the ib verbs the second access point to ULP coders, seeing its config and documenting it is important. Or. From moshek at voltaire.com Sun Sep 17 06:15:33 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Sun, 17 Sep 2006 16:15:33 +0300 Subject: [openib-general] Mstflint - not working on ppc64 and when driver is not loaded on AMD Message-ID: Michael, The attached patch was received from Frank (IBM) . Frank change the mmap in the mopen function and now it is working o.k. on my IBM JS21 ppc64 (sles9 sp3 sles10) and IBM HS21 (EM64T) sles9 sp3 all the computer uses PCI-Ex HCA cards I tested this fix on AMD computer (PCI-X) and found that it did not fix the problem initially reported by Or Gerlitz in the attached message. Also, I suspect that it doesn't work on MAC ppc64 G5 with PCI-X . (I have to repeated this test) . I'm suspect that this this is a PCI-X to PCI-EX issue . Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Moshe Kazir Sent: Thursday, September 07, 2006 12:32 PM To: 'Michael S. Tsirkin' Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org; Yiftah Shahar; Tseng-hui Lin Subject: RE: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question Let assume that the HCA has wrong FWR and/or other reason that cause driver load failure ? We have to check what's going on in this case. -> mstflint is one of our tools. Moshe. ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Wednesday, September 06, 2006 4:25 PM To: Moshe Kazir Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org; Yiftah Shahar; Tseng-hui Lin Subject: Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question Quoting r. Moshe Kazir : > Is it time to create a work arround that opens /proc/bus/pci/ .... > And always work ? But why isn't the driver loaded? -- MST -------------- next part -------------- A non-text attachment was scrubbed... Name: mstflint.patch Type: application/octet-stream Size: 8720 bytes Desc: mstflint.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mstflint.from.frank.tar.gz Type: application/x-gzip Size: 46672 bytes Desc: mstflint.from.frank.tar.gz URL: -------------- next part -------------- An embedded message was scrubbed... From: "Michael S. Tsirkin" Subject: Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question Date: Tue, 5 Sep 2006 16:36:50 +0300 Size: 5018 URL: From mst at mellanox.co.il Sun Sep 17 06:34:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Sep 2006 16:34:49 +0300 Subject: [openib-general] Mstflint - not working on ppc64 and when driver is not loaded on AMD In-Reply-To: References: Message-ID: <20060917133449.GA28318@mellanox.co.il> Quoting r. Moshe Kazir : > Subject: Mstflint - not working on ppc64 and when driver is not loaded on AMD > > > Michael, > > The attached patch was received from Frank (IBM) . Wow, that's one big patch, I can't see what it's doing at all. Can just the relevant fix be isolated? > Frank change the mmap in the mopen function and now it is working o.k. > on my IBM JS21 ppc64 (sles9 sp3 sles10) and IBM HS21 (EM64T) sles9 sp3 > all the computer uses PCI-Ex HCA cards > I tested this fix on AMD computer (PCI-X) and found that it did not fix > the problem initially reported by Or Gerlitz in the attached message. That is, if it is even relevant? > Also, I suspect that it doesn't work on MAC ppc64 G5 with PCI-X . (I > have to repeated this test) . > > I'm suspect that this this is a PCI-X to PCI-EX issue . > Hmm. What I can understand of the patch, it attempts using sysfs resource0 which is only implemented on kernels > 2.6.12 or 2.6.13, so that's probably your issue. Can you try passing the following to mstflint (my version): -d /sys/bus/pci/devices/0000\:08\:00.0/resource0 q where 0000\:08\:00.0 is the appropriate device? Does this work with driver not loaded? On which OS-es? -- MST From kliteyn at dev.mellanox.co.il Sun Sep 17 07:20:32 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 17 Sep 2006 17:20:32 +0300 Subject: [openib-general] [PATCH] osm: bug in __osmv_send_sa_req Message-ID: <1158502832.8516.9.camel@kliteynik.yok.mtl.com> Hi Hal This patch fixes a bug is __osmv_send_sa_req in libvendor. After sending a MAD, the status of the responce was ignored. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: libvendor/osm_vendor_ibumad_sa.c =================================================================== --- libvendor/osm_vendor_ibumad_sa.c (revision 9500) +++ libvendor/osm_vendor_ibumad_sa.c (working copy) @@ -606,6 +606,7 @@ __osmv_send_sa_req( "Waiting for async event\n" ); cl_event_wait_on( &p_bind->sync_event, EVENT_NO_TIMEOUT, FALSE ); cl_event_reset(&p_bind->sync_event); + status = p_madw->status; } Exit: From moshek at voltaire.com Sun Sep 17 07:41:15 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Sun, 17 Sep 2006 17:41:15 +0300 Subject: [openib-general] OpenSm on sles10 ppc64 Message-ID: /etc/init.d/opensm start produce an error on my JS21 ppc64 SLES10 OFED 1.0 . Should ppc64 SLES10 OFED 1.0 work ? Anyone tried it ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of vlad at dev.mellanox.co.il Sent: Thursday, September 14, 2006 7:39 PM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready Hi, OFED-1.1-rc5 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ File: OFED-1.1-rc5.tgz Please report any issues in bugzilla http://openib.org/bugzilla/ Release details: ================ Build_id: OFED-1.1-rc5 openib-1.1 (REV=9485) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09 # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm OS support: =========== Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up3 - Redhat EL4 up4 kernel.org: - Kernel 2.6.17 Bug fixes from OFED-1.1-rc4: ========================== 1. ISER compilation fixed on SLES10 2. Fixed build on SLES9 PPC64 3. Updated libehca 4. OpenSM fixes 5. Added tavor_quirk option to rdma_cm module (disabled by default): Tavor performance quirk: limit MTU to 1K if > 0 (int) Known issues: ============= libipathverbs compilation fails on SLES10 (Bug:204) OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday or Tuesday. Regards, Vladimir > Hi, > > The plan is to issue OFED RC5 on Thursday 9/14 and final release next > week. I am aware of the following issues: > > > 1) Compilation on SLES9 on PPC - Jack Morgenstein > 2) Huge pages on PPC - Eli Cohen > 3) libipathverbs: - Qlogic > a) libipathverbs ABI issue > b) libipathverbs build on SLES10 > 4) SDP performance on Tavor - Michael Tsirkin > 5) iSER issue on SLES10 - Voltaire > > > In order to meet tomorrow's RC5 release all owners please send your > patches by end of today. > > > Regards, > > Aviram > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From eitan at mellanox.co.il Sun Sep 17 07:52:49 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 17:52:49 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 - not for MTU Sel=3 Message-ID: <864pv6mtoe.fsf@mtl066.yok.mtl.com> Hi Hal We have reviewed the patch for the above and figured out there is an issue with it: Currently when MTU_SEL=3 the quirk applies. We think this is wrong behavior as MTU_SEL=3 means "max possible MTU" by the IBTA spec. So if an application/ULP would like to get the max MTU possible the correct answer is 2K for Tavor by the spec. So this patch fxies the quirk and when MTU_SEL=3 it does not apply the MTU limit quirk for Tavor devices. Thanks Eitan Signed-off-by: Eitan Zahavi Index: 1.1/src/userspace/management/osm/opensm/osm_sa_multipath_record.c =================================================================== --- 1.1/src/userspace/management/osm/opensm/osm_sa_multipath_record.c (revision 9500) +++ 1.1/src/userspace/management/osm/opensm/osm_sa_multipath_record.c (working copy) @@ -203,9 +203,13 @@ boolean_t break; case 1: /* must be less than */ - case 3: /* largest available */ /* can't be disqualified by this one */ break; + case 3: /* largest available */ + /* the ULP intentionally requested */ + /* the largest MTU possible */ + return(FALSE); + break; default: /* if we're here, there's a bug in ib_multipath_rec_mtu_sel() */ Index: opensm/osm_sa_path_record.c =================================================================== --- 1.1/src/userspace/management/osm/opensm/osm_sa_path_record.c (revision 9500) +++ 1.1/src/userspace/management/osm/opensm/osm_sa_path_record.c (working copy) @@ -204,9 +204,13 @@ static boolean_t break; case 1: /* must be less than */ - case 3: /* largest available */ /* can't be disqualified by this one */ break; + case 3: /* largest available */ + /* the ULP intentionally requested */ + /* the largest MTU possible */ + return(FALSE); + break; default: /* if we're here, there's a bug in ib_path_rec_mtu_sel() */ From halr at voltaire.com Sun Sep 17 07:49:39 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Sep 2006 10:49:39 -0400 Subject: [openib-general] [openfabrics-ewg] OpenSm on sles10 ppc64 In-Reply-To: References: Message-ID: <1158504477.25157.143740.camel@hal.voltaire.com> Hi Moshe, On Sun, 2006-09-17 at 10:41, Moshe Kazir wrote: > /etc/init.d/opensm start produce an error on my JS21 ppc64 SLES10 OFED > 1.0 . What error ? > Should ppc64 SLES10 OFED 1.0 work ? I don't think so. > Anyone tried it ? OFED 1.0 OpenSM release notes say: * PPC support: No PPC QA was performed. There was an issue with PPC64 that Sasha fixed post OFED 1.0. It's in OFED 1.1 and could easily be retrofitted to OFED 1.0 if needed. Contact Sasha or me if you are interested in doing this. -- Hal > > Moshe > > ____________________________________________________________ > Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) > > Voltaire - The Grid Backbone > > www.voltaire.com > > > > > -----Original Message----- > From: openfabrics-ewg-bounces at openib.org > [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of > vlad at dev.mellanox.co.il > Sent: Thursday, September 14, 2006 7:39 PM > To: openfabrics-ewg at openib.org > Cc: openib-general at openib.org > Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready > > > Hi, > > OFED-1.1-rc5 is available on > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > File: OFED-1.1-rc5.tgz > Please report any issues in bugzilla http://openib.org/bugzilla/ > > > Release details: > ================ > Build_id: > > OFED-1.1-rc5 > > openib-1.1 (REV=9485) > # User space https://openib.org/svn/gen2/branches/1.1/src/userspace > Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 > commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09 > > # MPI > mpi_osu-0.9.7-mlx2.2.0.tgz > openmpi-1.1.1-1.src.rpm > mpitests-2.0-0.src.rpm > > OS support: > =========== > Novell: > - SLES 9.0 SP3 > - SLES10 > Redhat: > - Redhat EL4 up3 > > - Redhat EL4 up4 > kernel.org: > - Kernel 2.6.17 > > > Bug fixes from OFED-1.1-rc4: > ========================== > 1. ISER compilation fixed on SLES10 > 2. Fixed build on SLES9 PPC64 > 3. Updated libehca > 4. OpenSM fixes > 5. Added tavor_quirk option to rdma_cm module (disabled by default): > Tavor performance quirk: limit MTU to 1K if > 0 (int) > > Known issues: > ============= > libipathverbs compilation fails on SLES10 (Bug:204) > > > OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday > or Tuesday. > > > Regards, > Vladimir > > > > Hi, > > > > The plan is to issue OFED RC5 on Thursday 9/14 and final release next > > week. I am aware of the following issues: > > > > > > 1) Compilation on SLES9 on PPC - Jack Morgenstein > > 2) Huge pages on PPC - Eli Cohen > > 3) libipathverbs: - Qlogic > > a) libipathverbs ABI issue > > b) libipathverbs build on SLES10 > > 4) SDP performance on Tavor - Michael Tsirkin > > 5) iSER issue on SLES10 - Voltaire > > > > > > In order to meet tomorrow's RC5 release all owners please send your > > patches by end of today. > > > > > > Regards, > > > > Aviram > > > > _______________________________________________ > > openfabrics-ewg mailing list > > openfabrics-ewg at openib.org > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From jackm at dev.mellanox.co.il Sun Sep 17 08:01:37 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 17 Sep 2006 18:01:37 +0300 Subject: [openib-general] What can be the reason for VAPI_WR_FLUSH_ERR when sending from gen2 to gen1 In-Reply-To: References: Message-ID: <200609171801.37678.jackm@dev.mellanox.co.il> On Friday 15 September 2006 12:37, Bub Thomas wrote: > I'm now in the situation that I have a gen2 client connected to a gen1 > server via CM. > Unfortunately the first IBV_WR_SEND causes a: > (syndrome=0xf9=VAPI_WR_FLUSH_ERR , opcode=6=VAPI_CQE_RQ_SEND_DATA) > error in the receive completion queue of the server. > Its not at all clear what the error could be. The Gen1 and Gen2 stacks are implemented with totally different code. Some suggestions (together with dotan at mellanox.co.il): 1. Connect a CATC/analyzer to the wire and capture the detailed traffic. Examine the CM messages exchanged to see that they are correct. 2. It sounds like the server QP is already in an error state when the first send is performed. Query the QP on the server side before performing the first server send to verify that it is in the RTS state. 3. Examine /var/log/messages on the server side to see if there were any CQ overruns (which would cause the associated QP to enter an error state). PLEASE NOTE: The opcode field is NOT valid in a completion-with-error. The only valid fields upon error completion are the status and work-request-id fields (all other completion fields are undefined). Therefore, you cannot depend on the opcode value! You need to save work request information keyed to the transaction ID to know what really happened. Another question: is the send you are talking about on the client side? Is it a regular send, or an rdma operation? - Jack From eitan at mellanox.co.il Sun Sep 17 08:57:50 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:57:50 +0300 Subject: [openib-general] [PATCH 1/13] osm: port to WinIB stack : include/opensm/osm_base.h Message-ID: <863baqmqo1.fsf@mtl066.yok.mtl.com> Hi Hal osm_base.h uses cache dir for osm-partitions.conf. Thanks Eitan Signed-off-by: Eitan Zahavi Index: include/opensm/osm_base.h =================================================================== --- include/opensm/osm_base.h (revision 9502) +++ include/opensm/osm_base.h (working copy) @@ -231,7 +231,7 @@ BEGIN_C_DECLS * SYNOPSIS */ #ifdef __WIN__ -#define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmPath(), "osm-partitions.conf") +#define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmCachePath(), "osm-partitions.conf") #else #define OSM_DEFAULT_PARTITION_CONFIG_FILE "/etc/osm-partitions.conf" #endif From eitan at mellanox.co.il Sun Sep 17 08:58:06 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:58:06 +0300 Subject: [openib-general] [PATCH 1/13] osm: port to WinIB stack : include/opensm/osm_base.h Message-ID: <861wqamqnl.fsf@mtl066.yok.mtl.com> Hi Hal osm_base.h uses cache dir for osm-partitions.conf. Thanks Eitan Signed-off-by: Eitan Zahavi Index: include/opensm/osm_base.h =================================================================== --- include/opensm/osm_base.h (revision 9502) +++ include/opensm/osm_base.h (working copy) @@ -231,7 +231,7 @@ BEGIN_C_DECLS * SYNOPSIS */ #ifdef __WIN__ -#define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmPath(), "osm-partitions.conf") +#define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmCachePath(), "osm-partitions.conf") #else #define OSM_DEFAULT_PARTITION_CONFIG_FILE "/etc/osm-partitions.conf" #endif From eitan at mellanox.co.il Sun Sep 17 08:58:33 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:58:33 +0300 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : include/opensm/osm_pkey.h Message-ID: <86zmcylc2e.fsf@mtl066.yok.mtl.com> Hi Hal Partition tables blocks are always 16 bits. This resolves the need to later cast back and forth. Thanks Eitan Signed-off-by: Eitan Zahavi Index: include/opensm/osm_pkey.h =================================================================== --- include/opensm/osm_pkey.h (revision 9502) +++ include/opensm/osm_pkey.h (working copy) @@ -143,7 +143,7 @@ typedef struct _osm_pkey_tbl typedef struct _osm_pending_pkey { cl_list_item_t list_item; uint16_t pkey; - uint32_t block; + uint16_t block; uint8_t index; boolean_t is_new; } osm_pending_pkey_t; @@ -396,7 +396,7 @@ ib_api_status_t osm_pkey_tbl_get_block_and_idx( IN osm_pkey_tbl_t *p_pkey_tbl, IN uint16_t *p_pkey, - OUT uint32_t *block_idx, + OUT uint16_t *block_idx, OUT uint8_t *pkey_index); /* * p_pkey_tbl From eitan at mellanox.co.il Sun Sep 17 08:58:51 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:58:51 +0300 Subject: [openib-general] [PATCH 3/13] osm: port to WinIB stack : include/iba/ib_types.h Message-ID: <86y7silc1w.fsf@mtl066.yok.mtl.com> Hi Hal Most are just adding OSM_API for fucntion declarations. Some minor indentations. Thanks Eitan Signed-off-by: Eitan Zahavi Index: include/iba/ib_types.h =================================================================== --- include/iba/ib_types.h (revision 9502) +++ include/iba/ib_types.h (working copy) @@ -52,6 +52,19 @@ BEGIN_C_DECLS +#if defined( WIN32 ) || defined( _WIN64 ) + #if defined( EXPORT_AL_SYMBOLS ) + #define OSM_EXPORT __declspec(dllexport) + #else + #define OSM_EXPORT __declspec(dllimport) + #endif + #define OSM_API __stdcall +#else + #define OSM_EXPORT extern + #define OSM_API + #define __ptr64 +#endif + /****h* IBA Base/Constants * NAME * Constants @@ -573,7 +586,7 @@ BEGIN_C_DECLS * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_class_is_vendor_specific_low( IN const uint8_t class_code ) { @@ -605,7 +618,7 @@ ib_class_is_vendor_specific_low( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_class_is_vendor_specific_high( IN const uint8_t class_code ) { @@ -637,7 +650,7 @@ ib_class_is_vendor_specific_high( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_class_is_vendor_specific( IN const uint8_t class_code ) { @@ -668,7 +681,7 @@ ib_class_is_vendor_specific( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_class_is_rmpp( IN const uint8_t class_code ) { @@ -1297,6 +1310,7 @@ ib_class_is_rmpp( * IB_MAD_ATTR_SLVL_RECORD * * DESCRIPTION +* VSLtoL Map Table attribute (15.2.5) * SLtoVL Mapping Table Record attribute (15.2.5) * * SOURCE @@ -1680,7 +1694,7 @@ ib_class_is_rmpp( * IB_PATH_REC_BASE_MASK * * DESCRIPTION -* Mask for the base value field for path record MTU, rate, +* Mask for the base value field for path record MTU, rate * and packet lifetime. * * SOURCE @@ -1768,7 +1782,7 @@ typedef ib_net64_t ib_gid_prefix_t; */ #define IB_LINK_NO_CHANGE 0 #define IB_LINK_DOWN 1 -#define IB_LINK_INIT 2 +#define IB_LINK_INIT 2 #define IB_LINK_ARMED 3 #define IB_LINK_ACTIVE 4 #define IB_LINK_ACT_DEFER 5 @@ -1792,7 +1806,7 @@ static const char* const __ib_node_type_ * * SYNOPSIS */ -static inline const char* +static inline const char* OSM_API ib_get_node_type_str( IN uint32_t node_type ) { @@ -1834,7 +1848,7 @@ static const char* const __ib_port_state * * SYNOPSIS */ -static inline const char* +static inline const char* OSM_API ib_get_port_state_str( IN uint8_t port_state ) { @@ -1865,7 +1879,7 @@ ib_get_port_state_str( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_get_port_state_from_str( IN char* p_port_state_str ) { @@ -1920,7 +1934,7 @@ ib_get_port_state_from_str( * * SYNOPSIS */ -static inline ib_net16_t +static inline ib_net16_t OSM_API ib_pkey_get_base( IN const ib_net16_t pkey ) { @@ -1947,7 +1961,7 @@ ib_pkey_get_base( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_pkey_is_full_member( IN const ib_net16_t pkey ) { @@ -1979,7 +1993,7 @@ ib_pkey_is_full_member( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_pkey_is_invalid( IN const ib_net16_t pkey ) { @@ -2044,7 +2058,7 @@ typedef union _ib_gid * SEE ALSO *********/ -static inline boolean_t +static inline boolean_t OSM_API ib_gid_is_multicast( IN const ib_gid_t* p_gid ) { @@ -2060,7 +2074,7 @@ ib_gid_is_multicast( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_gid_set_default( IN ib_gid_t* const p_gid, IN const ib_net64_t interface_id ) @@ -2093,7 +2107,7 @@ ib_gid_set_default( * * SYNOPSIS */ -static inline ib_net64_t +static inline ib_net64_t OSM_API ib_gid_get_subnet_prefix( IN const ib_gid_t* const p_gid ) { @@ -2122,7 +2136,7 @@ ib_gid_get_subnet_prefix( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_gid_is_link_local( IN const ib_gid_t* const p_gid ) { @@ -2152,7 +2166,7 @@ ib_gid_is_link_local( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_gid_is_site_local( IN const ib_gid_t* const p_gid ) { @@ -2182,7 +2196,7 @@ ib_gid_is_site_local( * * SYNOPSIS */ -static inline ib_net64_t +static inline ib_net64_t OSM_API ib_gid_get_guid( IN const ib_gid_t* const p_gid ) { @@ -2539,7 +2553,7 @@ typedef struct _ib_path_rec * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_path_rec_init_local( IN ib_path_rec_t* const p_rec, IN ib_gid_t* const p_dgid, @@ -2649,7 +2663,7 @@ ib_path_rec_init_local( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_num_path( IN const ib_path_rec_t* const p_rec ) { @@ -2674,11 +2688,11 @@ ib_path_rec_num_path( * ib_path_rec_sl * * DESCRIPTION -* Get service level. +* Get path service level. * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_sl( IN const ib_path_rec_t* const p_rec ) { @@ -2707,7 +2721,7 @@ ib_path_rec_sl( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_mtu( IN const ib_path_rec_t* const p_rec ) { @@ -2742,7 +2756,7 @@ ib_path_rec_mtu( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_mtu_sel( IN const ib_path_rec_t* const p_rec ) { @@ -2775,7 +2789,7 @@ ib_path_rec_mtu_sel( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_rate( IN const ib_path_rec_t* const p_rec ) { @@ -2814,7 +2828,7 @@ ib_path_rec_rate( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_rate_sel( IN const ib_path_rec_t* const p_rec ) { @@ -2847,7 +2861,7 @@ ib_path_rec_rate_sel( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_pkt_life( IN const ib_path_rec_t* const p_rec ) { @@ -2876,7 +2890,7 @@ ib_path_rec_pkt_life( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_pkt_life_sel( IN const ib_path_rec_t* const p_rec ) { @@ -2909,7 +2923,7 @@ ib_path_rec_pkt_life_sel( * * SYNOPSIS */ -static inline uint32_t +static inline uint32_t OSM_API ib_path_rec_flow_lbl( IN const ib_path_rec_t* const p_rec ) { @@ -2938,7 +2952,7 @@ ib_path_rec_flow_lbl( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_rec_hop_limit( IN const ib_path_rec_t* const p_rec ) { @@ -3141,7 +3155,7 @@ typedef struct _ib_sm_info * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_sminfo_get_priority( IN const ib_sm_info_t* const p_smi ) { @@ -3169,7 +3183,7 @@ ib_sminfo_get_priority( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_sminfo_get_state( IN const ib_sm_info_t* const p_smi ) { @@ -3287,7 +3301,7 @@ typedef struct _ib_rmpp_mad * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_mad_init_new( IN ib_mad_t* const p_mad, IN const uint8_t mgmt_class, @@ -3350,7 +3364,7 @@ ib_mad_init_new( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_mad_init_response( IN const ib_mad_t* const p_req_mad, IN ib_mad_t* const p_mad, @@ -3395,7 +3409,7 @@ ib_mad_init_response( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_mad_is_response( IN const ib_mad_t* const p_mad ) { @@ -3452,7 +3466,7 @@ ib_mad_is_response( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_rmpp_is_flag_set( IN const ib_rmpp_mad_t* const p_rmpp_mad, IN const uint8_t flag ) @@ -3477,7 +3491,7 @@ ib_rmpp_is_flag_set( * ib_mad_t, ib_rmpp_mad_t *********/ -static inline void +static inline void OSM_API ib_rmpp_set_resp_time( IN ib_rmpp_mad_t* const p_rmpp_mad, IN const uint8_t resp_time ) @@ -3487,7 +3501,7 @@ ib_rmpp_set_resp_time( } -static inline uint8_t +static inline uint8_t OSM_API ib_rmpp_get_resp_time( IN const ib_rmpp_mad_t* const p_rmpp_mad ) { @@ -3624,7 +3638,7 @@ typedef struct _ib_smp * * SYNOPSIS */ -static inline ib_net16_t +static inline ib_net16_t OSM_API ib_smp_get_status( IN const ib_smp_t* const p_smp ) { @@ -3653,7 +3667,7 @@ ib_smp_get_status( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_smp_is_response( IN const ib_smp_t* const p_smp ) { @@ -3681,7 +3695,7 @@ ib_smp_is_response( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_smp_is_d( IN const ib_smp_t* const p_smp ) { @@ -3714,7 +3728,7 @@ ib_smp_is_d( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_smp_init_new( IN ib_smp_t* const p_smp, IN const uint8_t method, @@ -3800,7 +3814,7 @@ ib_smp_init_new( * * SYNOPSIS */ -static inline void* +static inline void* OSM_API ib_smp_get_payload_ptr( IN const ib_smp_t* const p_smp ) { @@ -3894,14 +3908,14 @@ typedef struct _ib_sa_mad /**********/ #define IB_SA_MAD_HDR_SIZE (sizeof(ib_sa_mad_t) - IB_SA_DATA_SIZE) -static inline uint32_t +static inline uint32_t OSM_API ib_get_attr_size( IN const ib_net16_t attr_offset ) { return( ((uint32_t)cl_ntoh16( attr_offset )) << 3 ); } -static inline ib_net16_t +static inline ib_net16_t OSM_API ib_get_attr_offset( IN const uint32_t attr_size ) { @@ -3917,7 +3931,7 @@ ib_get_attr_offset( * * SYNOPSIS */ -static inline void* +static inline void* OSM_API ib_sa_mad_get_payload_ptr( IN const ib_sa_mad_t* const p_sa_mad ) { @@ -3954,7 +3968,7 @@ ib_sa_mad_get_payload_ptr( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_node_info_get_local_port_num( IN const ib_node_info_t* const p_ni ) { @@ -3985,7 +3999,7 @@ ib_node_info_get_local_port_num( * * SYNOPSIS */ -static inline ib_net32_t +static inline ib_net32_t OSM_API ib_node_info_get_vendor_id( IN const ib_node_info_t* const p_ni ) { @@ -4134,7 +4148,7 @@ typedef struct _ib_port_info * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_port_state( IN const ib_port_info_t* const p_pi ) { @@ -4162,7 +4176,7 @@ ib_port_info_get_port_state( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_port_state( IN ib_port_info_t* const p_pi, IN const uint8_t port_state ) @@ -4194,7 +4208,7 @@ ib_port_info_set_port_state( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_vl_cap( IN const ib_port_info_t* const p_pi) { @@ -4222,7 +4236,7 @@ ib_port_info_get_vl_cap( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_init_type( IN const ib_port_info_t* const p_pi) { @@ -4250,7 +4264,7 @@ ib_port_info_get_init_type( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_op_vls( IN const ib_port_info_t* const p_pi) { @@ -4278,7 +4292,7 @@ ib_port_info_get_op_vls( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_op_vls( IN ib_port_info_t* const p_pi, IN const uint8_t op_vls ) @@ -4310,7 +4324,7 @@ ib_port_info_set_op_vls( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_state_no_change( IN ib_port_info_t* const p_pi ) { @@ -4339,7 +4353,7 @@ ib_port_info_set_state_no_change( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_link_speed_sup( IN const ib_port_info_t* const p_pi ) { @@ -4370,7 +4384,7 @@ ib_port_info_get_link_speed_sup( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_link_speed_sup( IN uint8_t const speed, IN ib_port_info_t* p_pi ) @@ -4405,7 +4419,7 @@ ib_port_info_set_link_speed_sup( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_port_phys_state( IN const ib_port_info_t* const p_pi ) { @@ -4436,7 +4450,7 @@ ib_port_info_get_port_phys_state( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_port_phys_state( IN uint8_t const phys_state, IN ib_port_info_t* p_pi ) @@ -4471,7 +4485,7 @@ ib_port_info_set_port_phys_state( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_link_down_def_state( IN const ib_port_info_t* const p_pi ) { @@ -4499,7 +4513,7 @@ ib_port_info_get_link_down_def_state( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_link_down_def_state( IN ib_port_info_t* const p_pi, IN const uint8_t link_dwn_state ) @@ -4531,7 +4545,7 @@ ib_port_info_set_link_down_def_state( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_link_speed_active( IN const ib_port_info_t* const p_pi ) { @@ -4583,7 +4597,7 @@ ib_port_info_get_link_speed_active( * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_compute_rate( IN const ib_port_info_t* const p_pi ) { @@ -4680,7 +4694,7 @@ ib_port_info_compute_rate( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_path_get_ipd( IN uint8_t local_link_width_supported, IN uint8_t path_rec_rate ) @@ -4751,7 +4765,7 @@ ib_path_get_ipd( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_mtu_cap( IN const ib_port_info_t* const p_pi ) { @@ -4778,7 +4792,7 @@ ib_port_info_get_mtu_cap( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_neighbor_mtu( IN const ib_port_info_t* const p_pi ) { @@ -4805,7 +4819,7 @@ ib_port_info_get_neighbor_mtu( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_neighbor_mtu( IN ib_port_info_t* const p_pi, IN const uint8_t mtu ) @@ -4839,7 +4853,7 @@ ib_port_info_set_neighbor_mtu( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_master_smsl( IN const ib_port_info_t* const p_pi ) { @@ -4866,7 +4880,7 @@ ib_port_info_get_master_smsl( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_master_smsl( IN ib_port_info_t* const p_pi, IN const uint8_t smsl ) @@ -4898,7 +4912,7 @@ ib_port_info_set_master_smsl( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_timeout( IN ib_port_info_t* const p_pi, IN const uint8_t timeout ) @@ -4933,7 +4947,7 @@ ib_port_info_set_timeout( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_client_rereg( IN ib_port_info_t* const p_pi, IN const uint8_t client_rereg ) @@ -4968,7 +4982,7 @@ ib_port_info_set_client_rereg( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_timeout( IN ib_port_info_t const* p_pi ) { @@ -4996,7 +5010,7 @@ ib_port_info_get_timeout( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_client_rereg( IN ib_port_info_t const* p_pi ) { @@ -5025,7 +5039,7 @@ ib_port_info_get_client_rereg( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_hoq_lifetime( IN ib_port_info_t* const p_pi, IN const uint8_t hoq_life ) @@ -5059,7 +5073,7 @@ ib_port_info_set_hoq_lifetime( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_hoq_lifetime( IN const ib_port_info_t* const p_pi ) { @@ -5089,7 +5103,7 @@ ib_port_info_get_hoq_lifetime( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_vl_stall_count( IN ib_port_info_t* const p_pi, IN const uint8_t vl_stall_count ) @@ -5123,7 +5137,7 @@ ib_port_info_set_vl_stall_count( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_vl_stall_count( IN const ib_port_info_t* const p_pi ) { @@ -5152,7 +5166,7 @@ ib_port_info_get_vl_stall_count( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_lmc( IN const ib_port_info_t* const p_pi ) { @@ -5180,7 +5194,7 @@ ib_port_info_get_lmc( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_lmc( IN ib_port_info_t* const p_pi, IN const uint8_t lmc ) @@ -5213,7 +5227,7 @@ ib_port_info_set_lmc( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_link_speed_enabled( IN const ib_port_info_t* const p_pi ) { @@ -5240,7 +5254,7 @@ ib_port_info_get_link_speed_enabled( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_link_speed_enabled( IN ib_port_info_t* const p_pi, IN const uint8_t link_speed_enabled ) @@ -5272,7 +5286,7 @@ ib_port_info_set_link_speed_enabled( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_mpb( IN const ib_port_info_t* const p_pi ) { @@ -5301,7 +5315,7 @@ ib_port_info_get_mpb( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_mpb( IN ib_port_info_t* p_pi, IN uint8_t mpb ) @@ -5332,7 +5346,7 @@ ib_port_info_set_mpb( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_local_phy_err_thd( IN const ib_port_info_t* const p_pi ) { @@ -5359,7 +5373,7 @@ ib_port_info_get_local_phy_err_thd( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_port_info_get_overrun_err_thd( IN const ib_port_info_t* const p_pi ) { @@ -5387,7 +5401,7 @@ ib_port_info_get_overrun_err_thd( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_port_info_set_phy_and_overrun_err_thd( IN ib_port_info_t* const p_pi, IN uint8_t phy_threshold, @@ -5540,7 +5554,7 @@ typedef struct _ib_switch_info_record * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_switch_info_get_state_change( IN const ib_switch_info_t* const p_si ) { @@ -5568,7 +5582,7 @@ ib_switch_info_get_state_change( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_switch_info_clear_state_change( IN ib_switch_info_t* const p_si ) { @@ -5599,7 +5613,7 @@ ib_switch_info_clear_state_change( * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_switch_info_is_enhanced_port0( IN const ib_switch_info_t* const p_si ) { @@ -5714,7 +5728,7 @@ typedef struct _ib_multipath_rec_t * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_multipath_rec_num_path( IN const ib_multipath_rec_t* const p_rec ) { @@ -5743,7 +5757,7 @@ ib_multipath_rec_num_path( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_multipath_rec_sl( IN const ib_multipath_rec_t* const p_rec ) { @@ -5772,7 +5786,7 @@ ib_multipath_rec_sl( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_multipath_rec_mtu( IN const ib_multipath_rec_t* const p_rec ) { @@ -5807,7 +5821,7 @@ ib_multipath_rec_mtu( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_multipath_rec_mtu_sel( IN const ib_multipath_rec_t* const p_rec ) { @@ -5840,7 +5854,7 @@ ib_multipath_rec_mtu_sel( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_multipath_rec_rate( IN const ib_multipath_rec_t* const p_rec ) { @@ -5873,7 +5887,7 @@ ib_multipath_rec_rate( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_multipath_rec_rate_sel( IN const ib_multipath_rec_t* const p_rec ) { @@ -5906,7 +5920,7 @@ ib_multipath_rec_rate_sel( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_multipath_rec_pkt_life( IN const ib_multipath_rec_t* const p_rec ) { @@ -5935,7 +5949,7 @@ ib_multipath_rec_pkt_life( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_multipath_rec_pkt_life_sel( IN const ib_multipath_rec_t* const p_rec ) { @@ -6052,7 +6066,7 @@ typedef struct _ib_slvl_table_record * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_slvl_table_set( IN ib_slvl_table_t* p_slvl_tbl, IN uint8_t sl_index, @@ -6102,7 +6116,7 @@ ib_slvl_table_set( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_slvl_table_get( IN const ib_slvl_table_t* p_slvl_tbl, IN uint8_t sl_index ) @@ -6223,7 +6237,7 @@ typedef struct _ib_grh * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_grh_get_ver_class_flow( IN const ib_net32_t ver_class_flow, OUT uint8_t* const p_ver, @@ -6275,7 +6289,7 @@ ib_grh_get_ver_class_flow( * * SYNOPSIS */ -static inline ib_net32_t +static inline ib_net32_t OSM_API ib_grh_set_ver_class_flow( IN const uint8_t ver, IN const uint8_t tclass, @@ -6391,7 +6405,7 @@ typedef struct _ib_member_rec * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_member_get_sl_flow_hop( IN const ib_net32_t sl_flow_hop, OUT uint8_t* const p_sl, @@ -6442,7 +6456,7 @@ ib_member_get_sl_flow_hop( * * SYNOPSIS */ -static inline ib_net32_t +static inline ib_net32_t OSM_API ib_member_set_sl_flow_hop( IN const uint8_t sl, IN const uint32_t flow_label, @@ -6483,7 +6497,7 @@ ib_member_set_sl_flow_hop( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_member_get_scope_state( IN const uint8_t scope_state, OUT uint8_t* const p_scope, @@ -6527,7 +6541,7 @@ ib_member_get_scope_state( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_member_set_scope_state( IN const uint8_t scope, IN const uint8_t state ) @@ -6566,7 +6580,7 @@ ib_member_set_scope_state( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_member_set_join_state( IN OUT ib_member_rec_t *p_mc_rec, IN const uint8_t state ) @@ -6730,7 +6744,7 @@ typedef struct _ib_mad_notice_attr // * * SYNOPSIS */ -static inline boolean_t +static inline boolean_t OSM_API ib_notice_is_generic( IN const ib_mad_notice_attr_t *p_ntc ) { @@ -6757,7 +6771,7 @@ ib_notice_is_generic( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_notice_get_type( IN const ib_mad_notice_attr_t *p_ntc ) { @@ -6784,7 +6798,7 @@ ib_notice_get_type( * * SYNOPSIS */ -static inline ib_net32_t +static inline ib_net32_t OSM_API ib_notice_get_prod_type( IN const ib_mad_notice_attr_t *p_ntc ) { @@ -6815,7 +6829,7 @@ ib_notice_get_prod_type( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_notice_set_prod_type( IN ib_mad_notice_attr_t *p_ntc, IN ib_net32_t prod_type_val ) @@ -6848,7 +6862,7 @@ ib_notice_set_prod_type( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_notice_set_prod_type_ho( IN ib_mad_notice_attr_t *p_ntc, IN uint32_t prod_type_val_ho ) @@ -6882,7 +6896,7 @@ ib_notice_set_prod_type_ho( * * SYNOPSIS */ -static inline ib_net32_t +static inline ib_net32_t OSM_API ib_notice_get_vend_id( IN const ib_mad_notice_attr_t *p_ntc ) { @@ -6913,7 +6927,7 @@ ib_notice_get_vend_id( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_notice_set_vend_id( IN ib_mad_notice_attr_t *p_ntc, IN ib_net32_t vend_id ) @@ -6946,7 +6960,7 @@ ib_notice_set_vend_id( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_notice_set_vend_id_ho( IN ib_mad_notice_attr_t *p_ntc, IN uint32_t vend_id_ho ) @@ -6974,12 +6988,12 @@ ib_notice_set_vend_id_ho( #include typedef struct _ib_inform_info { - ib_gid_t gid; + ib_gid_t gid; ib_net16_t lid_range_begin; ib_net16_t lid_range_end; ib_net16_t reserved1; - uint8_t is_generic; - uint8_t subscribe; + uint8_t is_generic; + uint8_t subscribe; ib_net16_t trap_type; union _inform_g_or_v { @@ -7015,7 +7029,7 @@ typedef struct _ib_inform_info * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_inform_info_get_qpn_resp_time( IN const ib_net32_t qpn_resp_time_val, OUT ib_net32_t* const p_qpn, @@ -7056,7 +7070,7 @@ ib_inform_info_get_qpn_resp_time( * * SYNOPSIS */ -static inline void +static inline void OSM_API ib_inform_info_set_qpn( IN ib_inform_info_t *p_ii, IN ib_net32_t const qpn) @@ -7087,7 +7101,7 @@ ib_inform_info_set_qpn( * * SYNOPSIS */ -static inline ib_net32_t +static inline ib_net32_t OSM_API ib_inform_info_get_node_type( IN const ib_inform_info_t *p_inf) { @@ -7120,7 +7134,7 @@ ib_inform_info_get_node_type( * * SYNOPSIS */ -static inline ib_net32_t +static inline ib_net32_t OSM_API ib_inform_info_get_vend_id( IN const ib_inform_info_t *p_inf) { @@ -7271,7 +7285,7 @@ typedef struct _ib_iou_info * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_iou_info_diag_dev_id( IN const ib_iou_info_t* const p_iou_info ) { @@ -7300,7 +7314,7 @@ ib_iou_info_diag_dev_id( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ib_iou_info_option_rom( IN const ib_iou_info_t* const p_iou_info ) { @@ -7329,7 +7343,7 @@ ib_iou_info_option_rom( * * SYNOPSIS */ -static inline uint8_t +static inline uint8_t OSM_API ioc_at_slot( IN const ib_iou_info_t* const p_iou_info, IN uint8_t slot ) @@ -7476,7 +7490,7 @@ typedef struct _ib_ioc_profile *********/ -static inline uint32_t +static inline uint32_t OSM_API ib_ioc_profile_get_vend_id( IN const ib_ioc_profile_t* const p_ioc_profile ) { @@ -7484,7 +7498,7 @@ ib_ioc_profile_get_vend_id( } -static inline void +static inline void OSM_API ib_ioc_profile_set_vend_id( IN ib_ioc_profile_t* const p_ioc_profile, IN const uint32_t vend_id ) @@ -7552,7 +7566,7 @@ typedef struct _ib_svc_entries *********/ -static inline void +static inline void OSM_API ib_dm_get_slot_lo_hi( IN const ib_net32_t slot_lo_hi, OUT uint8_t *const p_slot, @@ -7580,7 +7594,7 @@ typedef struct _ib_ioc_info { ib_net64_t module_guid; ib_net64_t iou_guid; - ib_ioc_profile_t ioc_profile; + ib_ioc_profile_t ioc_profile; ib_net64_t access_key; uint16_t initiators_conf; uint8_t resv[38]; @@ -7621,8 +7635,8 @@ typedef struct _ib_ioc_info #define IB_SIDR_REQ_PDATA_SIZE_VER1 216 #define IB_SIDR_REP_PDATA_SIZE_VER1 140 -#define IB_ARI_SIZE 72 // redefine -#define IB_APR_INFO_SIZE 72 +#define IB_ARI_SIZE 72 // redefine +#define IB_APR_INFO_SIZE 72 /****d* Access Layer/ib_rej_status_t @@ -7748,17 +7762,22 @@ typedef uint16_t ib_sidr_status_t; * The following definitions are shared between the Access Layer and VPD */ -typedef struct _ib_ca *ib_ca_handle_t; -typedef struct _ib_pd *ib_pd_handle_t; -typedef struct _ib_rdd *ib_rdd_handle_t; -typedef struct _ib_mr *ib_mr_handle_t; -typedef struct _ib_mw *ib_mw_handle_t; -typedef struct _ib_qp *ib_qp_handle_t; -typedef struct _ib_eec *ib_eec_handle_t; -typedef struct _ib_cq *ib_cq_handle_t; -typedef struct _ib_av *ib_av_handle_t; -typedef struct _ib_mcast *ib_mcast_handle_t; +typedef struct _ib_ca* __ptr64 ib_ca_handle_t; +typedef struct _ib_pd* __ptr64 ib_pd_handle_t; +typedef struct _ib_rdd* __ptr64 ib_rdd_handle_t; +typedef struct _ib_mr* __ptr64 ib_mr_handle_t; +typedef struct _ib_mw* __ptr64 ib_mw_handle_t; +typedef struct _ib_qp* __ptr64 ib_qp_handle_t; +typedef struct _ib_eec* __ptr64 ib_eec_handle_t; +typedef struct _ib_cq* __ptr64 ib_cq_handle_t; +typedef struct _ib_av* __ptr64 ib_av_handle_t; +typedef struct _ib_mcast* __ptr64 ib_mcast_handle_t; + +/* Currently for windows branch we use the extended version of ib special verbs struct + in order to be compliant with Infinicon ib_types , later we'll change it to support + OpenSM ib_types.h */ +#ifndef WIN32 /****d* Access Layer/ib_api_status_t * NAME @@ -7832,7 +7851,7 @@ typedef enum _ib_api_status_t } ib_api_status_t; /*****/ -extern const char* ib_error_str[]; +OSM_EXPORT const char* ib_error_str[]; /****f* IBA Base: Types/ib_get_err_str * NAME @@ -7843,7 +7862,7 @@ extern const char* ib_error_str[]; * * SYNOPSIS */ -static inline const char* +static inline const char* OSM_API ib_get_err_str( IN ib_api_status_t status ) { @@ -8020,7 +8039,7 @@ typedef enum _ib_async_event_t * *****/ -extern const char* ib_async_event_str[]; +OSM_EXPORT const char* ib_async_event_str[]; /****f* IBA Base: Types/ib_get_async_event_str * NAME @@ -8031,7 +8050,7 @@ extern const char* ib_async_event_str[]; * * SYNOPSIS */ -static inline const char* +static inline const char* OSM_API ib_get_async_event_str( IN ib_async_event_t event ) { @@ -8311,6 +8330,7 @@ typedef struct _ib_ca_attr uint32_t vend_id; uint16_t dev_id; uint16_t revision; + uint64_t fw_ver; /* * Total size of the ca attributes in bytes @@ -8353,6 +8373,8 @@ typedef struct _ib_ca_attr uint32_t max_mcast_grps; uint32_t max_mcast_qps; uint32_t max_qps_per_mcast_grp; + uint32_t max_fmr; + uint32_t max_map_per_fmr; /* * local_ack_delay: @@ -8400,6 +8422,9 @@ typedef struct _ib_ca_attr * revision * Revision ID of this adapter * +* Fw_ver +* Device Firmware version. +* * size * Total size in bytes for the HCA attributes. This size includes total * size required for all the variable members of the structure. If a @@ -9633,7 +9658,7 @@ typedef enum _ib_wc_status_t * The completed work request was canceled by the user. *****/ -extern const char* ib_wc_status_str[]; +OSM_EXPORT const char* ib_wc_status_str[]; /****f* IBA Base: Types/ib_get_wc_status_str * NAME @@ -9644,7 +9669,7 @@ extern const char* ib_wc_status_str[]; * * SYNOPSIS */ -static inline const char* +static inline const char* OSM_API ib_get_wc_status_str( IN ib_wc_status_t wc_status ) { @@ -10300,4 +10325,9 @@ typedef struct _ib_ci_op END_C_DECLS -#endif // __IB_TYPES_H__ +#endif /* ndef WIN */ +#if defined( __WIN__ ) + #include +#endif + +#endif /* __IB_TYPES_H__ */ From eitan at mellanox.co.il Sun Sep 17 08:59:02 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:59:02 +0300 Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c Message-ID: <86wt82lc1l.fsf@mtl066.yok.mtl.com> Hi Hal Explicit cast required for the win compiler to handle this... Thanks Eitan Signed-off-by: Eitan Zahavi Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 9502) +++ osmtest/osmtest.c (working copy) @@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t else { /* Also, this doesn't detect fewer than the correct number of paths being returned */ - if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) ) + if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) ) { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_validate_path_data: ERR 0052: " From eitan at mellanox.co.il Sun Sep 17 08:59:13 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:59:13 +0300 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : opensm/osm_subnet.c Message-ID: <86venmlc1a.fsf@mtl066.yok.mtl.com> Hi Hal No need for stdio.h but do need stdlib.h ... Also map snprintf to _snprintf in windows case Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 9502) +++ opensm/osm_subnet.c (working copy) @@ -53,6 +53,7 @@ #include #include +#include #include #include #include @@ -65,7 +66,6 @@ #include #include #include -#include /********************************************************************** **********************************************************************/ @@ -659,6 +659,9 @@ __osm_subn_opts_unpack_charp( } } +#ifdef WIN32 +#define snprintf _snprintf +#endif /********************************************************************** **********************************************************************/ static void From eitan at mellanox.co.il Sun Sep 17 08:59:22 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:59:22 +0300 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : opensm/osm_prtn_config.c Message-ID: <86u036lc11.fsf@mtl066.yok.mtl.com> Hi Hal 1. Avoid varargs macros not supported by win 2. Some explicit casting required Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_prtn_config.c =================================================================== --- opensm/osm_prtn_config.c (revision 9502) +++ opensm/osm_prtn_config.c (working copy) @@ -66,17 +66,6 @@ #define STRTO_IB_NET64(str, end, base) strtoull(str, end, base) #endif -#define PARSERR(log, lnum, fmt, arg...) { \ - osm_log(log, OSM_LOG_ERROR, \ - "PARSE ERROR: line %d: " fmt , (lnum), ##arg ); \ - fprintf(stderr, \ - "\nPARSE ERROR: line %d: " fmt "\n", (lnum), ##arg ); \ -} - -#define PARSEWARN(log, lnum, fmt, arg...) \ - osm_log(log, OSM_LOG_VERBOSE, \ - "PARSE WARN: line %d: " fmt , (lnum), ##arg ) - /* */ struct part_conf { @@ -112,7 +101,7 @@ static int partition_create(unsigned lin if (id) { char *end; - pkey = strtoul(id, &end, 0); + pkey = (uint16_t)strtoul(id, &end, 0); if (end == id || *end) return -1; } else @@ -131,11 +120,11 @@ static int partition_create(unsigned lin conf->sl = OSM_DEFAULT_SL; } } - conf->p_prtn->sl = conf->sl; + conf->p_prtn->sl = (uint8_t)conf->sl; if (conf->is_ipoib) osm_prtn_add_mcgroup(conf->p_log, conf->p_subn, conf->p_prtn, - conf->is_ipoib, conf->rate, conf->mtu); + conf->is_ipoib, (uint8_t)conf->rate, (uint8_t)conf->mtu); return 0; } @@ -148,29 +137,33 @@ static int partition_add_flag(unsigned l conf->is_ipoib = 1; } else if (!strncmp(flag, "mtu", len)) { if (!val || (conf->mtu = strtoul(val, NULL, 0)) == 0) - PARSEWARN(conf->p_log, lineno, - "flag \'mtu\' requires valid value" - " - skipped.\n"); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "flag \'mtu\' requires valid value" + " - skipped.\n", lineno); } else if (!strncmp(flag, "rate", len)) { if (!val || (conf->rate = strtoul(val, NULL, 0)) == 0) - PARSEWARN(conf->p_log, lineno, - "flag \'rate\' requires valid value" - " - skipped.\n"); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "flag \'rate\' requires valid value" + " - skipped.\n", lineno); } else if (!strncmp(flag, "sl", len)) { unsigned sl; char *end; if (!val || !*val || (sl = strtoul(val, &end, 0)) > 15 || (*end && !isspace(*end))) - PARSEWARN(conf->p_log, lineno, - "flag \'sl\' requires valid value" - " - skipped.\n"); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "flag \'sl\' requires valid value" + " - skipped.\n", lineno); else conf->sl = sl; } else { - PARSEWARN(conf->p_log, lineno, - "unrecognized partition flag \'%s\'" - " - ignored.\n", flag); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "unrecognized partition flag \'%s\'" + " - ignored.\n", lineno, flag); } return 0; } @@ -189,9 +182,10 @@ static int partition_add_port(unsigned l if (!strncmp(flag, "full", strlen(flag))) full = TRUE; else if (strncmp(flag, "limited", strlen(flag))) { - PARSEWARN(conf->p_log, lineno, - "unrecognized port flag \'%s\'." - " Assume \'limited\'\n", flag); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "unrecognized port flag \'%s\'." + " Assume \'limited\'\n", lineno, flag); } } @@ -305,8 +299,9 @@ static int parse_part_conf(struct part_c q = strchr(p, ':'); if (!q) { - PARSERR(conf->p_log, lineno, - "no partition definition found\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "no partition definition found\n", lineno); return -1; } @@ -330,8 +325,9 @@ static int parse_part_conf(struct part_c *q++ = '\0'; ret = parse_name_token(p, &flag, &flval); if (!flag) { - PARSERR(conf->p_log, lineno, - "bad partition flags\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "bad partition flags\n",lineno); return -1; } p += ret; @@ -341,8 +337,9 @@ static int parse_part_conf(struct part_c if (p != str || (partition_create(lineno, conf, name, id, flag, flval) < 0)) { - PARSERR(conf->p_log, lineno, - "bad partition definition\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "bad partition definition\n", lineno); return -1; } @@ -354,8 +351,9 @@ static int parse_part_conf(struct part_c *q++ = '\0'; ret = parse_name_token(p, &name, &flag); if (partition_add_port(lineno, conf, name, flag) < 0) { - PARSERR(conf->p_log, lineno, - "bad PortGUID\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "bad PortGUID\n", lineno); return -1; } p += ret; @@ -404,8 +402,9 @@ int osm_prtn_config_parse_file(osm_log_t if (!conf && !(conf = new_part_conf(p_log, p_subn))) { - PARSERR(p_log, lineno, - "internal: cannot create config.\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "internal: cannot create config.\n", lineno); break; } From eitan at mellanox.co.il Sun Sep 17 08:59:32 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:59:32 +0300 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c Message-ID: <86sliqlc0r.fsf@mtl066.yok.mtl.com> Hi Hal 1. Avoid varargs macros not supported by win 2. Some explicit casting required 3. Use stroull and not stroll Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_ucast_file.c =================================================================== --- opensm/osm_ucast_file.c (revision 9502) +++ opensm/osm_ucast_file.c (working copy) @@ -52,18 +52,11 @@ #include #include +#include #include #include #include -#define PARSEERR(log, file_name, lineno, fmt, arg...) \ - osm_log(log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " fmt , \ - file_name, lineno, ##arg ) - -#define PARSEWARN(log, file_name, lineno, fmt, arg...) \ - osm_log(log, OSM_LOG_VERBOSE, "PARSE WARN: %s:%u: " fmt , \ - file_name, lineno, ##arg ) - static uint16_t remap_lid(osm_opensm_t *p_osm, uint16_t lid, ib_net64_t guid) { osm_port_t *p_port; @@ -72,10 +65,11 @@ static uint16_t remap_lid(osm_opensm_t * p_port = (osm_port_t *)cl_qmap_get(&p_osm->subn.port_guid_tbl, guid); if (!p_port || - p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) { + p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) + { osm_log(&p_osm->log, OSM_LOG_VERBOSE, - "remap_lid: cannot find port guid 0x%016" PRIx64 - " , will use the same lid\n", cl_ntoh64(guid)); + "remap_lid: cannot find port guid 0x%016" PRIx64 + " , will use the same lid\n", cl_ntoh64(guid)); return lid; } @@ -182,19 +176,21 @@ static int do_ucast_file_load(void *cont "skipping parsing. Using default routing algorithm\n"); } + else if (!strncmp(p, "Unicast lids", 12)) { q = strstr(p, " guid 0x"); if (!q) { - PARSEERR(&p_osm->log, file_name, lineno, - "cannot parse switch definition\n"); + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" + " cannot parse switch definition\n", + file_name, lineno); return -1; } p = q + 6; - sw_guid = strtoll(p, &q, 16); + sw_guid = strtoull(p, &q, 16); if (q && !isspace(*q)) { - PARSEERR(&p_osm->log, file_name, lineno, - "cannot parse switch guid: \'%s\'\n", - p); + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" + "cannot parse switch guid: \'%s\'\n", + file_name, lineno, p); return -1; } sw_guid = cl_hton64(sw_guid); @@ -212,40 +208,39 @@ static int do_ucast_file_load(void *cont } } else if (p_sw && !strncmp(p, "0x", 2)) { - lid = strtoul(p, &q, 16); + lid = (uint16_t)strtoul(p, &q, 16); if (q && !isspace(*q)) { - PARSEERR(&p_osm->log, file_name, lineno, - "cannot parse lid: \'%s\'\n", p); + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" + "cannot parse lid: \'%s\'\n", file_name, lineno, p); return -1; } p = q; while (isspace(*p)) p++; - port_num = strtoul(p, &q, 10); + port_num = (uint8_t)strtoul(p, &q, 10); if (q && !isspace(*q)) { - PARSEERR(&p_osm->log, file_name, lineno, - "cannot parse port: \'%s\'\n", p); + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" + "cannot parse port: \'%s\'\n", file_name, lineno, p); return -1; } p = q; /* additionally try to exract guid */ q = strstr(p, " portguid 0x"); if (!q) { - PARSEWARN(&p_osm->log, file_name, lineno, - "cannot find port guid " - "(maybe broken dump): \'%s\'\n", p); + osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u:" + "cannot find port guid " + "(maybe broken dump): \'%s\'\n", file_name, lineno, p); port_guid = 0; } else { p = q + 10; - port_guid = strtoll(p, &q, 16); + port_guid = strtoull(p, &q, 16); if (!q && !isspace(*q) && *q != ':') { - PARSEWARN(&p_osm->log, file_name, - lineno, - "cannot parse port guid " - "(maybe broken dump): " - "\'%s\'\n", p); + osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u:" + "cannot parse port guid " + "(maybe broken dump): " + "\'%s\'\n", file_name, lineno, p); port_guid = 0; } } From eitan at mellanox.co.il Sun Sep 17 08:59:40 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:59:40 +0300 Subject: [openib-general] [PATCH 8/13] osm: port to WinIB stack : opensm/osm_opensm.c Message-ID: <86r6yalc0j.fsf@mtl066.yok.mtl.com> Hi Hal Explicit NULL in empty array initializer Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_opensm.c =================================================================== --- opensm/osm_opensm.c (revision 9502) +++ opensm/osm_opensm.c (working copy) @@ -80,7 +80,7 @@ const static struct routing_engine_modul {"null", NULL}, {"updn", osm_ucast_updn_setup }, {"file", osm_ucast_file_setup }, - {} + {NULL, NULL} }; static int setup_routing_engine(osm_opensm_t *p_osm, const char *name) From eitan at mellanox.co.il Sun Sep 17 08:59:48 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 18:59:48 +0300 Subject: [openib-general] [PATCH 9/13] osm: port to WinIB stack : opensm/osm_prtn.c Message-ID: <86psdulc0b.fsf@mtl066.yok.mtl.com> Hi Hal Required cl_debug.h for PRIx64 Also map snprintf to _snprintf and stat to _stat Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_prtn.c =================================================================== --- opensm/osm_prtn.c (revision 9502) +++ opensm/osm_prtn.c (working copy) @@ -53,7 +53,7 @@ #include #include #include - +#include #include #include #include @@ -61,6 +61,10 @@ #include #include +#ifdef WIN32 +#define snprintf _snprintf +#define stat _stat +#endif extern int osm_prtn_config_parse_file(osm_log_t * const p_log, osm_subn_t * const p_subn, From eitan at mellanox.co.il Sun Sep 17 09:00:01 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 19:00:01 +0300 Subject: [openib-general] [PATCH 10/13] osm: port to WinIB stack : opensm/osm_pkey.c Message-ID: <86odtelbzy.fsf@mtl066.yok.mtl.com> Hi Hal Some explicit casting required and also pkey blocks are only uint16_t . Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_pkey.c =================================================================== --- opensm/osm_pkey.c (revision 9502) +++ opensm/osm_pkey.c (working copy) @@ -116,7 +116,7 @@ void osm_pkey_tbl_init_new_blocks( IN const osm_pkey_tbl_t *p_pkey_tbl) { ib_pkey_table_t *p_block; - int16_t b, num_blocks = cl_ptr_vector_get_size(&p_pkey_tbl->new_blocks); + size_t b, num_blocks = cl_ptr_vector_get_size(&p_pkey_tbl->new_blocks); for (b = 0; b < num_blocks; b++) if ((p_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b))) @@ -279,17 +279,17 @@ ib_api_status_t osm_pkey_tbl_get_block_and_idx( IN osm_pkey_tbl_t *p_pkey_tbl, IN uint16_t *p_pkey, - OUT uint32_t *p_block_idx, + OUT uint16_t *p_block_idx, OUT uint8_t *p_pkey_idx) { - uint32_t num_of_blocks; - uint32_t block_index; + uint16_t num_of_blocks; + uint16_t block_index; ib_pkey_table_t *block; CL_ASSERT( p_block_idx != NULL ); CL_ASSERT( p_pkey_idx != NULL ); - num_of_blocks = cl_ptr_vector_get_size( &p_pkey_tbl->blocks); + num_of_blocks = (uint16_t)cl_ptr_vector_get_size( &p_pkey_tbl->blocks); for (block_index = 0; block_index < num_of_blocks; block_index++) { block = osm_pkey_tbl_block_get(p_pkey_tbl, block_index); From eitan at mellanox.co.il Sun Sep 17 09:00:12 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 19:00:12 +0300 Subject: [openib-general] [PATCH 11/13] osm: port to WinIB stack : opensm/osm_log.c Message-ID: <86mz8ylbzn.fsf@mtl066.yok.mtl.com> Hi Hal 1. function mappings for stat, fstat and fileno 2. Currently no imp for log file truncation Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_log.c =================================================================== --- opensm/osm_log.c (revision 9502) +++ opensm/osm_log.c (working copy) @@ -60,6 +60,8 @@ #include #include +static int log_exit_count = 0; + #ifndef WIN32 #include #include @@ -79,9 +81,6 @@ static char *month_str[] = { "Nov", "Dec" }; -#endif /* ndef WIN32 */ - -static int log_exit_count = 0; static void truncate_log_file(osm_log_t* const p_log) { @@ -95,6 +94,19 @@ static void truncate_log_file(osm_log_t* p_log->count = 0; } +#else /* Windows */ + +#define fstat _fstat +#define stat _stat +#define fileno _fileno +static void truncate_log_file(osm_log_t* const p_log) +{ + fprintf(stderr, "truncate_log_file: cannot truncate on windows system (yet)\n"); +} + +#endif /* ndef WIN32 */ + + void osm_log( IN osm_log_t* const p_log, From eitan at mellanox.co.il Sun Sep 17 09:00:33 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 19:00:33 +0300 Subject: [openib-general] [PATCH 12/13] osm: port to WinIB stack : opensm/osm_qos.c Message-ID: <86lkoilbz2.fsf@mtl066.yok.mtl.com> Hi Hal Port num is uint8_t (avoid casting by using correct size field). Added some explicit casts Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_qos.c =================================================================== --- opensm/osm_qos.c (revision 9502) +++ opensm/osm_qos.c (working copy) @@ -70,7 +70,7 @@ static void qos_build_config(struct qos_ */ static ib_api_status_t vlarb_update_table_block(osm_req_t * p_req, osm_physp_t * p, - unsigned port_num, + uint8_t port_num, const ib_vl_arb_table_t *table_block, unsigned block_length, unsigned block_num) @@ -80,7 +80,7 @@ static ib_api_status_t vlarb_update_tabl uint32_t attr_mod; ib_port_info_t *p_pi; unsigned vl_mask; - int i; + unsigned int i; if (!(p_pi = osm_physp_get_port_info_ptr(p))) return IB_ERROR; @@ -110,7 +110,7 @@ static ib_api_status_t vlarb_update_tabl } static ib_api_status_t vlarb_update(osm_req_t * p_req, - osm_physp_t * p, unsigned port_num, + osm_physp_t * p, uint8_t port_num, const struct qos_config *qcfg) { ib_api_status_t status = IB_SUCCESS; @@ -198,11 +198,11 @@ static ib_api_status_t sl2vl_update_tabl } static ib_api_status_t sl2vl_update(osm_req_t * p_req, osm_port_t * p_port, - osm_physp_t * p, unsigned port_num, + osm_physp_t * p, uint8_t port_num, const struct qos_config *qcfg) { ib_api_status_t status; - unsigned i, num_ports; + uint8_t i, num_ports; ib_port_info_t *p_pi = osm_physp_get_port_info_ptr(p); osm_physp_t *p_physp; @@ -273,7 +273,7 @@ static ib_api_status_t vl_high_limit_upd static ib_api_status_t qos_physp_setup(osm_log_t * p_log, osm_req_t * p_req, osm_port_t * p_port, osm_physp_t * p, - unsigned port_num, + uint8_t port_num, const struct qos_config *qcfg) { ib_api_status_t status; @@ -329,7 +329,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t osm_physp_t *p_physp; uint8_t node_type; ib_api_status_t status; - uint32_t i; + uint8_t i; if (p_osm->subn.opt.no_qos) return OSM_SIGNAL_DONE; @@ -411,7 +411,7 @@ static int parse_vlarb_entry(char *str, p += parse_one_unsigned(p, ':', &val); e->vl = val % 15; p += parse_one_unsigned(p, ',', &val); - e->weight = val; + e->weight = (uint8_t)val; return p - str; } @@ -434,7 +434,7 @@ static void qos_build_config(struct qos_ memset(cfg, 0, sizeof(*cfg)); cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls; - cfg->vl_high_limit = opt->high_limit; + cfg->vl_high_limit = (uint8_t)opt->high_limit; p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { From eitan at mellanox.co.il Sun Sep 17 09:00:49 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 19:00:49 +0300 Subject: [openib-general] [PATCH 13/13] osm: port to WinIB stack : opensm/osm_pkey_mgr.c Message-ID: <86k642lbym.fsf@mtl066.yok.mtl.com> Hi Hal Avoid using array initialization statements which do not compile on win. Thanks Eitan Signed-off-by: Eitan Zahavi Index: opensm/osm_pkey_mgr.c =================================================================== --- opensm/osm_pkey_mgr.c (revision 9502) +++ opensm/osm_pkey_mgr.c (working copy) @@ -67,7 +67,7 @@ a different place for switch external ports (SwitchInfo) and the rest of the ports (NodeInfo). */ -static int +static uint16_t pkey_mgr_get_physp_max_blocks( IN const osm_subn_t *p_subn, IN const osm_physp_t *p_physp ) @@ -132,8 +132,8 @@ pkey_mgr_process_physical_port( CL_ASSERT( ib_pkey_get_base( *p_orig_pkey ) == ib_pkey_get_base( pkey ) ); p_pending->is_new = FALSE; if (osm_pkey_tbl_get_block_and_idx( - p_pkey_tbl, p_orig_pkey, - &p_pending->block, &p_pending->index ) != IB_SUCCESS) + p_pkey_tbl, p_orig_pkey, + &p_pending->block, &p_pending->index ) != IB_SUCCESS) { osm_log( p_log, OSM_LOG_ERROR, "pkey_mgr_process_physical_port: ERR 0503: " @@ -276,7 +276,8 @@ static boolean_t pkey_mgr_update_port( boolean_t ret_val = FALSE; osm_pending_pkey_t *p_pending; boolean_t found; - ib_pkey_table_t empty_block = {.pkey_entry = {0}, }; + ib_pkey_table_t empty_block; + memset(&empty_block, 0, sizeof(ib_pkey_table_t)); p_physp = osm_port_get_default_phys_ptr( p_port ); if ( !osm_physp_is_valid( p_physp ) ) @@ -403,7 +404,8 @@ pkey_mgr_update_peer_port( uint16_t peer_max_blocks; ib_api_status_t status = IB_SUCCESS; boolean_t ret_val = FALSE; - ib_pkey_table_t empty_block = {.pkey_entry = {0}, }; + ib_pkey_table_t empty_block; + memset(&empty_block, 0, sizeof(ib_pkey_table_t)); p_physp = osm_port_get_default_phys_ptr( p_port ); if (!osm_physp_is_valid( p_physp )) From eitan at mellanox.co.il Sun Sep 17 09:22:33 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 17 Sep 2006 19:22:33 +0300 Subject: [openib-general] [PATCH 0/13] osm: port to WinIB stack Message-ID: <863baq5upi.fsf@mtl066.yok.mtl.com> Hi Hal The following series of 13 patches are required for porting of the trunk OpenSM code (based on 9502) to WinIB. I have intentionally break the patch by file to ease the review. Most changes are: 1. casting from some int into exact uintXX_t 2. Avoiding macros with varargs (windows do not support that) 3. Mapping snprintf to _snprintf, stat to _stat, etc 4. Missing include for cl_debug required for PRIx64 def 5. The osm_log changes for truncating are not supported yet 6. ib_types: add a macro for OSM_API required for windows to declare the API as __cdecl These patches are for the trunk only. Thanks Eitan From mst at mellanox.co.il Sun Sep 17 10:30:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Sep 2006 20:30:28 +0300 Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c In-Reply-To: <86wt82lc1l.fsf@mtl066.yok.mtl.com> References: <86wt82lc1l.fsf@mtl066.yok.mtl.com> Message-ID: <20060917173028.GA32526@mellanox.co.il> Quoting r. Eitan Zahavi : > Subject: [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c > > Hi Hal > > Explicit cast required for the win compiler to handle this... > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi > > Index: osmtest/osmtest.c > =================================================================== > --- osmtest/osmtest.c (revision 9502) > +++ osmtest/osmtest.c (working copy) > @@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t > else > { > /* Also, this doesn't detect fewer than the correct number of paths being returned */ > - if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) ) > + if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) ) > { > osm_log( &p_osmt->log, OSM_LOG_ERROR, > "osmtest_validate_path_data: ERR 0052: " > Integer casts are ugly, and can mask real errors. All you want is for the math result to be unsigned, so 1u << (2*lmc) would be cleaner with the same effect, I think. -- MST From mst at mellanox.co.il Sun Sep 17 10:34:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Sep 2006 20:34:08 +0300 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : opensm/osm_prtn_config.c In-Reply-To: <86u036lc11.fsf@mtl066.yok.mtl.com> References: <86u036lc11.fsf@mtl066.yok.mtl.com> Message-ID: <20060917173408.GB32526@mellanox.co.il> Quoting r. Eitan Zahavi : > @@ -112,7 +101,7 @@ static int partition_create(unsigned lin > > if (id) { > char *end; > - pkey = strtoul(id, &end, 0); > + pkey = (uint16_t)strtoul(id, &end, 0); > if (end == id || *end) > return -1; > } else would it make sense to range-check the value before casting it? -- MST From mst at mellanox.co.il Sun Sep 17 10:35:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Sep 2006 20:35:18 +0300 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <86sliqlc0r.fsf@mtl066.yok.mtl.com> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> Message-ID: <20060917173518.GC32526@mellanox.co.il> Quoting r. Eitan Zahavi : > p++; > - port_num = strtoul(p, &q, 10); > + port_num = (uint8_t)strtoul(p, &q, 10); > if (q && !isspace(*q)) { Would it make sense to range-check the value before casting it away? -- MST From eitan at mellanox.co.il Sun Sep 17 11:50:21 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 17 Sep 2006 21:50:21 +0300 Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c In-Reply-To: <20060917173028.GA32526@mellanox.co.il> References: <86wt82lc1l.fsf@mtl066.yok.mtl.com> <20060917173028.GA32526@mellanox.co.il> Message-ID: <450D98ED.3000707@mellanox.co.il> Hi Michael, In general I agree we could make the code a little more safe by checking castings. But in many of the cases (not the ones with user input - strtoul/strtoull) it is not required as the values are limited by the IB arch. Anyway, the patch I am sending is for WinIB migration. Just doing the explicit cast does not make things any worst. We could take the task of cleaning these integer casts (like I did in osm_pkey.c/h) but this is another patch. EZ Michael S. Tsirkin wrote: >Quoting r. Eitan Zahavi : > > >>Subject: [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c >> >>Hi Hal >> >>Explicit cast required for the win compiler to handle this... >> >>Thanks >> >>Eitan >> >>Signed-off-by: Eitan Zahavi >> >>Index: osmtest/osmtest.c >>=================================================================== >>--- osmtest/osmtest.c (revision 9502) >>+++ osmtest/osmtest.c (working copy) >>@@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t >> else >> { >> /* Also, this doesn't detect fewer than the correct number of paths being returned */ >>- if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) ) >>+ if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) ) >> { >> osm_log( &p_osmt->log, OSM_LOG_ERROR, >> "osmtest_validate_path_data: ERR 0052: " >> >> >> > >Integer casts are ugly, and can mask real errors. >All you want is for the math result to be unsigned, so > 1u << (2*lmc) >would be cleaner with the same effect, I think. > > > From mst at mellanox.co.il Sun Sep 17 11:55:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Sep 2006 21:55:43 +0300 Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c In-Reply-To: <450D98ED.3000707@mellanox.co.il> References: <450D98ED.3000707@mellanox.co.il> Message-ID: <20060917185543.GD32526@mellanox.co.il> Quoting r. Eitan Zahavi : > Subject: Re: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c > > Hi Michael, > > In general I agree we could make the code a little more safe by checking > castings. > But in many of the cases (not the ones with user input - > strtoul/strtoull) it is not required as the values are limited by the IB > arch. > > Anyway, the patch I am sending is for WinIB migration. Just doing the > explicit cast does not make things any worst. > We could take the task of cleaning these integer casts (like I did in > osm_pkey.c/h) but this is another patch. > > EZ I agree with that. My point was VC++ was catching some potential errors here so need to be careful not to through that away. -- MST From bgreen at nas.nasa.gov Sun Sep 17 12:59:42 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Sun, 17 Sep 2006 12:59:42 -0700 Subject: [openib-general] patch trouble In-Reply-To: Your message of "Sat, 16 Sep 2006 20:56:28 +0300." <20060916175628.GB22267@mellanox.co.il> Message-ID: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov> "Michael S. Tsirkin" writes: > Quoting r. Bryan Green : > > Subject: patch trouble > > > > Hello, > > Many of the patches in subversion fail to have an effect when I apply them to a kernel, > > because they create headers in 'drivers/infiniband/include' which depend on being inclu > ded > > before the like-named headers in the toplevel 'include'. Is there a step I am missing > to > > make the headers in 'drivers/infiniband/include' get chosen for inclusion first? > > Note that backport patches are intended to be applied in an out-of-kernel > fashion - they are not changing the kernel at all. > > So you build as an out-of-tree driver, and dd something like this to make > command line: > > LINUXINCLUDE='-I$(CWD)/include \ > -I$(CWD)/drivers/infiniband/include \ > -Iinclude \ > $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include) \ > -include include/linux/autoconf.h \ > -include $(CWD)/include/linux/autoconf.h \ > ' \ > > You can find an example here > https://openib.org/svn/gen2/trunk/ofed/openib/scripts/Makefile > > BTW, Mellanox is not actively supporting backport patches on the svn trunk. > If you want code that works on something other than 2.6.17, > I suggest you pull backports for the ofed branch (forked from > 2.6.18-rc6) from ofed_1_1 tree by pulling > git://www.mellanox.co.il/~git/infiniband ofed_1_1 > and looking in ofed_scripts directory. > Thanks. I am looking at the git repository, and I see a number of patches in 'kernel_patches/fixes' which are apparently applied before the kernel patches under 'kernel_patches/backport'. I also see the discrepancies between the patches in git and svn. I am currenly putting together a gentoo overlay (a series of gentoo installation scripts) for openib. Since there are no source tar files available for download, I am downloading the code from subversion - I have already done this for the 1.0 subversion branch, and mvapich2 from the 1.1 branch. My interest in the 2.6.12 kernel comes from a need to evaluate the lustre filesystem (production version), which has support for the 2.6.12 vanilla kernel. Is there a great discrepancy between the git repository and the svn repository? If I am downloading the kernel modules from subversion, should I still use the patchset from the git repository? What about putting a source tar file for openib up for download? There is currently only a source tarball for libibverbs, while ofed is too RPM-centric. Thanks, -bryan From mst at mellanox.co.il Sun Sep 17 13:31:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Sep 2006 23:31:53 +0300 Subject: [openib-general] patch trouble In-Reply-To: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov> References: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov> Message-ID: <20060917203153.GG32526@mellanox.co.il> Quoting r. Bryan Green : > Is there a great discrepancy between the git repository and the svn > repository? If I am downloading the kernel modules from subversion, should I > still use the patchset from the git repository? *Please* do not use svn trunk code for production. You want either kernel.org code or the OFED git repository for everything. kernel code in subversion is being deprecated. > What about putting a source > tar file for openib up for download? Putting anything up for dowload on openib site is very hard - we mostly stick binary files in svn. > There is currently only a source tarball > for libibverbs, while ofed is too RPM-centric. Not really. Please try the following: Get the ofed tarball here https://openib.org/svn/gen2/branches/1.1/ofed/releases/ and unpack it. Take this file: SOURCES/openib-1.1.tgz That's all of subversion + git all nicely packed up. You can run configure and make there and it mostly works as expected. There's also install.sh script that wraps these two and also adds some convenient softlinks and such goodies. Let me know how it goes. BTW, which distro are you using? -- MST From mst at mellanox.co.il Sun Sep 17 13:35:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Sep 2006 23:35:58 +0300 Subject: [openib-general] patch trouble In-Reply-To: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov> References: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov> Message-ID: <20060917203558.GH32526@mellanox.co.il> Quoting r. Bryan Green : > Subject: Re: patch trouble > > "Michael S. Tsirkin" writes: > > Quoting r. Bryan Green : > > > Subject: patch trouble > > > > > > Hello, > > > Many of the patches in subversion fail to have an effect when I apply them to a kernel, > > > because they create headers in 'drivers/infiniband/include' which depend on being inclu > > ded > > > before the like-named headers in the toplevel 'include'. Is there a step I am missing > > to > > > make the headers in 'drivers/infiniband/include' get chosen for inclusion first? > > > > Note that backport patches are intended to be applied in an out-of-kernel > > fashion - they are not changing the kernel at all. > > > > So you build as an out-of-tree driver, and dd something like this to make > > command line: > > > > LINUXINCLUDE='-I$(CWD)/include \ > > -I$(CWD)/drivers/infiniband/include \ > > -Iinclude \ > > $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include) \ > > -include include/linux/autoconf.h \ > > -include $(CWD)/include/linux/autoconf.h \ > > ' \ > > > > You can find an example here > > https://openib.org/svn/gen2/trunk/ofed/openib/scripts/Makefile > > > > BTW, Mellanox is not actively supporting backport patches on the svn trunk. > > If you want code that works on something other than 2.6.17, > > I suggest you pull backports for the ofed branch (forked from > > 2.6.18-rc6) from ofed_1_1 tree by pulling > > git://www.mellanox.co.il/~git/infiniband ofed_1_1 > > and looking in ofed_scripts directory. > > > > Thanks. I am looking at the git repository, and I see a number of patches in > 'kernel_patches/fixes' which are apparently applied before the kernel patches under > 'kernel_patches/backport'. Right. These are things that will be going into 2.6.19 but that we decided should be in OFED. > I also see the discrepancies between the patches in git and > svn. kernel code in svn trunk is deprecated - kernel code needs to sync with linus and doing that from svn adds too much overhead. In particular we stopped updating the backport patches for svn. -- MST From moshek at voltaire.com Sun Sep 17 23:09:07 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Mon, 18 Sep 2006 09:09:07 +0300 Subject: [openib-general] Any chance to get 32-Bit libraries on SLES9 x86_64? Message-ID: I had the other problem (trying to find the 64-bit rpm) In sles9 sysfsutils is part of the udev rpm. Therefore I think that you may try udev...rpm for sysfsutils 32-bit version and udev-64bit...rpm for sysfsutils 64-bit version after install the 32 bit libraries are located on /usr/lib and the 64 bit libraries are located under /usr/lib64 Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Bub Thomas Sent: Friday, September 15, 2006 9:24 AM To: openib-general at openib.org; Bub Thomas Subject: [openib-general] Any chance to get 32-Bit libraries on SLES9 x86_64? Is there any chance/trick to get 32-Bit Libraries build and usable on SLES9 x86_64? When I installed OFED-1.1-rc4 I get: WARNING: sysfsutils 32-bit version is required to build 32-bit libibverbs package. WARNING: Skiping build of 32-bit libraries. I googled around and didn't find any sysfsutils 32-bit for SLES9. I now that tit is working under SLES10 b ut our customer base is on SLES9 and very conservative when it comes down to using the latest and greates Os/distribution. Thomas ............................................................ Thomas Bub Grass Valley Germany GmbH Brunnenweg 9 64331 Weiterstadt, Germany Tel: +49 6150 104 147 Fax: +49 6150 104 656 Email: Thomas.Bub at thomson.net www.GrassValley.com ............................................................ -------------- next part -------------- An HTML attachment was scrubbed... URL: From moshek at voltaire.com Mon Sep 18 00:13:40 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Mon, 18 Sep 2006 10:13:40 +0300 Subject: [openib-general] [openfabrics-ewg] OpenSm on sles10 ppc64 OFED 1.0 - bug ? Message-ID: See attached file. Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Sunday, September 17, 2006 5:50 PM To: Moshe Kazir Cc: openib-general at openib.org; OpenFabricsEWG; Sasha Khapyorsky Subject: Re: [openfabrics-ewg] OpenSm on sles10 ppc64 Hi Moshe, On Sun, 2006-09-17 at 10:41, Moshe Kazir wrote: > /etc/init.d/opensm start produce an error on my JS21 ppc64 SLES10 > OFED 1.0 . What error ? > Should ppc64 SLES10 OFED 1.0 work ? I don't think so. > Anyone tried it ? OFED 1.0 OpenSM release notes say: * PPC support: No PPC QA was performed. There was an issue with PPC64 that Sasha fixed post OFED 1.0. It's in OFED 1.1 and could easily be retrofitted to OFED 1.0 if needed. Contact Sasha or me if you are interested in doing this. -- Hal > > Moshe > > ____________________________________________________________ > Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) > > Voltaire - The Grid Backbone > > www.voltaire.com > > > > > -----Original Message----- > From: openfabrics-ewg-bounces at openib.org > [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of > vlad at dev.mellanox.co.il > Sent: Thursday, September 14, 2006 7:39 PM > To: openfabrics-ewg at openib.org > Cc: openib-general at openib.org > Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready > > > Hi, > > OFED-1.1-rc5 is available on > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > File: OFED-1.1-rc5.tgz > Please report any issues in bugzilla http://openib.org/bugzilla/ > > > Release details: > ================ > Build_id: > > OFED-1.1-rc5 > > openib-1.1 (REV=9485) > # User space https://openib.org/svn/gen2/branches/1.1/src/userspace > Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 > commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09 > > # MPI > mpi_osu-0.9.7-mlx2.2.0.tgz > openmpi-1.1.1-1.src.rpm > mpitests-2.0-0.src.rpm > > OS support: > =========== > Novell: > - SLES 9.0 SP3 > - SLES10 > Redhat: > - Redhat EL4 up3 > > - Redhat EL4 up4 > kernel.org: > - Kernel 2.6.17 > > > Bug fixes from OFED-1.1-rc4: > ========================== > 1. ISER compilation fixed on SLES10 > 2. Fixed build on SLES9 PPC64 > 3. Updated libehca > 4. OpenSM fixes > 5. Added tavor_quirk option to rdma_cm module (disabled by default): > Tavor performance quirk: limit MTU to 1K if > 0 (int) > > Known issues: > ============= > libipathverbs compilation fails on SLES10 (Bug:204) > > > OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday > or Tuesday. > > > Regards, > Vladimir > > > > Hi, > > > > The plan is to issue OFED RC5 on Thursday 9/14 and final release > > next > > week. I am aware of the following issues: > > > > > > 1) Compilation on SLES9 on PPC - Jack Morgenstein > > 2) Huge pages on PPC - Eli Cohen > > 3) libipathverbs: - Qlogic > > a) libipathverbs ABI issue > > b) libipathverbs build on SLES10 > > 4) SDP performance on Tavor - Michael Tsirkin > > 5) iSER issue on SLES10 - Voltaire > > > > > > In order to meet tomorrow's RC5 release all owners please send your > > patches by end of today. > > > > > > Regards, > > > > Aviram > > > > _______________________________________________ > > openfabrics-ewg mailing list > > openfabrics-ewg at openib.org > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: opensm.error.txt URL: From krkumar2 at in.ibm.com Mon Sep 18 00:35:45 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 18 Sep 2006 13:05:45 +0530 Subject: [openib-general] [PATCH] Fix freed mem deref race in cma_process_remove/cma_req_handler Message-ID: <20060918073545.26067.41763.sendpatchset@localhost.localdomain> The race is as follows : A process : cma_process_remove() calls cma_remove_id_dev(), which sets id state to CMA_DEVICE_REMOVAL and calls wait_event(dev_remove). B process : cma_req_handler() had incremented dev_remove, and calls cma_acquire_ib_dev() and on failure calls cma_release_remove(), which does a wake_up of cma_process_remove(). Then cma_req_handler() calls rdma_destroy_id(); A Process : cma_remove_id_dev() gets woken and checks the state of id, and since it is still (wrongly) CMA_DEVICE_REMOVAL, it calls notify_user(id) and if that fails, the caller - cma_process_remove() calls rdma_destroy_id(id). Two processes can call rdma_destroy_id(), resulting in one de-referencing kfreed id_priv. Fix is for process B to set CMA_DESTROYING in cma_req_handler() so that process A will return instead of doing a rdma_destroy_id(). Signed-off-by: Krishna Kumar diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-14 15:41:01.000000000 +0530 +++ new/core/cma.c 2006-09-18 11:52:52.000000000 +0530 @@ -1023,6 +1023,7 @@ static int cma_req_handler(struct ib_cm_ mutex_unlock(&lock); if (ret) { ret = -ENODEV; + cma_exch(conn_id, CMA_DESTROYING); cma_release_remove(conn_id); rdma_destroy_id(&conn_id->id); goto out; From halr at voltaire.com Mon Sep 18 01:53:28 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 04:53:28 -0400 Subject: [openib-general] [PATCH] osm: bug in __osmv_send_sa_req In-Reply-To: <1158502832.8516.9.camel@kliteynik.yok.mtl.com> References: <1158502832.8516.9.camel@kliteynik.yok.mtl.com> Message-ID: <1158569544.25157.180348.camel@hal.voltaire.com> Hi Yevgeny, On Sun, 2006-09-17 at 10:20, Yevgeny Kliteynik wrote: > Hi Hal > > This patch fixes a bug is __osmv_send_sa_req in libvendor. > After sending a MAD, the status of the responce was ignored. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied to trunk and 1.1. -- Hal From ogerlitz at voltaire.com Mon Sep 18 01:56:31 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 18 Sep 2006 11:56:31 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <86y7sle4kg.fsf@mtl066.yok.mtl.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> Message-ID: <450E5F3F.2090203@voltaire.com> Eitan Zahavi wrote: > The following patch solves an issue with OpenSM preferring largest MTU > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) > devices instead of using a 1K MTU which is best for this device. Eitan, Isn't the 2K MTU issue with Tavor comes into play only under RC QP? more over, doing TAVOR/UD/2K MTU is very common, eg IPoIB. So does your patch relies on a somehow completing quirk in the host side for UD based ULPs to add some mtu selector which will prevent the SM side quirk to take action? Or. From halr at voltaire.com Mon Sep 18 02:16:17 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 05:16:17 -0400 Subject: [openib-general] [openfabrics-ewg] OpenSm on sles10 ppc64 OFED 1.0 - bug ? In-Reply-To: References: Message-ID: <1158570905.25157.180934.camel@hal.voltaire.com> On Mon, 2006-09-18 at 03:13, Moshe Kazir wrote: > See attached file. That was the problem that Sasha found and fixed. -- Hal > Moshe > > ____________________________________________________________ > Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) > > Voltaire - The Grid Backbone > > www.voltaire.com > > > > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Sunday, September 17, 2006 5:50 PM > To: Moshe Kazir > Cc: openib-general at openib.org; OpenFabricsEWG; Sasha Khapyorsky > Subject: Re: [openfabrics-ewg] OpenSm on sles10 ppc64 > > > Hi Moshe, > > On Sun, 2006-09-17 at 10:41, Moshe Kazir wrote: > > /etc/init.d/opensm start produce an error on my JS21 ppc64 SLES10 > > OFED 1.0 . > > What error ? > > > Should ppc64 SLES10 OFED 1.0 work ? > > I don't think so. > > > Anyone tried it ? > > OFED 1.0 OpenSM release notes say: > * PPC support: > No PPC QA was performed. > > There was an issue with PPC64 that Sasha fixed post OFED 1.0. It's in > OFED 1.1 and could easily be retrofitted to OFED 1.0 if needed. Contact > Sasha or me if you are interested in doing this. > > -- Hal > > > > > Moshe > > > > ____________________________________________________________ > > Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) > > > > Voltaire - The Grid Backbone > > > > www.voltaire.com > > > > > > > > > > -----Original Message----- > > From: openfabrics-ewg-bounces at openib.org > > [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of > > vlad at dev.mellanox.co.il > > Sent: Thursday, September 14, 2006 7:39 PM > > To: openfabrics-ewg at openib.org > > Cc: openib-general at openib.org > > Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready > > > > > > Hi, > > > > OFED-1.1-rc5 is available on > > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > > File: OFED-1.1-rc5.tgz > > Please report any issues in bugzilla http://openib.org/bugzilla/ > > > > > > Release details: > > ================ > > Build_id: > > > > OFED-1.1-rc5 > > > > openib-1.1 (REV=9485) > > # User space https://openib.org/svn/gen2/branches/1.1/src/userspace > > Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 > > commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09 > > > > # MPI > > mpi_osu-0.9.7-mlx2.2.0.tgz > > openmpi-1.1.1-1.src.rpm > > mpitests-2.0-0.src.rpm > > > > OS support: > > =========== > > Novell: > > - SLES 9.0 SP3 > > - SLES10 > > Redhat: > > - Redhat EL4 up3 > > > > - Redhat EL4 up4 > > kernel.org: > > - Kernel 2.6.17 > > > > > > Bug fixes from OFED-1.1-rc4: > > ========================== > > 1. ISER compilation fixed on SLES10 > > 2. Fixed build on SLES9 PPC64 > > 3. Updated libehca > > 4. OpenSM fixes > > 5. Added tavor_quirk option to rdma_cm module (disabled by default): > > Tavor performance quirk: limit MTU to 1K if > 0 (int) > > > > Known issues: > > ============= > > libipathverbs compilation fails on SLES10 (Bug:204) > > > > > > OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday > > > or Tuesday. > > > > > > Regards, > > Vladimir > > > > > > > Hi, > > > > > > The plan is to issue OFED RC5 on Thursday 9/14 and final release > > > next > > > week. I am aware of the following issues: > > > > > > > > > 1) Compilation on SLES9 on PPC - Jack Morgenstein > > > 2) Huge pages on PPC - Eli Cohen > > > 3) libipathverbs: - Qlogic > > > a) libipathverbs ABI issue > > > b) libipathverbs build on SLES10 > > > 4) SDP performance on Tavor - Michael Tsirkin > > > 5) iSER issue on SLES10 - Voltaire > > > > > > > > > In order to meet tomorrow's RC5 release all owners please send your > > > patches by end of today. > > > > > > > > > Regards, > > > > > > Aviram > > > > > > _______________________________________________ > > > openfabrics-ewg mailing list > > > openfabrics-ewg at openib.org > > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > > > > > > > > > _______________________________________________ > > openfabrics-ewg mailing list > > openfabrics-ewg at openib.org > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > > > > _______________________________________________ > > openfabrics-ewg mailing list > > openfabrics-ewg at openib.org > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > From mst at mellanox.co.il Mon Sep 18 02:35:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 12:35:06 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <450E5F3F.2090203@voltaire.com> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> <450E5F3F.2090203@voltaire.com> Message-ID: <20060918093506.GC29055@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices > > Eitan Zahavi wrote: > > The following patch solves an issue with OpenSM preferring largest MTU > > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) > > devices instead of using a 1K MTU which is best for this device. > > Eitan, > > Isn't the 2K MTU issue with Tavor comes into play only under RC QP? I don't think so, no. Tavor supports 2K MTU, but it has better performance with 1K MTU than 2K MTU. QP type should not matter. > more over, doing TAVOR/UD/2K MTU is very common, eg IPoIB. Correct. And it works with existing SMs. But ULPs that have specific MTU requirements must set MTU selector accordingly, otherwise SM is free to select any MTU. > So does your patch relies on a somehow completing quirk in the host side > for UD based ULPs to add some mtu selector which will prevent the SM > side quirk to take action? It's more a bugfix than a quirk. IPoIB currently has specific MTU requirements but does not set MTU selector at all, relying on specific SM behaviour. I consider it a bug in IPoIB and am testing a patch fixing this. -- MST From ogerlitz at voltaire.com Mon Sep 18 02:45:22 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 18 Sep 2006 12:45:22 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <20060918093506.GC29055@mellanox.co.il> References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> <450E5F3F.2090203@voltaire.com> <20060918093506.GC29055@mellanox.co.il> Message-ID: <450E6AB2.70505@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> Eitan Zahavi wrote: >>> The following patch solves an issue with OpenSM preferring largest MTU >>> for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) >>> devices instead of using a 1K MTU which is best for this device. >> Isn't the 2K MTU issue with Tavor comes into play only under RC QP? > I don't think so, no. Tavor supports 2K MTU, but it has better performance with > 1K MTU than 2K MTU. QP type should not matter. Can you double check that please, as far as i know there is something like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW with Tavor/UD/2048 is **no less** then Tavor/UD/1024. So its very common for IPoIB net devices impl. to expose 2044 or 1500 bytes MTU to the OS eg to cope with Ethernet and reduce IP fragmentation/reassembly of UDP/TCP traffic. Or. From halr at voltaire.com Mon Sep 18 02:53:56 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 05:53:56 -0400 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 - not for MTU Sel=3 In-Reply-To: <864pv6mtoe.fsf@mtl066.yok.mtl.com> References: <864pv6mtoe.fsf@mtl066.yok.mtl.com> Message-ID: <1158573172.25157.182192.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-09-17 at 10:52, Eitan Zahavi wrote: > Hi Hal > > We have reviewed the patch for the above and figured out there is an > issue with it: > Currently when MTU_SEL=3 the quirk applies. > We think this is wrong behavior as MTU_SEL=3 means "max possible MTU" by > the IBTA spec. So if an application/ULP would like to get the max MTU possible > the correct answer is 2K for Tavor by the spec. > So this patch fxies the quirk and when MTU_SEL=3 it does not apply the MTU > limit quirk for Tavor devices. Good catch. So compliancy over performance is preferred for this case. Thanks. Applied to both trunk and 1.1 -- Hal From mst at mellanox.co.il Mon Sep 18 02:54:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 12:54:23 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <450E6AB2.70505@voltaire.com> References: <450E6AB2.70505@voltaire.com> Message-ID: <20060918095423.GG29055@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz : > > >> Eitan Zahavi wrote: > >>> The following patch solves an issue with OpenSM preferring largest MTU > >>> for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor) > >>> devices instead of using a 1K MTU which is best for this device. > > >> Isn't the 2K MTU issue with Tavor comes into play only under RC QP? > > > I don't think so, no. Tavor supports 2K MTU, but it has better performance with > > 1K MTU than 2K MTU. QP type should not matter. > > Can you double check that please, as far as i know there is something > like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW with > Tavor/UD/2048 is **no less** then Tavor/UD/1024. The property of Tavor to work better with 1K MTU is not transport-specific. But, BW depends on the ULP. I guess UD top BW is simply lower (smaller messages) so you do not see the drop there. This just means ULP should use MTU selector to give SM hints about the MTU it wants. If it wants the highest MTU available it should set the selector to 3, not wildcard it. > So its very common for IPoIB net devices impl. to expose 2044 or 1500 > bytes MTU to the OS eg to cope with Ethernet and reduce IP > fragmentation/reassembly of UDP/TCP traffic. I expect IPoIB to get better performance with higher MTU - TCP fragmentation likely has bigger effect than hardware speed quirks. But this is just another reason to set the mtu selector in IPoIB appropriately. -- MST From halr at voltaire.com Mon Sep 18 03:44:22 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 06:44:22 -0400 Subject: [openib-general] [PATCH 1/13] osm: port to WinIB stack : include/opensm/osm_base.h In-Reply-To: <861wqamqnl.fsf@mtl066.yok.mtl.com> References: <861wqamqnl.fsf@mtl066.yok.mtl.com> Message-ID: <1158576183.18842.636.camel@hal.voltaire.com> On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote: > Hi Hal > > osm_base.h uses cache dir for osm-partitions.conf. > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied to trunk only. -- Hal From halr at voltaire.com Mon Sep 18 03:53:51 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 06:53:51 -0400 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : include/opensm/osm_pkey.h In-Reply-To: <86zmcylc2e.fsf@mtl066.yok.mtl.com> References: <86zmcylc2e.fsf@mtl066.yok.mtl.com> Message-ID: <1158576813.18842.935.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote: > Hi Hal > > Partition tables blocks are always 16 bits. > This resolves the need to later cast back and forth. > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi > > Index: include/opensm/osm_pkey.h > =================================================================== > --- include/opensm/osm_pkey.h (revision 9502) > +++ include/opensm/osm_pkey.h (working copy) > @@ -143,7 +143,7 @@ typedef struct _osm_pkey_tbl > typedef struct _osm_pending_pkey { > cl_list_item_t list_item; > uint16_t pkey; > - uint32_t block; > + uint16_t block; > uint8_t index; > boolean_t is_new; > } osm_pending_pkey_t; > @@ -396,7 +396,7 @@ ib_api_status_t > osm_pkey_tbl_get_block_and_idx( > IN osm_pkey_tbl_t *p_pkey_tbl, > IN uint16_t *p_pkey, > - OUT uint32_t *block_idx, > + OUT uint16_t *block_idx, > OUT uint8_t *pkey_index); > /* > * p_pkey_tbl Doesn't this require at least a similar change to opensm/osm_pkey.c:osm_pkey_tbl_get_block_and_idx ? Anything else ? -- Hal From mirko.benz at xiranet.com Mon Sep 18 03:59:26 2006 From: mirko.benz at xiranet.com (Mirko Benz) Date: Mon, 18 Sep 2006 12:59:26 +0200 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) Message-ID: <450E7C0E.3020001@xiranet.com> Hello, We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone). Some IB diagnostics tools e.g. ibhosts and ibswitches (located under .../ofed/bin/) do not work with a normal user account -- no output given. It works as root though. Regards, Mirko From halr at voltaire.com Mon Sep 18 04:10:18 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 07:10:18 -0400 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <450E7C0E.3020001@xiranet.com> References: <450E7C0E.3020001@xiranet.com> Message-ID: <1158577816.18842.1501.camel@hal.voltaire.com> Hi Mirko, On Mon, 2006-09-18 at 06:59, Mirko Benz wrote: > Hello, > > We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone). > Some IB diagnostics tools e.g. ibhosts and ibswitches (located under > .../ofed/bin/) > do not work with a normal user account -- no output given. It works as > root though. It depends on how you have udev access for umad setup. With the default setup for IB, root is required as these diagnostics send SMPs which require umad access which is limited to root. -- Hal > Regards, > Mirko > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mirko.benz at xiranet.com Mon Sep 18 04:20:57 2006 From: mirko.benz at xiranet.com (Mirko Benz) Date: Mon, 18 Sep 2006 13:20:57 +0200 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <1158577816.18842.1501.camel@hal.voltaire.com> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> Message-ID: <450E8119.4060405@xiranet.com> Hi Hal, This was a default/build all OFED install. Either we should place these tools under ../ofed/sbin or make it work for every body. At least a error message that umad access failed would be required. Regards, Mirko Hal Rosenstock schrieb: > Hi Mirko, > > On Mon, 2006-09-18 at 06:59, Mirko Benz wrote: > >> Hello, >> >> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone). >> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under >> .../ofed/bin/) >> do not work with a normal user account -- no output given. It works as >> root though. >> > > It depends on how you have udev access for umad setup. With the default > setup for IB, root is required as these diagnostics send SMPs which > require umad access which is limited to root. > > -- Hal > > >> Regards, >> Mirko >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> >> From mst at mellanox.co.il Mon Sep 18 04:40:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 14:40:18 +0300 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <450E8119.4060405@xiranet.com> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> <450E8119.4060405@xiranet.com> Message-ID: <20060918114018.GJ29055@mellanox.co.il> Quoting r. Mirko Benz : > Subject: Re: IB diagnostics problems (OFED-1.1-rc5) > > Hi Hal, > > This was a default/build all OFED install. Either we should place these > tools under ../ofed/sbin or make it work for every body. At least a > error message that umad access failed would be required. I don't think opening umad for regular user by default is a good idea. And isn't sbin for static binaries? With regards to diagnostics - I think proper exit status is reported, so I expect if you set up shell accordingly you'll get the diagnostic printout. Printing stuff on stderr/stdout might interfere with activating these from scripts, so I'm not sure it's a good idea. Hal? -- MST From erezz at voltaire.com Mon Sep 18 04:57:05 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 18 Sep 2006 14:57:05 +0300 Subject: [openib-general] Negotiation of Rsponder resource & Initiator depth Message-ID: <450E8991.5080603@voltaire.com> Sean, In the IB spec it says in 12.7.29: The recipient of the REQ message shall choose a local Initiator Depth that does not exceed the Responder Resources offered in the REQ. If the recipient of the REQ message is unwilling or unable to do so, it shall send a REJ message to discontinue the connection establishment. From reading the CMA code, I see that it does not negotiate these values (responder resources & initiator depth). It expects the ULP to negotiate it. Why? Shouldn't it be done by the CMA? Thanks -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com From glebn at voltaire.com Mon Sep 18 05:06:37 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Mon, 18 Sep 2006 15:06:37 +0300 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <20060918114018.GJ29055@mellanox.co.il> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> <450E8119.4060405@xiranet.com> <20060918114018.GJ29055@mellanox.co.il> Message-ID: <20060918120637.GB29931@minantech.com> On Mon, Sep 18, 2006 at 02:40:18PM +0300, Michael S. Tsirkin wrote: > Quoting r. Mirko Benz : > > Subject: Re: IB diagnostics problems (OFED-1.1-rc5) > > > > Hi Hal, > > > > This was a default/build all OFED install. Either we should place these > > tools under ../ofed/sbin or make it work for every body. At least a > > error message that umad access failed would be required. > > I don't think opening umad for regular user by default is a good idea. > And isn't sbin for static binaries? > It isn't. sbin is for System BINaries. -- Gleb. From halr at voltaire.com Mon Sep 18 05:39:33 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 08:39:33 -0400 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <450E8119.4060405@xiranet.com> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> <450E8119.4060405@xiranet.com> Message-ID: <1158583167.18842.4632.camel@hal.voltaire.com> Hi again Mirko, On Mon, 2006-09-18 at 07:20, Mirko Benz wrote: > Hi Hal, > > This was a default/build all OFED install. Either we should place these > tools under ../ofed/sbin or make it work for every body. The issue with making it work for everyone is that there's a chicken and egg problem in that when the tools are built and installed, one doesn't know how udev will be configured for umad. I agree that since the default is to run as root, these should be in sbin rather than bin. Can you file a bugzilla report for this (or do you want me to do it on your behalf) ? Is this critical for OFED 1.1 ? > At least a error message that umad access failed would be required. Those are scripts and the errors are being returned from the lower level programs invoked but not by the scripts. Would you please file a bug for this as well (or let me know whether I should do this) ? Thanks. -- Hal > Regards, > Mirko > > Hal Rosenstock schrieb: > > Hi Mirko, > > > > On Mon, 2006-09-18 at 06:59, Mirko Benz wrote: > > > >> Hello, > >> > >> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone). > >> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under > >> .../ofed/bin/) > >> do not work with a normal user account -- no output given. It works as > >> root though. > >> > > > > It depends on how you have udev access for umad setup. With the default > > setup for IB, root is required as these diagnostics send SMPs which > > require umad access which is limited to root. > > > > -- Hal > > > > > >> Regards, > >> Mirko > >> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >> > >> > From halr at voltaire.com Mon Sep 18 05:42:06 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 08:42:06 -0400 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <20060918114018.GJ29055@mellanox.co.il> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> <450E8119.4060405@xiranet.com> <20060918114018.GJ29055@mellanox.co.il> Message-ID: <1158583325.18842.4712.camel@hal.voltaire.com> On Mon, 2006-09-18 at 07:40, Michael S. Tsirkin wrote: > Quoting r. Mirko Benz : > > Subject: Re: IB diagnostics problems (OFED-1.1-rc5) > > > > Hi Hal, > > > > This was a default/build all OFED install. Either we should place these > > tools under ../ofed/sbin or make it work for every body. At least a > > error message that umad access failed would be required. > > I don't think opening umad for regular user by default is a good idea. > And isn't sbin for static binaries? > > With regards to diagnostics - I think proper exit status is reported, > so I expect if you set up shell accordingly you'll get the diagnostic printout. I don't think so for this case. > Printing stuff on stderr/stdout might interfere with activating these from > scripts, so I'm not sure it's a good idea. Hal? The ones Mirko cited currently are scripts rather than binaries. -- Hal From michael.arndt at informatik.tu-chemnitz.de Mon Sep 18 05:47:04 2006 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Mon, 18 Sep 2006 14:47:04 +0200 Subject: [openib-general] What does --process_mad-- exactly? Message-ID: <001a01c6db20$8b306980$21606d86@one7> Hi, the function ib_mad_recv_done_handler, which is called if a dr_smp packet was received, calls "port_priv->device->process_mad". I analyze that function (recursive) for mthca, but I never found the point where a set or get method is applied. Does anybody knows, what exactly process_mad does. Thanks, Michael From mirko.benz at xiranet.com Mon Sep 18 05:56:13 2006 From: mirko.benz at xiranet.com (Mirko Benz) Date: Mon, 18 Sep 2006 14:56:13 +0200 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <1158583167.18842.4632.camel@hal.voltaire.com> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> <450E8119.4060405@xiranet.com> <1158583167.18842.4632.camel@hal.voltaire.com> Message-ID: <450E976D.3070802@xiranet.com> Hi Hal, Please prepare the bugzilla entry. It is not critical -- I just think it is not convenient for an end user. Regards, Mirko Hal Rosenstock schrieb: > Hi again Mirko, > > On Mon, 2006-09-18 at 07:20, Mirko Benz wrote: > >> Hi Hal, >> >> This was a default/build all OFED install. Either we should place these >> tools under ../ofed/sbin or make it work for every body. >> > > The issue with making it work for everyone is that there's a chicken and > egg problem in that when the tools are built and installed, one doesn't > know how udev will be configured for umad. I agree that since the > default is to run as root, these should be in sbin rather than bin. Can > you file a bugzilla report for this (or do you want me to do it on your > behalf) ? Is this critical for OFED 1.1 ? > > >> At least a error message that umad access failed would be required. >> > Those are scripts and the errors are being returned from the lower level > programs invoked but not by the scripts. > > Would you please file a bug for this as well (or let me know whether I > should do this) ? > > Thanks. > > -- Hal > > >> Regards, >> Mirko >> >> Hal Rosenstock schrieb: >> >>> Hi Mirko, >>> >>> On Mon, 2006-09-18 at 06:59, Mirko Benz wrote: >>> >>> >>>> Hello, >>>> >>>> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone). >>>> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under >>>> .../ofed/bin/) >>>> do not work with a normal user account -- no output given. It works as >>>> root though. >>>> >>>> >>> It depends on how you have udev access for umad setup. With the default >>> setup for IB, root is required as these diagnostics send SMPs which >>> require umad access which is limited to root. >>> >>> -- Hal >>> >>> >>> >>>> Regards, >>>> Mirko >>>> >>>> _______________________________________________ >>>> openib-general mailing list >>>> openib-general at openib.org >>>> http://openib.org/mailman/listinfo/openib-general >>>> >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>>> >>>> >>>> From halr at voltaire.com Mon Sep 18 06:05:13 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 09:05:13 -0400 Subject: [openib-general] What does --process_mad-- exactly? In-Reply-To: <001a01c6db20$8b306980$21606d86@one7> References: <001a01c6db20$8b306980$21606d86@one7> Message-ID: <1158584696.18842.5432.camel@hal.voltaire.com> Hi Michael, On Mon, 2006-09-18 at 08:47, Michael Arndt wrote: > Hi, > > the function ib_mad_recv_done_handler, which is called if a dr_smp packet > was received, calls "port_priv->device->process_mad". I analyze that > function (recursive) for mthca, but I never found the point where a set or > get method is applied. > > Does anybody knows, what exactly process_mad does. process_mad hands the incoming MAD to the driver (mthca, ipath, eHCA) if it has defined this routine. In the case of mthca, it is used to hand incoming SMA and PMA packets down the firmware (see hw/mthca/mthca_mad.c:mthca_process_mad). -- Hal > Thanks, Michael > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From johnt1johnt2 at gmail.com Mon Sep 18 06:43:59 2006 From: johnt1johnt2 at gmail.com (john t) Date: Mon, 18 Sep 2006 19:13:59 +0530 Subject: [openib-general] Reuse pd amd mr Message-ID: Hi I have two HCA cards each having one port. I want to use same memory buffer to store packets arriving on the two ports. Can I do this, meaning can I use same pd (protection domain) and mr (memory registration) for the two QPs (one QP on each port), though the context (i.e. ib device) for each QP is different? Regards, John T -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at dev.mellanox.co.il Mon Sep 18 06:58:20 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 18 Sep 2006 16:58:20 +0300 Subject: [openib-general] Reuse pd amd mr In-Reply-To: References: Message-ID: <450EA5FC.7040003@dev.mellanox.co.il> Hi john. john t wrote: > Hi > > I have two HCA cards each having one port. I want to use same memory > buffer to store packets arriving on the two ports. Can I do this, > meaning can I use same pd (protection domain) and mr (memory > registration) for the two QPs (one QP on each port), though the > context ( i.e. ib device) for each QP is different? if the context is different how can you create 2 QPs using the same PD? The context is a driver abstraction and the HCA is not aware of it ... anyway, if you have 2 QPs and 1 MR which are in the same PD, the QPs can listen/send the packets on any port and write to the same MR (in different address of course, the order of the packet arrival in those QPs is "random" ...) Dotan From sashak at voltaire.com Mon Sep 18 06:46:04 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 18 Sep 2006 16:46:04 +0300 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <20060918114018.GJ29055@mellanox.co.il> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> <450E8119.4060405@xiranet.com> <20060918114018.GJ29055@mellanox.co.il> Message-ID: <1158587165.9877.3.camel@localhost> On Mon, 2006-09-18 at 14:40 +0300, Michael S. Tsirkin wrote: > Quoting r. Mirko Benz : > > Subject: Re: IB diagnostics problems (OFED-1.1-rc5) > > > > Hi Hal, > > > > This was a default/build all OFED install. Either we should place these > > tools under ../ofed/sbin or make it work for every body. At least a > > error message that umad access failed would be required. > > I don't think opening umad for regular user by default is a good idea. Yes, but this can be limited for predefined group (something like ib, umad, ibumad...) and then umad permitted users permitted will be a members of this supplementary group. Sasha > And isn't sbin for static binaries? > > With regards to diagnostics - I think proper exit status is reported, so I > expect if you set up shell accordingly you'll get the diagnostic printout. > Printing stuff on stderr/stdout might interfere with activating these from > scripts, so I'm not sure it's a good idea. Hal? > From sashak at voltaire.com Mon Sep 18 06:46:04 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 18 Sep 2006 16:46:04 +0300 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <20060918114018.GJ29055@mellanox.co.il> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> <450E8119.4060405@xiranet.com> <20060918114018.GJ29055@mellanox.co.il> Message-ID: <1158587165.9877.3.camel@localhost> On Mon, 2006-09-18 at 14:40 +0300, Michael S. Tsirkin wrote: > Quoting r. Mirko Benz : > > Subject: Re: IB diagnostics problems (OFED-1.1-rc5) > > > > Hi Hal, > > > > This was a default/build all OFED install. Either we should place these > > tools under ../ofed/sbin or make it work for every body. At least a > > error message that umad access failed would be required. > > I don't think opening umad for regular user by default is a good idea. Yes, but this can be limited for predefined group (something like ib, umad, ibumad...) and then umad permitted users permitted will be a members of this supplementary group. Sasha > And isn't sbin for static binaries? > > With regards to diagnostics - I think proper exit status is reported, so I > expect if you set up shell accordingly you'll get the diagnostic printout. > Printing stuff on stderr/stdout might interfere with activating these from > scripts, so I'm not sure it's a good idea. Hal? > From trimmer at silverstorm.com Mon Sep 18 07:09:00 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 18 Sep 2006 10:09:00 -0400 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <450E6AB2.70505@voltaire.com> Message-ID: > From: Or Gerlitz > Sent: Monday, September 18, 2006 5:45 AM > To: Michael S. Tsirkin > Cc: OPENIB > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for > MT23108 devices > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz : > > >> Eitan Zahavi wrote: > >>> The following patch solves an issue with OpenSM preferring largest MTU > >>> for PathRecord/MultiPathRecord for paths going to or from MT23108 > (Tavor) > >>> devices instead of using a 1K MTU which is best for this device. > > >> Isn't the 2K MTU issue with Tavor comes into play only under RC QP? > > > I don't think so, no. Tavor supports 2K MTU, but it has better > performance with > > 1K MTU than 2K MTU. QP type should not matter. > > Can you double check that please, as far as i know there is something > like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW with > Tavor/UD/2048 is **no less** then Tavor/UD/1024. > > So its very common for IPoIB net devices impl. to expose 2044 or 1500 > bytes MTU to the OS eg to cope with Ethernet and reduce IP > fragmentation/reassembly of UDP/TCP traffic. > Putting this in the SM alone and making it a fabric wide setting is inappropriate. The performance difference depends on application message size. Application message size can vary per ULP and/or per application itself. For example one MPI application may send mostly large messages while another may send mostly small messages. The same could be true of applications for other ULPs such as uDAPL and SDP, etc. The root issue is the Tavor HCA has 1 too few credits to truly double buffer at 2K MTU. However at message sizes > 1K but < 2K the 2K MTU performs better. Here are some MPI bandwidth results: Tavor w/ 2K MTU: 512 140.394173 1024 310.553002 1500 407.003858 1800 435.538752 2048 392.831026 4096 417.592991 Tavor w/ 1K MTU: 512 140.261964 1024 300.789425 1500 379.746835 1800 416.726957 2048 425.227096 4096 501.442289 Note that message sizes shown on left do not include MPI headers. Hence actual IB message size is approx 50 bytes larger. So we see at IB message sizes < 1024 (MPI 512 message), performance is the same. At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages), performance is best with 2K MTU. At IB message sizes > 2048 (MPI 2048-4096 messages above), performance is best with 1K MTU. At larger IB message sizes (MPI 4096 message), performance starts to take off and ultimately at 128K message size (not shown) the 50% difference between 1K and 2K MTU reaches its peak. Todd Rimmer From rdreier at cisco.com Mon Sep 18 07:10:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 07:10:13 -0700 Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name References: <450D36E9.1000502@voltaire.com> Message-ID: Or> I want it to be visible so if some other config **depends** on Or> it the use can **see** this config and select it. Or> Also as of the importance of the rdma cm within the IB stack Or> being along with the ib verbs the second access point to ULP Or> coders, seeing its config and documenting it is important. I don't buy this. The only thing making this config option visible does is make it more likely (far more likely) that someone will disable it. Right now the RDMA CM is built as long as INFINIBAND and INET are enabled. No one is going to turn off INET on any normal system so effectively the RDMA CM is always built whenever INFINIBAND is enabled. As far as making a config symbol to depend on, I think INET makes as much sense or more: something using IP addressing naturally depends on having IP networking. - R. From rdreier at cisco.com Mon Sep 18 07:15:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 07:15:13 -0700 Subject: [openib-general] [PATCH] IB/iser: fix iSER description and selections in Kconfig References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> <450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com> <450D0FCB.1000401@voltaire.com> Message-ID: Erez> There are 3 additional required config entries: NET, INET & Erez> INFINIBAND_RDMA_CM. Do you suggest to 'depned' on them or Erez> 'depned' on some of them and 'select' the rest? INET depends on NET, and INFINIBAND_RDMA_CM doesn't exist. So depending on INET is sufficient. That's the reason 'depend' is better than 'select' -- you don't have to worry about recreating the full dependency tree of things you depend on. Erez> Also, since I'm not familiar enough with 'make rndconfig', Erez> here's a question: if iSER 'depends' on INET, is it possible Erez> that 'make rndconfig' will enable iSER without enabling Erez> INET? No, of course not. The whole point of make randconfig is to make a random but valid configuration. Anyway, rather than waste more time going back and forth on this, I added the following to my for-2.6.19 tree as the obvious fix: Author: Roland Dreier Date: Sun Sep 17 22:58:27 2006 -0700 IB/iser: INFINIBAND_ISER depends on INET iSER won't build without CONFIG_INET enabled, so make Kconfig reflect that. Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig index fead87d..365a1b5 100644 --- a/drivers/infiniband/ulp/iser/Kconfig +++ b/drivers/infiniband/ulp/iser/Kconfig @@ -1,6 +1,6 @@ config INFINIBAND_ISER tristate "ISCSI RDMA Protocol" - depends on INFINIBAND && SCSI + depends on INFINIBAND && SCSI && INET select SCSI_ISCSI_ATTRS ---help--- Support for the ISCSI RDMA Protocol over InfiniBand. This From thlin at us.ibm.com Mon Sep 18 07:24:50 2006 From: thlin at us.ibm.com (Tseng-Hui (Frank) Lin) Date: Mon, 18 Sep 2006 09:24:50 -0500 Subject: [openib-general] Mstflint - not working on ppc64 and when driver is not loaded on AMD In-Reply-To: <20060917133449.GA28318@mellanox.co.il> References: <20060917133449.GA28318@mellanox.co.il> Message-ID: <1158589490.21249.19.camel@flin.austin.ibm.com> Michael: You are right. The idea was to use sysfs resource0 whenever it is available and fall back to config space when it is not. This would make both new and old kernels happy. This re-structured mopen() and make the patch look big. I dug into the ppc64 a little bit. The device driver does IO remap. ioremap is needed in IBM pSeries machines. I suspect that's why resource0 (and other mmap) only works when the device driver is loaded. I have not figured out a way to do ioremap from user space. In addition to open and mmap, maybe I should try to read a few bytes and fall back to config space if the read failed. I suspect x86_64 suffers from the same problem. I am getting an AMD blade to find what exact the problem is. You mentioned "your version" of mstflint. Is that a different one from the one in OFED-1.0? If it is, would you mind sending me a copy of your version so that I can play with it as well? Thanks. On Sun, 2006-09-17 at 16:34 +0300, Michael S. Tsirkin wrote: > Quoting r. Moshe Kazir : > > Subject: Mstflint - not working on ppc64 and when driver is not loaded on AMD > > > > > > Michael, > > > > The attached patch was received from Frank (IBM) . > > Wow, that's one big patch, I can't see what it's doing at all. > Can just the relevant fix be isolated? > > > Frank change the mmap in the mopen function and now it is working o.k. > > on my IBM JS21 ppc64 (sles9 sp3 sles10) and IBM HS21 (EM64T) sles9 sp3 > > all the computer uses PCI-Ex HCA cards > > > I tested this fix on AMD computer (PCI-X) and found that it did not fix > > the problem initially reported by Or Gerlitz in the attached message. > > That is, if it is even relevant? > > > Also, I suspect that it doesn't work on MAC ppc64 G5 with PCI-X . (I > > have to repeated this test) . > > > > I'm suspect that this this is a PCI-X to PCI-EX issue . > > > > Hmm. > What I can understand of the patch, it attempts using sysfs resource0 > which is only implemented on kernels > 2.6.12 or 2.6.13, so > that's probably your issue. > > Can you try passing the following to mstflint (my version): > -d /sys/bus/pci/devices/0000\:08\:00.0/resource0 q > where 0000\:08\:00.0 is the appropriate device? > > Does this work with driver not loaded? On which OS-es? > From johnt1johnt2 at gmail.com Mon Sep 18 07:36:33 2006 From: johnt1johnt2 at gmail.com (john t) Date: Mon, 18 Sep 2006 20:06:33 +0530 Subject: [openib-general] Reuse pd amd mr In-Reply-To: <450EA5FC.7040003@dev.mellanox.co.il> References: <450EA5FC.7040003@dev.mellanox.co.il> Message-ID: Hi Dotan, This may be a very basic question. When u said "QPs can listen/send the packets on any port and write to the same MR", does it mean QPs can listen/send packets to any port on the same HCA or to also ports on different HCA ? Regards, John T On 9/18/06, Dotan Barak wrote: > > Hi john. > > john t wrote: > > Hi > > > > I have two HCA cards each having one port. I want to use same memory > > buffer to store packets arriving on the two ports. Can I do this, > > meaning can I use same pd (protection domain) and mr (memory > > registration) for the two QPs (one QP on each port), though the > > context ( i.e. ib device) for each QP is different? > > if the context is different how can you create 2 QPs using the same PD? > The context is a driver abstraction and the HCA is not aware of it ... > > anyway, if you have 2 QPs and 1 MR which are in the same PD, the QPs can > listen/send the packets on any port and write to the same MR > (in different address of course, the order of the packet arrival in > those QPs is "random" ...) > > Dotan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Sep 18 07:49:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 07:49:28 -0700 Subject: [openib-general] Reuse pd amd mr In-Reply-To: (john t.'s message of "Mon, 18 Sep 2006 19:13:59 +0530") References: Message-ID: john> Hi I have two HCA cards each having one port. I want to use john> same memory buffer to store packets arriving on the two john> ports. Can I do this, meaning can I use same pd (protection john> domain) and mr (memory registration) for the two QPs (one QP john> on each port), though the context (i.e. ib device) for each john> QP is different? No, a PD belongs to a specific device. However nothing prevents you from creating one PD for each device, and two MRs (one for each device, each using one of those two PDs) that cover the same memory. - R. From dotanb at dev.mellanox.co.il Mon Sep 18 07:50:09 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 18 Sep 2006 17:50:09 +0300 Subject: [openib-general] Reuse pd amd mr In-Reply-To: References: <450EA5FC.7040003@dev.mellanox.co.il> Message-ID: <450EB221.3000708@dev.mellanox.co.il> john t wrote: > Hi Dotan, > > This may be a very basic question. When u said "QPs can listen/send > the packets on any port and write to the same MR", does it mean QPs > can listen/send packets to any port on the same HCA or to also ports > on different HCA ? You are right, i wasn't clear enough (sorry)... What i meant was that you can work with more than one QP in an HCA, each QP can send/recv messages on a different port (every QP is working with only one port). A QP is a resource in the HCA, hence a QP in HCA1 cannot listen/send packets to the port of HCA2. If you wish, your SW can handle all of the HCAs: you can open a QP for every port in every HCA( total QPs: #HCAs * #Ports ). i hope i was more clear this time ... Dotan > > > On 9/18/06, *Dotan Barak* > wrote: > > Hi john. > > john t wrote: > > Hi > > > > I have two HCA cards each having one port. I want to use same > memory > > buffer to store packets arriving on the two ports. Can I do this, > > meaning can I use same pd (protection domain) and mr (memory > > registration) for the two QPs (one QP on each port), though the > > context ( i.e. ib device) for each QP is different? > > if the context is different how can you create 2 QPs using the > same PD? > The context is a driver abstraction and the HCA is not aware of it ... > > anyway, if you have 2 QPs and 1 MR which are in the same PD, the > QPs can > listen/send the packets on any port and write to the same MR > (in different address of course, the order of the packet arrival in > those QPs is "random" ...) > > Dotan > > From eitan at mellanox.co.il Mon Sep 18 08:20:07 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 18 Sep 2006 18:20:07 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: References: Message-ID: <450EB927.2020903@mellanox.co.il> Hi Todd, Seems like your knowledge about the specific MTU best for the application (MPI) you are running is good enough such that you will be able to include the MTU in the PathRecord request and thus the patch describe in here will not affect your MPI at all. The patch only applies if your request does not provide any MTU & MTU SEL comp_mask EZ Rimmer, Todd wrote: >>From: Or Gerlitz >>Sent: Monday, September 18, 2006 5:45 AM >>To: Michael S. Tsirkin >>Cc: OPENIB >>Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU >> >> >for > > >>MT23108 devices >> >>Michael S. Tsirkin wrote: >> >> >>>Quoting r. Or Gerlitz : >>> >>> >>>>Eitan Zahavi wrote: >>>> >>>> >>>>>The following patch solves an issue with OpenSM preferring largest >>>>> >>>>> >MTU > > >>>>>for PathRecord/MultiPathRecord for paths going to or from MT23108 >>>>> >>>>> >>(Tavor) >> >> >>>>>devices instead of using a 1K MTU which is best for this device. >>>>> >>>>> >>>>Isn't the 2K MTU issue with Tavor comes into play only under RC QP? >>>> >>>> >>>I don't think so, no. Tavor supports 2K MTU, but it has better >>> >>> >>performance with >> >> >>>1K MTU than 2K MTU. QP type should not matter. >>> >>> >>Can you double check that please, as far as i know there is something >>like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW >> >> >with > > >>Tavor/UD/2048 is **no less** then Tavor/UD/1024. >> >>So its very common for IPoIB net devices impl. to expose 2044 or 1500 >>bytes MTU to the OS eg to cope with Ethernet and reduce IP >>fragmentation/reassembly of UDP/TCP traffic. >> >> >> > >Putting this in the SM alone and making it a fabric wide setting is >inappropriate. The performance difference depends on application >message size. Application message size can vary per ULP and/or per >application itself. For example one MPI application may send mostly >large messages while another may send mostly small messages. The same >could be true of applications for other ULPs such as uDAPL and SDP, etc. > >The root issue is the Tavor HCA has 1 too few credits to truly double >buffer at 2K MTU. However at message sizes > 1K but < 2K the 2K MTU >performs better. > >Here are some MPI bandwidth results: >Tavor w/ 2K MTU: >512 140.394173 >1024 310.553002 >1500 407.003858 >1800 435.538752 >2048 392.831026 >4096 417.592991 > >Tavor w/ 1K MTU: >512 140.261964 >1024 300.789425 >1500 379.746835 >1800 416.726957 >2048 425.227096 >4096 501.442289 > >Note that message sizes shown on left do not include MPI headers. Hence >actual IB message size is approx 50 bytes larger. > >So we see at IB message sizes < 1024 (MPI 512 message), performance is >the same. >At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages), performance >is best with 2K MTU. >At IB message sizes > 2048 (MPI 2048-4096 messages above), performance >is best with 1K MTU. >At larger IB message sizes (MPI 4096 message), performance starts to >take off and ultimately at 128K message size (not shown) the 50% >difference between 1K and 2K MTU reaches its peak. > >Todd Rimmer > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From trimmer at silverstorm.com Mon Sep 18 08:52:18 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 18 Sep 2006 11:52:18 -0400 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <450EB927.2020903@mellanox.co.il> Message-ID: > From: Eitan Zahavi [mailto:eitan at mellanox.co.il] > Sent: Monday, September 18, 2006 11:20 AM > To: Rimmer, Todd > Cc: Or Gerlitz; Michael S. Tsirkin; OPENIB > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for > MT23108 devices > > Hi Todd, > > Seems like your knowledge about the specific MTU best for the > application (MPI) you are running is good > enough such that you will be able to include the MTU in the PathRecord > request and thus the patch describe in here will not affect your MPI at > all. > The patch only applies if your request does not provide any MTU & MTU > SEL comp_mask Eitan, The question is not about "our MPI", rather its to ensure the Open Fabrics and OFED included MPIs and ULPs are capable of being tuned for optimal performance. When a fabric runs more than 1 application, its necessary to be able to tune this at the MPI, SDP, etc level, not at the SM level. This patch turns on a non-standard behaviour in the SM for the entire fabric such that some applications will have better performance while others will suffer. In order to be complete, this patch would need to include ULP level tunability in all the relevant ULPs (MPI, SDP, uDAPL, etc) to select the "MAX MTU" to use or to request. This then begs the question, if proper tuning requires all the ULPs to have a configurable MAX MTU, why should the SA need to implement the quirk at all? Todd Rimmer > > > >Putting this in the SM alone and making it a fabric wide setting is > >inappropriate. The performance difference depends on application > >message size. Application message size can vary per ULP and/or per > >application itself. For example one MPI application may send mostly > >large messages while another may send mostly small messages. The same > >could be true of applications for other ULPs such as uDAPL and SDP, etc. > > > >The root issue is the Tavor HCA has 1 too few credits to truly double > >buffer at 2K MTU. However at message sizes > 1K but < 2K the 2K MTU > >performs better. > > > >Here are some MPI bandwidth results: > >Tavor w/ 2K MTU: > >512 140.394173 > >1024 310.553002 > >1500 407.003858 > >1800 435.538752 > >2048 392.831026 > >4096 417.592991 > > > >Tavor w/ 1K MTU: > >512 140.261964 > >1024 300.789425 > >1500 379.746835 > >1800 416.726957 > >2048 425.227096 > >4096 501.442289 > > > >Note that message sizes shown on left do not include MPI headers. Hence > >actual IB message size is approx 50 bytes larger. > > > >So we see at IB message sizes < 1024 (MPI 512 message), performance is > >the same. > >At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages), performance > >is best with 2K MTU. > >At IB message sizes > 2048 (MPI 2048-4096 messages above), performance > >is best with 1K MTU. > >At larger IB message sizes (MPI 4096 message), performance starts to > >take off and ultimately at 128K message size (not shown) the 50% > >difference between 1K and 2K MTU reaches its peak. > > > >Todd Rimmer > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general > > > > From changquing.tang at hp.com Mon Sep 18 09:30:57 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 18 Sep 2006 11:30:57 -0500 Subject: [openib-general] Reuse pd amd mr In-Reply-To: Message-ID: > >No, a PD belongs to a specific device. However nothing >prevents you from creating one PD for each device, and two MRs >(one for each device, each using one of those two PDs) that >cover the same memory. Roland: I did exactly what you said with two cards on a node, however, if I use the two physical channels for Message striping, 99% of the test passed, but for some condition, I got IBV_WC_RETRY_EXC_ERR, or the code Just hangs there with no sending completion(ibv_poll_cq returns 0). Do you think this is a firware issue, Or the driver issue ? Thanks. --CQ > > - R. > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general > > From rdreier at cisco.com Mon Sep 18 09:46:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 09:46:00 -0700 Subject: [openib-general] Reuse pd amd mr In-Reply-To: (Changqing Tang's message of "Mon, 18 Sep 2006 11:30:57 -0500") References: Message-ID: Changqing> Roland: I did exactly what you said with two cards on a Changqing> node, however, if I use the two physical channels for Changqing> Message striping, 99% of the test passed, but for some Changqing> condition, I got IBV_WC_RETRY_EXC_ERR, or the code Just Changqing> hangs there with no sending completion(ibv_poll_cq Changqing> returns 0). Do you think this is a firware issue, Or Changqing> the driver issue ? 'retries exceeded' means that the transport retry count was exceeded, so most likely your timeout is set too low. Without seeing your code, I couldn't begin to say why you don't see a send completion. If you are absolutely positive that you post a send and you never see a completion for that send, then I guess it is a firmware or hardware problem. - R. From changquing.tang at hp.com Mon Sep 18 09:58:56 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 18 Sep 2006 11:58:56 -0500 Subject: [openib-general] Reuse pd amd mr In-Reply-To: Message-ID: > >'retries exceeded' means that the transport retry count was >exceeded, so most likely your timeout is set too low. Is there a common recommended value for this timeout ? I use 18, which represents 1 second. > >Without seeing your code, I couldn't begin to say why you >don't see a send completion. If you are absolutely positive >that you post a send and you never see a completion for that >send, then I guess it is a firmware or hardware problem. It is very hard to reproduce this error with standalone code. I use HP-Mpi and need 8 ranks, at least 4 nodes with 2 cards on each node, and just one of our hundred test code can catch this error, and it is on MPI_Scatterv Operation. --CQ > > - R. > From rdreier at cisco.com Mon Sep 18 10:02:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 10:02:42 -0700 Subject: [openib-general] Reuse pd amd mr In-Reply-To: (Changqing Tang's message of "Mon, 18 Sep 2006 11:58:56 -0500") References: Message-ID: Changqing> Is there a common recommended value for this timeout ? Changqing> I use 18, which represents 1 second. 18 should be OK I guess, unless you have congestion in your fabric, in which case you have other problems anyway. Changqing> It is very hard to reproduce this error with standalone Changqing> code. I use HP-Mpi and need 8 ranks, at least 4 nodes Changqing> with 2 cards on each node, and just one of our hundred Changqing> test code can catch this error, and it is on Changqing> MPI_Scatterv Operation. Unless you can narrow down a way to reproduce this, I don't think it's going to be possible for anyone to help debug it. - R. From changquing.tang at hp.com Mon Sep 18 10:14:35 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 18 Sep 2006 12:14:35 -0500 Subject: [openib-general] Reuse pd amd mr In-Reply-To: Message-ID: > > Changqing> Is there a common recommended value for this timeout ? > Changqing> I use 18, which represents 1 second. > >18 should be OK I guess, unless you have congestion in your >fabric, in which case you have other problems anyway. > > Changqing> It is very hard to reproduce this error with standalone > Changqing> code. I use HP-Mpi and need 8 ranks, at least 4 nodes > Changqing> with 2 cards on each node, and just one of our hundred > Changqing> test code can catch this error, and it is on > Changqing> MPI_Scatterv Operation. > >Unless you can narrow down a way to reproduce this, I don't >think it's going to be possible for anyone to help debug it. OK, I forget to mention, if I use rdma on both channels, it is hard to reprocude the hang, If I create SRQ on one of the channel, then it hangs the other channel even on the first Rdma operation, I will write a standlone code for you driver guys to debug. --CQ > > - R. > From mst at mellanox.co.il Mon Sep 18 10:22:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 20:22:05 +0300 Subject: [openib-general] Mstflint - not working on ppc64 and whendriver is not loaded on AMD In-Reply-To: <1158589490.21249.19.camel@flin.austin.ibm.com> References: <1158589490.21249.19.camel@flin.austin.ibm.com> Message-ID: <20060918172205.GA1371@mellanox.co.il> Quoting r. Tseng-Hui (Frank) Lin : > You mentioned "your version" of mstflint. Is that a different one > from the one in OFED-1.0? If it is, would you mind sending me a copy of > your version so that I can play with it as well? Thanks. Jut the one in svn trunk/OFED 1.1 RC. -- MST From ralphc at pathscale.com Mon Sep 18 10:27:59 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Mon, 18 Sep 2006 10:27:59 -0700 Subject: [openib-general] How to support IOMMUs for ipath driver In-Reply-To: <450D1C0A.90906@voltaire.com> References: <1158108010.8759.192.camel@brick.pathscale.com> <45093428.5010009@voltaire.com> <1158263019.8759.324.camel@brick.pathscale.com> <450D1C0A.90906@voltaire.com> Message-ID: <1158600479.2592.9.camel@brick.pathscale.com> On Sun, 2006-09-17 at 12:57 +0300, Or Gerlitz wrote: > Ralph Campbell wrote: > > Here is my thinking so far: > > > > The driver is passed an LKEY/RKEY plus an address. > > For ib_get_dma_mr(), the address is currently from > > dma_map_single(), dma_map_page(), or dma_map_sg(). > > With the ib_dma_*() routines, I can intercept these calls > > and return something instead of a bus or IOMMU address. > > I would like to return a kernel virtual address since that > > is the simplest and is what I ultimately need. This is > > trivial for dma_map_single() and trivial for low memory > > pages for dma_map_page(). > > > > I think I can safely just return error for architectures > > with high memory pages since the driver really only works > > on 64-bit systems (for a variety of reasons which I won't > > go into) and those systems don't have high memory. > > Again (and please go and check me), pages you need to DMA (ie move over > IB) need **not** be mapped into the kernel virtual address space and > this happens **not** only under ia32 high-memory scheme, please see my > other email for two examples (direct I/O etc) > > > > ib_sg_dma_address would return the page_address() of sg->page > > but wouldn't be able to rely on other fields which might be in > > the struct scatterlist. > > your design seems to reply on three fields: page, offset and length, so > > ib_sg_map_sg(scat) is kmap-ping whatever pages which are not mapped now > into kvirt > > ib_dma_unmap_sg(scat) is kunmap-ping those pages you were mapping before > (you might need an aux data structure to keep which need kunmap) > > ib_sg_dma_address(scat) is page_address(scat->page) + scat->offset > > ib_sg_dma_len(scat) is scat->length > > Or. Correct. From mst at mellanox.co.il Mon Sep 18 11:06:11 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 21:06:11 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: References: Message-ID: <20060918180611.GB1371@mellanox.co.il> Quoting r. Rimmer, Todd : > Subject: RE: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices > > > From: Eitan Zahavi [mailto:eitan at mellanox.co.il] > > Sent: Monday, September 18, 2006 11:20 AM > > To: Rimmer, Todd > > Cc: Or Gerlitz; Michael S. Tsirkin; OPENIB > > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU > for > > MT23108 devices > > > > Hi Todd, > > > > Seems like your knowledge about the specific MTU best for the > > application (MPI) you are running is good > > enough such that you will be able to include the MTU in the PathRecord > > request and thus the patch describe in here will not affect your MPI > at > > all. > > The patch only applies if your request does not provide any MTU & MTU > > SEL comp_mask > > Eitan, > > The question is not about "our MPI", rather its to ensure the Open > Fabrics and OFED included MPIs and ULPs are capable of being tuned for > optimal performance. When a fabric runs more than 1 application, its > necessary to be able to tune this at the MPI, SDP, etc level, not at the > SM level. We did not remove this ability at all. So it's there. > This patch turns on a non-standard behaviour in the SM for the entire > fabric such that some applications will have better performance while > others will suffer. I disagree. The behaviour is perfectly standards compliant. > In order to be complete, this patch would need to > include ULP level tunability in all the relevant ULPs (MPI, SDP, uDAPL, > etc) to select the "MAX MTU" to use or to request. This tunability is already there - that's what MTU selector in path queries does. > This then begs the question, if proper tuning requires all the ULPs to > have a configurable MAX MTU, why should the SA need to implement the > quirk at all? > > Todd Rimmer If ULP wants MAX MTU, it must set MTU selector to 3 in path query. If MTU selector is disabled in the query, SM will guess which MTU is best to select. SM used a specific heuristic to perform that guess. All we did is, provide an option to use a different heuristic. This is useful because, SM has data on the whole fabric as opposed to ULPs which often only have data on the endnode. -- MST From bgreen at nas.nasa.gov Mon Sep 18 11:07:32 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Mon, 18 Sep 2006 11:07:32 -0700 Subject: [openib-general] patch trouble In-Reply-To: Your message of "Sun, 17 Sep 2006 23:31:53 +0300." <20060917203153.GG32526@mellanox.co.il> Message-ID: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov> "Michael S. Tsirkin" writes: > Quoting r. Bryan Green : > > Is there a great discrepancy between the git repository and the svn > > repository? If I am downloading the kernel modules from subversion, should I > > still use the patchset from the git repository? > > *Please* do not use svn trunk code for production. > You want either kernel.org code or the OFED git repository for everything. > kernel code in subversion is being deprecated. Okay, thanks. Thats good to know. What about the userspace code in the 1.0/1.1 svn branch? Is that userspace code equivalent to what's in the OFED distribution? I can forego using kernel code from subversion, but it is convenient for the userspace stuff (as explained below). > > > There is currently only a source tarball > > for libibverbs, while ofed is too RPM-centric. > > Not really. Please try the following: > > Get the ofed tarball here > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > and unpack it. > Take this file: SOURCES/openib-1.1.tgz > That's all of subversion + git all nicely packed up. The problem with that tar file is that it contains far more than the '.tgz' file. It also contains some large source rpms. The whole thing is 47 Megs, of which only about 12 Megs is of interest to me. > Let me know how it goes. BTW, which distro are you using? I am using Gentoo (www.gentoo.org). I am actually writing the ebuild (http://en.wikipedia.org/wiki/Ebuild) scripts for adding openib to the gentoo science overlay (http://svn.cryos.net/projects/gentoo-sci-overlay) The key feature of Gentoo package management is that packages are downloaded, built, and installed from source, all in an automated fashion. Downloading the entire 47 Meg files to build openib is prohibitive. Especially if all one wants to do is build, say, libibverbs, libmthca, and the performance tools. That is why I am downloading the userspace code from svn. If there was a single downloadable openib.tgz file, I could build the kernel modules as well as the userspace tools from that. In the meantime, I'd like to continue getting userspace code from svn. As for the kernel modules, I will stick with whats in the kernel for now, though I look forward to SDP being added to the main line, as I'm getting some rather nice performance from it. -bryan From trimmer at silverstorm.com Mon Sep 18 11:20:46 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 18 Sep 2006 14:20:46 -0400 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <20060918180611.GB1371@mellanox.co.il> Message-ID: > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Monday, September 18, 2006 2:06 PM > To: Rimmer, Todd > Cc: Eitan Zahavi; Or Gerlitz; OPENIB > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for > MT23108 devices > > Quoting r. Rimmer, Todd : > > Subject: RE: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for > MT23108 devices > > > > > From: Eitan Zahavi [mailto:eitan at mellanox.co.il] > > > Sent: Monday, September 18, 2006 11:20 AM > > > To: Rimmer, Todd > > > Cc: Or Gerlitz; Michael S. Tsirkin; OPENIB > > > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU > > for > > > MT23108 devices > > > > > > Hi Todd, > > > > > > Seems like your knowledge about the specific MTU best for the > > > application (MPI) you are running is good > > > enough such that you will be able to include the MTU in the PathRecord > > > request and thus the patch describe in here will not affect your MPI > > at > > > all. > > > The patch only applies if your request does not provide any MTU & MTU > > > SEL comp_mask > > > > Eitan, > > > > The question is not about "our MPI", rather its to ensure the Open > > Fabrics and OFED included MPIs and ULPs are capable of being tuned for > > optimal performance. When a fabric runs more than 1 application, its > > necessary to be able to tune this at the MPI, SDP, etc level, not at the > > SM level. > > We did not remove this ability at all. So it's there. > > > In order to be complete, this patch would need to > > include ULP level tunability in all the relevant ULPs (MPI, SDP, uDAPL, > > etc) to select the "MAX MTU" to use or to request. > > This tunability is already there - that's what MTU selector in path > queries > does. > > > This then begs the question, if proper tuning requires all the ULPs to > > have a configurable MAX MTU, why should the SA need to implement the > > quirk at all? > > > If ULP wants MAX MTU, it must set MTU selector to 3 in path query. > > If MTU selector is disabled in the query, SM will guess which MTU is best > to > select. SM used a specific heuristic to perform that guess. All we did > is, > provide an option to use a different heuristic. > > This is useful because, SM has data on the whole fabric as opposed to ULPs > which often only have data on the endnode. The patch you submitted only modified Open SM. So please show me the patch where MVAPICH, Open MPI, SDP, SRP and other ULPs allow this to be tuned by the user or application? Lacking that patch, all the "if a ULP wants" statements above are mute. The goal is for OFED to provide a high performance standard solution. If end users must modify the ULPs source code to achieve that goal, OFED misses the mark. Todd Rimmer From rdreier at cisco.com Mon Sep 18 11:21:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 11:21:23 -0700 Subject: [openib-general] patch trouble In-Reply-To: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov> (Bryan Green's message of "Mon, 18 Sep 2006 11:07:32 -0700") References: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov> Message-ID: Bryan> I am using Gentoo (www.gentoo.org). I am actually writing Bryan> the ebuild (http://en.wikipedia.org/wiki/Ebuild) scripts Bryan> for adding openib to the gentoo science overlay Bryan> (http://svn.cryos.net/projects/gentoo-sci-overlay) Cool, glad to hear it. Bryan> Downloading the entire 47 Meg files to build openib is Bryan> prohibitive. Especially if all one wants to do is build, Bryan> say, libibverbs, libmthca, and the performance tools. That Bryan> is why I am downloading the userspace code from svn. If Bryan> there was a single downloadable openib.tgz file, I could Bryan> build the kernel modules as well as the userspace tools Bryan> from that. In the meantime, I'd like to continue getting Bryan> userspace code from svn. For libibverbs and libmthca at least, I am careful to keep the releases on http://openib.org/downloads/ up to date. For example, you can find http://openib.org/downloads/libibverbs-1.0.3.tar.gz http://openib.org/downloads/libmthca-1.0.2.tar.gz there, which are the latest stable releases as of now. None of the other package maintainers seems to have gotten serious about publishing releases of their packages -- and I agree with you that just relying on OFED leaves a serious gap. Anyway, for whatever reason, I seem to be the only openib person who really cares about distro inclusion of stuff, but I'm happy to do things that make your job as a packager easier, at least for my userspace packages (libibverbs and libmthca). Just let me know. I've already gotten those packages into mainline Debian, Ubuntu and Fedora Extras repositories, and I'd be happy to see them in Gentoo as well. - R. From mst at mellanox.co.il Mon Sep 18 11:22:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 21:22:41 +0300 Subject: [openib-general] patch trouble In-Reply-To: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov> References: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov> Message-ID: <20060918182241.GE1371@mellanox.co.il> Quoting r. Bryan Green : > Subject: Re: patch trouble > > "Michael S. Tsirkin" writes: > > Quoting r. Bryan Green : > > > Is there a great discrepancy between the git repository and the svn > > > repository? If I am downloading the kernel modules from subversion, should I > > > still use the patchset from the git repository? > > > > *Please* do not use svn trunk code for production. > > You want either kernel.org code or the OFED git repository for everything. > > kernel code in subversion is being deprecated. > > Okay, thanks. Thats good to know. What about the userspace code in the > 1.0/1.1 svn branch? Is that userspace code equivalent to what's in the > OFED distribution? I can forego using kernel code from subversion, but > it is convenient for the userspace stuff (as explained below). Yes, that 's what OFED uses for userspace. > > > > > There is currently only a source tarball > > > for libibverbs, while ofed is too RPM-centric. > > > > Not really. Please try the following: > > > > Get the ofed tarball here > > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > > and unpack it. > > Take this file: SOURCES/openib-1.1.tgz > > That's all of subversion + git all nicely packed up. > > The problem with that tar file is that it contains far more than the > '.tgz' file. It also contains some large source rpms. The whole thing is > 47 Megs, of which only about 12 Megs is of interest to me. So, I guess you want to just remove the rest of the stuff? > > Let me know how it goes. BTW, which distro are you using? > > I am using Gentoo (www.gentoo.org). I am actually writing the ebuild > (http://en.wikipedia.org/wiki/Ebuild) scripts for adding openib to the > gentoo science overlay (http://svn.cryos.net/projects/gentoo-sci-overlay) > > The key feature of Gentoo package management is that packages are > downloaded, built, and installed from source, all in an automated fashion. > > Downloading the entire 47 Meg files to build openib is prohibitive. > Especially if all one wants to do is build, say, libibverbs, libmthca, and > the performance tools. That is why I am downloading the userspace code > from svn. If there was a single downloadable openib.tgz file, I could > build the kernel modules as well as the userspace tools from that. We can try looking into that. So what do you want it to include? We currently only target RPM based distros. Are you willing to maintan the gentoo support? Maybe after each release candidate you can prepare the tarball for gentoo and upload it? > In the meantime, I'd like to continue getting userspace code from svn. That's fine too, I think. > As for the kernel modules, I will stick with whats in the kernel for now, > though I look forward to SDP being added to the main line, as I'm getting > some rather nice performance from it. Hmm. Which kernel do you run? If you have 2.6.18, it's easy to add just SDP as an out of kernel module. -- MST From rdreier at cisco.com Mon Sep 18 11:39:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 11:39:37 -0700 Subject: [openib-general] patch trouble In-Reply-To: <20060918182241.GE1371@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 18 Sep 2006 21:22:41 +0300") References: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov> <20060918182241.GE1371@mellanox.co.il> Message-ID: Michael> Hmm. Which kernel do you run? If you have 2.6.18, it's Michael> easy to add just SDP as an out of kernel module. Lots of things would be easy if you have kernel 2.6.18, because then you also have a time machine... - R. From bgreen at nas.nasa.gov Mon Sep 18 11:41:41 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Mon, 18 Sep 2006 11:41:41 -0700 Subject: [openib-general] patch trouble In-Reply-To: Your message of "Mon, 18 Sep 2006 21:22:41 +0300." <20060918182241.GE1371@mellanox.co.il> Message-ID: <200609181841.k8IIffnc013148@ece06.nas.nasa.gov> "Michael S. Tsirkin" writes: > Quoting r. Bryan Green : > > Subject: Re: patch trouble > > > > The problem with that tar file is that it contains far more than the > > '.tgz' file. It also contains some large source rpms. The whole thing > > is 47 Megs, of which only about 12 Megs is of interest to me. > > So, I guess you want to just remove the rest of the stuff? Well, ideally, it would be possible to download a tarball that contained the contents of https://openfabrics.org/svn/gen2/branches/1.1/src/userspace/, minus the mvapich subdirs, which could perhaps be a separate tgz as they are very large on their own. > > > If there was a single downloadable openib.tgz file, I could > > build the kernel modules as well as the userspace tools from that. > > We can try looking into that. So what do you want it to include? > We currently only target RPM based distros. > Are you willing to maintan the gentoo support? > Maybe after each release candidate you can prepare the tarball > for gentoo and upload it? I'm willing the maintain the gentoo support for the gentoo science overlay. That isn't a problem. I have already constructed gentoo-ified versions of the OFED scripts found in '1.0/ofed/openib/scripts'. I could potentially upload tarballs as you suggest. For now, I'm going to stick with subversion until my ebuilds are commited to the science overlay, and there is potential interest in it from others. We will have to see. If there are others in the gentoo community who would benefit from it, I'd be happy to produce and upload the tarfiles as my schedule permits, if openfabrics.org would host them. > > As for the kernel modules, I will stick with whats in the kernel for now, > > though I look forward to SDP being added to the main line, as I'm getting > > some rather nice performance from it. > > Hmm. Which kernel do you run? > If you have 2.6.18, it's easy to add just SDP as an out of kernel module. Usually our cluster has a very up-to-date kernel. Right now I'm being forced to downgrade it temporarily to 2.6.12 in order to evaluate Lustre. In the future, I plan to do as you suggest, and add SDP as an out of kernel module. -bryan From bgreen at nas.nasa.gov Mon Sep 18 11:52:40 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Mon, 18 Sep 2006 11:52:40 -0700 Subject: [openib-general] patch trouble In-Reply-To: Your message of "Mon, 18 Sep 2006 21:22:41 +0300." <20060918182241.GE1371@mellanox.co.il> Message-ID: <200609181852.k8IIqeMH013506@ece06.nas.nasa.gov> "Michael S. Tsirkin" writes: > Quoting r. Bryan Green : > > Subject: Re: patch trouble > > > > > > The problem with that tar file is that it contains far more than the > > '.tgz' file. It also contains some large source rpms. The whole thing i > s > > 47 Megs, of which only about 12 Megs is of interest to me. > > So, I guess you want to just remove the rest of the stuff? I forgot to mention in the previous email - a separate tar file with kernel module code and patches would be nice too. Just to add to my previous wish list. ;) -bryan From bgreen at nas.nasa.gov Mon Sep 18 11:56:58 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Mon, 18 Sep 2006 11:56:58 -0700 Subject: [openib-general] patch trouble In-Reply-To: Your message of "Mon, 18 Sep 2006 11:21:23 PDT." Message-ID: <200609181856.k8IIuwC9013626@ece06.nas.nasa.gov> Roland Dreier writes: > > For libibverbs and libmthca at least, I am careful to keep the > releases on http://openib.org/downloads/ up to date. For example, you > can find > > http://openib.org/downloads/libibverbs-1.0.3.tar.gz > http://openib.org/downloads/libmthca-1.0.2.tar.gz > > there, which are the latest stable releases as of now. Yes, my first experimental ebuild was based on your libibverbs package. :) But then I wanted more (namely libsdp and mvapich2) and moved on the svn. Thanks for the packages. I was wondering why those were the only ones. > > Anyway, for whatever reason, I seem to be the only openib person who > really cares about distro inclusion of stuff, but I'm happy to do > things that make your job as a packager easier, at least for my > userspace packages (libibverbs and libmthca). Just let me know. I've > already gotten those packages into mainline Debian, Ubuntu and Fedora > Extras repositories, and I'd be happy to see them in Gentoo as well. Cool, thanks for the support. :) -bryan From kliteyn at dev.mellanox.co.il Mon Sep 18 11:52:09 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 18 Sep 2006 21:52:09 +0300 Subject: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics Message-ID: <450EEAD9.1000503@dev.mellanox.co.il> Hi Hal This patch fixes a bug in opensm that was discovered on a 'broken' fabrics when opensm was executed with --stay_on_fatal. Replacing assert with a real check. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: opensm/osm_node_info_rcv.c =================================================================== --- opensm/osm_node_info_rcv.c (revision 9527) +++ opensm/osm_node_info_rcv.c (working copy) @@ -543,7 +543,13 @@ __osm_ni_rcv_process_ca_port( p_physp = osm_node_get_physp_ptr( p_node, port_num ); CL_ASSERT( p_physp ); - CL_ASSERT( osm_physp_is_valid( p_physp ) ); + if (!osm_physp_is_valid( p_physp )) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_ni_rcv_process_ca_port: ERR 0D19: " + "Invalid physical port found. Aborting discovery.\n"); + goto Exit; + } /* Update the DR Path to the port, From mst at mellanox.co.il Mon Sep 18 12:05:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 22:05:05 +0300 Subject: [openib-general] patch trouble In-Reply-To: <200609181856.k8IIuwC9013626@ece06.nas.nasa.gov> References: <200609181856.k8IIuwC9013626@ece06.nas.nasa.gov> Message-ID: <20060918190505.GG1371@mellanox.co.il> Quoting r. Bryan Green : > But then I wanted more (namely libsdp and mvapich2) and moved on the svn. > Thanks for the packages. I was wondering why those were the only ones. libsdp (as well as SDP itself) is still in beta, that's why we didn't publish any releases yet. -- MST From mst at mellanox.co.il Mon Sep 18 12:13:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 22:13:53 +0300 Subject: [openib-general] patch trouble In-Reply-To: <200609181852.k8IIqeMH013506@ece06.nas.nasa.gov> References: <20060918182241.GE1371@mellanox.co.il> <200609181852.k8IIqeMH013506@ece06.nas.nasa.gov> Message-ID: <20060918191353.GH1371@mellanox.co.il> Quoting r. Bryan Green : > Subject: Re: patch trouble > > "Michael S. Tsirkin" writes: > > Quoting r. Bryan Green : > > > Subject: Re: patch trouble > > > > > > > > > The problem with that tar file is that it contains far more than the > > > '.tgz' file. It also contains some large source rpms. The whole thing i > > s > > > 47 Megs, of which only about 12 Megs is of interest to me. > > > > So, I guess you want to just remove the rest of the stuff? > > I forgot to mention in the previous email - a separate tar file with kernel > module code and patches would be nice too. Just to add to my previous > wish list. ;) OK, so you'll be the first user for kernel/user split that I wanted to do right after 1.1. -- MST From mst at mellanox.co.il Mon Sep 18 12:17:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Sep 2006 22:17:08 +0300 Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names Message-ID: <20060918191708.GI1371@mellanox.co.il> IB/sa: Fix ib_sa_selector names Relevant SA queries are actually "greater than" not "greater than or equal" as the name implies. See IB spec 1.2 Vol 1, 15.2.5.16 PATHRECORD/Table 205 PathRecord. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.18-rc2-devel/include/rdma/ib_sa.h =================================================================== --- linux-2.6.18-rc2-devel.orig/include/rdma/ib_sa.h 2006-09-17 11:54:38.000000000 +0300 +++ linux-2.6.18-rc2-devel/include/rdma/ib_sa.h 2006-09-17 11:54:51.000000000 +0300 @@ -79,8 +79,8 @@ enum { }; enum ib_sa_selector { - IB_SA_GTE = 0, - IB_SA_LTE = 1, + IB_SA_GT = 0, + IB_SA_LT = 1, IB_SA_EQ = 2, /* * The meaning of "best" depends on the attribute: for -- MST From eitan at mellanox.co.il Mon Sep 18 13:11:35 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 18 Sep 2006 23:11:35 +0300 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : include/opensm/osm_pkey.h In-Reply-To: <1158576813.18842.935.camel@hal.voltaire.com> References: <86zmcylc2e.fsf@mtl066.yok.mtl.com> <1158576813.18842.935.camel@hal.voltaire.com> Message-ID: <450EFD77.7000605@mellanox.co.il> Hal Rosenstock wrote: >Hi Eitan, > >On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote: > > >>Hi Hal >> >>Partition tables blocks are always 16 bits. >>This resolves the need to later cast back and forth. >> >>Thanks >> >>Eitan >> >>Signed-off-by: Eitan Zahavi >> >>Index: include/opensm/osm_pkey.h >>=================================================================== >>--- include/opensm/osm_pkey.h (revision 9502) >>+++ include/opensm/osm_pkey.h (working copy) >>@@ -143,7 +143,7 @@ typedef struct _osm_pkey_tbl >> typedef struct _osm_pending_pkey { >> cl_list_item_t list_item; >> uint16_t pkey; >>- uint32_t block; >>+ uint16_t block; >> uint8_t index; >> boolean_t is_new; >> } osm_pending_pkey_t; >>@@ -396,7 +396,7 @@ ib_api_status_t >> osm_pkey_tbl_get_block_and_idx( >> IN osm_pkey_tbl_t *p_pkey_tbl, >> IN uint16_t *p_pkey, >>- OUT uint32_t *block_idx, >>+ OUT uint16_t *block_idx, >> OUT uint8_t *pkey_index); >> /* >> * p_pkey_tbl >> >> > >Doesn't this require at least a similar change to >opensm/osm_pkey.c:osm_pkey_tbl_get_block_and_idx ? Anything else ? > > Sure this affects the osm_pkey.c. It is included in the mail named: [PATCH 10/13] osm: port to WinIB stack : opensm/osm_pkey.c >-- Hal > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From eitan at mellanox.co.il Mon Sep 18 13:17:20 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 18 Sep 2006 23:17:20 +0300 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: References: Message-ID: <450EFED0.7050909@mellanox.co.il> Rimmer, Todd wrote: >>From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] >>Sent: Monday, September 18, 2006 2:06 PM >>To: Rimmer, Todd >>Cc: Eitan Zahavi; Or Gerlitz; OPENIB >>Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU >> >> >for > > >>MT23108 devices >> >>Quoting r. Rimmer, Todd : >> >> >>>Subject: RE: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU >>> >>> >for > > >>MT23108 devices >> >> >>>>From: Eitan Zahavi [mailto:eitan at mellanox.co.il] >>>>Sent: Monday, September 18, 2006 11:20 AM >>>>To: Rimmer, Todd >>>>Cc: Or Gerlitz; Michael S. Tsirkin; OPENIB >>>>Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K >>>> >>>> >MTU > > >>>for >>> >>> >>>>MT23108 devices >>>> >>>>Hi Todd, >>>> >>>>Seems like your knowledge about the specific MTU best for the >>>>application (MPI) you are running is good >>>>enough such that you will be able to include the MTU in the >>>> >>>> >PathRecord > > >>>>request and thus the patch describe in here will not affect your >>>> >>>> >MPI > > >>>at >>> >>> >>>>all. >>>>The patch only applies if your request does not provide any MTU & >>>> >>>> >MTU > > >>>>SEL comp_mask >>>> >>>> >>>Eitan, >>> >>>The question is not about "our MPI", rather its to ensure the Open >>>Fabrics and OFED included MPIs and ULPs are capable of being tuned >>> >>> >for > > >>>optimal performance. When a fabric runs more than 1 application, >>> >>> >its > > >>>necessary to be able to tune this at the MPI, SDP, etc level, not at >>> >>> >the > > >>>SM level. >>> >>> >>We did not remove this ability at all. So it's there. >> >> >> >>>In order to be complete, this patch would need to >>>include ULP level tunability in all the relevant ULPs (MPI, SDP, >>> >>> >uDAPL, > > >>>etc) to select the "MAX MTU" to use or to request. >>> >>> >>This tunability is already there - that's what MTU selector in path >>queries >>does. >> >> >> >>>This then begs the question, if proper tuning requires all the ULPs >>> >>> >to > > >>>have a configurable MAX MTU, why should the SA need to implement the >>>quirk at all? >>> >>> >>> >>If ULP wants MAX MTU, it must set MTU selector to 3 in path query. >> >>If MTU selector is disabled in the query, SM will guess which MTU is >> >> >best > > >>to >>select. SM used a specific heuristic to perform that guess. All we >> >> >did > > >>is, >>provide an option to use a different heuristic. >> >>This is useful because, SM has data on the whole fabric as opposed to >> >> >ULPs > > >>which often only have data on the endnode. >> >> > >The patch you submitted only modified Open SM. So please show me the >patch where MVAPICH, Open MPI, SDP, SRP and other ULPs allow this to be >tuned by the user or application? Lacking that patch, all the "if a ULP >wants" statements above are mute. The goal is for OFED to provide a >high performance standard solution. If end users must modify the ULPs >source code to achieve that goal, OFED misses the mark. > > To our best knowledge the change which automatically selects 1K MTU for the above ULPs improves their performance. Do you have any measurement on OFED 1.1 that shows otherwise? Under what cases? If this is the case then you basically do not have to do anything and all just works as it used to. But if we are correct then a user can create an OpenSM cache file, modify the enable_quirks to TRUE and restart the SM. I am sure you could imaging this patch was not just another way for us to spend our time... >Todd Rimmer > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From trimmer at silverstorm.com Mon Sep 18 13:30:53 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 18 Sep 2006 16:30:53 -0400 Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices In-Reply-To: <450EFED0.7050909@mellanox.co.il> Message-ID: > From: Eitan Zahavi [mailto:eitan at mellanox.co.il] > Sent: Monday, September 18, 2006 4:17 PM > To: Rimmer, Todd > Cc: Michael S. Tsirkin; Or Gerlitz; OPENIB > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for > MT23108 devices > > >The patch you submitted only modified Open SM. So please show me the > >patch where MVAPICH, Open MPI, SDP, SRP and other ULPs allow this to be > >tuned by the user or application? Lacking that patch, all the "if a ULP > >wants" statements above are mute. The goal is for OFED to provide a > >high performance standard solution. If end users must modify the ULPs > >source code to achieve that goal, OFED misses the mark. > > > > > To our best knowledge the change which automatically selects 1K MTU for > the above ULPs improves their performance. > Do you have any measurement on OFED 1.1 that shows otherwise? Under what > cases? > If this is the case then you basically do not have to do anything and > all just works as it used to. > But if we are correct then a user can create an OpenSM cache file, > modify the enable_quirks to TRUE and restart the SM. > I am sure you could imaging this patch was not just another way for us > to spend our time... > In my previous posting I included MPI performance numbers which showed >1K <2K performance was reduced when using 2K MTU. This also applies to SDP when an existing application is using message sizes in this range (which is quite common). Having control only in the SM is not sufficient. Having to bounce the SM is even worse. Todd Rimmer From halr at voltaire.com Mon Sep 18 13:46:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 16:46:58 -0400 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : include/opensm/osm_pkey.h In-Reply-To: <450EFD77.7000605@mellanox.co.il> References: <86zmcylc2e.fsf@mtl066.yok.mtl.com> <1158576813.18842.935.camel@hal.voltaire.com> <450EFD77.7000605@mellanox.co.il> Message-ID: <1158612363.18842.19619.camel@hal.voltaire.com> On Mon, 2006-09-18 at 16:11, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > >Hi Eitan, > > > >On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote: > > > > > >>Hi Hal > >> > >>Partition tables blocks are always 16 bits. > >>This resolves the need to later cast back and forth. > >> > >>Thanks > >> > >>Eitan > >> > >>Signed-off-by: Eitan Zahavi > >> > >>Index: include/opensm/osm_pkey.h > >>=================================================================== > >>--- include/opensm/osm_pkey.h (revision 9502) > >>+++ include/opensm/osm_pkey.h (working copy) > >>@@ -143,7 +143,7 @@ typedef struct _osm_pkey_tbl > >> typedef struct _osm_pending_pkey { > >> cl_list_item_t list_item; > >> uint16_t pkey; > >>- uint32_t block; > >>+ uint16_t block; > >> uint8_t index; > >> boolean_t is_new; > >> } osm_pending_pkey_t; > >>@@ -396,7 +396,7 @@ ib_api_status_t > >> osm_pkey_tbl_get_block_and_idx( > >> IN osm_pkey_tbl_t *p_pkey_tbl, > >> IN uint16_t *p_pkey, > >>- OUT uint32_t *block_idx, > >>+ OUT uint16_t *block_idx, > >> OUT uint8_t *pkey_index); > >> /* > >> * p_pkey_tbl > >> > >> > > > >Doesn't this require at least a similar change to > >opensm/osm_pkey.c:osm_pkey_tbl_get_block_and_idx ? Anything else ? > > > > > Sure this affects the osm_pkey.c. > It is included in the mail named: [PATCH 10/13] osm: port to WinIB > stack : opensm/osm_pkey.c Each patch should be able to stand on its own (so 2 and 10 should have been one patch). No need to resubmit for this. -- Hal > >-- Hal > > > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From rdreier at cisco.com Mon Sep 18 14:24:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 14:24:05 -0700 Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names In-Reply-To: <20060918191708.GI1371@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 18 Sep 2006 22:17:08 +0300") References: <20060918191708.GI1371@mellanox.co.il> Message-ID: Thanks, queued for 2.6.19. > Signed-off-by: Michael S. Tsirkin > > Index: linux-2.6.18-rc2-devel/include/rdma/ib_sa.h > =================================================================== One trivial request: can you make sure your patches have a "---" line between the patch description and the actual patch? That way git tools can just apply the patch automagically for me. - R. From rdreier at cisco.com Mon Sep 18 14:26:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 14:26:34 -0700 Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names In-Reply-To: <20060918191708.GI1371@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 18 Sep 2006 22:17:08 +0300") References: <20060918191708.GI1371@mellanox.co.il> Message-ID: BTW, I think this means your original IPoIB patch that did: > + path->pathrec.mtu = priv->broadcast->mcmember.mtu; > + path->pathrec.mtu_selector = IB_SA_GTE; now needs to do something like + path->pathrec.mtu = max(IB_MTU_256, priv->broadcast->mcmember.mtu - 1); + path->pathrec.mtu_selector = IB_SA_GT; right? The strict inequality semantics defined by the spec are somewhat more awkward to actually use :( - R. From rdreier at cisco.com Mon Sep 18 15:34:10 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 15:34:10 -0700 Subject: [openib-general] Bug in OpenSM multicast group creation? Message-ID: Around line 1340 of osm_sa_mcmember_record.c, there is the code: /* the mcmember_record should have mtu_sel, rate_sel and pkt_lifetime_sel = 2 */ (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */ (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */ (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */ /* Initialize the mgrp */ (*pp_mgrp)->mcmember_rec = mcm_rec; (*pp_mgrp)->mcmember_rec.mlid = mlid; I don't know exactly what this is trying to do, but it looks very fishy to me: as far as I can see, the second block of code overwrites the effects of the first three lines. So either those "/* exactly */" lines aren't needed, or they need to be moved after the mgrp is initialized. - R. From halr at voltaire.com Mon Sep 18 16:30:33 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 19:30:33 -0400 Subject: [openib-general] Bug in OpenSM multicast group creation? In-Reply-To: References: Message-ID: <1158622202.18842.23874.camel@hal.voltaire.com> On Mon, 2006-09-18 at 18:34, Roland Dreier wrote: > Around line 1340 of osm_sa_mcmember_record.c, there is the code: > > /* the mcmember_record should have mtu_sel, rate_sel and pkt_lifetime_sel = 2 */ > (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */ > (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */ > (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */ > > /* Initialize the mgrp */ > (*pp_mgrp)->mcmember_rec = mcm_rec; > (*pp_mgrp)->mcmember_rec.mlid = mlid; > > I don't know exactly what this is trying to do, The response is required to have the selectors set to exactly regardless of what they were in the request. > but it looks very fishy to me: Now that you point it out, me too :-( > as far as I can see, the second block of code overwrites > the effects of the first three lines. So either those "/* exactly */" > lines aren't needed, or they need to be moved after the mgrp is > initialized. It appears to me that they should be moved after those 2 lines of mgrp initialization. -- Hal > - R. From rdreier at cisco.com Mon Sep 18 17:07:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Sep 2006 17:07:26 -0700 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: (Roland Dreier's message of "Thu, 14 Sep 2006 14:11:32 -0700") References: <20060914141901.GG25691@mellanox.co.il> Message-ID: Here's a patch that tries to fix this. I only tried it with the Cisco embedded SM, so someone should probably check that this doesn't break under OpenSM. Look OK? - R. IPoIB: Create MCGs with all attributes required by RFC RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says: If the IB multicast group does not already exist, one must be created first with the IPoIB link MTU. The MGID MUST use the same P_Key, Q_Key, SL, MTU, and HopLimit as those used in the broadcast-GID. The rest of attributes SHOULD follow the values used in the broadcast-GID as well. However, the current IPoIB driver is only setting the attributes required by the InfiniBand spec to create a multicast group, so in particular the MTU and HopLimit are not being set. Add these attributes when creating MCGs, and also set the Rate attribute, since IPoIB pays attention to that attribute as well. Signed-off-by: Roland Dreier --- diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index fb3e487..3faa182 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -472,15 +472,25 @@ static void ipoib_mcast_join(struct net_ if (create) { comp_mask |= - IB_SA_MCMEMBER_REC_QKEY | - IB_SA_MCMEMBER_REC_SL | - IB_SA_MCMEMBER_REC_FLOW_LABEL | - IB_SA_MCMEMBER_REC_TRAFFIC_CLASS; + IB_SA_MCMEMBER_REC_QKEY | + IB_SA_MCMEMBER_REC_MTU_SELECTOR | + IB_SA_MCMEMBER_REC_MTU | + IB_SA_MCMEMBER_REC_TRAFFIC_CLASS | + IB_SA_MCMEMBER_REC_RATE_SELECTOR | + IB_SA_MCMEMBER_REC_RATE | + IB_SA_MCMEMBER_REC_SL | + IB_SA_MCMEMBER_REC_FLOW_LABEL | + IB_SA_MCMEMBER_REC_HOP_LIMIT; rec.qkey = priv->broadcast->mcmember.qkey; + rec.mtu_selector = IB_SA_EQ; + rec.mtu = priv->broadcast->mcmember.mtu; + rec.traffic_class = priv->broadcast->mcmember.traffic_class; + rec.rate_selector = IB_SA_EQ; + rec.rate = priv->broadcast->mcmember.rate; rec.sl = priv->broadcast->mcmember.sl; rec.flow_label = priv->broadcast->mcmember.flow_label; - rec.traffic_class = priv->broadcast->mcmember.traffic_class; + rec.hop_limit = priv->broadcast->mcmember.hop_limit; } init_completion(&mcast->done); From halr at voltaire.com Mon Sep 18 17:30:37 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2006 20:30:37 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c: In osm_mcmr_rcv_create_new_mgrp, fix exactly selectors in response Message-ID: <1158625818.18842.25479.camel@hal.voltaire.com> OpenSM/osm_sa_mcmember_record.c: In osm_mcmr_rcv_create_new_mgrp, set exactly selectors after rather than before mgrp is initialized Pointed out by: Roland Dreier Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_mcmember_record.c =================================================================== --- opensm/osm_sa_mcmember_record.c (revision 9347) +++ opensm/osm_sa_mcmember_record.c (working copy) @@ -1337,15 +1337,18 @@ osm_mcmr_rcv_create_new_mgrp( goto Exit; } - /* the mcmember_record should have mtu_sel, rate_sel and pkt_lifetime_sel = 2 */ - (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */ - (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */ - (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */ - /* Initialize the mgrp */ (*pp_mgrp)->mcmember_rec = mcm_rec; (*pp_mgrp)->mcmember_rec.mlid = mlid; + /* the mcmember_record should have mtu_sel, rate_sel, and pkt_lifetime_sel = 2 */ + (*pp_mgrp)->mcmember_rec.mtu &= 0x3f; + (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */ + (*pp_mgrp)->mcmember_rec.rate &= 0x3f; + (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */ + (*pp_mgrp)->mcmember_rec.pkt_life &= 0x3f; + (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */ + /* Insert the new group in the data base */ /* since we might have an old group by that mlid From mst at mellanox.co.il Mon Sep 18 20:03:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Sep 2006 06:03:41 +0300 Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names In-Reply-To: References: Message-ID: <20060919030341.GA30563@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/sa: fix ib_sa_selector names > > BTW, I think this means your original IPoIB patch that did: > > > + path->pathrec.mtu = priv->broadcast->mcmember.mtu; > > + path->pathrec.mtu_selector = IB_SA_GTE; > > now needs to do something like > > + path->pathrec.mtu = max(IB_MTU_256, priv->broadcast->mcmember.mtu - 1); > + path->pathrec.mtu_selector = IB_SA_GT; > > right? > > The strict inequality semantics defined by the spec are somewhat more > awkward to actually use :( But they also happen to work :). I'm testing that patch and will post it RSN. -- MST From mst at mellanox.co.il Mon Sep 18 20:06:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Sep 2006 06:06:41 +0300 Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names In-Reply-To: References: Message-ID: <20060919030641.GB30563@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/sa: fix ib_sa_selector names > > Thanks, queued for 2.6.19. > > > Signed-off-by: Michael S. Tsirkin > > > > Index: linux-2.6.18-rc2-devel/include/rdma/ib_sa.h > > =================================================================== > > One trivial request: can you make sure your patches have a "---" line > between the patch description and the actual patch? That way git > tools can just apply the patch automagically for me. Sure. Are you using git-apply-mbox BTW? -- MST From mst at mellanox.co.il Mon Sep 18 20:15:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Sep 2006 06:15:34 +0300 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: References: Message-ID: <20060919031534.GC30563@mellanox.co.il> Quoting r. Roland Dreier : Subject: Re: Fwd: IPoIB Multicast Here's a patch that tries to fix this. I only tried it with the Cisco embedded SM, so someone should probably check that this doesn't break under OpenSM. Look OK? - R. We've been testing the following which looks exactly equivalent. I'll look at the regression results in the morning and will let you know. Please note this fixes an actual issue for us: on a mixed 1x/4x or SDP/DDR network, if a group is created with the wrong parameters, some nodes are unable to join. ----------------------------------------------------------- IB/ipoib: make multicast group creation spec compliant IPoIB spec says: The MGID MUST use the same P_Key, Q_Key, SL, MTU and HopLimit as those used in the broadcast-GID. For the rest of attributes too, the values used in the broadcast-GID SHOULD be used. IPoIB currently violates this rule, which breaks multicast on heterogenious networks. Signed-off-by: Eli Cohen Signed-off-by: Michael S. Tsirkin --- Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-09-17 12:23:25.000000000 +0300 +++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-09-17 16:31:08.000000000 +0300 @@ -472,15 +472,25 @@ static void ipoib_mcast_join(struct net_ if (create) { comp_mask |= - IB_SA_MCMEMBER_REC_QKEY | - IB_SA_MCMEMBER_REC_SL | - IB_SA_MCMEMBER_REC_FLOW_LABEL | - IB_SA_MCMEMBER_REC_TRAFFIC_CLASS; + IB_SA_MCMEMBER_REC_QKEY | + IB_SA_MCMEMBER_REC_SL | + IB_SA_MCMEMBER_REC_FLOW_LABEL | + IB_SA_MCMEMBER_REC_TRAFFIC_CLASS | + IB_SA_MCMEMBER_REC_RATE_SELECTOR | + IB_SA_MCMEMBER_REC_RATE | + IB_SA_MCMEMBER_REC_HOP_LIMIT | + IB_SA_MCMEMBER_REC_MTU_SELECTOR | + IB_SA_MCMEMBER_REC_MTU; rec.qkey = priv->broadcast->mcmember.qkey; rec.sl = priv->broadcast->mcmember.sl; rec.flow_label = priv->broadcast->mcmember.flow_label; rec.traffic_class = priv->broadcast->mcmember.traffic_class; + rec.rate_selector = IB_SA_EQ; + rec.rate = priv->broadcast->mcmember.rate; + rec.hop_limit = priv->broadcast->mcmember.hop_limit; + rec.mtu_selector = IB_SA_EQ; + rec.mtu = priv->broadcast->mcmember.mtu; } init_completion(&mcast->done); -- MST From halr at voltaire.com Mon Sep 18 22:07:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 01:07:25 -0400 Subject: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics In-Reply-To: <450EEAD9.1000503@dev.mellanox.co.il> References: <450EEAD9.1000503@dev.mellanox.co.il> Message-ID: <1158642400.18842.33716.camel@hal.voltaire.com> Hi Yevgeny, On Mon, 2006-09-18 at 14:52, Yevgeny Kliteynik wrote: > Hi Hal > > This patch fixes a bug in opensm that was discovered on > a 'broken' fabrics when opensm was executed with --stay_on_fatal. > Replacing assert with a real check. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Is this intended for trunk only or also 1.1 ? -- Hal From eli at dev.mellanox.co.il Mon Sep 18 23:30:53 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Tue, 19 Sep 2006 09:30:53 +0300 Subject: [openib-general] ipoib multicast problem Message-ID: <1158647453.5392.66.camel@localhost> Hi, I have seen the following problem with ipoib: 1. An application registers to a multicast group as a full member. As a result all the groups are listed in dev->mclist. 2. The infiniband link falls momentarily, opensm restarted etc. 3. All multicast memberships are flushed. 4. The net device will not join again until at a later time something will cause ipoib_set_mcast_list() to be called. From eli at dev.mellanox.co.il Mon Sep 18 23:31:14 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Tue, 19 Sep 2006 09:31:14 +0300 Subject: [openib-general] [PATCH] ipoib mcast restart Message-ID: <1158647474.5392.68.camel@localhost> Make sure after after ipoib_ib_dev_flush is executed, ipoib_mcast_restart_task is executed also to join all the mcast groups maintained by the kernel for the device. Signed-off-by: Eli Cohen Signed-off-by: Michael S. Tsirkin Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2006-09-14 17:20:06.000000000 +0300 +++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2006-09-17 15:51:52.000000000 +0300 @@ -619,8 +619,10 @@ * The device could have been brought down between the start and when * we get here, don't bring it back up if it's not configured up */ - if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) + if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) { ipoib_ib_dev_up(dev); + ipoib_mcast_restart_task(dev); + } mutex_lock(&priv->vlan_mutex); From krkumar2 at in.ibm.com Tue Sep 19 00:02:10 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 19 Sep 2006 12:32:10 +0530 Subject: [openib-general] [PATCH] id_priv_list->list is not initialized sometimes Message-ID: <20060919070210.5476.68607.sendpatchset@localhost.localdomain> rdma_listen could be called from a context where id_priv->list is not initialized. Then at a later stage, a cma_cancel_listen does a list_del() which could oops since this element is not on any list. Eg, in rdma_listen(), if id->device is !NULL, it calls cma_ib_listen() which doesn't add this id to any list. A cma_cancel_listen() will do a list_del. Signed-off-by: Krishna Kumar -------- diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-14 15:31:27.000000000 +0530 +++ new/core/cma.c 2006-09-14 16:07:35.000000000 +0530 @@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c atomic_set(&id_priv->dev_remove, 0); INIT_LIST_HEAD(&id_priv->listen_list); INIT_LIST_HEAD(&id_priv->mc_list); + INIT_LIST_HEAD(&id_priv->list); get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); return &id_priv->id; From krkumar2 at in.ibm.com Tue Sep 19 00:02:06 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 19 Sep 2006 12:32:06 +0530 Subject: [openib-general] [PATCH] ucma : Encapsulate duplicate code to common routine Message-ID: <20060919070206.5476.64107.sendpatchset@localhost.localdomain> Encapsulate duplicate code to common routine - avoid checking same errors in multiple places. Signed-off-by: Krishna Kumar -------- diff -ruNp org/core/ucma.c new/core/ucma.c --- org/core/ucma.c 2006-09-18 17:38:12.000000000 +0530 +++ new/core/ucma.c 2006-09-18 17:39:34.000000000 +0530 @@ -87,20 +87,30 @@ struct ucma_event { static DEFINE_MUTEX(ctx_mutex); static DEFINE_IDR(ctx_idr); -static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id) +/* _ucma_find_context : internal find routine. Assumes ctx_mutex is held */ +static inline struct ucma_context* _ucma_find_context(int id) { struct ucma_context *ctx; - mutex_lock(&ctx_mutex); + BUG_ON(!mutex_is_locked(&ctx_mutex)); + ctx = idr_find(&ctx_idr, id); if (!ctx) ctx = ERR_PTR(-ENOENT); else if (ctx->file != file) ctx = ERR_PTR(-EINVAL); - else + return ctx; +} + +static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id) +{ + struct ucma_context *ctx; + + mutex_lock(&ctx_mutex); + ctx = _ucma_find_context(id); + if (!IS_ERR(ctx)) atomic_inc(&ctx->ref); mutex_unlock(&ctx_mutex); - return ctx; } @@ -354,12 +364,8 @@ static ssize_t ucma_destroy_id(struct uc return -EFAULT; mutex_lock(&ctx_mutex); - ctx = idr_find(&ctx_idr, cmd.id); - if (!ctx) - ctx = ERR_PTR(-ENOENT); - else if (ctx->file != file) - ctx = ERR_PTR(-EINVAL); - else + ctx = _ucma_find_context(cmd.id); + if (!IS_ERR(ctx)) idr_remove(&ctx_idr, ctx->id); mutex_unlock(&ctx_mutex); From krkumar2 at in.ibm.com Tue Sep 19 00:02:03 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 19 Sep 2006 12:32:03 +0530 Subject: [openib-general] [PATCH] fix cma_leave_mc_groups Message-ID: <20060919070203.5476.17650.sendpatchset@localhost.localdomain> - mthca_multicast_detach - as an example, frees up a bit for re-use later so if it is not called during destroy_id, it *appears* that those bits (index) are leaked. - cma_leave_mc_groups can race with other routines updating or reading the mclist, so use lock. Eg while doing a rdma_destroy_id(), other processes could be looking at this id and de-referencing mclist. Signed-off-by: Krishna Kumar -------- diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-18 16:00:41.000000000 +0530 +++ new/core/cma.c 2006-09-18 16:12:58.000000000 +0530 @@ -761,14 +761,24 @@ static void cma_release_port(struct rdma static void cma_leave_mc_groups(struct rdma_id_private *id_priv) { struct cma_multicast *mc; + unsigned long flags; + spin_lock_irqsave(&id_priv->lock, flags); while (!list_empty(&id_priv->mc_list)) { mc = container_of(id_priv->mc_list.next, struct cma_multicast, list); list_del(&mc->list); + spin_unlock_irqrestore(&id_priv->lock, flags); + if (id_priv->id.qp) { + ib_detach_mcast(id_priv->id.qp, + &mc->multicast.ib->rec.mgid, + mc->multicast.ib->rec.mlid); + } ib_free_multicast(mc->multicast.ib); kfree(mc); + spin_lock_irqsave(&id_priv->lock, flags); } + spin_unlock_irqrestore(&id_priv->lock, flags); } void rdma_destroy_id(struct rdma_cm_id *id) From krkumar2 at in.ibm.com Tue Sep 19 00:02:14 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 19 Sep 2006 12:32:14 +0530 Subject: [openib-general] [PATCH] Typo in ib_set_client_data() Message-ID: <20060919070214.5476.99212.sendpatchset@localhost.localdomain> Signed-off-by: Krishna Kumar -------- diff -ruNp org/core/device.c new/core/device.c --- org/core/device.c 2006-09-14 15:38:14.000000000 +0530 +++ new/core/device.c 2006-09-14 15:38:29.000000000 +0530 @@ -385,7 +385,7 @@ void *ib_get_client_data(struct ib_devic EXPORT_SYMBOL(ib_get_client_data); /** - * ib_set_client_data - Get IB client context + * ib_set_client_data - Set IB client context * @device:Device to set context for * @client:Client to set context for * @data:Context to set From mst at mellanox.co.il Tue Sep 19 00:21:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Sep 2006 10:21:29 +0300 Subject: [openib-general] [PATCH] Fix freed mem deref race in cma_process_remove/cma_req_handler In-Reply-To: <20060918073545.26067.41763.sendpatchset@localhost.localdomain> References: <20060918073545.26067.41763.sendpatchset@localhost.localdomain> Message-ID: <20060919072129.GD31498@mellanox.co.il> Quoting r. Krishna Kumar : > Subject: [PATCH] Fix freed mem deref race in cma_process_remove/cma_req_handler > > The race is as follows : > > A process : cma_process_remove() calls cma_remove_id_dev(), > which sets id state to CMA_DEVICE_REMOVAL and > calls wait_event(dev_remove). > > B process : cma_req_handler() had incremented dev_remove, > and calls cma_acquire_ib_dev() and on failure > calls cma_release_remove(), which does a > wake_up of cma_process_remove(). Then > cma_req_handler() calls rdma_destroy_id(); > > A Process : cma_remove_id_dev() gets woken and checks the > state of id, and since it is still (wrongly) > CMA_DEVICE_REMOVAL, it calls notify_user(id) > and if that fails, the caller - cma_process_remove() > calls rdma_destroy_id(id). Two processes can > call rdma_destroy_id(), resulting in one > de-referencing kfreed id_priv. > > Fix is for process B to set CMA_DESTROYING in cma_req_handler() > so that process A will return instead of doing a rdma_destroy_id(). > > Signed-off-by: Krishna Kumar Did you actually see these crashes? If yes, this looks serious enough even for 2.6.18. Sean? -- MST From mst at mellanox.co.il Tue Sep 19 00:25:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Sep 2006 10:25:09 +0300 Subject: [openib-general] Fwd: [PATCH] id_priv_list->list is not initialized sometimes Message-ID: <20060919072509.GE31498@mellanox.co.il> ----- Forwarded message from Krishna Kumar ----- From: "Krishna Kumar" Date: Tue, 19 Sep 2006 12:32:10 +0530 Subject: [PATCH] id_priv_list->list is not initialized sometimes rdma_listen could be called from a context where id_priv->list is not initialized. Then at a later stage, a cma_cancel_listen does a list_del() which could oops since this element is not on any list. Eg, in rdma_listen(), if id->device is !NULL, it calls cma_ib_listen() which doesn't add this id to any list. A cma_cancel_listen() will do a list_del. Signed-off-by: Krishna Kumar -------- diff -ruNp org/core/cma.c new/core/cma.c --- org/core/cma.c 2006-09-14 15:31:27.000000000 +0530 +++ new/core/cma.c 2006-09-14 16:07:35.000000000 +0530 @@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c atomic_set(&id_priv->dev_remove, 0); INIT_LIST_HEAD(&id_priv->listen_list); INIT_LIST_HEAD(&id_priv->mc_list); + INIT_LIST_HEAD(&id_priv->list); get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); return &id_priv->id; ----- End forwarded message ----- Did you actually see these crashes? If yes, this might need to be fixed even for 2.6.18. Sean? -- MST From krkumar2 at in.ibm.com Tue Sep 19 00:42:15 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Tue, 19 Sep 2006 13:12:15 +0530 Subject: [openib-general] Fwd: [PATCH] id_priv_list->list is not initialized sometimes In-Reply-To: <20060919072509.GE31498@mellanox.co.il> Message-ID: Hi Michael, > Did you actually see these crashes? > If yes, this might need to be fixed even for 2.6.18. Sean? No I have not seen this crash, this is based on reading the code. thanks, - KK openib-general-bounces at openib.org wrote on 09/19/2006 12:55:09 PM: > > ----- Forwarded message from Krishna Kumar ----- > > From: "Krishna Kumar" > Date: Tue, 19 Sep 2006 12:32:10 +0530 > Subject: [PATCH] id_priv_list->list is not initialized > sometimes > > rdma_listen could be called from a context where id_priv->list > is not initialized. Then at a later stage, a cma_cancel_listen > does a list_del() which could oops since this element is not > on any list. > > Eg, in rdma_listen(), if id->device is !NULL, it calls > cma_ib_listen() which doesn't add this id to any list. A > cma_cancel_listen() will do a list_del. > > Signed-off-by: Krishna Kumar > -------- > > diff -ruNp org/core/cma.c new/core/cma.c > --- org/core/cma.c 2006-09-14 15:31:27.000000000 +0530 > +++ new/core/cma.c 2006-09-14 16:07:35.000000000 +0530 > @@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c > atomic_set(&id_priv->dev_remove, 0); > INIT_LIST_HEAD(&id_priv->listen_list); > INIT_LIST_HEAD(&id_priv->mc_list); > + INIT_LIST_HEAD(&id_priv->list); > get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); > > return &id_priv->id; > > ----- End forwarded message ----- > > > -- > MST > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Tue Sep 19 01:13:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Sep 2006 11:13:24 +0300 Subject: [openib-general] [PATCH] mthca: fix lid used for sending traps Message-ID: <20060919081324.GF31498@mellanox.co.il> From: "Jack Morgenstein" SM lid was incorrectly set to port lid. This is a regression from 2.6.17 - after event, no traps are sent to the SM LID - they go to the loopback interface instead, and are typicaly dropped there. Should be set to sm_lid of port info response. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin --- Roland, this fixes a serious regression from 2.6.17. The bug was introduced by commit 12bbb2b7be7f5564952ebe0196623e97464b8ac5: IB/mthca: Add client reregister event generation I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or 2.6.18.1. Index: ofed_1_1/drivers/infiniband/hw/mthca/mthca_mad.c =================================================================== --- ofed_1_1.orig/drivers/infiniband/hw/mthca/mthca_mad.c 2006-08-16 10:16:19.000000000 +0300 +++ ofed_1_1/drivers/infiniband/hw/mthca/mthca_mad.c 2006-09-19 10:33:31.280328000 +0300 @@ -119,7 +119,7 @@ static void smp_snoop(struct ib_device * mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, - be16_to_cpu(pinfo->lid), + be16_to_cpu(pinfo->sm_lid), pinfo->neighbormtu_mastersmsl & 0xf); event.device = ibdev; -- MST From mst at mellanox.co.il Tue Sep 19 02:08:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Sep 2006 12:08:49 +0300 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: <20060919031534.GC30563@mellanox.co.il> References: <20060919031534.GC30563@mellanox.co.il> Message-ID: <20060919090849.GC32603@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: Fwd: IPoIB Multicast > > Quoting r. Roland Dreier : > Subject: Re: Fwd: IPoIB Multicast > > Here's a patch that tries to fix this. I only tried it with the Cisco > embedded SM, so someone should probably check that this doesn't break > under OpenSM. > > Look OK? > > - R. > > > We've been testing the following which looks exactly equivalent. > I'll look at the regression results in the morning and will let you know. Works OK here. Please commit. Please note this does fix a real issue for us, which is quite severe for clusters where ipoib is the only interconnect, I wander whether this is 2.6.18 material. -- MST From maheshbarve at gmail.com Tue Sep 19 02:29:17 2006 From: maheshbarve at gmail.com (Mahesh Barve) Date: Tue, 19 Sep 2006 14:59:17 +0530 Subject: [openib-general] Posting requests on multiple QPs simultaneously Message-ID: <507df10d0609190229g6855bd33g1f5973d4d489c6f6@mail.gmail.com> Hi, Infiniband allows the creation of 16M QPs. Suppose a programmer wants to post separate requests on each of the QPs simultaneously, what would be the most efficient way of doing it? regards, -mahesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at dev.mellanox.co.il Tue Sep 19 04:28:36 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 19 Sep 2006 14:28:36 +0300 Subject: [openib-general] Posting requests on multiple QPs simultaneously In-Reply-To: <507df10d0609190229g6855bd33g1f5973d4d489c6f6@mail.gmail.com> References: <507df10d0609190229g6855bd33g1f5973d4d489c6f6@mail.gmail.com> Message-ID: <450FD464.9010809@dev.mellanox.co.il> Mahesh Barve wrote: > Hi, > Infiniband allows the creation of 16M QPs. > Suppose a programmer wants to post separate requests on each of the > QPs simultaneously, > what would be the most efficient way of doing it? > regards, > -mahesh > what is your question: should you use threads? should you post one by one or post a list? in one post operation, you cannot post WR to more than one QP. Dotan From eli at dev.mellanox.co.il Tue Sep 19 04:44:51 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Tue, 19 Sep 2006 14:44:51 +0300 Subject: [openib-general] ipoib multicast problems on RHEL4.0 u4 Message-ID: <1158666291.24776.32.camel@localhost> Hi, while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt() succeeds to add a multicast group to an interface but actually the multicast group is not added to the net_device. This means that an application cannot join a multicast group as a full member. When I examined the differences between the kernel sources for u3 and u4 I noticed that essential code was removed: diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c --- net/ipv4/arp.c 2006-09-18 15:35:03.000000000 +0300 +++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c 2006-09-19 10:08:06.000000000 +0300 @@ -213,9 +213,6 @@ case ARPHRD_IEEE802_TR: ip_tr_mc_map(addr, haddr); return 0; - case ARPHRD_INFINIBAND: - ip_ib_mc_map(addr, haddr); - return 0; default: if (dir) { memcpy(haddr, dev->broadcast, dev->addr_len); Can anyone suggest a workaround to this issue? Thanks Eli From erezz at voltaire.com Tue Sep 19 04:45:28 2006 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 19 Sep 2006 14:45:28 +0300 Subject: [openib-general] [PATCH] IB/iser: fix iSER description and selections in Kconfig In-Reply-To: References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> <450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com> <450D0FCB.1000401@voltaire.com> Message-ID: <450FD858.3000507@voltaire.com> Roland Dreier wrote: > Erez> There are 3 additional required config entries: NET, INET & > Erez> INFINIBAND_RDMA_CM. Do you suggest to 'depned' on them or > Erez> 'depned' on some of them and 'select' the rest? > > INET depends on NET, and INFINIBAND_RDMA_CM doesn't exist. So > depending on INET is sufficient. That's the reason 'depend' is better > than 'select' -- you don't have to worry about recreating the full > dependency tree of things you depend on. > > Erez> Also, since I'm not familiar enough with 'make rndconfig', > Erez> here's a question: if iSER 'depends' on INET, is it possible > Erez> that 'make rndconfig' will enable iSER without enabling > Erez> INET? > > No, of course not. The whole point of make randconfig is to make a > random but valid configuration. > > Anyway, rather than waste more time going back and forth on this, I > added the following to my for-2.6.19 tree as the obvious fix: > > Author: Roland Dreier > Date: Sun Sep 17 22:58:27 2006 -0700 > > IB/iser: INFINIBAND_ISER depends on INET > > iSER won't build without CONFIG_INET enabled, so make Kconfig reflect that. > > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig > index fead87d..365a1b5 100644 > --- a/drivers/infiniband/ulp/iser/Kconfig > +++ b/drivers/infiniband/ulp/iser/Kconfig > @@ -1,6 +1,6 @@ > config INFINIBAND_ISER > tristate "ISCSI RDMA Protocol" > - depends on INFINIBAND && SCSI > + depends on INFINIBAND && SCSI && INET > select SCSI_ISCSI_ATTRS > ---help--- > Support for the ISCSI RDMA Protocol over InfiniBand. This > I don't agree with that. It is possible that INFINIBAND_ADDR_TRANS won't be selected according to your patch. How about this solution: iSER should depend on INFINIBAND && SCSI && INFINIBAND_ADDR_TRANS (which depends on INET, so the INET dependency is ok). Erez From kliteyn at dev.mellanox.co.il Tue Sep 19 05:23:17 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 19 Sep 2006 15:23:17 +0300 Subject: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics Message-ID: <450FE135.1060007@dev.mellanox.co.il> Hi Hal. Please apply this patch both to trunk and 1.1. Thanks. -- Yevgeny > Hi Yevgeny, > > On Mon, 2006-09-18 at 14:52, Yevgeny Kliteynik wrote: > > Hi Hal > > > > This patch fixes a bug in opensm that was discovered on > > a 'broken' fabrics when opensm was executed with --stay_on_fatal. > > Replacing assert with a real check. > > > > Yevgeny > > > > Signed-off-by: Yevgeny Kliteynik > > Is this intended for trunk only or also 1.1 ? > > -- Hal From mst at mellanox.co.il Tue Sep 19 05:45:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Sep 2006 15:45:46 +0300 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries Message-ID: <20060919124546.GF32603@mellanox.co.il> Roland, the patch is still under test (I'll leave it to run for a nigh), but I'd like to get comments on the following: IB/ipoib: user appropriate mtu selector for path queries IPoIB must set mtu selector in path record query according to dev->mtu: if we wildcard it, SM can select a path with lower MTU. This breaks IPoIB on networks with SM Tavor quirk activates. We can always require this, since IPoIB spec includes the following statement: The value (for IB MTU) assigned to the broadcast-GID must not be greater than any physical link MTU spanned by the IPoIB subnet. Signed-off-by: Michael S. Tsirkin --- Note the following uses IB_SA_GT so it should be applied on top of SA enum rename. Index: ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- ofed_1_1.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -182,6 +182,8 @@ static int ipoib_change_mtu(struct net_d dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); + queue_work(ipoib_workqueue, &priv->flush_task); + return 0; } @@ -452,15 +454,39 @@ static int path_rec_start(struct net_dev struct ipoib_path *path) { struct ipoib_dev_priv *priv = netdev_priv(dev); + ib_sa_comp_mask comp_mask = IB_SA_PATH_REC_MTU_SELECTOR | IB_SA_PATH_REC_MTU; + + path->pathrec.mtu_selector = IB_SA_GT; - ipoib_dbg(priv, "Start path record lookup for " IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(path->pathrec.dgid)); + switch (roundup_pow_of_two(dev->mtu + IPOIB_ENCAP_LEN)) { + case 512: + path->pathrec.mtu = IB_MTU_256; + break; + case 1024: + path->pathrec.mtu = IB_MTU_512; + break; + case 2048: + path->pathrec.mtu = IB_MTU_1024; + break; + case 4096: + path->pathrec.mtu = IB_MTU_2048; + break; + default: + /* Wildcard everything */ + comp_mask = 0; + path->pathrec.mtu = 0; + path->pathrec.mtu_selector = 0; + } + + ipoib_dbg(priv, "Start path record lookup for " IPOIB_GID_FMT " MTU > %d\n", + IPOIB_GID_ARG(path->pathrec.dgid), + comp_mask ? ib_mtu_enum_to_int(path->pathrec.mtu) : 0); init_completion(&path->done); path->query_id = ib_sa_path_rec_get(priv->ca, priv->port, - &path->pathrec, + &path->pathrec, comp_mask | IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_NUMB_PATH | -- MST From ogerlitz at voltaire.com Tue Sep 19 05:49:06 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 19 Sep 2006 15:49:06 +0300 Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name In-Reply-To: References: <450D36E9.1000502@voltaire.com> Message-ID: <450FE742.7040005@voltaire.com> Roland Dreier wrote: > Or> I want it to be visible so if some other config **depends** on > Or> it the use can **see** this config and select it. > > Or> Also as of the importance of the rdma cm within the IB stack > Or> being along with the ib verbs the second access point to ULP > Or> coders, seeing its config and documenting it is important. > > I don't buy this. The only thing making this config option visible > does is make it more likely (far more likely) that someone will > disable it. Right now the RDMA CM is built as long as INFINIBAND and > INET are enabled. No one is going to turn off INET on any normal > system so effectively the RDMA CM is always built whenever INFINIBAND > is enabled. I am fine with having the CMA config selected whenever someone selects INFINIBAND so adding the help text and making it visible are not a must per my taste. However, are you fine with changing the **name** of the config directive to CONFIG_INFINIBAND_RDMA_CM so its better understood? > As far as making a config symbol to depend on, I think INET makes as > much sense or more: something using IP addressing naturally depends on > having IP networking. As Erez wrote you on the other thread, we must depend on the CMA else a user running make rndconfig would be able to produce a config file where INFINIBAND is selected but the CMA (RDMA_ADDR_TRANS) config is not selected so linkage will fail. Or. From halr at voltaire.com Tue Sep 19 06:17:05 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 09:17:05 -0400 Subject: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics In-Reply-To: <450EEAD9.1000503@dev.mellanox.co.il> References: <450EEAD9.1000503@dev.mellanox.co.il> Message-ID: <1158671825.4509.3990.camel@hal.voltaire.com> On Mon, 2006-09-18 at 14:52, Yevgeny Kliteynik wrote: > Hi Hal > > This patch fixes a bug in opensm that was discovered on > a 'broken' fabrics when opensm was executed with --stay_on_fatal. > Replacing assert with a real check. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied with some cosmetic changes (to both trunk and 1.1). Note that this patch was rejected (not sure why) and was manually applied. -- Hal From halr at voltaire.com Tue Sep 19 06:25:59 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 09:25:59 -0400 Subject: [openib-general] [PATCH][TRIVIAL]OpenSM/osm_node_info_rcv.c: Eliminate superfluous call level Message-ID: <1158672358.4509.4309.camel@hal.voltaire.com> OpenSM/osm_node_info_rcv.c: Eliminate superfluous call level Signed-off-by: Hal Rosenstock Index: opensm/osm_node_info_rcv.c =================================================================== --- opensm/osm_node_info_rcv.c (revision 9536) +++ opensm/osm_node_info_rcv.c (working copy) @@ -437,7 +437,7 @@ __osm_ni_rcv_process_new_ca( The plock must be held before calling this function. **********************************************************************/ static void -__osm_ni_rcv_process_ca_port( +__osm_ni_rcv_process_existing_ca( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, IN const osm_madw_t* const p_madw ) @@ -455,7 +455,7 @@ __osm_ni_rcv_process_ca_port( osm_bind_handle_t h_bind; cl_status_t cl_status; - OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_ca_port ); + OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca ); p_smp = osm_madw_get_smp_ptr( p_madw ); p_ni = (ib_node_info_t*)ib_smp_get_payload_ptr( p_smp ); @@ -473,7 +473,7 @@ __osm_ni_rcv_process_ca_port( if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) ) { osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, - "__osm_ni_rcv_process_ca_port: " + "__osm_ni_rcv_process_existing_ca: " "Creating new port object with GUID = 0x%" PRIx64 "\n", cl_ntoh64( p_ni->port_guid ) ); @@ -483,7 +483,7 @@ __osm_ni_rcv_process_ca_port( if( p_port == NULL ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_ni_rcv_process_ca_port: ERR 0D04: " + "__osm_ni_rcv_process_existing_ca: ERR 0D04: " "Unable to create new port object\n" ); goto Exit; } @@ -500,7 +500,7 @@ __osm_ni_rcv_process_ca_port( Somehow, this port GUID already exists in the table. */ osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_ni_rcv_process_ca_port: ERR 0D12: " + "__osm_ni_rcv_process_existing_ca: ERR 0D12: " "Port 0x%" PRIx64 " already in the database!\n", cl_ntoh64( p_ni->port_guid ) ); @@ -521,7 +521,7 @@ __osm_ni_rcv_process_ca_port( if( cl_status != CL_SUCCESS ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_ni_rcv_process_ca_port: ERR 0D08: " + "__osm_ni_rcv_process_existing_ca: ERR 0D08: " "Error %s adding to list\n", CL_STATUS_MSG( cl_status ) ); osm_port_delete( &p_port ); @@ -530,7 +530,7 @@ __osm_ni_rcv_process_ca_port( else { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, - "__osm_ni_rcv_process_ca_port: " + "__osm_ni_rcv_process_existing_ca: " "Adding port GUID:0x%016" PRIx64 " to new_ports_list\n", cl_ntoh64(osm_node_get_node_guid( p_port->p_node )) ); } @@ -547,7 +547,7 @@ __osm_ni_rcv_process_ca_port( if ( !osm_physp_is_valid( p_physp ) ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_ni_rcv_process_ca_port: ERR 0D19: " + "__osm_ni_rcv_process_existing_ca: ERR 0D19: " "Invalid physical port. Aborting discovery\n"); goto Exit; } @@ -579,7 +579,7 @@ __osm_ni_rcv_process_ca_port( if( status != IB_SUCCESS ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_ni_rcv_process_ca_port: ERR 0D13: " + "__osm_ni_rcv_process_existing_ca: ERR 0D13: " "Failure initiating PortInfo request (%s)\n", ib_get_err_str(status)); } @@ -592,22 +592,6 @@ __osm_ni_rcv_process_ca_port( The plock must be held before calling this function. **********************************************************************/ static void -__osm_ni_rcv_process_existing_ca( - IN const osm_ni_rcv_t* const p_rcv, - IN osm_node_t* const p_node, - IN const osm_madw_t* const p_madw ) -{ - OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca ); - - __osm_ni_rcv_process_ca_port( p_rcv, p_node, p_madw ); - - OSM_LOG_EXIT( p_rcv->p_log ); -} - -/********************************************************************** - The plock must be held before calling this function. -**********************************************************************/ -static void __osm_ni_rcv_process_new_router( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, From aviram at dev.mellanox.co.il Tue Sep 19 08:07:54 2006 From: aviram at dev.mellanox.co.il (Aviram Gutman) Date: Tue, 19 Sep 2006 18:07:54 +0300 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 In-Reply-To: <450ECD3E.8020703@dev.mellanox.co.il> References: <450ECD3E.8020703@dev.mellanox.co.il> Message-ID: <451007CA.7050809@dev.mellanox.co.il> Aviram Gutman wrote: > We want to have RC6 on Wed and final release next week on Tues or Wed > Sep-27. > Is that acceptable by all EWG members? > > Regards, > Aviram > > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > We currently see two issues: 1) IPoIB multicast is not working on RHEL4 U4 2) iSER on SLES10 requires root privilege I hope that Voltaire can fix issue #2. It seems that issue #1 is not solvable (unless we require the user to replace the kernel). Are these issues showstoppers? Or can we issue RC6 with these issues outstanding? Regards, Aviram From rdreier at cisco.com Tue Sep 19 09:27:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 09:27:45 -0700 Subject: [openib-general] Fwd: IPoIB Multicast In-Reply-To: <20060919090849.GC32603@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 19 Sep 2006 12:08:49 +0300") References: <20060919031534.GC30563@mellanox.co.il> <20060919090849.GC32603@mellanox.co.il> Message-ID: Michael> Works OK here. Please commit. Please note this does fix a Michael> real issue for us, which is quite severe for clusters Michael> where ipoib is the only interconnect, I wander whether Michael> this is 2.6.18 material. I don't understand why this is a big problem. What breaks if we let OpenSM pick the MTU and Rate for a new multicast group? It's already picking them for the broadcast group. Anyway I put this in my for-2.6.19 branch for now. - R. From rdreier at cisco.com Tue Sep 19 09:30:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 09:30:59 -0700 Subject: [openib-general] ipoib multicast problem In-Reply-To: <1158647453.5392.66.camel@localhost> (Eli cohen's message of "Tue, 19 Sep 2006 09:30:53 +0300") References: <1158647453.5392.66.camel@localhost> Message-ID: Eli> 1. An application registers to a multicast group as a full Eli> member. As a result all the groups are listed in dev->mclist. Eli> 2. The infiniband link falls momentarily, opensm restarted Eli> etc. 3. All multicast memberships are flushed. 4. The net Eli> device will not join again until at a later time something Eli> will cause ipoib_set_mcast_list() to be called. I don't understand. How could ipoib rejoin the broadcast group and then not rejoin the rest of the full member groups it has? - R. From rdreier at cisco.com Tue Sep 19 09:31:53 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 09:31:53 -0700 Subject: [openib-general] [PATCH] ipoib mcast restart In-Reply-To: <1158647474.5392.68.camel@localhost> (Eli cohen's message of "Tue, 19 Sep 2006 09:31:14 +0300") References: <1158647474.5392.68.camel@localhost> Message-ID: Eli> Make sure after after ipoib_ib_dev_flush is executed, Eli> ipoib_mcast_restart_task is executed also to join all the Eli> mcast groups maintained by the kernel for the device. Why is the ipoib_mcast_start_thread() at the end of ipoib_ib_dev_up() not sufficient to rejoin all the mcgs? - R. From rdreier at cisco.com Tue Sep 19 09:40:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 09:40:54 -0700 Subject: [openib-general] [PATCH] IB/iser: fix iSER description and selections in Kconfig In-Reply-To: <450FD858.3000507@voltaire.com> (Erez Zilber's message of "Tue, 19 Sep 2006 14:45:28 +0300") References: <200609071902.57379.toralf.foerster@gmx.de> <200609101343.02740.toralf.foerster@gmx.de> <450401AE.2030606@voltaire.com> <200609101645.22695.toralf.foerster@gmx.de> <4505032B.3050706@voltaire.com> <450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com> <450D0FCB.1000401@voltaire.com> <450FD858.3000507@voltaire.com> Message-ID: Erez> I don't agree with that. It is possible that Erez> INFINIBAND_ADDR_TRANS won't be selected according to your Erez> patch. How about this solution: iSER should depend on Erez> INFINIBAND && SCSI && INFINIBAND_ADDR_TRANS (which depends Erez> on INET, so the INET dependency is ok). How is that possible? If INFINIBAND and INET are selected, then INFINIBAND_ADDR_TRANS is selected too (at least as far as I can see). How do you enable INET without INFINIBAND_ADDR_TRANS? I don't like making things depend on INFINIBAND_ADDR_TRANS, since it's really just an internal symbol to prevent building ib_addr when it won't build. - R. From rdreier at cisco.com Tue Sep 19 09:42:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 09:42:18 -0700 Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name In-Reply-To: <450FE742.7040005@voltaire.com> (Or Gerlitz's message of "Tue, 19 Sep 2006 15:49:06 +0300") References: <450D36E9.1000502@voltaire.com> <450FE742.7040005@voltaire.com> Message-ID: Or> I am fine with having the CMA config selected whenever someone Or> selects INFINIBAND so adding the help text and making it Or> visible are not a must per my taste. However, are you fine Or> with changing the **name** of the config directive to Or> CONFIG_INFINIBAND_RDMA_CM so its better understood? No, since really what it is controlling is the ib_addr module. Or> As Erez wrote you on the other thread, we must depend on the Or> CMA else a user running make rndconfig would be able to Or> produce a config file where INFINIBAND is selected but the CMA Or> (RDMA_ADDR_TRANS) config is not selected so linkage will fail. How? make randconfig won't produce invalid configurations. - R. From rdreier at cisco.com Tue Sep 19 09:44:57 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 09:44:57 -0700 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: <20060919124546.GF32603@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 19 Sep 2006 15:45:46 +0300") References: <20060919124546.GF32603@mellanox.co.il> Message-ID: Seems OK from an anal spec compliance point of view, but I don't understand this: > This breaks IPoIB on networks with SM Tavor quirk activates. Even if opensm returns a path record with a lower MTU, the underlying links still have a 2K mtu really, so nothing breaks. IPoIB is just doing something naughty by ignoring the MTU in the path record. So what breaks really? (not to mention the fact that the "Tavor quirk" hasn't been accepted into OpenSM yet anyway) - R. From halr at voltaire.com Tue Sep 19 10:16:56 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 13:16:56 -0400 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : include/opensm/osm_pkey.h In-Reply-To: <86zmcylc2e.fsf@mtl066.yok.mtl.com> References: <86zmcylc2e.fsf@mtl066.yok.mtl.com> Message-ID: <1158686214.4509.12804.camel@hal.voltaire.com> On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote: > Hi Hal > > Partition tables blocks are always 16 bits. > This resolves the need to later cast back and forth. > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied to trunk only in conjunction with patch 10/13 on osm_pkey.c. -- Hal From halr at voltaire.com Tue Sep 19 10:17:05 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 13:17:05 -0400 Subject: [openib-general] [PATCH 10/13] osm: port to WinIB stack : opensm/osm_pkey.c In-Reply-To: <86odtelbzy.fsf@mtl066.yok.mtl.com> References: <86odtelbzy.fsf@mtl066.yok.mtl.com> Message-ID: <1158686216.4509.12806.camel@hal.voltaire.com> On Sun, 2006-09-17 at 12:00, Eitan Zahavi wrote: > Hi Hal > > Some explicit casting required and also pkey blocks are only uint16_t . > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied to trunk only in conjunction with patch 2/13 on osm_pkey.h. -- Hal From halr at voltaire.com Tue Sep 19 10:28:38 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 13:28:38 -0400 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : opensm/osm_subnet.c In-Reply-To: <86venmlc1a.fsf@mtl066.yok.mtl.com> References: <86venmlc1a.fsf@mtl066.yok.mtl.com> Message-ID: <1158686918.4509.13210.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote: > Hi Hal I think this patch is really 5/13 rather than 2/13. > No need for stdio.h but do need stdlib.h ... It appears to be the other way around (stdio.h needed but stdlib.h isn't), right ? > Also map snprintf to _snprintf in windows case > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi > > Index: opensm/osm_subnet.c > =================================================================== > --- opensm/osm_subnet.c (revision 9502) > +++ opensm/osm_subnet.c (working copy) > @@ -53,6 +53,7 @@ > > #include Should this include of stdlib.h also be removed ? -- Hal > #include > +#include > #include > #include > #include > @@ -65,7 +66,6 @@ > #include > #include > #include > -#include > > /********************************************************************** > **********************************************************************/ > @@ -659,6 +659,9 @@ __osm_subn_opts_unpack_charp( > } > } > > +#ifdef WIN32 > +#define snprintf _snprintf > +#endif > /********************************************************************** > **********************************************************************/ > static void > From dledford at redhat.com Tue Sep 19 10:43:43 2006 From: dledford at redhat.com (Doug Ledford) Date: Tue, 19 Sep 2006 13:43:43 -0400 Subject: [openib-general] ipoib multicast problems on RHEL4.0 u4 In-Reply-To: <1158666291.24776.32.camel@localhost> References: <1158666291.24776.32.camel@localhost> Message-ID: <1158687823.17671.119.camel@fc6.xsintricity.com> On Tue, 2006-09-19 at 14:44 +0300, Eli cohen wrote: > Hi, > > while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt() > succeeds to add a multicast group to an interface but actually the > multicast group is not added to the net_device. This means that an > application cannot join a multicast group as a full member. When I > examined the differences between the kernel sources for u3 and u4 I > noticed that essential code was removed: > > diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c > --- net/ipv4/arp.c 2006-09-18 15:35:03.000000000 +0300 > +++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c 2006-09-19 > 10:08:06.000000000 +0300 > @@ -213,9 +213,6 @@ > case ARPHRD_IEEE802_TR: > ip_tr_mc_map(addr, haddr); > return 0; > - case ARPHRD_INFINIBAND: > - ip_ib_mc_map(addr, haddr); > - return 0; > default: > if (dir) { > memcpy(haddr, dev->broadcast, dev->addr_len); > > > Can anyone suggest a workaround to this issue? Short of spinning a kernel, it's going to be hard to work around. Thanks for finding this, I'll track down how this got left out of the U4 kernel when it was in the U3 kernel :-/ -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From halr at voltaire.com Tue Sep 19 10:44:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 13:44:49 -0400 Subject: [openib-general] [PATCH 9/13] osm: port to WinIB stack : opensm/osm_prtn.c In-Reply-To: <86psdulc0b.fsf@mtl066.yok.mtl.com> References: <86psdulc0b.fsf@mtl066.yok.mtl.com> Message-ID: <1158687889.4509.13818.camel@hal.voltaire.com> On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote: > Hi Hal > > Required cl_debug.h for PRIx64 > Also map snprintf to _snprintf and stat to _stat > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied to trunk only. -- Hal From halr at voltaire.com Tue Sep 19 11:02:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 14:02:00 -0400 Subject: [openib-general] [PATCH 11/13] osm: port to WinIB stack : opensm/osm_log.c In-Reply-To: <86mz8ylbzn.fsf@mtl066.yok.mtl.com> References: <86mz8ylbzn.fsf@mtl066.yok.mtl.com> Message-ID: <1158688911.4509.14416.camel@hal.voltaire.com> On Sun, 2006-09-17 at 12:00, Eitan Zahavi wrote: > Hi Hal > > 1. function mappings for stat, fstat and fileno > 2. Currently no imp for log file truncation > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied to trunk only. -- Hal From eli at dev.mellanox.co.il Tue Sep 19 11:04:44 2006 From: eli at dev.mellanox.co.il (eli at dev.mellanox.co.il) Date: Tue, 19 Sep 2006 21:04:44 +0300 (IDT) Subject: [openib-general] ipoib multicast problem In-Reply-To: References: <1158647453.5392.66.camel@localhost> Message-ID: <61036.212.235.62.73.1158689084.squirrel@dev.mellanox.co.il> > > > I don't understand. How could ipoib rejoin the broadcast group and > then not rejoin the rest of the full member groups it has? > > That is because the broadcast group is not part of the multicast groups maintained by the kernel but rather is part of ipoib and is joined from a different function. The other full members are maintained by the kernel for the net device and come from dev->mclist. From eeb at bartonsoftware.com Tue Sep 19 11:14:28 2006 From: eeb at bartonsoftware.com (Eric Barton) Date: Tue, 19 Sep 2006 19:14:28 +0100 Subject: [openib-general] Completion callback /teardown race Message-ID: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> Hi, I create 1 CQ just for receive completions on each of my QPs. When I tear down the QP, I rdma_disconnect(), change the QP state to IB_QPS_ERR and then wait for all currently posted receives to complete. This has worked just fine for me, but I've had a bug report from a site using this software (possibly with HCAs I've not tested with) that another completion callback can happen after all the posted receives have completed. I supplied a debug/workaround patch that checks the CQ in this situation. It confirms that all posted receives have completed and that the CQ is in fact empty. Is this a bug, or an unavoidable race between arming the callback and polling the CQ? All the CQ callback does is wake a thread to poll the queue. This effectively keeps polling completions out of the CQ until it is empty. Then it calls ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time. If this last call to ib_poll_cq() finds something, it repeats the whole process - but can I be guaranteed another CQ callback in this case or is it indeterminate? -- Cheers, Eric From halr at voltaire.com Tue Sep 19 11:44:11 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 14:44:11 -0400 Subject: [openib-general] [PATCH 8/13] osm: port to WinIB stack : opensm/osm_opensm.c In-Reply-To: <86r6yalc0j.fsf@mtl066.yok.mtl.com> References: <86r6yalc0j.fsf@mtl066.yok.mtl.com> Message-ID: <1158691450.4509.15987.camel@hal.voltaire.com> On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote: > Hi Hal > > Explicit NULL in empty array initializer > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied to trunk only. -- Hal From rdreier at cisco.com Tue Sep 19 11:56:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 11:56:45 -0700 Subject: [openib-general] ipoib multicast problem In-Reply-To: <61036.212.235.62.73.1158689084.squirrel@dev.mellanox.co.il> (eli@dev.mellanox.co.il's message of "Tue, 19 Sep 2006 21:04:44 +0300 (IDT)") References: <1158647453.5392.66.camel@localhost> <61036.212.235.62.73.1158689084.squirrel@dev.mellanox.co.il> Message-ID: eli> That is because the broadcast group is not part of the eli> multicast groups maintained by the kernel but rather is part eli> of ipoib and is joined from a different function. The other eli> full members are maintained by the kernel for the net device eli> and come from dev->mclist. Oh I see, when we flush the multicast groups we actually delete all of them instead of just removing the attached flag. OK I guess your fix makes sense then. - R. From eli at dev.mellanox.co.il Tue Sep 19 12:08:30 2006 From: eli at dev.mellanox.co.il (eli at dev.mellanox.co.il) Date: Tue, 19 Sep 2006 22:08:30 +0300 (IDT) Subject: [openib-general] [PATCH] ipoib mcast restart In-Reply-To: References: <1158647474.5392.68.camel@localhost> Message-ID: <61651.212.235.62.73.1158692910.squirrel@dev.mellanox.co.il> > Why is the ipoib_mcast_start_thread() at the end of ipoib_ib_dev_up() > not sufficient to rejoin all the mcgs? > Because after a port event all the mcast groups on the device are flushed and all that remains is from the dev->mclist and we must renew the joins from there. From rdreier at cisco.com Tue Sep 19 12:10:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 12:10:39 -0700 Subject: [openib-general] [PATCH] mthca: fix lid used for sending traps In-Reply-To: <20060919081324.GF31498@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 19 Sep 2006 11:13:24 +0300") References: <20060919081324.GF31498@mellanox.co.il> Message-ID: > I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or > 2.6.18.1. Makes sense -- I'll try to get this into 2.6.18, since it's a one-liner and fixes a regression from 2.6.17. - R. From rdreier at cisco.com Tue Sep 19 12:13:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 12:13:36 -0700 Subject: [openib-general] [GIT PULL] please pull infiniband.git (one-liner fix for 2.6.18) Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This contains another one-liner that fixes a regression from 2.6.17: Jack Morgenstein: IB/mthca: Fix lid used for sending traps drivers/infiniband/hw/mthca/mthca_mad.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c index d9bc030..45e106f 100644 --- a/drivers/infiniband/hw/mthca/mthca_mad.c +++ b/drivers/infiniband/hw/mthca/mthca_mad.c @@ -119,7 +119,7 @@ static void smp_snoop(struct ib_device * mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, - be16_to_cpu(pinfo->lid), + be16_to_cpu(pinfo->sm_lid), pinfo->neighbormtu_mastersmsl & 0xf); event.device = ibdev; From trimmer at silverstorm.com Tue Sep 19 12:24:27 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 19 Sep 2006 15:24:27 -0400 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> Message-ID: > From: Eric Barton > Sent: Tuesday, September 19, 2006 2:14 PM > To: openib-general at openib.org > Subject: [openib-general] Completion callback /teardown race > > > > All the CQ callback does is wake a thread to poll the queue. This > effectively > keeps polling completions out of the CQ until it is empty. Then it calls > ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time. > > If this last call to ib_poll_cq() finds something, it repeats the whole > process > - but can I be guaranteed another CQ callback in this case or is it > indeterminate? > The recommended algorithm would be: poll_cq until empty ib_req_notify_cq poll_cq until empty Once ib_req_notify_cq is called, its possible for an additional callback to race with the poll_cq's which follow. There are some differences in HCA behaviour with regard to ib_req_notify_cq. Mellanox HCAs will provide a callback/interrupt if the CQ is not empty at this point (in which case the poll_cq's after the notify are optional). However the behaviour defined in the IBTA spec indicates that ib_req_notify_cq will cause a callback/interrupt only on the next CQE which arrives, hence to be portable the poll_cq loop after ib_req_notify_cq is necessary to cover any CQEs which arrived between the prior poll and the ib_req_notify_cq. Within a given callback invokation, there is no reason to call notify more than once. Todd Rimmer From rdreier at cisco.com Tue Sep 19 12:24:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 12:24:37 -0700 Subject: [openib-general] [PATCH] ipoib mcast restart In-Reply-To: <1158647474.5392.68.camel@localhost> (Eli cohen's message of "Tue, 19 Sep 2006 09:31:14 +0300") References: <1158647474.5392.68.camel@localhost> Message-ID: OK, I applied this to for-2.6.19, although the patch was line-wrapped, didn't have a usable subject, etc.... So... I merge > 100 patches every kernel release. If I have to spend an extra 5 minutes for each one fixing a patch or pulling it out of svn, then I end up burning an extra 9 hours of stupid work. If 20+ people who contribute patches sent me clean patches, then everyone will be happier because I'll be able to merge things quicker and focus on productive work. From rdreier at cisco.com Tue Sep 19 13:28:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 13:28:08 -0700 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: <20060919124546.GF32603@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 19 Sep 2006 15:45:46 +0300") References: <20060919124546.GF32603@mellanox.co.il> Message-ID: I didn't really read the new patch before... anyway: Why have you changed from the approach of just using the broadcast group's MTU? As far as I can see, the issue being addressed here is purely theoretical anyway, but with the approach of taking the current device MTU, you now have to flush all the paths if the configured MTU changes, and you have to have a big switch in path_rec_start(). - R. From halr at voltaire.com Tue Sep 19 13:59:32 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 16:59:32 -0400 Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5) In-Reply-To: <450E976D.3070802@xiranet.com> References: <450E7C0E.3020001@xiranet.com> <1158577816.18842.1501.camel@hal.voltaire.com> <450E8119.4060405@xiranet.com> <1158583167.18842.4632.camel@hal.voltaire.com> <450E976D.3070802@xiranet.com> Message-ID: <1158699571.4509.21167.camel@hal.voltaire.com> Hi Mirko, On Mon, 2006-09-18 at 08:56, Mirko Benz wrote: > Hi Hal, > > Please prepare the bugzilla entry. I entered the following: http://openib.org/bugzilla/show_bug.cgi?id=238 http://openib.org/bugzilla/show_bug.cgi?id=239 Feel free to annotate it. -- Hal > It is not critical -- I just think it is not convenient for an end user. > > Regards, > Mirko > > Hal Rosenstock schrieb: > > Hi again Mirko, > > > > On Mon, 2006-09-18 at 07:20, Mirko Benz wrote: > > > >> Hi Hal, > >> > >> This was a default/build all OFED install. Either we should place these > >> tools under ../ofed/sbin or make it work for every body. > >> > > > > The issue with making it work for everyone is that there's a chicken and > > egg problem in that when the tools are built and installed, one doesn't > > know how udev will be configured for umad. I agree that since the > > default is to run as root, these should be in sbin rather than bin. Can > > you file a bugzilla report for this (or do you want me to do it on your > > behalf) ? Is this critical for OFED 1.1 ? > > > > > >> At least a error message that umad access failed would be required. > >> > > Those are scripts and the errors are being returned from the lower level > > programs invoked but not by the scripts. > > > > Would you please file a bug for this as well (or let me know whether I > > should do this) ? > > > > Thanks. > > > > -- Hal > > > > > >> Regards, > >> Mirko > >> > >> Hal Rosenstock schrieb: > >> > >>> Hi Mirko, > >>> > >>> On Mon, 2006-09-18 at 06:59, Mirko Benz wrote: > >>> > >>> > >>>> Hello, > >>>> > >>>> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone). > >>>> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under > >>>> .../ofed/bin/) > >>>> do not work with a normal user account -- no output given. It works as > >>>> root though. > >>>> > >>>> > >>> It depends on how you have udev access for umad setup. With the default > >>> setup for IB, root is required as these diagnostics send SMPs which > >>> require umad access which is limited to root. > >>> > >>> -- Hal > >>> > >>> > >>> > >>>> Regards, > >>>> Mirko > >>>> > >>>> _______________________________________________ > >>>> openib-general mailing list > >>>> openib-general at openib.org > >>>> http://openib.org/mailman/listinfo/openib-general > >>>> > >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >>>> > >>>> > >>>> > From rdreier at cisco.com Tue Sep 19 14:17:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 14:17:03 -0700 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> ( Eric Barton's message of "Tue, 19 Sep 2006 19:14:28 +0100") References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> Message-ID: Eric> If this last call to ib_poll_cq() finds something, it Eric> repeats the whole process - but can I be guaranteed another Eric> CQ callback in this case or is it indeterminate? In general there is an unavoidable race, since you don't know whether the new completion you find in the CQ was generated before or after you requested notification. So with the completion semantics as defined in the IBA spec, you have the choice of two poisons: 1) Don't poll after you request notification. Then you run the risk of a completion being added after your last poll but before you requested notification. If another completion never occurs, then you're stuck forever. 2) Poll after you request notification. Then you run the risk of having a completion added after your request for notification but before your final poll. This means another completion event will be pending, but you will likely drain the CQ before you take the event. However, Mellanox HCAs implement stronger semantics: they generate an event if the CQ is not empty at the time notification is requested, which closes the race between draining the CQ and requesting notification. This means *for Mellanox HCAs only* it is safe to do: completion_handler(): poll CQ until empty request notification on CQ with no additional poll after the request for notification. I'll have more to say on this in the context of IPoIB and NAPI shortly, since I've been thinking about this issue myself. The ipath driver implements only the weaker semantics guaranteed by the IBA spec -- ie an event is generated if a completion is added after the request for notification. And I don't know what ehca and amso1100 implement to be honest. (The Mellanox semantics are conforming though, since it's not well-defined exactly when a completion is added to a CQ if no one looks...) - R. From bevans at ocf.co.uk Tue Sep 19 14:36:43 2006 From: bevans at ocf.co.uk (Barry Evans) Date: Tue, 19 Sep 2006 22:36:43 +0100 Subject: [openib-general] Fluent and OFED Message-ID: <43185D89536AD545B065B7ECEA630065AF28@mailserver.ocf.co.uk> Hello, Has anyone had any luck getting Fluent 6.2 to cooperate with OFED? I think I've got all the libraries pointing to the right place, but I'm ending up with the dreaded: "[1] Abort: [0] Abort: mpirun: executable version 1 does not match our version 3." from mvapich. Ugh. Cheers, Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From trimmer at silverstorm.com Tue Sep 19 14:47:19 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 19 Sep 2006 17:47:19 -0400 Subject: [openib-general] Completion callback /teardown race In-Reply-To: Message-ID: > From: Roland Dreier > Sent: Tuesday, September 19, 2006 5:17 PM > To: Eric Barton > Cc: openib-general at openib.org > Subject: Re: [openib-general] Completion callback /teardown race > > > I'll have more to say on this in the context of IPoIB and NAPI > shortly, since I've been thinking about this issue myself. > > The ipath driver implements only the weaker semantics guaranteed by > the IBA spec -- ie an event is generated if a completion is added > after the request for notification. And I don't know what ehca and > amso1100 implement to be honest. > > (The Mellanox semantics are conforming though, since it's not > well-defined exactly when a completion is added to a CQ if no one > looks...) An approach we implemented a few years ago in our proprietary stack was a new verb (in addition to poll_cq and notify_req): poll_and_notify (we called it iba_poll_and_rearm). This verb always did a poll_cq, but if the CQ was drained it then did a rearm of the CQ. The return value from the call indicated what the next step for the caller should be: - SUCCESS - call poll_and_notify again (CQE returned) - COMPLETED - nothing to do after this CQE (CQE returned, rearmed, no need to poll anymore) - POLL_NEEDED - loop on poll (CQE returned, rearmed, need to poll_cq til empty) - NOT_DONE - nothing more to do, no CQE (no CQE returned, rearmed, CQ still empty, no need to poll anymore) - error (invalid call, etc) callback would loop on poll_and_notify as long as SUCCESS was returned. afterwhich if POLL_NEEDED had been returned, it would loop on poll_cq This approach provided 2 advantages: 1. for performance an extra 1-2 calls into the HCA driver per callback were avoided. The win here was saving some spin locks (in high CQE rate drivers like IPoIB this was noticible). 2. on HCAs such as mellanox, POLL_NEEDED was never returned and the caller never did unnecessary polls, however the caller and API was also able to handle HCAs which did not have the mellanox semantics. Todd Rimmer From rdreier at cisco.com Tue Sep 19 15:04:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Sep 2006 15:04:42 -0700 Subject: [openib-general] Completion callback /teardown race In-Reply-To: (Todd Rimmer's message of "Tue, 19 Sep 2006 17:47:19 -0400") References: Message-ID: Todd> An approach we implemented a few years ago in our Todd> proprietary stack was a new verb (in addition to poll_cq and Todd> notify_req): poll_and_notify (we called it Todd> iba_poll_and_rearm). Makes sense but it doesn't actually help for NAPI for ipoib (to be fair I haven't described the issue there yet). - R. From ftillier at silverstorm.com Tue Sep 19 15:27:39 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 19 Sep 2006 15:27:39 -0700 Subject: [openib-general] Completion callback /teardown race In-Reply-To: References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> Message-ID: <79ae2f320609191527j6fbeeafbu74f7ba468cce1f73@mail.gmail.com> On 9/19/06, Rimmer, Todd wrote: > > From: Eric Barton > > Sent: Tuesday, September 19, 2006 2:14 PM > > To: openib-general at openib.org > > Subject: [openib-general] Completion callback /teardown race > > > > All the CQ callback does is wake a thread to poll the queue. This > > effectively > > keeps polling completions out of the CQ until it is empty. Then it > > calls > > ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time. > > > > If this last call to ib_poll_cq() finds something, it repeats the > > whole process > > - but can I be guaranteed another CQ callback in this case or is it > > indeterminate? > > > The recommended algorithm would be: > > poll_cq until empty > ib_req_notify_cq > poll_cq until empty Note that if you are going to poll after ib_req_notify_cq, you can simplify the above algorithm and just do: ib_req_notify_cq poll_cq until empty However, such an algorithm will result in extra CQ events on Mellanox HCAs. On HCAs where the new CQ event is only generated for new CQEs it works just as well as the opposite, which works only on Mellanox HCAs: poll_cq until empty ib_req_notify_cq > There are some differences in HCA behaviour with regard to > ib_req_notify_cq. Mellanox HCAs will provide a callback/interrupt if > the CQ is not empty at this point (in which case the poll_cq's after the > notify are optional). > > However the behaviour defined in the IBTA spec indicates that > ib_req_notify_cq will cause a callback/interrupt only on the next CQE > which arrives, hence to be portable the poll_cq loop after > ib_req_notify_cq is necessary to cover any CQEs which arrived between > the prior poll and the ib_req_notify_cq. I remember a while ago a mention that the behavior of the Mellanox HCAs could be controlled in the firmware, so that they would follow the IBTA spec defined behavior. I don't know what the impact on performance would be if such a change were made. Perhaps someone from Mellanox can confirm/deny the HCAs ability to implement the IBA spec behavior, and quantify the effects. - Fab From rjwalsh at pathscale.com Tue Sep 19 17:09:03 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 17:09:03 -0700 Subject: [openib-general] gen2_basic patches Message-ID: <4510869F.60309@pathscale.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, We've got some patches to gen2_basic to fix some problems with the test suite. Some are trivial (fix typos, etc.) and some are more serious (handle max_qp counts correctly, etc.) I'm going to be sending them out piecemeal as we review them internally, and I'll make sure to send them out in sequence (i.e. in the order they should be applied), so don't be surprised to hear nothing for a day or two, then see some more patches ;-) Regards, Robert. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRRCGnvzvnpzTd9fxAQKcegf/UtzQJiZFPRkcd4ZvBTHbNUdVK2NcQNkw pAu/Mh2xRDboQ28btoJJbrERZ9VUpIlnyc8rQ2wRmDbkCQL/7vpDZkLK5XRYXZfg DrwiXimRd8NHLfKVR/wbrR6QtuTDbIUpMWSpCFxkOoAYmKSRusjEoLK/Yf3gXggt NsxoomFKSEPV3W2tgEn8Aanq0ZzfTPmBhFNbHPOrpyfb/tWFVc+IAQF/QFSai1Tm PSjagRxTHY1eHCBHC7w1WZc7OOrSOBeKev5tzzcFO2PpzQ/3fAztcKRfDJ0UakIi xvMOO+C0qM1EUowIRW+ymCoeFF5SXR6p2fuFeZ+vF6S6Sf9X1o7PLg== =YULT -----END PGP SIGNATURE----- From rjwalsh at pathscale.com Tue Sep 19 17:12:26 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 17:12:26 -0700 Subject: [openib-general] gen2_basic patch 1/10: fix some minor typos Message-ID: <4510876A.6070602@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 01_typos.patch URL: From rjwalsh at pathscale.com Tue Sep 19 17:12:49 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 17:12:49 -0700 Subject: [openib-general] gen2_basic patch 2/10: fix up some compiler warnings Message-ID: <45108781.8060602@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 02_warnings.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 02_warnings.patch.sig Type: application/octet-stream Size: 280 bytes Desc: not available URL: From rjwalsh at pathscale.com Tue Sep 19 17:26:21 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 17:26:21 -0700 Subject: [openib-general] gen2_basic patch 3/10: fix is_global settings for AH attributes Message-ID: <45108AAD.9040001@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 03_global.patch URL: From rjwalsh at pathscale.com Tue Sep 19 17:27:43 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 17:27:43 -0700 Subject: [openib-general] gen2_basic patch 4/10: make sure the DLID is valid Message-ID: <45108AFF.6070703@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 04_valid_lids.patch URL: From rjwalsh at pathscale.com Tue Sep 19 17:28:31 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 17:28:31 -0700 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number Message-ID: <45108B2F.8080207@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 05_valid_port.patch URL: From rjwalsh at pathscale.com Tue Sep 19 17:29:27 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 17:29:27 -0700 Subject: [openib-general] gen2_basic patch 6/10: handle case where max_sge > 100 Message-ID: <45108B67.1000606@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 06_max_sge.patch URL: From halr at voltaire.com Tue Sep 19 18:05:53 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2006 21:05:53 -0400 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number In-Reply-To: <45108B2F.8080207@pathscale.com> References: <45108B2F.8080207@pathscale.com> Message-ID: <1158714353.4509.30709.camel@hal.voltaire.com> On Tue, 2006-09-19 at 20:28, Robert Walsh wrote: > gen2_basic - select a valid port number > > Port numbers start at 1, not 0. True for CA and routers but not switches. > Signed-off by: Robert Walsh > > diff -rNu a/gen2_basic/test_poll_post.c b/gen2_basic/test_poll_post.c > --- a/gen2_basic/test_poll_post.c 2006-09-13 19:09:47.410808000 -0700 > +++ b/gen2_basic/test_poll_post.c 2006-08-14 14:17:03.705821000 -0700 > @@ -283,7 +283,7 @@ > .dlid = VL_range(rand_gen, 1, 0xffff), > .sl = VL_range(rand_gen, 0, 15), > .src_path_bits = VL_range(rand_gen, 0, 0x8f), > - .port_num = VL_random(rand_gen, device_attr.phys_port_cnt), > + .port_num = VL_range(rand_gen, 1, device_attr.phys_port_cnt), > .static_rate = get_static_rate(1, rand_gen), > .grh = { > .traffic_class = VL_range(rand_gen, 1, 0xff), > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rjwalsh at pathscale.com Tue Sep 19 18:16:13 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 18:16:13 -0700 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number In-Reply-To: <1158714353.4509.30709.camel@hal.voltaire.com> References: <45108B2F.8080207@pathscale.com> <1158714353.4509.30709.camel@hal.voltaire.com> Message-ID: <4510965D.4040103@pathscale.com> Hal Rosenstock wrote: > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote: >> gen2_basic - select a valid port number >> >> Port numbers start at 1, not 0. > > True for CA and routers but not switches. Yeah. Does anyone run gen2_basic on switches, though? I assumed it was HCA-centric. Regards, Robert. From mst at mellanox.co.il Tue Sep 19 21:26:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Sep 2006 07:26:18 +0300 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: References: Message-ID: <20060920042618.GA1710@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries > > I didn't really read the new patch before... anyway: > > Why have you changed from the approach of just using the broadcast > group's MTU? As far as I can see, the issue being addressed here is > purely theoretical anyway, but with the approach of taking the current > device MTU, you now have to flush all the paths if the configured MTU > changes, and you have to have a big switch in path_rec_start(). > > - R. > I'm not sure priv->broadcast is always initialized when we start a path record query. Is there a reason why it is? -- MST From halr at voltaire.com Tue Sep 19 21:39:48 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2006 00:39:48 -0400 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number In-Reply-To: <4510965D.4040103@pathscale.com> References: <45108B2F.8080207@pathscale.com> <1158714353.4509.30709.camel@hal.voltaire.com> <4510965D.4040103@pathscale.com> Message-ID: <1158727188.4509.39096.camel@hal.voltaire.com> On Tue, 2006-09-19 at 21:16, Robert Walsh wrote: > Hal Rosenstock wrote: > > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote: > >> gen2_basic - select a valid port number > >> > >> Port numbers start at 1, not 0. > > > > True for CA and routers but not switches. > > Yeah. Does anyone run gen2_basic on switches, though? I assumed it was > HCA-centric. Yes, that appears to be the scope but I'm not 100% sure. -- Hal > Regards, > Robert. From mst at mellanox.co.il Tue Sep 19 21:58:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Sep 2006 07:58:06 +0300 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: References: <20060919124546.GF32603@mellanox.co.il> Message-ID: <20060920045806.GE1710@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries > > Seems OK from an anal spec compliance point of view, but I don't > understand this: > > > This breaks IPoIB on networks with SM Tavor quirk activates. > > Even if opensm returns a path record with a lower MTU, the underlying > links still have a 2K mtu really, so nothing breaks. IPoIB is just > doing something naughty by ignoring the MTU in the path record. So > what breaks really? Maybe "breaks" was too strong a word. Let's change that to "This makes IPoIB behave in a naughty way on networks with SM Tavor quirk active" :) > (not to mention the fact that the "Tavor quirk" hasn't been accepted > into OpenSM yet anyway) AFAIK it has been accepted. -- MST From mst at mellanox.co.il Tue Sep 19 22:01:11 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Sep 2006 08:01:11 +0300 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: <20060920042618.GA1710@mellanox.co.il> References: <20060920042618.GA1710@mellanox.co.il> Message-ID: <20060920050111.GF1710@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries > > Quoting r. Roland Dreier : > > Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries > > > > I didn't really read the new patch before... anyway: > > > > Why have you changed from the approach of just using the broadcast > > group's MTU? As far as I can see, the issue being addressed here is > > purely theoretical anyway, but with the approach of taking the current > > device MTU, you now have to flush all the paths if the configured MTU > > changes, and you have to have a big switch in path_rec_start(). > > > > - R. > > > > I'm not sure priv->broadcast is always initialized when we start > a path record query. Is there a reason why it is? It also seemed kind of nice to be able to control the path MTU from dev->mtu - and I don't think path flush on mtu change is an issue from the performance POV. What do you think? -- MST From mst at mellanox.co.il Tue Sep 19 22:05:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Sep 2006 08:05:30 +0300 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number In-Reply-To: <1158727188.4509.39096.camel@hal.voltaire.com> References: <45108B2F.8080207@pathscale.com> <1158714353.4509.30709.camel@hal.voltaire.com> <4510965D.4040103@pathscale.com> <1158727188.4509.39096.camel@hal.voltaire.com> Message-ID: <20060920050530.GG1710@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: gen2_basic patch 5/10: select a valid port number > > On Tue, 2006-09-19 at 21:16, Robert Walsh wrote: > > Hal Rosenstock wrote: > > > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote: > > >> gen2_basic - select a valid port number > > >> > > >> Port numbers start at 1, not 0. > > > > > > True for CA and routers but not switches. > > > > Yeah. Does anyone run gen2_basic on switches, though? I assumed it was > > HCA-centric. > > Yes, that appears to be the scope but I'm not 100% sure. Its easy to get linux running on a switch, so why not? You just need to write a low level driver that cn send/receve MADs. We did run a gen1 port on a switch at some point, and someone might want to do it again. -- MST From mst at mellanox.co.il Tue Sep 19 22:14:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Sep 2006 08:14:20 +0300 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <79ae2f320609191527j6fbeeafbu74f7ba468cce1f73@mail.gmail.com> References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> <79ae2f320609191527j6fbeeafbu74f7ba468cce1f73@mail.gmail.com> Message-ID: <20060920051420.GH1710@mellanox.co.il> Quoting r. Fabian Tillier : > > There are some differences in HCA behaviour with regard to > > ib_req_notify_cq. Mellanox HCAs will provide a callback/interrupt if > > the CQ is not empty at this point (in which case the poll_cq's after the > > notify are optional). > > > > However the behaviour defined in the IBTA spec indicates that > > ib_req_notify_cq will cause a callback/interrupt only on the next CQE > > which arrives, hence to be portable the poll_cq loop after > > ib_req_notify_cq is necessary to cover any CQEs which arrived between > > the prior poll and the ib_req_notify_cq. > > I remember a while ago a mention that the behavior of the Mellanox > HCAs could be controlled in the firmware, so that they would follow > the IBTA spec defined behavior. There's a mistake here. Mellanox HCAs will generate an event upon ib_req_notify_cq only if new completions has arrived after the previous event has been reported. AFAIK this is IBTA spec compliant. -- MST From rjwalsh at pathscale.com Tue Sep 19 22:42:23 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 19 Sep 2006 22:42:23 -0700 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number In-Reply-To: <20060920050530.GG1710@mellanox.co.il> References: <45108B2F.8080207@pathscale.com> <1158714353.4509.30709.camel@hal.voltaire.com> <4510965D.4040103@pathscale.com> <1158727188.4509.39096.camel@hal.voltaire.com> <20060920050530.GG1710@mellanox.co.il> Message-ID: <4510D4BF.30907@pathscale.com> > Its easy to get linux running on a switch, so why not? You just > need to write a low level driver that cn send/receve MADs. > We did run a gen1 port on a switch at some point, and someone might want to > do it again. OK - that's a fine project idea, but I'm not about to start coding it up any time soon :-) In any case, if we're going to insist that this test run on a hypothetical switch gen2 distribution, then the "choose a random port" code needs to check if it's running on a CA or router versus a switch and choose the port range appropriately. Regards, Robert. From mst at mellanox.co.il Tue Sep 19 23:02:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Sep 2006 09:02:04 +0300 Subject: [openib-general] [PATCH] mthca: fix lid used for sending traps In-Reply-To: References: <20060919081324.GF31498@mellanox.co.il> Message-ID: <20060920060204.GA2870@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] mthca: fix lid used for sending traps > > > I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or > > 2.6.18.1. > > Makes sense -- I'll try to get this into 2.6.18, since it's a > one-liner and fixes a regression from 2.6.17. Arrr! http://lkml.org/lkml/2006/9/20/2 Missed 2.6.18 by a small margin. Gar! Acked for 2.6.18.1? -- MST From kliteyn at dev.mellanox.co.il Tue Sep 19 23:36:52 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 20 Sep 2006 09:36:52 +0300 Subject: [openib-general] [PATCH] osm: fixing bugs in osmtest Message-ID: <4510E184.8070900@dev.mellanox.co.il> Hi Hal I'm doing a major review of the osmtest. This patch is fixing a few bugs in osmtest where failures were ignored. More precisely, osmtest was expecting error, but got IB_SUCCESS and ignored the fact that it should have gotten an error. There are also a few changes to improve the code and osmtest log readability. More patches expected. This patch is for trunk only. I tested applying this patch before sending it. If you get the patch rejected again - let me know. Thanks. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmtest/include/osmtest.h =================================================================== --- osmtest/include/osmtest.h (revision 9552) +++ osmtest/include/osmtest.h (working copy) @@ -506,4 +506,13 @@ ib_api_status_t osmtest_get_local_port_lmc( IN osmtest_t * const p_osmt, IN ib_net16_t lid, OUT uint8_t * const p_lmc ); + + +/* + * A few auxiliary macros for logging + */ + +#define EXPECTING_ERRORS_START "[[ ===== Expecting Errors - START ===== " +#define EXPECTING_ERRORS_END " ===== Expecting Errors - END ===== ]]" + #endif /* _OSMTEST_H_ */ Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 9552) +++ osmtest/osmtest.c (working copy) @@ -552,6 +552,7 @@ osmtest_init( IN osmtest_t * const p_osm osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_init: ERR 0001: " "Unable to allocate vendor object" ); + status = IB_ERROR; goto Exit; } @@ -1817,6 +1818,11 @@ osmtest_wrong_sm_key_ignored( IN osmtest osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_wrong_sm_key_ignored: ERR 0011: " "Did not get a timeout but got (%s)\n", ib_get_err_str( status ) ); + if ( status == IB_SUCCESS ) + { + /* assign some error value to status, since IB_SUCCESS is a bad rc */ + status = IB_ERROR; + } goto Exit; } else @@ -5448,14 +5454,23 @@ osmtest_validate_against_db( IN osmtest_ memset( &context, 0, sizeof( context ) ); memset( &request, 0, sizeof( request ) ); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + if( status == IB_SUCCESS ) - goto Exit; - else { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_multipath_rec: " - "IS EXPECTED ERROR ^^^^\n"); + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5463,14 +5478,23 @@ osmtest_validate_against_db( IN osmtest_ request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT; request.sgid_count = 1; ib_gid_set_default( &request.gids[0], portguid ); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); - if( status == IB_SUCCESS ) - goto Exit; - else + if( status != IB_SUCCESS ) { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_multipath_rec: " - "IS EXPECTED ERROR ^^^^\n"); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + + if( status == IB_SUCCESS ) + { + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5482,14 +5506,23 @@ osmtest_validate_against_db( IN osmtest_ /* Set IPoIB broadcast MGID */ request.gids[1].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL); request.gids[1].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + if( status == IB_SUCCESS ) - goto Exit; - else { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_multipath_rec: " - "IS EXPECTED ERROR ^^^^\n"); + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5500,14 +5533,23 @@ osmtest_validate_against_db( IN osmtest_ request.gids[0].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL); request.gids[0].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL); ib_gid_set_default( &request.gids[1], portguid ); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + if( status == IB_SUCCESS ) - goto Exit; - else { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_multipath_rec_gid_ipoib_bcast: " - "IS EXPECTED ERROR ^^^^\n"); + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5569,14 +5611,23 @@ osmtest_validate_against_db( IN osmtest_ goto Exit; memset( &context, 0, sizeof( context ) ); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_pkeytbl_rec_by_lid( p_osmt, test_lid, 0, &context ); - if ( status == IB_SUCCESS ) - goto Exit; - else + if( status != IB_SUCCESS ) { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_pkeytbl_rec_by_lid: " - "IS EXPECTED ERROR ^^^^\n"); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + + if( status == IB_SUCCESS ) + { + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5679,26 +5730,43 @@ osmtest_validate_against_db( IN osmtest_ goto Exit; memset( &context, 0, sizeof( context ) ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_path_rec_by_lid_pair( p_osmt, 0xffff, 0xffff, &context ); - if (status == IB_SUCCESS ) - goto Exit; - else + if( status != IB_SUCCESS ) { - osm_log ( &p_osmt->log, OSM_LOG_ERROR, + osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_get_path_rec_by_lid_pair: " - "IS EXPECTED ERROR ^^^^\n" ); + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_END "\n" ); + + if( status == IB_SUCCESS ) + { + status = IB_ERROR; + goto Exit; } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_START "\n" ); + status = osmtest_get_path_rec_by_lid_pair( p_osmt, test_lid, 0xffff, &context ); - if (status == IB_SUCCESS ) - goto Exit; - else + if( status != IB_SUCCESS ) { - osm_log ( &p_osmt->log, OSM_LOG_ERROR, + osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_get_path_rec_by_lid_pair: " - "IS EXPECTED ERROR ^^^^\n" ); + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_END "\n" ); + + if( status == IB_SUCCESS ) + { + status = IB_ERROR; + goto Exit; } } } @@ -7141,6 +7209,9 @@ osmtest_run( IN osmtest_t * const p_osmt if( p_osmt->opt.flow == 1 ) { + /* + * Creating an inventory file with all nodes, ports and paths + */ status = osmtest_create_inventory_file( p_osmt ); if( status != IB_SUCCESS ) { @@ -7155,6 +7226,9 @@ osmtest_run( IN osmtest_t * const p_osmt { if( p_osmt->opt.flow == 5 ) { + /* + * Stress SA - flood the it with queries + */ switch ( p_osmt->opt.stress ) { case 0: @@ -7215,8 +7289,11 @@ osmtest_run( IN osmtest_t * const p_osmt /* * Run normal validition tests. */ - if (!p_osmt->opt.flow || p_osmt->opt.flow == 2) + if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 2) { + /* + * Only validate the given inventory file + */ status = osmtest_create_db( p_osmt ); if( status != IB_SUCCESS ) { @@ -7238,7 +7315,7 @@ osmtest_run( IN osmtest_t * const p_osmt } } - if (!p_osmt->opt.flow) + if (p_osmt->opt.flow == 0) { status = osmtest_wrong_sm_key_ignored( p_osmt ); if( status != IB_SUCCESS ) @@ -7251,8 +7328,11 @@ osmtest_run( IN osmtest_t * const p_osmt } } - if (!p_osmt->opt.flow || p_osmt->opt.flow == 3) + if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 3) { + /* + * run service registration, deregistration, and lease test + */ status = osmt_run_service_records_flow( p_osmt ); if( status != IB_SUCCESS ) { @@ -7264,8 +7344,11 @@ osmtest_run( IN osmtest_t * const p_osmt } } - if (!p_osmt->opt.flow || p_osmt->opt.flow == 4) + if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 4) { + /* + * Run event forwarding test + */ #ifdef OSM_VENDOR_INTF_MTL status = osmt_run_inform_info_flow( p_osmt ); @@ -7286,12 +7369,13 @@ osmtest_run( IN osmtest_t * const p_osmt #endif } - /* - * since it generates a huge file, we run it only - * if explicitly required to - */ if (p_osmt->opt.flow == 7) { + /* + * QoS info: dump VLArb and SLtoVL tables. + * Since it generates a huge file, we run it only + * if explicitly required to + */ status = osmtest_create_db( p_osmt ); if( status != IB_SUCCESS ) { @@ -7315,6 +7399,9 @@ osmtest_run( IN osmtest_t * const p_osmt if (p_osmt->opt.flow == 8) { + /* + * Run trap 64/65 flow + */ #ifdef OSM_VENDOR_INTF_MTL status = osmt_run_trap64_65_flow( p_osmt ); if( status != IB_SUCCESS ) @@ -7334,8 +7421,11 @@ osmtest_run( IN osmtest_t * const p_osmt #endif } - if (!p_osmt->opt.flow || p_osmt->opt.flow == 6) + if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 6) { + /* + * Multicast flow + */ status = osmt_run_mcast_flow( p_osmt ); if( status != IB_SUCCESS ) { From dotanb at dev.mellanox.co.il Wed Sep 20 00:40:40 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 20 Sep 2006 10:40:40 +0300 Subject: [openib-general] gen2_basic patches In-Reply-To: <4510869F.60309@pathscale.com> References: <4510869F.60309@pathscale.com> Message-ID: <4510F078.4030401@dev.mellanox.co.il> Robert Walsh wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > We've got some patches to gen2_basic to fix some problems with the test > suite. Some are trivial (fix typos, etc.) and some are more serious > (handle max_qp counts correctly, etc.) I'm going to be sending them out > piecemeal as we review them internally, and I'll make sure to send them > out in sequence (i.e. in the order they should be applied), so don't be > surprised to hear nothing for a day or two, then see some more patches ;-) > > Regards, > Robert. > Thank you (in advanced) for all of the patches that you will send us. I will take the patches (and maybe modify them a little bit) and check it to the openib svn the final fixed version. Thanks again. Dotan From ogerlitz at voltaire.com Wed Sep 20 01:11:09 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 20 Sep 2006 11:11:09 +0300 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> Message-ID: <4510F79D.8010203@voltaire.com> Eric Barton wrote: > I create 1 CQ just for receive completions on each of my QPs. When I tear down > the QP, I rdma_disconnect(), change the QP state to IB_QPS_ERR and then wait > for all currently posted receives to complete. I understand your driver is a CMA consumer whose QP state transitions are carried out by the CMA. So you need ***not*** modify the QP state to error, as the CMA does it for you in rdma_disconnect() before sending the DREQ or DREP. Please note that you need to call rdma_disconnect() in both sides, the one that initiates the disconnection but also on the side that gets the DREQ, that is suddenly gets RDMA_CM_EVENT_DISCONNECTED event (note that also the disconnection initiator would get this event and if you call there again to rdma_disconnect() its not going to break anything, i think). Is it possible that manual QP modify to error in your code actually covered the latter case where you did not call rdma_disconnect()? Or. From dotanb at dev.mellanox.co.il Wed Sep 20 02:15:59 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 20 Sep 2006 12:15:59 +0300 Subject: [openib-general] gen2_basic patches In-Reply-To: <4510F078.4030401@dev.mellanox.co.il> References: <4510869F.60309@pathscale.com> <4510F078.4030401@dev.mellanox.co.il> Message-ID: <451106CF.10007@dev.mellanox.co.il> Dotan Barak wrote: > Robert Walsh wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hi all, >> >> We've got some patches to gen2_basic to fix some problems with the test >> suite. Some are trivial (fix typos, etc.) and some are more serious >> (handle max_qp counts correctly, etc.) I'm going to be sending them out >> piecemeal as we review them internally, and I'll make sure to send them >> out in sequence (i.e. in the order they should be applied), so don't be >> surprised to hear nothing for a day or two, then see some more patches ;-) >> >> Regards, >> Robert. >> >> > Thank you (in advanced) for all of the patches that you will send us. > > I will take the patches (and maybe modify them a little bit) and check > it to the openib svn the final fixed version. > > Thanks again. > Dotan > I applied all of the fixed that you sent me (1..6). i will be happy if the next patches will be based on the latest test version that i just have committed. thanks Dotan From ogerlitz at voltaire.com Wed Sep 20 02:23:22 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 20 Sep 2006 12:23:22 +0300 Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name In-Reply-To: References: <450D36E9.1000502@voltaire.com> <450FE742.7040005@voltaire.com> Message-ID: <4511088A.6000108@voltaire.com> Roland Dreier wrote: > Or> I am fine with having the CMA config selected whenever someone > Or> selects INFINIBAND so adding the help text and making it > Or> visible are not a must per my taste. However, are you fine > Or> with changing the **name** of the config directive to > Or> CONFIG_INFINIBAND_RDMA_CM so its better understood? > > No, since really what it is controlling is the ib_addr module. Just for the record it is controlling the build of both ib_addr and rdma_cm modules where rdma address resolution is a part from the overall rdma communication management managed by the rdma_cm module. Anyway, if you prefer to leave the config name as is, let it be. > Or> As Erez wrote you on the other thread, we must depend on the > Or> CMA else a user running make rndconfig would be able to > Or> produce a config file where INFINIBAND is selected but the CMA > Or> (RDMA_ADDR_TRANS) config is not selected so linkage will fail. > > How? make randconfig won't produce invalid configurations. I think i got it (at last) if INFINIBAND is selected it causes the selection of INFINIBAND_ADDR_TRANS as long as INET is selected so if something (eg iSER) is dependent on INFINIBAND and INET make rndconfig would do the job of selecting both of them when it selects iSER, correct? Or. From halr at voltaire.com Wed Sep 20 03:11:19 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2006 06:11:19 -0400 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number In-Reply-To: <20060920050530.GG1710@mellanox.co.il> References: <45108B2F.8080207@pathscale.com> <1158714353.4509.30709.camel@hal.voltaire.com> <4510965D.4040103@pathscale.com> <1158727188.4509.39096.camel@hal.voltaire.com> <20060920050530.GG1710@mellanox.co.il> Message-ID: <1158747078.4509.52336.camel@hal.voltaire.com> On Wed, 2006-09-20 at 01:05, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: gen2_basic patch 5/10: select a valid port number > > > > On Tue, 2006-09-19 at 21:16, Robert Walsh wrote: > > > Hal Rosenstock wrote: > > > > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote: > > > >> gen2_basic - select a valid port number > > > >> > > > >> Port numbers start at 1, not 0. > > > > > > > > True for CA and routers but not switches. > > > > > > Yeah. Does anyone run gen2_basic on switches, though? I assumed it was > > > HCA-centric. > > > > Yes, that appears to be the scope but I'm not 100% sure. > > Its easy to get linux running on a switch, so why not? You just > need to write a low level driver that cn send/receve MADs. > We did run a gen1 port on a switch at some point, and someone might want to > do it again. And the only limitation would be what switch port 0 (extended, base) supports relative to these tests. -- Hal From eeb at bartonsoftware.com Wed Sep 20 03:25:00 2006 From: eeb at bartonsoftware.com (Eric Barton) Date: Wed, 20 Sep 2006 11:25:00 +0100 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <4510F79D.8010203@voltaire.com> Message-ID: <051b01c6dc9f$07900e70$0281a8c0@ebpc> Or, > Eric Barton wrote: > > I create 1 CQ just for receive completions on each of my QPs. When I tear down > > the QP, I rdma_disconnect(), change the QP state to IB_QPS_ERR and then wait > > for all currently posted receives to complete. > > I understand your driver is a CMA consumer whose QP state transitions > are carried out by the CMA. So you need ***not*** modify the QP state to > error, as the CMA does it for you in rdma_disconnect() before sending > the DREQ or DREP. Yes - understood. It's not actually harmful at this point I think. Please correct me if I'm wrong. > Please note that you need to call rdma_disconnect() in both sides, the > one that initiates the disconnection but also on the side that gets the > DREQ, that is suddenly gets RDMA_CM_EVENT_DISCONNECTED event... I ensure I always call rdma_disconnect() once, no matter whether I am the initiator or not. Cheers, Eric From ogerlitz at voltaire.com Wed Sep 20 05:29:31 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 20 Sep 2006 15:29:31 +0300 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <051b01c6dc9f$07900e70$0281a8c0@ebpc> References: <051b01c6dc9f$07900e70$0281a8c0@ebpc> Message-ID: <4511342B.8090601@voltaire.com> >> I understand your driver is a CMA consumer whose QP state transitions >> are carried out by the CMA. So you need ***not*** modify the QP state to >> error, as the CMA does it for you in rdma_disconnect() before sending >> the DREQ or DREP. > Yes - understood. It's not actually harmful at this point I think. Please > correct me if I'm wrong. Not that its harmful, but its not needed, so it can confuse people looking/debugging this code... >> Please note that you need to call rdma_disconnect() in both sides, the >> one that initiates the disconnection but also on the side that gets the >> DREQ, that is suddenly gets RDMA_CM_EVENT_DISCONNECTED event... > I ensure I always call rdma_disconnect() once, no matter whether I am the > initiator or not. cool. Or. From halr at voltaire.com Wed Sep 20 05:38:18 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2006 08:38:18 -0400 Subject: [openib-general] [PATCH] osm: fixing bugs in osmtest In-Reply-To: <4510E184.8070900@dev.mellanox.co.il> References: <4510E184.8070900@dev.mellanox.co.il> Message-ID: <1158755895.4509.58062.camel@hal.voltaire.com> Hi Yevgeny, On Wed, 2006-09-20 at 02:36, Yevgeny Kliteynik wrote: > Hi Hal > > I'm doing a major review of the osmtest. Good. This has been long overdue. > This patch is fixing a few bugs in osmtest where failures > were ignored. More precisely, osmtest was expecting error, > but got IB_SUCCESS and ignored the fact that it should have > gotten an error. > There are also a few changes to improve the code and osmtest > log readability. Looks good at the code inspection level. > More patches expected. Thanks for the heads up. > This patch is for trunk only. > > I tested applying this patch before sending it. If you get the > patch rejected again - let me know. It took the header file part but rejected all code blocks for osmtest.c :-( -- Hal From erezz at voltaire.com Wed Sep 20 05:45:13 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 20 Sep 2006 15:45:13 +0300 Subject: [openib-general] 2 SLES 10 backport directories In-Reply-To: <20060917044626.GA26054@mellanox.co.il> References: <450915EE.1090705@voltaire.com> <20060917044626.GA26054@mellanox.co.il> Message-ID: <451137D9.3060607@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Erez Zilber : > >> Subject: 2 SLES 10 backport directories >> >> Michael, >> >> I saw that there are 2 SLES 10 backport directories in the svn: >> >> https://openib.org/svn/gen2/branches/backport/sles10/ - this one >> contains patches that we added for SLES 10 >> >> https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/ - this one >> was added later by you. >> >> Can we unite them? >> >> Here's my motivation: I want to be able to install SLES 10, replace its >> infiniband dir with infiniband from openib's svn, apply all SLES 10 >> patches (from a single directory) and then it should work. >> >> This should help us in future OFED releases. >> > > I'd like that too, but there's a difficulty here. > > The rest of the backport patches make it possible to build > IB support out of kernel, without patching the kernel code itself. > This is an explicit requirement of some users, so we made an effort > to preserve this ability, and so far it works with the rest of the IB stack - > assuming that user has built infiniband support as a module or disabled it - > but that's what most people currenty have, anyway. > > Unfortunately sles10 patches for iser that you mention violate this rule - they > patch the iscsi support that is already there as part of the kernel. > So unless this can be fixed somehow, we need the iscsi stuff separate, so that > 1. we know to apply it in kernel source directory, not where we unpacked IB code > 2. it can be applied conditionally when the user has enabled iser, so that > others still have the ability not to touch their kernel > I think that we can throw away https://openib.org/svn/gen2/branches/backport/sles10/. These patches apply to SLES 10 beta 8. They are no longer needed. As for https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/, it contains 2 iSER patches. Both affect only iSER code (nothing in open-iscsi or any other kernel code). Therefore, I think that it's ok. What do you think? Erez From rjwalsh at pathscale.com Wed Sep 20 09:19:01 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 20 Sep 2006 09:19:01 -0700 Subject: [openib-general] gen2_basic patches In-Reply-To: <451106CF.10007@dev.mellanox.co.il> References: <4510869F.60309@pathscale.com> <4510F078.4030401@dev.mellanox.co.il> <451106CF.10007@dev.mellanox.co.il> Message-ID: <451169F5.9040804@pathscale.com> Dotan Barak wrote: > Dotan Barak wrote: >> Robert Walsh wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Hi all, >>> >>> We've got some patches to gen2_basic to fix some problems with the test >>> suite. Some are trivial (fix typos, etc.) and some are more serious >>> (handle max_qp counts correctly, etc.) I'm going to be sending them out >>> piecemeal as we review them internally, and I'll make sure to send them >>> out in sequence (i.e. in the order they should be applied), so don't be >>> surprised to hear nothing for a day or two, then see some more >>> patches ;-) >>> >>> Regards, >>> Robert. >>> >> Thank you (in advanced) for all of the patches that you will send us. >> >> I will take the patches (and maybe modify them a little bit) and check >> it to the openib svn the final fixed version. >> >> Thanks again. >> Dotan >> > I applied all of the fixed that you sent me (1..6). > i will be happy if the next patches will be based on the latest test > version that i just have committed. Thanks, Dotan. I'll make sure the next bunch are against the latest stuff. From ftillier at silverstorm.com Wed Sep 20 09:30:59 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Wed, 20 Sep 2006 09:30:59 -0700 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <20060920051420.GH1710@mellanox.co.il> References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> <79ae2f320609191527j6fbeeafbu74f7ba468cce1f73@mail.gmail.com> <20060920051420.GH1710@mellanox.co.il> Message-ID: <79ae2f320609200930xd15aaf8safc62dfee6064cc4@mail.gmail.com> Hi Michael, On 9/19/06, Michael S. Tsirkin wrote: > Quoting r. Fabian Tillier : > > > There are some differences in HCA behaviour with regard to > > > ib_req_notify_cq. Mellanox HCAs will provide a callback/interrupt if > > > the CQ is not empty at this point (in which case the poll_cq's after the > > > notify are optional). > > > > > > However the behaviour defined in the IBTA spec indicates that > > > ib_req_notify_cq will cause a callback/interrupt only on the next CQE > > > which arrives, hence to be portable the poll_cq loop after > > > ib_req_notify_cq is necessary to cover any CQEs which arrived between > > > the prior poll and the ib_req_notify_cq. > > > > I remember a while ago a mention that the behavior of the Mellanox > > HCAs could be controlled in the firmware, so that they would follow > > the IBTA spec defined behavior. > > There's a mistake here. Mellanox HCAs will generate an event upon > ib_req_notify_cq only if new completions has arrived after the previous event > has been reported. Thanks for correcting me - I expected my memory to be a bit rusty. In this case, is there any benefit in polling before calling ib_req_notify_cq? > AFAIK this is IBTA spec compliant. Yes, I believe it is too. Do you know if there is any impact on performance in doing the following for completion processing: ib_req_notify_cq poll_cq until empty Thanks, - Fab From dotanb at dev.mellanox.co.il Wed Sep 20 09:52:58 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 20 Sep 2006 19:52:58 +0300 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number In-Reply-To: <1158747078.4509.52336.camel@hal.voltaire.com> References: <45108B2F.8080207@pathscale.com> <1158714353.4509.30709.camel@hal.voltaire.com> <4510965D.4040103@pathscale.com> <1158727188.4509.39096.camel@hal.voltaire.com> <20060920050530.GG1710@mellanox.co.il> <1158747078.4509.52336.camel@hal.voltaire.com> Message-ID: <451171EA.7010602@dev.mellanox.co.il> Hal Rosenstock wrote: > On Wed, 2006-09-20 at 01:05, Michael S. Tsirkin wrote: > >> Quoting r. Hal Rosenstock : >> >>> Subject: Re: gen2_basic patch 5/10: select a valid port number >>> >>> On Tue, 2006-09-19 at 21:16, Robert Walsh wrote: >>> >>>> Hal Rosenstock wrote: >>>> >>>>> On Tue, 2006-09-19 at 20:28, Robert Walsh wrote: >>>>> >>>>>> gen2_basic - select a valid port number >>>>>> >>>>>> Port numbers start at 1, not 0. >>>>>> >>>>> True for CA and routers but not switches. >>>>> >>>> Yeah. Does anyone run gen2_basic on switches, though? I assumed it was >>>> HCA-centric. >>>> >>> Yes, that appears to be the scope but I'm not 100% sure. >>> >> Its easy to get linux running on a switch, so why not? You just >> need to write a low level driver that cn send/receve MADs. >> We did run a gen1 port on a switch at some point, and someone might want to >> do it again. >> > > And the only limitation would be what switch port 0 (extended, base) > supports relative to these tests. > > -- Hal > Hi. This test was written in order to check the verbs layer and it was developed over an HCA (and being executed every day on all of our HCAs). I don't know what is the expected result of executing this test over a switch and if there should be some changes in order to check switch features. If I'll get this input, i will add the needed features/code to the test. Dotan From ralphc at pathscale.com Wed Sep 20 10:29:38 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 20 Sep 2006 10:29:38 -0700 Subject: [openib-general] gen2_basic patch 5/10: select a valid port number In-Reply-To: <1158727188.4509.39096.camel@hal.voltaire.com> References: <45108B2F.8080207@pathscale.com> <1158714353.4509.30709.camel@hal.voltaire.com> <4510965D.4040103@pathscale.com> <1158727188.4509.39096.camel@hal.voltaire.com> Message-ID: <1158773378.3608.9.camel@brick.pathscale.com> In either case, if we want to support testing switches and HCAs, we should have a command line option to change the tests as appropriate for each. On Wed, 2006-09-20 at 00:39 -0400, Hal Rosenstock wrote: > On Tue, 2006-09-19 at 21:16, Robert Walsh wrote: > > Hal Rosenstock wrote: > > > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote: > > >> gen2_basic - select a valid port number > > >> > > >> Port numbers start at 1, not 0. > > > > > > True for CA and routers but not switches. > > > > Yeah. Does anyone run gen2_basic on switches, though? I assumed it was > > HCA-centric. > > Yes, that appears to be the scope but I'm not 100% sure. > > -- Hal > > > Regards, > > Robert. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Wed Sep 20 10:59:18 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 20 Sep 2006 10:59:18 -0700 Subject: [openib-general] Negotiation of Rsponder resource & Initiator depth In-Reply-To: <450E8991.5080603@voltaire.com> References: <450E8991.5080603@voltaire.com> Message-ID: <45118176.6040106@ichips.intel.com> Erez Zilber wrote: > In the IB spec it says in 12.7.29: > > The recipient of the REQ message shall choose a local Initiator Depth that > does not exceed the Responder Resources offered in the REQ. If the recipient > of the REQ message is unwilling or unable to do so, it shall send a > REJ message to discontinue the connection establishment. > > From reading the CMA code, I see that it does not negotiate these > values (responder resources & initiator depth). It expects the ULP to > negotiate it. Why? Shouldn't it be done by the CMA? There's a bug in the CMA interface in that it doesn't expose the requested connection parameters up to a listener. I have plans to fix this in the short term, but the negotiation is still left to the user. I don't think that the CMA knows enough about what the application is trying to do to set this for it. - Sean From amit_byron at yahoo.com Wed Sep 20 11:47:36 2006 From: amit_byron at yahoo.com (amit byron) Date: Wed, 20 Sep 2006 11:47:36 -0700 (PDT) Subject: [openib-general] max message size for IB_WR_SEND Message-ID: <20060920184736.28168.qmail@web38513.mail.mud.yahoo.com> hi, if i evoke/call ib_post_send(IB_WR_SEND) with message size 512 bytes, the message gets received on the peer (second) node. the 2 nodes are connected point-to -point. but if message size is increased to 4096 bytes then second node receives the message; but message content is missing (empty). won't infiniband stack break down message in smaller chunks and assemble on peer node? thanks, Amit. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From amit_byron at yahoo.com Wed Sep 20 11:50:16 2006 From: amit_byron at yahoo.com (amit byron) Date: Wed, 20 Sep 2006 18:50:16 +0000 (UTC) Subject: [openib-general] =?utf-8?q?max_message_size_for_IB=5FWR=5FSEND?= Message-ID: hi, if i evoke/call ib_post_send(IB_WR_SEND) with message size 512 bytes, the message gets received on the peer (second) node. the 2 nodes are connected point-to -point. but if message size is increased to 4096 bytes then second node receives the message; but message content is missing (empty). won't infiniband stack break down message in smaller chunks and assemble on peer node? thanks, Amit. From rjwalsh at pathscale.com Wed Sep 20 12:08:29 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 20 Sep 2006 12:08:29 -0700 Subject: [openib-general] gen2_basic patch boogum? Message-ID: <451191AD.80608@pathscale.com> Hi Dotan, I just noticed that you didn't apply one of my patch hunks that called get_is_global(). I know why you didn't do it (the dlid is always 0: see line 218 of test_av.c), but should you still be setting the is_global field in the ah_attr structure to some value? Right now, it will just be set to some random unitialized stack value. Regards, Robert. From mst at mellanox.co.il Wed Sep 20 13:42:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Sep 2006 23:42:20 +0300 Subject: [openib-general] OFED 1.1 and OpenSM on SLES 10 for PPC64 In-Reply-To: <1158765336.4509.64116.camel@hal.voltaire.com> References: <1158765336.4509.64116.camel@hal.voltaire.com> Message-ID: <20060920204220.GA9724@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: RE: OFED 1.1 and OpenSM on SLES 10 for PPC64 > > On Wed, 2006-09-20 at 11:11, Eitan Zahavi wrote: > > I will try to get to that tomorrow > > It's not an OpenSM issue. See the latest info in the bug report: > http://openib.org/bugzilla/show_bug.cgi?id=241 I dug in a bit and I'm not sure what's the root cause, but what is triggering the problem is that the saquery diag utility depends on opensm, which makes a mess of dependencies, and at some point libtool goes berserk. Short term, can we just skip saquery utility in OFED 1.1? Hal, can you approve this please? Longer term, I think saquery should be fixed not to depend on opensm - opensm is a large tool, complicated by portability requirements etc, and it is a waste to need parts of it on endnodes just to be able to run some diagnostics. With RMPP support in kernel, we really sholdn't need an extra depenency just to push a query and get a response. Comments? -- MST From mst at mellanox.co.il Wed Sep 20 13:57:00 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Sep 2006 23:57:00 +0300 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <79ae2f320609200930xd15aaf8safc62dfee6064cc4@mail.gmail.com> References: <79ae2f320609200930xd15aaf8safc62dfee6064cc4@mail.gmail.com> Message-ID: <20060920205700.GB9724@mellanox.co.il> Quoting r. Fabian Tillier : > > There's a mistake here. Mellanox HCAs will generate an event upon > > ib_req_notify_cq only if new completions has arrived after the previous event > > has been reported. > > Thanks for correcting me - I expected my memory to be a bit rusty. In > this case, is there any benefit in polling before calling > ib_req_notify_cq? > > > AFAIK this is IBTA spec compliant. > > Yes, I believe it is too. Do you know if there is any impact on > performance in doing the following for completion processing: > > ib_req_notify_cq > poll_cq until empty Some additional polling has a chance to improve performance on any hardware: it increases the chance that you do a cheap poll for completion instead of getting a (typically expensive) notification interrupt. And its a win on any hardware to delay ib_req_notify_cq as long as possible, so that a single event reports as many completions as possible. That's why it's common to e.g. poll_cq until empty ib_req_notify_cq poll_cq until empty this might work well for bursty traffic, where once CQ is empty it will stay empty for a while. There's no reason why polling twice will work best in all cases however - it's easy to invent other heuristics: for(i=0;i<1000;++i) poll_cq until empty ib_req_notify_cq for(i=0;i<10;++i) poll_cq until empty etc. what works best depends on the application. -- MST From mst at mellanox.co.il Wed Sep 20 14:07:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Sep 2006 00:07:16 +0300 Subject: [openib-general] 2 SLES 10 backport directories In-Reply-To: <451137D9.3060607@voltaire.com> References: <450915EE.1090705@voltaire.com> <20060917044626.GA26054@mellanox.co.il> <451137D9.3060607@voltaire.com> Message-ID: <20060920210716.GD9724@mellanox.co.il> Quoting r. Erez Zilber : > >> I saw that there are 2 SLES 10 backport directories in the svn: > >> > >> https://openib.org/svn/gen2/branches/backport/sles10/ - this one > >> contains patches that we added for SLES 10 > >> > >> https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/ - this one > >> was added later by you. > >> > >> Can we unite them? > >> > I think that we can throw away > https://openib.org/svn/gen2/branches/backport/sles10/. These patches > apply to SLES 10 beta 8. They are no longer needed. As for > https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/, it > contains 2 iSER patches. Both affect only iSER code (nothing in > open-iscsi or any other kernel code). Therefore, I think that it's ok. Go ahead and kill backport/sles10 then. But the whole backport dir should be updated from OFED tree or better killed once Sean switches to git. -- MST From rjwalsh at pathscale.com Wed Sep 20 14:17:04 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 20 Sep 2006 14:17:04 -0700 Subject: [openib-general] gen2_basic patch boogum? In-Reply-To: <451191AD.80608@pathscale.com> References: <451191AD.80608@pathscale.com> Message-ID: <4511AFD0.9030902@pathscale.com> Another quick question: I noticed that in the latest changes your pushed, including my patches, you removed the following check in test_qp.c: @@ -1702,7 +1700,6 @@ CHECK_VALUE("qp_type", query_init_attr.qp_type, attr.qp_type, goto cleanup); CHECK_VALUE_PTR("recv_cq", query_init_attr.recv_cq, attr.recv_cq, goto cleanup); CHECK_VALUE_PTR("send_cq", query_init_attr.send_cq, attr.send_cq, goto cleanup); - CHECK_VALUE("sq_sig_all", query_init_attr.sq_sig_all, attr.sq_sig_all, goto cleanup); CHECK_VALUE_PTR("srq", query_init_attr.srq, attr.srq, goto cleanup); } PASSED; Any particular reason why you removed this? I don't ever remember this being a problem on ipath or mthca. Regards, Robert. From ftillier at silverstorm.com Wed Sep 20 14:16:54 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Wed, 20 Sep 2006 14:16:54 -0700 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <20060920205700.GB9724@mellanox.co.il> References: <79ae2f320609200930xd15aaf8safc62dfee6064cc4@mail.gmail.com> <20060920205700.GB9724@mellanox.co.il> Message-ID: <79ae2f320609201416n6c61bd02p5c92701253f6c6b3@mail.gmail.com> On 9/20/06, Michael S. Tsirkin wrote: > Quoting r. Fabian Tillier : > > > There's a mistake here. Mellanox HCAs will generate an event upon > > > ib_req_notify_cq only if new completions has arrived after the previous event > > > has been reported. > > > > Thanks for correcting me - I expected my memory to be a bit rusty. In > > this case, is there any benefit in polling before calling > > ib_req_notify_cq? > > > > > AFAIK this is IBTA spec compliant. > > > > Yes, I believe it is too. Do you know if there is any impact on > > performance in doing the following for completion processing: > > > > ib_req_notify_cq > > poll_cq until empty > > Some additional polling has a chance to improve performance on any > hardware: it increases the chance that you do a cheap poll for completion > instead of getting a (typically expensive) notification interrupt. > And its a win on any hardware to delay ib_req_notify_cq > as long as possible, so that a single event reports as many completions > as possible. Ok, now you have me confused. Based on what you said for Mellanox HCAs, a new CQ event will be generated when the CQ is rearmed if any CQEs where written since the last event was generated. To me this means that it doesn't matter if these CQEs where reaped or not. That is, at t0 you have a CQE written and a CQ notification. At t1 you have nother CQE written. At t2 you poll both CQEs and rearm. Since the CQE from t1 was written after the last event, I would expect (based on your description) that I would get another CQ notification, eventhough I already reaped the CQE. Did you mean that the hardware will only generate a new event if there are any un-reaped CQEs? - Fab From rjwalsh at pathscale.com Wed Sep 20 14:20:04 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 20 Sep 2006 14:20:04 -0700 Subject: [openib-general] gen2_basic patch 7/10: choose illegal max_qp_init_rd_atom values correctly Message-ID: <4511B084.8070100@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 07_rd_atom.patch URL: From rjwalsh at pathscale.com Wed Sep 20 14:28:32 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 20 Sep 2006 14:28:32 -0700 Subject: [openib-general] gen2_basic patch 8/10: handle auto path migration properly Message-ID: <4511B280.7060906@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 08_cleanup_mask.patch URL: From rjwalsh at pathscale.com Wed Sep 20 14:29:13 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 20 Sep 2006 14:29:13 -0700 Subject: [openib-general] gen2_basic patch 9/10: fix static_rate check Message-ID: <4511B2A9.8070105@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 09_static_rate.patch URL: From rjwalsh at pathscale.com Wed Sep 20 14:29:49 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 20 Sep 2006 14:29:49 -0700 Subject: [openib-general] gen2_basic patch 10/10: handle other vendor devices for max QP count Message-ID: <4511B2CD.2020104@pathscale.com> An embedded and charset-unspecified text was scrubbed... Name: 10_num_qp.patch URL: From halr at voltaire.com Wed Sep 20 14:31:59 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2006 17:31:59 -0400 Subject: [openib-general] OFED 1.1 and OpenSM on SLES 10 for PPC64 In-Reply-To: <20060920204220.GA9724@mellanox.co.il> References: <1158765336.4509.64116.camel@hal.voltaire.com> <20060920204220.GA9724@mellanox.co.il> Message-ID: <1158787917.4509.78684.camel@hal.voltaire.com> On Wed, 2006-09-20 at 16:42, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: RE: OFED 1.1 and OpenSM on SLES 10 for PPC64 > > > > On Wed, 2006-09-20 at 11:11, Eitan Zahavi wrote: > > > I will try to get to that tomorrow > > > > It's not an OpenSM issue. See the latest info in the bug report: > > http://openib.org/bugzilla/show_bug.cgi?id=241 > > I dug in a bit and I'm not sure what's the root cause, > but what is triggering the problem is that the saquery diag > utility depends on opensm, No, it depends on the opensm library not opensm. This was all fine until the libraries were broken into a separate RPM for OFED to attempt to decouple them from OpenSM. > which makes a mess of dependencies, It requires opensm library for the SA client interface and complib for portability. I believe this is no different than some other IB utilities in OFED too. > and at some point libtool goes berserk. Huh ? > Short term, can we just skip saquery utility in OFED 1.1? > Hal, can you approve this please? I would prefer that is not the case and this is part of OFED 1.1. > Longer term, I think saquery should be fixed not to depend on opensm - opensm is > a large tool, complicated by portability requirements etc, and it is a waste to > need parts of it on endnodes just to be able to run some diagnostics. > With RMPP support in kernel, we really sholdn't need an extra depenency > just to push a query and get a response. > Comments? It could depend on Sean's new user space SA client API (which perhaps needs some more infrastructure) but we are not there yet. -- Hal From mst at mellanox.co.il Wed Sep 20 14:38:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Sep 2006 00:38:40 +0300 Subject: [openib-general] OFED 1.1 and OpenSM on SLES 10 for PPC64 In-Reply-To: <1158787917.4509.78684.camel@hal.voltaire.com> References: <1158787917.4509.78684.camel@hal.voltaire.com> Message-ID: <20060920213840.GB10173@mellanox.co.il> Quoting r. Hal Rosenstock : > > Short term, can we just skip saquery utility in OFED 1.1? > > Hal, can you approve this please? > > I would prefer that is not the case and this is part of OFED 1.1. So - what do you suggest? Can you fix the OFED build today then? I don't have SLES10 ppc. If no - do we delay the release to have this utility in? -- MST From halr at voltaire.com Wed Sep 20 15:09:17 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2006 18:09:17 -0400 Subject: [openib-general] OFED 1.1 and OpenSM on SLES 10 for PPC64 In-Reply-To: <20060920213840.GB10173@mellanox.co.il> References: <1158787917.4509.78684.camel@hal.voltaire.com> <20060920213840.GB10173@mellanox.co.il> Message-ID: <1158790156.4509.80047.camel@hal.voltaire.com> On Wed, 2006-09-20 at 17:38, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > > Short term, can we just skip saquery utility in OFED 1.1? > > > Hal, can you approve this please? > > > > I would prefer that is not the case and this is part of OFED 1.1. > > So - what do you suggest? > Can you fix the OFED build today then? I don't have SLES10 ppc. Me neither. > If no - do we delay the release to have this utility in? Doesn't sound like there is much choice. If the release has to be today, then go without it. -- Hal From rdreier at cisco.com Wed Sep 20 15:47:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Sep 2006 15:47:56 -0700 Subject: [openib-general] [PATCH] mthca: fix lid used for sending traps In-Reply-To: <20060920060204.GA2870@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 20 Sep 2006 09:02:04 +0300") References: <20060919081324.GF31498@mellanox.co.il> <20060920060204.GA2870@mellanox.co.il> Message-ID: Michael> Missed 2.6.18 by a small margin. Gar! Acked for 2.6.18.1? I already sent it to stable at kernel.org. Thanks... From rdreier at cisco.com Wed Sep 20 15:52:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Sep 2006 15:52:31 -0700 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: <20060920042618.GA1710@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 20 Sep 2006 07:26:18 +0300") References: <20060920042618.GA1710@mellanox.co.il> Message-ID: Michael> I'm not sure priv->broadcast is always initialized when Michael> we start a path record query. Is there a reason why it is? IPoIB can't send a packet until the broadcast group is joined, since it doesn't do netif_carrier_on() until then. So I don't see any way that a path record query could start before we know the real mtu. - R. From rdreier at cisco.com Wed Sep 20 15:53:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Sep 2006 15:53:31 -0700 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: <20060920050111.GF1710@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 20 Sep 2006 08:01:11 +0300") References: <20060920042618.GA1710@mellanox.co.il> <20060920050111.GF1710@mellanox.co.il> Message-ID: Michael> It also seemed kind of nice to be able to control the Michael> path MTU from dev->mtu - and I don't think path flush on Michael> mtu change is an issue from the performance POV. Michael> What do you think? It just seems weird to me... - R. From sashak at voltaire.com Wed Sep 20 18:01:08 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 21 Sep 2006 04:01:08 +0300 Subject: [openib-general] [PATCH] osm: fixing bugs in osmtest In-Reply-To: <1158755895.4509.58062.camel@hal.voltaire.com> References: <4510E184.8070900@dev.mellanox.co.il> <1158755895.4509.58062.camel@hal.voltaire.com> Message-ID: <20060921010108.GA18938@sashak.voltaire.com> On 08:38 Wed 20 Sep , Hal Rosenstock wrote: > Hi Yevgeny, > > On Wed, 2006-09-20 at 02:36, Yevgeny Kliteynik wrote: > > Hi Hal > > > > I'm doing a major review of the osmtest. > > Good. This has been long overdue. > > > This patch is fixing a few bugs in osmtest where failures > > were ignored. More precisely, osmtest was expecting error, > > but got IB_SUCCESS and ignored the fact that it should have > > gotten an error. > > There are also a few changes to improve the code and osmtest > > log readability. > > Looks good at the code inspection level. > > > More patches expected. > > Thanks for the heads up. > > > This patch is for trunk only. > > > > I tested applying this patch before sending it. If you get the > > patch rejected again - let me know. > > It took the header file part but rejected all code blocks for osmtest.c > :-( It looks like modified and context lines have different numbers of prefixed spaces. Sasha From sashak at voltaire.com Wed Sep 20 18:27:47 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 21 Sep 2006 04:27:47 +0300 Subject: [openib-general] [PATCH TRIVIAL] opensm: remove osm_switch_get_lid() prototype Message-ID: <20060921012747.GC18938@sashak.voltaire.com> Hi Hal, Some trivial cleanup. Sasha. opensm: remove osm_switch_get_lid() prototype Remove prototype of non-existing osm_switch_get_lid() function. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_switch.h | 27 --------------------------- 1 files changed, 0 insertions(+), 27 deletions(-) diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index 5f33e4a..8c4799f 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -542,33 +542,6 @@ osm_switch_get_port_by_lid( * Switch object *********/ -/****f* OpenSM: Switch/osm_switch_get_lid -* NAME -* osm_switch_get_lid -* -* DESCRIPTION -* Gets the switch's LID. -* -* SYNOPSIS -*/ -ib_net16_t -osm_switch_get_lid( - IN const osm_switch_t* const p_sw ); -/* -* PARAMETERS -* p_sw -* [in] Pointer to an osm_switch_t object. -* -* RETURN VALUES -* Returns the switch's LID. A value of zero means no LID has -* been assigned to the switch. -* -* NOTES -* -* SEE ALSO -* Switch object -*********/ - /****f* OpenSM: Switch/osm_switch_get_physp_ptr * NAME * osm_switch_get_physp_ptr From sashak at voltaire.com Wed Sep 20 18:50:09 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 21 Sep 2006 04:50:09 +0300 Subject: [openib-general] [PATCH TRIVIAL] opensm: LOG_ENTER name fix Message-ID: <20060921015009.GD18938@sashak.voltaire.com> Hi Hal, Some trivial stuff. Sasha opensm: LOG_ENTER name fix In osm_pkey_get_tables() fix the name used with LOG_ENTER(). Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_port_info_rcv.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c index 9e2fc11..af6fdc8 100644 --- a/osm/opensm/osm_port_info_rcv.c +++ b/osm/opensm/osm_port_info_rcv.c @@ -442,7 +442,7 @@ void osm_pkey_get_tables( uint32_t attr_mod_ho; osm_switch_t* p_switch; - OSM_LOG_ENTER( p_log, osm_physp_has_pkey ); + OSM_LOG_ENTER( p_log, osm_pkey_get_tables ); path = *osm_physp_get_dr_path_ptr( p_physp ); From halr at voltaire.com Wed Sep 20 18:55:18 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2006 21:55:18 -0400 Subject: [openib-general] [PATCH TRIVIAL] opensm: remove osm_switch_get_lid() prototype In-Reply-To: <20060921012747.GC18938@sashak.voltaire.com> References: <20060921012747.GC18938@sashak.voltaire.com> Message-ID: <1158803696.4509.88358.camel@hal.voltaire.com> On Wed, 2006-09-20 at 21:27, Sasha Khapyorsky wrote: > Hi Hal, > > Some trivial cleanup. > > Sasha. > > > opensm: remove osm_switch_get_lid() prototype > > Remove prototype of non-existing osm_switch_get_lid() function. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From halr at voltaire.com Wed Sep 20 19:03:16 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2006 22:03:16 -0400 Subject: [openib-general] [PATCH TRIVIAL] opensm: LOG_ENTER name fix In-Reply-To: <20060921015009.GD18938@sashak.voltaire.com> References: <20060921015009.GD18938@sashak.voltaire.com> Message-ID: <1158804195.4509.88656.camel@hal.voltaire.com> On Wed, 2006-09-20 at 21:50, Sasha Khapyorsky wrote: > Hi Hal, > > Some trivial stuff. > > Sasha > > > opensm: LOG_ENTER name fix > > In osm_pkey_get_tables() fix the name used with LOG_ENTER(). > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From mst at mellanox.co.il Wed Sep 20 21:45:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Sep 2006 07:45:04 +0300 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: References: Message-ID: <20060921044504.GA5830@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries > > Michael> It also seemed kind of nice to be able to control the > Michael> path MTU from dev->mtu - and I don't think path flush on > Michael> mtu change is an issue from the performance POV. > > Michael> What do you think? > > It just seems weird to me... Well, I like this better, but you are the final arbiter here. I gather you want me to rework the patch to the original approach? -- MST From mst at mellanox.co.il Wed Sep 20 21:57:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Sep 2006 07:57:13 +0300 Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries In-Reply-To: References: Message-ID: <20060921045713.GA5983@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries > > Michael> I'm not sure priv->broadcast is always initialized when > Michael> we start a path record query. Is there a reason why it is? > > IPoIB can't send a packet until the broadcast group is joined, since > it doesn't do netif_carrier_on() until then. So I don't see any way > that a path record query could start before we know the real mtu. Good point, thanks. If we switch to that, a BUG_ON(priv->broadcast) just in case won't hurt tough, would it? -- MST From dotanb at dev.mellanox.co.il Wed Sep 20 22:34:05 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 08:34:05 +0300 Subject: [openib-general] max message size for IB_WR_SEND In-Reply-To: References: Message-ID: <4512244D.4040404@dev.mellanox.co.il> Hi. amit byron wrote: > hi, > > if i evoke/call ib_post_send(IB_WR_SEND) with message > size 512 bytes, the message gets received on the > peer (second) node. the 2 nodes are connected point-to > -point. > > but if message size is increased to 4096 bytes then > second node receives the message; but message content > is missing (empty). > > won't infiniband stack break down message in smaller > chunks and assemble on peer node? > > thanks, > Amit. > Which transport type are you using? if you are using a UD QP, then the answer is no. for any other transport type, the answer is yes (the message is being break down to packets with the MTU side as specified in the QP context. maybe you have a different problem in you code. did you check the completion status in both of the nodes? Dotan From mlleinin at hpcn.ca.sandia.gov Wed Sep 20 22:32:56 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Wed, 20 Sep 2006 22:32:56 -0700 Subject: [openib-general] OpenFabrics server scheduled downtime Sat. Sept 23 Message-ID: <1158816776.6412.89.camel@localhost> The OpenFabrics server will be offline Saturday September 23 from 6am PST to 6pm PST due to a scheduled maintenance on a power substation at Sandia. These outages usually last less than the scheduled 12 hours. We will bring the OpenFabrics server back online as soon as possible after the scheduled outage. Thanks, - Matt From dotanb at dev.mellanox.co.il Wed Sep 20 22:38:30 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 08:38:30 +0300 Subject: [openib-general] gen2_basic patch boogum? In-Reply-To: <451191AD.80608@pathscale.com> References: <451191AD.80608@pathscale.com> Message-ID: <45122556.8090401@dev.mellanox.co.il> Hi Robert. Robert Walsh wrote: > Hi Dotan, > > I just noticed that you didn't apply one of my patch hunks that called > get_is_global(). I know why you didn't do it (the dlid is always 0: see > line 218 of test_av.c), but should you still be setting the is_global > field in the ah_attr structure to some value? Right now, it will just > be set to some random unitialized stack value. > > Regards, > Robert. > You are right, i removed this line because the dlid is constant 0. here is the initialization of the variable: struct ibv_ah_attr av_attr = { .dlid = 0, .sl = VL_range(rand_gen, 0, 15), .src_path_bits = VL_range(rand_gen, 0, 0x8f), .port_num = port, .static_rate = get_static_rate(1, rand_gen), .grh = { .traffic_class = VL_range(rand_gen, 1, 0xff), .flow_label = VL_random(rand_gen, 0x100000), .hop_limit = VL_range(rand_gen, 1, 0xff), } }; This attributes of av_attr that are being initialized in this code will have the value that is being assigned to them. All the rest of the attributes (for example: is_global) are being set with 0. Dotan From eeb at bartonsoftware.com Wed Sep 20 22:40:11 2006 From: eeb at bartonsoftware.com (Eric Barton) Date: Thu, 21 Sep 2006 06:40:11 +0100 Subject: [openib-general] RDMA CM callback status Message-ID: <200609210540.k8L5eBce029142@robert.bartonsoftware.com> Sean, I have some questions regarding my RDMA CM callback handler.... int kiblnd_cm_callback(struct rdma_cm_id *cmid, struct rdma_cm_event *event) { switch (event->event) { default: ASSERT (0); case RDMA_CM_EVENT_CONNECT_REQUEST: return kiblnd_passive_connect(...); case RDMA_CM_EVENT_ADDR_ERROR: ASSERT(event->status != 0); /* handle error */ return event->status; case RDMA_CM_EVENT_ADDR_RESOLVED: if (event->status == 0) return kiblnd_resolve_route(...); /* handle error */ return event->status; case RDMA_CM_EVENT_ROUTE_ERROR: ASSERT(event->status != 0); /* handle error */ return event->status; case RDMA_CM_EVENT_ROUTE_RESOLVED: if (event->status == 0) return kiblnd_active_connect(...); /* handle error */ return event->status; case RDMA_CM_EVENT_UNREACHABLE: ASSERT (event->status != 0); /* handle error out-of-line */ return 0; case RDMA_CM_EVENT_CONNECT_ERROR: ASSERT (event->status != 0); /* handle error out-of-line */ return 0; case RDMA_CM_EVENT_REJECTED: /* handle error out-of-line */ return 0; case RDMA_CM_EVENT_ESTABLISHED: /* handle success */ return 0; case RDMA_CM_EVENT_DISCONNECTED: /* teardown */ return 0; case RDMA_CM_EVENT_DEVICE_REMOVAL: /* bleat on the console */ return 0; } } 1. Should I even be looking at event->status or does the event type tell me everything I need to know? I've had a report that the assertion (event->status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR. 2. /* handle error out-of-line */ above means I record failure in my connection data structure, start teardown and drop the callback's reference on it. When the last reference goes, the connection data structure is queued for final destruction (including rdma_destroy_id(cmid)). Given that this might race with the callback's caller is this OK? -- Cheers, Eric --------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: | --------------------------------------------------- From dotanb at dev.mellanox.co.il Wed Sep 20 23:51:11 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 09:51:11 +0300 Subject: [openib-general] gen2_basic patch 7/10: choose illegal max_qp_init_rd_atom values correctly In-Reply-To: <4511B084.8070100@pathscale.com> References: <4511B084.8070100@pathscale.com> Message-ID: <4512365F.9090204@dev.mellanox.co.il> Robert Walsh wrote: > gen2_basic - choose illegal max_qp_init_rd_atom values correctly > > Signed-off by: Robert Walsh > > diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c > --- a/gen2_basic/test_qp.c 2006-09-13 19:09:47.419791000 -0700 > +++ b/gen2_basic/test_qp.c 2006-08-14 14:16:57.911621000 -0700 > @@ -369,7 +369,7 @@ > if (legal) > return VL_range(rand_gen, 0, attr->max_qp_init_rd_atom); > else > - return VL_range(rand_gen, attr->max_qp_init_rd_atom, 0xFF); > + return VL_range(rand_gen, attr->max_qp_init_rd_atom + 1, 0xFF); > } > > uint8_t get_max_dest_rd_atomic( > @@ -380,7 +380,7 @@ > if (legal) > return VL_range(rand_gen, 0, attr->max_qp_rd_atom); > else > - return VL_range(rand_gen, attr->max_qp_rd_atom, 0xFF); > + return VL_range(rand_gen, attr->max_qp_rd_atom + 1, 0xFF); > } > > uint8_t get_min_rnr_timer( > committed. thanks Dotan From dotanb at dev.mellanox.co.il Wed Sep 20 23:51:23 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 09:51:23 +0300 Subject: [openib-general] gen2_basic patch 8/10: handle auto path migration properly In-Reply-To: <4511B280.7060906@pathscale.com> References: <4511B280.7060906@pathscale.com> Message-ID: <4512366B.5000704@dev.mellanox.co.il> Robert Walsh wrote: > gen2_basic - handle auto path migration properly > > Signed-off by: Robert Walsh > > diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c > --- a/gen2_basic/test_qp.c 2006-09-13 19:15:59.829006000 -0700 > +++ b/gen2_basic/test_qp.c 2006-08-14 14:16:57.911621000 -0700 > @@ -586,6 +586,7 @@ > } > > void cleanup_mask( > + IN struct ibv_device_attr *device_attr, > IN enum ibv_qp_type qp_type, > IN OUT int* mask) > { > @@ -607,6 +608,8 @@ > *mask &= ~IBV_QP_MAX_DEST_RD_ATOMIC; > *mask &= ~IBV_QP_MAX_QP_RD_ATOMIC; > } > + if (!(device_attr->device_cap_flags & IBV_DEVICE_AUTO_PATH_MIG)) > + *mask &= ~IBV_QP_ALT_PATH; > } > > int my_query_qp( > @@ -774,7 +777,7 @@ > case REQUIRED_ATTR: > mask |= test_vector[idx].required_attr; > > - cleanup_mask(qp_type, &mask); > + cleanup_mask(device_attr, qp_type, &mask); > mask &= ~IBV_QP_PATH_MIG_STATE; > > if (test_vector[idx].to == IBV_QPS_SQD && test_vector[idx].from == IBV_QPS_SQD && qp_type != IBV_QPT_RC) > @@ -798,8 +801,8 @@ > temp_mask = test_vector[idx].optional_attr; > mask = test_vector[idx].required_attr | test_vector[idx].optional_attr; > } > - cleanup_mask(qp_type, &mask); > - cleanup_mask(qp_type, &temp_mask); > + cleanup_mask(device_attr, qp_type, &mask); > + cleanup_mask(device_attr, qp_type, &temp_mask); > > for (i = 1; i <= 20; ++i) { > if ((1 << i) & temp_mask) { > @@ -820,7 +823,7 @@ > > case NOT_ALL_REQUIRED: > mask = test_vector[idx].required_attr; > - cleanup_mask(qp_type, &mask); > + cleanup_mask(device_attr, qp_type, &mask); > > for (i = 1; i <= 20; ++i) { > if ((1 << i) & mask) { > @@ -835,7 +838,7 @@ > break; > case NOT_ALL_OPTIONAL: > mask = test_vector[idx].required_attr | test_vector[idx].optional_attr; > - cleanup_mask(qp_type, &mask); > + cleanup_mask(device_attr, qp_type, &mask); > > if (test_vector[idx].to == IBV_QPS_SQD && test_vector[idx].from == IBV_QPS_SQD && qp_type != IBV_QPT_RC) > mask &= ~IBV_QP_PORT; > @@ -855,7 +858,7 @@ > break; > case INVALID_ATTR: > mask = test_vector[idx].required_attr | test_vector[idx].optional_attr; > - cleanup_mask(qp_type, &mask); > + cleanup_mask(device_attr, qp_type, &mask); > > mask = get_random_mask(rand_gen, mask); > > @@ -1420,7 +1422,7 @@ > > for (j = 1; j < 20; ++j) { > int mask = test_vector[i].optional_attr; > - cleanup_mask(qp_type, &mask); > + cleanup_mask(&device_attr, qp_type, &mask); > if ((1 << j) & mask) { > get_qp_cap(rand_gen, 1, &device_attr, &attr.cap); > > @@ -1540,7 +1542,7 @@ > mask = IBV_QP_STATE | IBV_QP_TIMEOUT | IBV_QP_RETRY_CNT | IBV_QP_RNR_RETRY | > IBV_QP_SQ_PSN | IBV_QP_MAX_QP_RD_ATOMIC | IBV_QP_PATH_MIG_STATE; > > - cleanup_mask(qp_type, &mask); > + cleanup_mask(&device_attr, qp_type, &mask); > > qp_attr.path_mig_state = IBV_MIG_REARM; > > @@ -1556,7 +1558,7 @@ > > mask = IBV_QP_STATE | IBV_QP_PATH_MIG_STATE; > > - cleanup_mask(qp_type, &mask); > + cleanup_mask(&device_attr, qp_type, &mask); > > qp_attr.path_mig_state = IBV_MIG_REARM; > > @@ -1584,7 +1586,7 @@ > > mask = IBV_QP_STATE | IBV_QP_PATH_MIG_STATE; > > - cleanup_mask(qp_type, &mask); > + cleanup_mask(&device_attr, qp_type, &mask); > > qp_attr.path_mig_state = IBV_MIG_REARM; > > committed. thanks Dotan From dotanb at dev.mellanox.co.il Wed Sep 20 23:51:35 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 09:51:35 +0300 Subject: [openib-general] gen2_basic patch 9/10: fix static_rate check In-Reply-To: <4511B2A9.8070105@pathscale.com> References: <4511B2A9.8070105@pathscale.com> Message-ID: <45123677.4050803@dev.mellanox.co.il> Robert Walsh wrote: > gen2_basic - fix static_rate check > > Make sure we're comparing apples to apples in the static_rate check. > > Signed-off by: Robert Walsh > > diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c > --- a/gen2_basic/test_qp.c 2006-09-13 19:17:17.835923000 -0700 > +++ b/gen2_basic/test_qp.c 2006-08-14 14:16:57.911621000 -0700 > @@ -659,7 +659,7 @@ > /* CHECK_VALUE("AV port_num", query_attr.ah_attr.port_num, attr->ah_attr.port_num, return -1); */ > CHECK_VALUE("AV sl", query_attr.ah_attr.sl, attr->ah_attr.sl, return -1); > CHECK_VALUE("AV src_path_bits", query_attr.ah_attr.src_path_bits, attr->ah_attr.src_path_bits, return -1); > - CHECK_VALUE("AV static_rate", query_attr.ah_attr.static_rate, !!attr->ah_attr.static_rate, return -1); > + CHECK_VALUE("AV static_rate", !!query_attr.ah_attr.static_rate, !!attr->ah_attr.static_rate, return -1); > if (query_attr.ah_attr.is_global) { > int i; > > committed. thanks Dotan From dotanb at dev.mellanox.co.il Wed Sep 20 23:51:53 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 09:51:53 +0300 Subject: [openib-general] gen2_basic patch 10/10: handle other vendor devices for max QP count In-Reply-To: <4511B2CD.2020104@pathscale.com> References: <4511B2CD.2020104@pathscale.com> Message-ID: <45123689.1080002@dev.mellanox.co.il> Robert Walsh wrote: > gen2_basic - handle other vendor devices for max QP count > > When choosing the actual max QP number, handle non-Mellanox devices too. > Make sure we only clean up the QPs we actually created. > > Signed-off by: Robert Walsh > > diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c > --- a/gen2_basic/test_qp.c 2006-09-13 19:18:03.655058000 -0700 > +++ b/gen2_basic/test_qp.c 2006-08-14 14:16:57.911621000 -0700 > @@ -1289,13 +1289,12 @@ > CHECK_PTR("ibv_create_cq", cq, goto cleanup); > > switch (device_attr.vendor_part_id) { > - case 23108: > - case 25208: > - num_qp = device_attr.max_qp; > - break; > case 25218: > case 25204: > num_qp = 15872; /* Found in experiments to be the max for memfree per process */ > + break; > + default: > + num_qp = device_attr.max_qp; > break; > } > > @@ -1330,7 +1329,7 @@ > WAIT_CLEANUP; > > if (qp) { > - for (i = 0; i < device_attr.max_qp + 1; ++ i) { > + for (i = 0; i < num_qp + 1; ++ i) { > if (qp[i]) { > rc = ibv_destroy_qp(qp[i]); > CHECK_VALUE("ibv_destroy_qp", rc, 0, test_result = -1); > committed. thanks Dotan From rjwalsh at pathscale.com Wed Sep 20 23:59:27 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 20 Sep 2006 23:59:27 -0700 Subject: [openib-general] gen2_basic patch boogum? In-Reply-To: <45122556.8090401@dev.mellanox.co.il> References: <451191AD.80608@pathscale.com> <45122556.8090401@dev.mellanox.co.il> Message-ID: <4512384F.3060304@pathscale.com> > All the rest of the attributes (for example: is_global) are being set > with 0. Oh, OK - I wasn't sure that you wanted it set that way or randomly like it used to be. No biggie. Regards, Robert. From kliteyn at dev.mellanox.co.il Thu Sep 21 00:30:38 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 21 Sep 2006 10:30:38 +0300 Subject: [openib-general] [PATCHv2] osm: fixing bugs in osmtest Message-ID: Hi Hal It appears that each mailer is messing with white spaces in its own very special way... Anyway, this time it is ok for sure. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmtest/include/osmtest.h =================================================================== --- osmtest/include/osmtest.h (revision 9585) +++ osmtest/include/osmtest.h (working copy) @@ -506,4 +506,13 @@ ib_api_status_t osmtest_get_local_port_lmc( IN osmtest_t * const p_osmt, IN ib_net16_t lid, OUT uint8_t * const p_lmc ); + + +/* + * A few auxiliary macros for logging + */ + +#define EXPECTING_ERRORS_START "[[ ===== Expecting Errors - START ===== " +#define EXPECTING_ERRORS_END " ===== Expecting Errors - END ===== ]]" + #endif /* _OSMTEST_H_ */ Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 9585) +++ osmtest/osmtest.c (working copy) @@ -552,6 +552,7 @@ osmtest_init( IN osmtest_t * const p_osm osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_init: ERR 0001: " "Unable to allocate vendor object" ); + status = IB_ERROR; goto Exit; } @@ -1817,6 +1818,11 @@ osmtest_wrong_sm_key_ignored( IN osmtest osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_wrong_sm_key_ignored: ERR 0011: " "Did not get a timeout but got (%s)\n", ib_get_err_str( status ) ); + if ( status == IB_SUCCESS ) + { + /* assign some error value to status, since IB_SUCCESS is a bad rc */ + status = IB_ERROR; + } goto Exit; } else @@ -5448,14 +5454,23 @@ osmtest_validate_against_db( IN osmtest_ memset( &context, 0, sizeof( context ) ); memset( &request, 0, sizeof( request ) ); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + if( status == IB_SUCCESS ) - goto Exit; - else { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_multipath_rec: " - "IS EXPECTED ERROR ^^^^\n"); + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5463,14 +5478,23 @@ osmtest_validate_against_db( IN osmtest_ request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT; request.sgid_count = 1; ib_gid_set_default( &request.gids[0], portguid ); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); - if( status == IB_SUCCESS ) - goto Exit; - else + if( status != IB_SUCCESS ) { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_multipath_rec: " - "IS EXPECTED ERROR ^^^^\n"); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + + if( status == IB_SUCCESS ) + { + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5482,14 +5506,23 @@ osmtest_validate_against_db( IN osmtest_ /* Set IPoIB broadcast MGID */ request.gids[1].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL); request.gids[1].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + if( status == IB_SUCCESS ) - goto Exit; - else { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_multipath_rec: " - "IS EXPECTED ERROR ^^^^\n"); + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5500,14 +5533,23 @@ osmtest_validate_against_db( IN osmtest_ request.gids[0].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL); request.gids[0].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL); ib_gid_set_default( &request.gids[1], portguid ); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + if( status == IB_SUCCESS ) - goto Exit; - else { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_multipath_rec_gid_ipoib_bcast: " - "IS EXPECTED ERROR ^^^^\n"); + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5569,14 +5611,23 @@ osmtest_validate_against_db( IN osmtest_ goto Exit; memset( &context, 0, sizeof( context ) ); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_pkeytbl_rec_by_lid( p_osmt, test_lid, 0, &context ); - if ( status == IB_SUCCESS ) - goto Exit; - else + if( status != IB_SUCCESS ) { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_pkeytbl_rec_by_lid: " - "IS EXPECTED ERROR ^^^^\n"); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" ); + + if( status == IB_SUCCESS ) + { + status = IB_ERROR; + goto Exit; } memset( &context, 0, sizeof( context ) ); @@ -5679,26 +5730,43 @@ osmtest_validate_against_db( IN osmtest_ goto Exit; memset( &context, 0, sizeof( context ) ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_START "\n" ); status = osmtest_get_path_rec_by_lid_pair( p_osmt, 0xffff, 0xffff, &context ); - if (status == IB_SUCCESS ) - goto Exit; - else + if( status != IB_SUCCESS ) { - osm_log ( &p_osmt->log, OSM_LOG_ERROR, + osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_get_path_rec_by_lid_pair: " - "IS EXPECTED ERROR ^^^^\n" ); + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_END "\n" ); + + if( status == IB_SUCCESS ) + { + status = IB_ERROR; + goto Exit; } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_START "\n" ); + status = osmtest_get_path_rec_by_lid_pair( p_osmt, test_lid, 0xffff, &context ); - if (status == IB_SUCCESS ) - goto Exit; - else + if( status != IB_SUCCESS ) { - osm_log ( &p_osmt->log, OSM_LOG_ERROR, + osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_get_path_rec_by_lid_pair: " - "IS EXPECTED ERROR ^^^^\n" ); + "Got error %s\n", ib_get_err_str(status) ); + } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_END "\n" ); + + if( status == IB_SUCCESS ) + { + status = IB_ERROR; + goto Exit; } } } @@ -7141,6 +7209,9 @@ osmtest_run( IN osmtest_t * const p_osmt if( p_osmt->opt.flow == 1 ) { + /* + * Creating an inventory file with all nodes, ports and paths + */ status = osmtest_create_inventory_file( p_osmt ); if( status != IB_SUCCESS ) { @@ -7155,6 +7226,9 @@ osmtest_run( IN osmtest_t * const p_osmt { if( p_osmt->opt.flow == 5 ) { + /* + * Stress SA - flood the it with queries + */ switch ( p_osmt->opt.stress ) { case 0: @@ -7215,8 +7289,11 @@ osmtest_run( IN osmtest_t * const p_osmt /* * Run normal validition tests. */ - if (!p_osmt->opt.flow || p_osmt->opt.flow == 2) + if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 2) { + /* + * Only validate the given inventory file + */ status = osmtest_create_db( p_osmt ); if( status != IB_SUCCESS ) { @@ -7238,7 +7315,7 @@ osmtest_run( IN osmtest_t * const p_osmt } } - if (!p_osmt->opt.flow) + if (p_osmt->opt.flow == 0) { status = osmtest_wrong_sm_key_ignored( p_osmt ); if( status != IB_SUCCESS ) @@ -7251,8 +7328,11 @@ osmtest_run( IN osmtest_t * const p_osmt } } - if (!p_osmt->opt.flow || p_osmt->opt.flow == 3) + if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 3) { + /* + * run service registration, deregistration, and lease test + */ status = osmt_run_service_records_flow( p_osmt ); if( status != IB_SUCCESS ) { @@ -7264,8 +7344,11 @@ osmtest_run( IN osmtest_t * const p_osmt } } - if (!p_osmt->opt.flow || p_osmt->opt.flow == 4) + if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 4) { + /* + * Run event forwarding test + */ #ifdef OSM_VENDOR_INTF_MTL status = osmt_run_inform_info_flow( p_osmt ); @@ -7286,12 +7369,13 @@ osmtest_run( IN osmtest_t * const p_osmt #endif } - /* - * since it generates a huge file, we run it only - * if explicitly required to - */ if (p_osmt->opt.flow == 7) { + /* + * QoS info: dump VLArb and SLtoVL tables. + * Since it generates a huge file, we run it only + * if explicitly required to + */ status = osmtest_create_db( p_osmt ); if( status != IB_SUCCESS ) { @@ -7315,6 +7399,9 @@ osmtest_run( IN osmtest_t * const p_osmt if (p_osmt->opt.flow == 8) { + /* + * Run trap 64/65 flow (this flow requires running of external tool) + */ #ifdef OSM_VENDOR_INTF_MTL status = osmt_run_trap64_65_flow( p_osmt ); if( status != IB_SUCCESS ) @@ -7334,8 +7421,11 @@ osmtest_run( IN osmtest_t * const p_osmt #endif } - if (!p_osmt->opt.flow || p_osmt->opt.flow == 6) + if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 6) { + /* + * Multicast flow + */ status = osmt_run_mcast_flow( p_osmt ); if( status != IB_SUCCESS ) { From mst at mellanox.co.il Thu Sep 21 01:09:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Sep 2006 11:09:08 +0300 Subject: [openib-general] [PATCHv2] osm: fixing bugs in osmtest In-Reply-To: References: Message-ID: <20060921080908.GC27123@mellanox.co.il> Quoting r. Yevgeny Kliteynik : > Subject: [PATCHv2] osm: fixing bugs in osmtest > > Hi Hal > > It appears that each mailer is messing with white spaces > in its own very special way... It's often not the mail agent's, but editor's fault: lots of editors mess up code if you cut and paste the patch - since it looks like regular text input to them. You want an editor that lets you inline the patch as is, taking it directly from file, not through the clipboard. E.g. I hear that kmail composer let's you do file/insert file to do that. -- MST From kliteyn at mellanox.co.il Thu Sep 21 01:31:46 2006 From: kliteyn at mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 21 Sep 2006 11:31:46 +0300 Subject: [openib-general] [PATCH][TRIVIAL]OpenSM/osm_node_info_rcv.c: Eliminate superfluous call level In-Reply-To: <1158672358.4509.4309.camel@hal.voltaire.com> References: <1158672358.4509.4309.camel@hal.voltaire.com> Message-ID: <1158827506.8655.28.camel@kliteynik.yok.mtl.com> Hi Hal. The patch looks OK. Regards, -- Yevgeny On Tue, 2006-09-19 at 09:25 -0400, Hal Rosenstock wrote: > OpenSM/osm_node_info_rcv.c: Eliminate superfluous call level > > Signed-off-by: Hal Rosenstock > Index: opensm/osm_node_info_rcv.c > =================================================================== > --- opensm/osm_node_info_rcv.c (revision 9536) > +++ opensm/osm_node_info_rcv.c (working copy) > @@ -437,7 +437,7 @@ __osm_ni_rcv_process_new_ca( > The plock must be held before calling this function. > **********************************************************************/ > static void > -__osm_ni_rcv_process_ca_port( > +__osm_ni_rcv_process_existing_ca( > IN const osm_ni_rcv_t* const p_rcv, > IN osm_node_t* const p_node, > IN const osm_madw_t* const p_madw ) > @@ -455,7 +455,7 @@ __osm_ni_rcv_process_ca_port( > osm_bind_handle_t h_bind; > cl_status_t cl_status; > > - OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_ca_port ); > + OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca ); > > p_smp = osm_madw_get_smp_ptr( p_madw ); > p_ni = (ib_node_info_t*)ib_smp_get_payload_ptr( p_smp ); > @@ -473,7 +473,7 @@ __osm_ni_rcv_process_ca_port( > if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) ) > { > osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, > - "__osm_ni_rcv_process_ca_port: " > + "__osm_ni_rcv_process_existing_ca: " > "Creating new port object with GUID = 0x%" PRIx64 "\n", > cl_ntoh64( p_ni->port_guid ) ); > > @@ -483,7 +483,7 @@ __osm_ni_rcv_process_ca_port( > if( p_port == NULL ) > { > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > - "__osm_ni_rcv_process_ca_port: ERR 0D04: " > + "__osm_ni_rcv_process_existing_ca: ERR 0D04: " > "Unable to create new port object\n" ); > goto Exit; > } > @@ -500,7 +500,7 @@ __osm_ni_rcv_process_ca_port( > Somehow, this port GUID already exists in the table. > */ > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > - "__osm_ni_rcv_process_ca_port: ERR 0D12: " > + "__osm_ni_rcv_process_existing_ca: ERR 0D12: " > "Port 0x%" PRIx64 " already in the database!\n", > cl_ntoh64( p_ni->port_guid ) ); > > @@ -521,7 +521,7 @@ __osm_ni_rcv_process_ca_port( > if( cl_status != CL_SUCCESS ) > { > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > - "__osm_ni_rcv_process_ca_port: ERR 0D08: " > + "__osm_ni_rcv_process_existing_ca: ERR 0D08: " > "Error %s adding to list\n", > CL_STATUS_MSG( cl_status ) ); > osm_port_delete( &p_port ); > @@ -530,7 +530,7 @@ __osm_ni_rcv_process_ca_port( > else > { > osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > - "__osm_ni_rcv_process_ca_port: " > + "__osm_ni_rcv_process_existing_ca: " > "Adding port GUID:0x%016" PRIx64 " to new_ports_list\n", > cl_ntoh64(osm_node_get_node_guid( p_port->p_node )) ); > } > @@ -547,7 +547,7 @@ __osm_ni_rcv_process_ca_port( > if ( !osm_physp_is_valid( p_physp ) ) > { > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > - "__osm_ni_rcv_process_ca_port: ERR 0D19: " > + "__osm_ni_rcv_process_existing_ca: ERR 0D19: " > "Invalid physical port. Aborting discovery\n"); > goto Exit; > } > @@ -579,7 +579,7 @@ __osm_ni_rcv_process_ca_port( > if( status != IB_SUCCESS ) > { > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > - "__osm_ni_rcv_process_ca_port: ERR 0D13: " > + "__osm_ni_rcv_process_existing_ca: ERR 0D13: " > "Failure initiating PortInfo request (%s)\n", > ib_get_err_str(status)); > } > @@ -592,22 +592,6 @@ __osm_ni_rcv_process_ca_port( > The plock must be held before calling this function. > **********************************************************************/ > static void > -__osm_ni_rcv_process_existing_ca( > - IN const osm_ni_rcv_t* const p_rcv, > - IN osm_node_t* const p_node, > - IN const osm_madw_t* const p_madw ) > -{ > - OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca ); > - > - __osm_ni_rcv_process_ca_port( p_rcv, p_node, p_madw ); > - > - OSM_LOG_EXIT( p_rcv->p_log ); > -} > - > -/********************************************************************** > - The plock must be held before calling this function. > -**********************************************************************/ > -static void > __osm_ni_rcv_process_new_router( > IN const osm_ni_rcv_t* const p_rcv, > IN osm_node_t* const p_node, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Thu Sep 21 01:35:10 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 21 Sep 2006 11:35:10 +0300 (IDT) Subject: [openib-general] timer_pending kernel assertion while stopping IPoIB Message-ID: I just got the following assertion: KERNEL: assertion (!timer_pending(&dev->watchdog_timer)) failed at net/sched/sch_generic.c (631) which is the outcome of this line in dev_shutdown() BUG_TRAP(!timer_pending(&dev->watchdog_timer)); when running a sctipt that does modprobe ib_ipoib echo 1 > /sys/module/ib_ipoib/parameters/mcast_debug_level echo 1 > /sys/module/ib_ipoib/parameters/debug_level ifconfig ib0 192.168.10.118 and then after some time a script that does ifconfig ib0 down ifconfig ib1 down modprobe -r ib_ipoib below is the dmesg, it might help, the kernel is net-2.6.19 git Or. ib0: bringing up interface ib0: starting multicast thread ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: restarting multicast task ib0: stopping multicast thread ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4) ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001 ib0: starting multicast thread ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: restarting multicast task ib0: stopping multicast thread ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4) ib0: starting multicast thread ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) ib0: Created ah ffff8100200517c0 ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff8100200517c0, LID 0xc000, SL 0 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001 ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0) ib0: Created ah ffff810035efec40 ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff810035efec40, LID 0xc001, SL 0 ib0: successfully joined all multicast groups ib0: restarting multicast task ib0: stopping multicast thread ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001 ib0: starting multicast thread ib0: successfully joined all multicast groups ib0: stopping interface ib0: downing ib_dev ib0: stopping multicast thread ib0: flushing multicast list ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff ib0: All sends and receives done. KERNEL: assertion (!timer_pending(&dev->watchdog_timer)) failed at net/sched/sch_generic.c (631) ib0: cleaning up ib_dev ib0: stopping multicast thread ib0: flushing multicast list ib1: cleaning up ib_dev ib1: stopping multicast thread ib1: flushing multicast list From erezz at voltaire.com Thu Sep 21 02:52:12 2006 From: erezz at voltaire.com (Erez Zilber) Date: Thu, 21 Sep 2006 12:52:12 +0300 Subject: [openib-general] 2 SLES 10 backport directories In-Reply-To: <20060920210716.GD9724@mellanox.co.il> References: <450915EE.1090705@voltaire.com> <20060917044626.GA26054@mellanox.co.il> <451137D9.3060607@voltaire.com> <20060920210716.GD9724@mellanox.co.il> Message-ID: <451260CC.6090805@voltaire.com> >>>> >> I think that we can throw away >> https://openib.org/svn/gen2/branches/backport/sles10/. These patches >> apply to SLES 10 beta 8. They are no longer needed. As for >> https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/, it >> contains 2 iSER patches. Both affect only iSER code (nothing in >> open-iscsi or any other kernel code). Therefore, I think that it's ok. >> > > Go ahead and kill backport/sles10 then. > But the whole backport dir should be updated from OFED tree > or better killed once Sean switches to git. > I will delete it. However, I don't understand what do you mean by saying "the whole backport dir should be updated from OFED tree". Erez From dotanb at dev.mellanox.co.il Thu Sep 21 04:51:40 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 14:51:40 +0300 Subject: [openib-general] gen2_basic patch boogum? In-Reply-To: <4511AFD0.9030902@pathscale.com> References: <451191AD.80608@pathscale.com> <4511AFD0.9030902@pathscale.com> Message-ID: <45127CCC.80609@dev.mellanox.co.il> Robert Walsh wrote: > Another quick question: I noticed that in the latest changes your > pushed, including my patches, you removed the following check in test_qp.c: > > @@ -1702,7 +1700,6 @@ > CHECK_VALUE("qp_type", query_init_attr.qp_type, > attr.qp_type, goto cleanup); > CHECK_VALUE_PTR("recv_cq", query_init_attr.recv_cq, > attr.recv_cq, goto cleanup); > CHECK_VALUE_PTR("send_cq", query_init_attr.send_cq, > attr.send_cq, goto cleanup); > - CHECK_VALUE("sq_sig_all", query_init_attr.sq_sig_all, > attr.sq_sig_all, goto cleanup); > CHECK_VALUE_PTR("srq", query_init_attr.srq, attr.srq, > goto cleanup); > } > PASSED; > > Any particular reason why you removed this? I don't ever remember this > being a problem on ipath or mthca. > > Regards, > Robert. > Yes, in the IB spec, the query QP don't have to return the signal type of the SQ. so, i prefer not to check an attribute that is not mandatory. Dotan From aviram at mellanox.co.il Thu Sep 21 05:34:52 2006 From: aviram at mellanox.co.il (Aviram Gutman) Date: Thu, 21 Sep 2006 15:34:52 +0300 Subject: [openib-general] OFED 1.1 RC6 Message-ID: <2D5DEE3C6A0E0244B0133244731D4C4BC03C@mtlexch01.mtl.com> Hi, OFED-1.1-rc6 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-rc6.tgz Please report any issues in bugzilla http://openib.org/bugzilla/ OS support: =========== Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up3 - Redhat EL4 up4* kernel.org: - Kernel 2.6.17 * Please notice that there is no IPoIB multicast support in Redhat EL4 up4. A kernel patch will be in the documentation and customers that needs it needs to use it Limitations and known issues: ================================= 1. OpenIB Diags build on SLES10 ppc - Mellanox to fix - Vlad 2. iSER build on SLES10 needs root privilege - Voltaire supplied a fix that will be integrated next week 3. Bug #233 SDP crash on ipath - not a showstopper to the release 4. Bug #33 ping fails on ib1 with Silverstorm switch - We couldn't reproduce the problem. Silverstorm, Please debug 5. Fix IBDM to allow multiple devices on the same machine - Eitan Zahavi Schedule: =========' Seems that we need another RC to fix the issue. Will RC7 on Monday or Tuesday next week (hopefully with minor fixes) and a final release by end of next is OK with every one? Thanks, Aviram -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Sep 21 05:39:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2006 08:39:02 -0400 Subject: [openib-general] [PATCHv2] osm: fixing bugs in osmtest In-Reply-To: References: Message-ID: <1158842325.4509.111417.camel@hal.voltaire.com> Hi Yevgeny, On Thu, 2006-09-21 at 03:30, Yevgeny Kliteynik wrote: > Hi Hal > > It appears that each mailer is messing with white spaces > in its own very special way... > > Anyway, this time it is ok for sure. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Yes, that's better! Thanks. Applied to trunk only. -- Hal From eli at dev.mellanox.co.il Thu Sep 21 06:10:25 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Thu, 21 Sep 2006 16:10:25 +0300 Subject: [openib-general] [PATCH] IB/ipoib: likely/unlikely annotations Message-ID: <1158844225.24776.118.camel@localhost> Use likely/unlikely in data tx flow Signed-off-by: Eli Cohen Acked-by: Michael S. Tsirkin --- Index: openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- openib-1.1-rc6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-09-21 15:43:49.000000000 +0300 +++ openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-09-21 15:46:26.000000000 +0300 @@ -643,7 +643,7 @@ struct ipoib_neigh *neigh; unsigned long flags; - if (!spin_trylock_irqsave(&priv->tx_lock, flags)) + if (unlikely(!spin_trylock_irqsave(&priv->tx_lock, flags))) return NETDEV_TX_LOCKED; /* @@ -656,7 +656,7 @@ return NETDEV_TX_BUSY; } - if (skb->dst && skb->dst->neighbour) { + if (likely(skb->dst && skb->dst->neighbour)) { if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) { ipoib_path_lookup(skb, dev); goto out; From eli at dev.mellanox.co.il Thu Sep 21 06:39:26 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Thu, 21 Sep 2006 16:39:26 +0300 Subject: [openib-general] [PATCH] IB/ipoib: unlikely in send Message-ID: <1158845966.24776.122.camel@localhost> Use unlikely in send flow Signed-off-by: Eli Cohen --- Index: openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- openib-1.1-rc6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2006-09-21 16:19:33.000000000 +0300 +++ openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2006-09-21 16:20:39.000000000 +0300 @@ -385,7 +385,7 @@ struct ipoib_tx_buf *tx_req; dma_addr_t addr; - if (skb->len > dev->mtu + INFINIBAND_ALEN) { + if (unlikely(skb->len > dev->mtu + INFINIBAND_ALEN)) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", skb->len, dev->mtu + INFINIBAND_ALEN); ++priv->stats.tx_dropped; From eli at dev.mellanox.co.il Thu Sep 21 07:56:32 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Thu, 21 Sep 2006 17:56:32 +0300 Subject: [openib-general] heads-up - ipoib NAPI Message-ID: <1158850592.24776.156.camel@localhost> Hi, I have a draft implementation of NAPI in ipoib and got the following results: System descriptions =================== Quad CPU E64T 2.4 Ghz 4 GB RAM MT25204 Sinai HCA I used netperf for benchmarking, the BW test ran for 600 seconds with 8 clients and 8 servers. The results I received are bellow: netperf TCP_STREAM: BW [MByte/sec] clients side [irqs/sec] server side [irqs/sec] -------------- ----------------------- ---------------------- without NAPI: 506 86441 66311 with NAPI: 550 6830 13600 netperf TCP_RR: rate [tran/sec] --------------- without NAPI: 39600 with NAPI: 39470 Please note this is still under work and we plan to do more tests and measure on other devices. From eli at dev.mellanox.co.il Thu Sep 21 07:57:37 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Thu, 21 Sep 2006 17:57:37 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI Message-ID: <1158850657.24776.158.camel@localhost> This patch implements NAPI for iopib. It is a draft implementation. I would like your opinion on whether we need a module parameter to control if NAPI should be activated or not. Also there is a need to implement peek_cq and call it for ib_req_notify_cq() so as to know if there is a need to call netif_rx_schedule_prep() again. Signed-off-by: Eli Cohen --- Index: openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- openib-1.1-rc6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-09-21 16:30:35.000000000 +0300 +++ openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-09-21 16:30:42.000000000 +0300 @@ -69,6 +69,8 @@ MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0"); #endif +static const int poll_def_weight = 64; + struct ipoib_path_iter { struct net_device *dev; struct ipoib_path path; @@ -91,6 +93,9 @@ .remove = ipoib_remove_one }; + +int ipoib_poll(struct net_device *dev, int *budget); + int ipoib_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -689,6 +694,7 @@ goto out; } + if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) { spin_lock(&priv->lock); __skb_queue_tail(&neigh->queue, skb); @@ -892,6 +898,7 @@ /* Delete any child interfaces first */ list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) { + netif_poll_disable(priv->dev); unregister_netdev(cpriv->dev); ipoib_dev_cleanup(cpriv->dev); free_netdev(cpriv->dev); @@ -919,6 +926,8 @@ dev->hard_header = ipoib_hard_header; dev->set_multicast_list = ipoib_set_mcast_list; dev->neigh_setup = ipoib_neigh_setup_dev; + dev->poll = ipoib_poll; + dev->weight = poll_def_weight; dev->watchdog_timeo = HZ; @@ -1097,6 +1106,8 @@ goto register_failed; } + netif_poll_enable(priv->dev); + ipoib_create_debug_files(priv->dev); if (ipoib_add_pkey_attr(priv->dev)) @@ -1111,6 +1122,7 @@ return priv->dev; sysfs_failed: + netif_poll_disable(priv->dev); ipoib_delete_debug_files(priv->dev); unregister_netdev(priv->dev); @@ -1168,6 +1180,7 @@ dev_list = ib_get_client_data(device, &ipoib_client); list_for_each_entry_safe(priv, tmp, dev_list, list) { + netif_poll_disable(priv->dev); ib_unregister_event_handler(&priv->event_handler); flush_scheduled_work(); Index: openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- openib-1.1-rc6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2006-09-21 16:30:38.000000000 +0300 +++ openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2006-09-21 17:24:59.000000000 +0300 @@ -169,7 +169,7 @@ return 0; } -static void ipoib_ib_handle_wc(struct net_device *dev, +static void ipoib_ib_handle_rwc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -178,122 +178,186 @@ ipoib_dbg_data(priv, "called: id %d, op %d, status: %d\n", wr_id, wc->opcode, wc->status); - if (wr_id & IPOIB_OP_RECV) { - wr_id &= ~IPOIB_OP_RECV; - - if (wr_id < ipoib_recvq_size) { - struct sk_buff *skb = priv->rx_ring[wr_id].skb; - dma_addr_t addr = priv->rx_ring[wr_id].mapping; - - if (unlikely(wc->status != IB_WC_SUCCESS)) { - if (wc->status != IB_WC_WR_FLUSH_ERR) - ipoib_warn(priv, "failed recv event " - "(status=%d, wrid=%d vend_err %x)\n", - wc->status, wr_id, wc->vendor_err); - dma_unmap_single(priv->ca->dma_device, addr, - IPOIB_BUF_SIZE, DMA_FROM_DEVICE); - dev_kfree_skb_any(skb); - priv->rx_ring[wr_id].skb = NULL; - return; - } - - /* - * If we can't allocate a new RX buffer, dump - * this packet and reuse the old buffer. - */ - if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) { - ++priv->stats.rx_dropped; - goto repost; - } - - ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", - wc->byte_len, wc->slid); + wr_id &= ~IPOIB_OP_RECV; + if (wr_id < ipoib_recvq_size) { + struct sk_buff *skb = priv->rx_ring[wr_id].skb; + dma_addr_t addr = priv->rx_ring[wr_id].mapping; + + if (unlikely(wc->status != IB_WC_SUCCESS)) { + if (wc->status != IB_WC_WR_FLUSH_ERR) + ipoib_warn(priv, "failed recv event " + "(status=%d, wrid=%d vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); dma_unmap_single(priv->ca->dma_device, addr, IPOIB_BUF_SIZE, DMA_FROM_DEVICE); + dev_kfree_skb_any(skb); + priv->rx_ring[wr_id].skb = NULL; + return; + } - skb_put(skb, wc->byte_len); - skb_pull(skb, IB_GRH_BYTES); + /* + * If we can't allocate a new RX buffer, dump + * this packet and reuse the old buffer. + */ + if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) { + ++priv->stats.rx_dropped; + goto repost; + } - if (wc->slid != priv->local_lid || - wc->src_qp != priv->qp->qp_num) { - skb->protocol = ((struct ipoib_header *) skb->data)->proto; - skb->mac.raw = skb->data; - skb_pull(skb, IPOIB_ENCAP_LEN); - - dev->last_rx = jiffies; - ++priv->stats.rx_packets; - priv->stats.rx_bytes += skb->len; - - skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; - netif_rx_ni(skb); - } else { - ipoib_dbg_data(priv, "dropping loopback packet\n"); - dev_kfree_skb_any(skb); - } + ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", + wc->byte_len, wc->slid); - repost: - if (unlikely(ipoib_ib_post_receive(dev, wr_id))) - ipoib_warn(priv, "ipoib_ib_post_receive failed " - "for buf %d\n", wr_id); - } else - ipoib_warn(priv, "completion event with wrid %d\n", - wr_id); + dma_unmap_single(priv->ca->dma_device, addr, + IPOIB_BUF_SIZE, DMA_FROM_DEVICE); - } else { - struct ipoib_tx_buf *tx_req; - unsigned long flags; + skb_put(skb, wc->byte_len); + skb_pull(skb, IB_GRH_BYTES); - if (wr_id >= ipoib_sendq_size) { - ipoib_warn(priv, "completion event with wrid %d (> %d)\n", - wr_id, ipoib_sendq_size); - return; + if (wc->slid != priv->local_lid || + wc->src_qp != priv->qp->qp_num) { + skb->protocol = ((struct ipoib_header *) skb->data)->proto; + skb->mac.raw = skb->data; + skb_pull(skb, IPOIB_ENCAP_LEN); + + dev->last_rx = jiffies; + ++priv->stats.rx_packets; + priv->stats.rx_bytes += skb->len; + + skb->dev = dev; + /* XXX get correct PACKET_ type here */ + skb->pkt_type = PACKET_HOST; + netif_receive_skb(skb); + } else { + ipoib_dbg_data(priv, "dropping loopback packet\n"); + dev_kfree_skb_any(skb); } - ipoib_dbg_data(priv, "send complete, wrid %d\n", wr_id); + repost: + if (unlikely(ipoib_ib_post_receive(dev, wr_id))) + ipoib_warn(priv, "ipoib_ib_post_receive failed " + "for buf %d\n", wr_id); + } else + ipoib_warn(priv, "completion event with wrid %d\n", + wr_id); - tx_req = &priv->tx_ring[wr_id]; +} - dma_unmap_single(priv->ca->dma_device, - pci_unmap_addr(tx_req, mapping), - tx_req->skb->len, - DMA_TO_DEVICE); - ++priv->stats.tx_packets; - priv->stats.tx_bytes += tx_req->skb->len; +static void ipoib_ib_handle_swc(struct net_device *dev, + struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned int wr_id = wc->wr_id; + struct ipoib_tx_buf *tx_req; + unsigned long flags; - dev_kfree_skb_any(tx_req->skb); + ipoib_dbg_data(priv, "called: id %d, op %d, status: %d\n", + wr_id, wc->opcode, wc->status); - spin_lock_irqsave(&priv->tx_lock, flags); - ++priv->tx_tail; - if (netif_queue_stopped(dev) && - test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) - netif_wake_queue(dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); - - if (wc->status != IB_WC_SUCCESS && - wc->status != IB_WC_WR_FLUSH_ERR) - ipoib_warn(priv, "failed send event " - "(status=%d, wrid=%d vend_err %x)\n", - wc->status, wr_id, wc->vendor_err); + if (wr_id >= ipoib_sendq_size) { + ipoib_warn(priv, "completion event with wrid %d (> %d)\n", + wr_id, ipoib_sendq_size); + return; } + + ipoib_dbg_data(priv, "send complete, wrid %d\n", wr_id); + + tx_req = &priv->tx_ring[wr_id]; + + dma_unmap_single(priv->ca->dma_device, + pci_unmap_addr(tx_req, mapping), + tx_req->skb->len, + DMA_TO_DEVICE); + + ++priv->stats.tx_packets; + priv->stats.tx_bytes += tx_req->skb->len; + + dev_kfree_skb_any(tx_req->skb); + + spin_lock_irqsave(&priv->tx_lock, flags); + ++priv->tx_tail; + if (netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && + priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) + netif_wake_queue(dev); + spin_unlock_irqrestore(&priv->tx_lock, flags); + + if (wc->status != IB_WC_SUCCESS && + wc->status != IB_WC_WR_FLUSH_ERR) + ipoib_warn(priv, "failed send event " + "(status=%d, wrid=%d vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); } -void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) +static inline int is_rx_comp(struct ib_wc *wc) +{ + unsigned int wr_id = wc->wr_id; + + if (wr_id & IPOIB_OP_RECV) + return 1; + + return 0; +} + +int ipoib_poll(struct net_device *dev, int *budget) { - struct net_device *dev = (struct net_device *) dev_ptr; struct ipoib_dev_priv *priv = netdev_priv(dev); int n, i; + struct ib_cq *cq = priv->cq; + int quota = dev->quota; + int wc; + int rx = 0; + int tx = 0; - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); do { - n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc); - for (i = 0; i < n; ++i) - ipoib_ib_handle_wc(dev, priv->ibwc + i); - } while (n == IPOIB_NUM_WC); + wc = min_t(int, quota, IPOIB_NUM_WC); + n = ib_poll_cq(cq, wc, priv->ibwc); + for (i = 0; i < n; ++i) { + if (is_rx_comp(priv->ibwc + i)) { + ++rx; + --quota; + ipoib_ib_handle_rwc(dev, priv->ibwc + i); + } + else { + ++tx; + ipoib_ib_handle_swc(dev, priv->ibwc + i); + } + + if (unlikely(quota <= 0)) + goto not_done; + } + } while (n == wc); + + if (rx || tx) + goto not_done; + + + netif_rx_complete(dev); + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + /* TODO we need peek_cq here for hw devices that + could would not generate interrupts for completions + arriving between end of polling till request notify */ + + return 0; + +not_done: + *budget -= rx; + dev->quota = quota; + return 1; +} + +void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) +{ + struct net_device *dev = (struct net_device *) dev_ptr; + struct ipoib_dev_priv *priv = netdev_priv(dev); + + /* tell the network layer we have packts */ + if (netif_rx_schedule_prep(dev)) + __netif_rx_schedule(dev); + else { + ipoib_warn(priv, "received interupt while in polling\n"); + } } static inline int post_send(struct ipoib_dev_priv *priv, From trimmer at silverstorm.com Thu Sep 21 08:10:09 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Thu, 21 Sep 2006 11:10:09 -0400 Subject: [openib-general] Completion callback /teardown race In-Reply-To: <20060920051420.GH1710@mellanox.co.il> Message-ID: > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Wednesday, September 20, 2006 1:14 AM > To: Tillier, Fabian > Cc: Rimmer, Todd; openib-general at openib.org > Subject: Re: Completion callback /teardown race > > Quoting r. Fabian Tillier : > > > There are some differences in HCA behaviour with regard to > > > ib_req_notify_cq. Mellanox HCAs will provide a callback/interrupt if > > > the CQ is not empty at this point (in which case the poll_cq's after > the > > > notify are optional). > > > > > > However the behaviour defined in the IBTA spec indicates that > > > ib_req_notify_cq will cause a callback/interrupt only on the next CQE > > > which arrives, hence to be portable the poll_cq loop after > > > ib_req_notify_cq is necessary to cover any CQEs which arrived between > > > the prior poll and the ib_req_notify_cq. > > > > I remember a while ago a mention that the behavior of the Mellanox > > HCAs could be controlled in the firmware, so that they would follow > > the IBTA spec defined behavior. > > There's a mistake here. Mellanox HCAs will generate an event upon > ib_req_notify_cq only if new completions has arrived after the previous > event > has been reported. > > AFAIK this is IBTA spec compliant. I agree the Mellanox HCA is spec compliant. The difference between HCAs is how they handle the situation: CQE arrives HCA generates event/callback poll CQ, remove CQE poll CQ, detect CQ is empty CQE arrives ib_req_notify_cq At this point a Mellanox HCA will generate an event (as Michael indicates, an unprocessed CQE has arrived since the previous event). Many other HCAs given this situation will not generate an event, instead they generate an event when a CQE arrives after the ib_req_notify_cq. Hence to support other HCAs, ULPs should poll the CQ after the ib_req_notify_cq. On any HCA model, ULPs should be prepared for a callback where the CQ is empty. There are situations in either approach which can introduce an extra callback after the CQ has been emptied. Todd Rimmer From mst at mellanox.co.il Thu Sep 21 08:09:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Sep 2006 18:09:29 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <1158850657.24776.158.camel@localhost> References: <1158850657.24776.158.camel@localhost> Message-ID: <20060921150929.GC28717@mellanox.co.il> Quoting r. Eli cohen : > Also there is a need to implement peek_cq and call it for > ib_req_notify_cq() so as to know if there is a need to call > netif_rx_schedule_prep() again. Thanks, Eli. Implementing peek_cq is not hard, at least for mthca. I wander what we should do if peek_cq is not available in the low level driver. I guess we could just disable NAPI for this case - Roland, would that be acceptable? -- MST From rdreier at cisco.com Thu Sep 21 08:20:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Sep 2006 08:20:28 -0700 Subject: [openib-general] timer_pending kernel assertion while stopping IPoIB In-Reply-To: (Or Gerlitz's message of "Thu, 21 Sep 2006 11:35:10 +0300 (IDT)") References: Message-ID: Or> the kernel is net-2.6.19 git My first guess would be it's a bug introduced in the net-2.6.19 tree. Can you reproduce it with plain 2.6.18 and/or my for-2.6.19 branch? - R. From dotanb at dev.mellanox.co.il Thu Sep 21 08:26:43 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 18:26:43 +0300 Subject: [openib-general] [ipoib] [PATCH] - Removed unused include of vmalloc.h Message-ID: <4512AF33.7090002@dev.mellanox.co.il> IPoIB: Removed unused include of vmalloc.h. Signed-off-by: Dotan Barak --- Index: last_stable/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-08-07 17:45:02.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-08-08 09:36:45.000000000 +0300 @@ -40,7 +40,6 @@ #include #include -#include #include #include /* For ARPHRD_xxx */ From dotanb at dev.mellanox.co.il Thu Sep 21 08:45:14 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 21 Sep 2006 18:45:14 +0300 Subject: [openib-general] all of the man pages should change the package name to OFED Message-ID: <4512B38A.8060002@dev.mellanox.co.il> Hi. When i executed "man ibv_devinfo" or "man ibstat" (for example) i notices that those man pages are marked as part of the OpenIB package. I believe that the package name should be changed to OFED. what do you think? thanks Dotan From sean.hefty at intel.com Thu Sep 21 09:54:22 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 21 Sep 2006 09:54:22 -0700 Subject: [openib-general] RDMA CM callback status In-Reply-To: <200609210540.k8L5eBce029142@robert.bartonsoftware.com> Message-ID: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com> >1. Should I even be looking at event->status or does the event type tell me > everything I need to know? I've had a report that the assertion > (event->status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR. The event type is usually sufficient. In the case of an error, the status should provide some additional information regarding the type of error. It sounds like (and looks like from reading the code) that you've hit a bug with the ROUTE_ERROR event. The failure status isn't being propagated up to the user. >2. /* handle error out-of-line */ above means I record failure in my connection > data structure, start teardown and drop the callback's reference on it. > When the last reference goes, the connection data structure is queued for > final destruction (including rdma_destroy_id(cmid)). > > Given that this might race with the callback's caller is this OK? Yes - The RDMA CM holds a reference on the cmid while in a callback, and drops it once the callback returns. rdma_destroy_id() will block until all references are released on the cmid. - Sean From sweitzen at cisco.com Thu Sep 21 10:03:57 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 21 Sep 2006 10:03:57 -0700 Subject: [openib-general] Cisco SQA test results for OFED 1.1 rc5 Message-ID: All testing was done on RHEL4 U3. No new bugs were found, overall things are looking very good. We have still not done any testing on RHEL4 U4, SLES 10, IPoIB HA, and SRP HA yet. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed_sqa_results.xls Type: application/vnd.ms-excel Size: 106496 bytes Desc: ofed_sqa_results.xls URL: From rdreier at cisco.com Thu Sep 21 10:10:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Sep 2006 10:10:15 -0700 Subject: [openib-general] all of the man pages should change the package name to OFED In-Reply-To: <4512B38A.8060002@dev.mellanox.co.il> (Dotan Barak's message of "Thu, 21 Sep 2006 18:45:14 +0300") References: <4512B38A.8060002@dev.mellanox.co.il> Message-ID: Dotan> Hi. When i executed "man ibv_devinfo" or "man ibstat" (for Dotan> example) i notices that those man pages are marked as part Dotan> of the OpenIB package. I believe that the package name Dotan> should be changed to OFED. Not for the libibverbs stuff, since many distributions (Debian, Ubuntu, Fedora) include libibverbs without OFED. I guess I'll just delete the OpenIB references from the man pages. - R. From mvharish at gmail.com Thu Sep 21 10:24:20 2006 From: mvharish at gmail.com (harish) Date: Thu, 21 Sep 2006 10:24:20 -0700 Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: <1158850592.24776.156.camel@localhost> References: <1158850592.24776.156.camel@localhost> Message-ID: Hi Eli, Thanks for sharing the results with us. It is great to see the reduction in Interrupts. Could you please specify the netperf test specifications [message size; socket size]. Wondering what the numbers would be if we use large socket and message sizes [128K & 64K respectively]. The reason for the request is to make sure we are not hitting any TCP related bottleneck while comparing NAPI vs. no NAPI cases. Please let me know what you think. Thanks, harish On 9/21/06, Eli cohen wrote: > > Hi, > > I have a draft implementation of NAPI in ipoib and got the following > results: > > System descriptions > =================== > Quad CPU E64T 2.4 Ghz > 4 GB RAM > MT25204 Sinai HCA > > I used netperf for benchmarking, the BW test ran for 600 seconds with 8 > clients and 8 servers. > > The results I received are bellow: > > netperf TCP_STREAM: > BW [MByte/sec] clients side [irqs/sec] server side > [irqs/sec] > -------------- ----------------------- > ---------------------- > without NAPI: 506 86441 66311 > with NAPI: 550 6830 13600 > > > netperf TCP_RR: > rate [tran/sec] > --------------- > without NAPI: 39600 > with NAPI: 39470 > > > > Please note this is still under work and we plan to do more tests and > measure on other devices. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli at dev.mellanox.co.il Thu Sep 21 10:37:59 2006 From: eli at dev.mellanox.co.il (eli at dev.mellanox.co.il) Date: Thu, 21 Sep 2006 20:37:59 +0300 (IDT) Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: References: <1158850592.24776.156.camel@localhost> Message-ID: <61399.85.250.167.59.1158860279.squirrel@dev.mellanox.co.il> > Hi Eli, > > Thanks for sharing the results with us. It is great to see the reduction > in > Interrupts. Could you please specify the netperf test specifications > [message size; socket size]. Wondering what the numbers would be if we use > large socket and message sizes [128K & 64K respectively]. The reason for > the > request is to make sure we are not hitting any TCP related bottleneck > while > comparing NAPI vs. no NAPI cases. Please let me know what you think. I used large socket buffer sizes. Here is the command line I used. The reult for the bandwidth is the some of all the connections. netperf -H 11.4.3.144 -l 600 -f M -p $port -- -s 200000,200000 -S 200000,200000 From rdreier at cisco.com Thu Sep 21 11:07:30 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Sep 2006 11:07:30 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <1158850657.24776.158.camel@localhost> (Eli cohen's message of "Thu, 21 Sep 2006 17:57:37 +0300") References: <1158850657.24776.158.camel@localhost> Message-ID: Looks pretty good. I took a stab at implementing this myself, and it seems we came to the same conclusion: for generic HCAs that have a race between request notify and poll CQ, there is no alternative except "peek CQ". However I don't think we want to use peek CQ always -- I think that extra CQ lock/unlock may kill a lot of the performance gain you see with NAPI (and I don't think even mthca can do a lockless CQ peek, since we need to protect against races with resize CQ, etc). So probably what we need is a feature bit in the struct ib_device to say whether the peek CQ is needed or whether req notify will generate events for existing CQEs. You might want to respin your patch against my for-2.6.19 branch -- I already split up the handle WC routine into separate send and receive functions, so the patch will become much smaller. Also, the handling of how many completions to poll and the logic of when to call netif_rx_complete() seems very strange to me. First, you totally ignore the budget parameter, so you may end up doing more work than the networking upper layers want. Second, you often leave the poll routine to run one more time, even if you've drained the CQ without using up your work quota. My poll routine ended up looking like the following, which I think is more correct: +int ipoib_poll(struct net_device *dev, int *budget) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + int max = min(*budget, dev->quota); + int done = 0; + int t; + int empty = 0; + int n, i; + + while (max) { + t = min(IPOIB_NUM_WC, max); + n = ib_poll_cq(priv->cq, t, priv->ibwc); + + for (i = 0; i < n; ++i) { + if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) { + ++done; + --max; + ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); + } else + ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); + } + + if (n != t) { + empty = 1; + break; + } + } + + dev->quota -= done; + *budget -= done; + + if (empty) { + netif_rx_complete(dev); + ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP); + /* XXX rotting packet! */ + return 0; + } + + return 1; +} From rdreier at cisco.com Thu Sep 21 11:10:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Sep 2006 11:10:27 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060921150929.GC28717@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 21 Sep 2006 18:09:29 +0300") References: <1158850657.24776.158.camel@localhost> <20060921150929.GC28717@mellanox.co.il> Message-ID: Michael> Thanks, Eli. Implementing peek_cq is not hard, at least Michael> for mthca. I wander what we should do if peek_cq is not Michael> available in the low level driver. I guess we could just Michael> disable NAPI for this case - Roland, would that be Michael> acceptable? Actually as I mentioned in my reply to Eli, mthca doesn't actually need the peek CQ operation, and I don't think we want IPoIB to be doing a peek CQ for mthca devices. But I'd rather not have to maintain both a NAPI and non-NAPI IPoIB completion path, so I think the thing to do would be to implement peek CQ for all devices that don't have the "event for existing CQE" behavior of req_notify_cq. - R. From rdreier at cisco.com Thu Sep 21 11:46:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Sep 2006 11:46:28 -0700 Subject: [openib-general] [ipoib] [PATCH] - Removed unused include of vmalloc.h In-Reply-To: <4512AF33.7090002@dev.mellanox.co.il> (Dotan Barak's message of "Thu, 21 Sep 2006 18:26:43 +0300") References: <4512AF33.7090002@dev.mellanox.co.il> Message-ID: Thanks, applied by hand to for-2.6.19 although your patch was corrupted (line wrapped, whitespace damage at least) I merge > 100 patches every kernel release. If I have to spend an extra 5 minutes for each one fixing a patch or pulling it out of svn, then I end up burning an extra 9 hours of stupid work. If 20+ people who contribute patches sent me clean patches, then everyone will be happier because I'll be able to merge things quicker and focus on productive work. From amit_byron at yahoo.com Thu Sep 21 12:47:28 2006 From: amit_byron at yahoo.com (amit byron) Date: Thu, 21 Sep 2006 19:47:28 +0000 (UTC) Subject: [openib-general] =?utf-8?q?max_message_size_for_IB=5FWR=5FSEND?= References: <4512244D.4040404@dev.mellanox.co.il> Message-ID: Dotan Barak dev.mellanox.co.il> writes: > > Hi. > > amit byron wrote: > > hi, > > > > if i evoke/call ib_post_send(IB_WR_SEND) with message > > size 512 bytes, the message gets received on the > > peer (second) node. the 2 nodes are connected point-to > > -point. > > > > but if message size is increased to 4096 bytes then > > second node receives the message; but message content > > is missing (empty). > > > > won't infiniband stack break down message in smaller > > chunks and assemble on peer node? > > > > thanks, > > Amit. > > > Which transport type are you using? > if you are using a UD QP, then the answer is no. > for any other transport type, the answer is yes (the message is being > break down to packets with the MTU side as specified in the QP context. > > maybe you have a different problem in you code. did you check the > completion status in both of the nodes? > > Dotan > > i'm using RC connection. the issue seems to occur only when running in xen's domain 0 (xen0). on core linux kernel, the code works -- i'm able to do both send message and perform rdma write with size greater than 4096. i don't see any errors reported while sending a message with size greater than 4096 (same hold true for rdma write). i'm able send message (greater than 4096 bytes) from code running in core linux kernel to peer node code that is running in xen's domain 0. this suggest that there is some hard-limit that prevents infiniband to send message; but no errors are reported from infiniband stack. any suggestions on how to enable tracing in hca driver? thanks, Amit. From bgreen at nas.nasa.gov Thu Sep 21 13:22:01 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Thu, 21 Sep 2006 13:22:01 -0700 Subject: [openib-general] ib_rdma_bw measures 1.2G vs. 1.4G Message-ID: <200609212022.k8LKM1eM007702@ece06.nas.nasa.gov> Hello, I've been testing rdma bandwidth between a number of machines using ib_rdma_bw, and I consistently see two approximate bandwidths, 1.4 GBytes/s or 1.2 GBytes/s. The 1.4G/s rate is what I expect from the link, but I don't know why in some cases I get 1.2G/s. What could cause this particular quantized degradation in performance? So far, these are the datapoints I have (all systems are Mellanox DDR, 8x PCI-E): There are 7 machines: ZeonA, ZeonB: dual-Core2 zeon systems running Suse 10.1, OFED 1.0 OptiA, OptiB, OptiC: older dual-cpu/dual-core Opteron systems running Gentoo with specialized 2.6.15 kernel, openib-1.0 userland. OptiX, OptiZ: brand new dual-cpu/dual-core Opteron systems, running Gentoo with 2.6.17-gentoo kernel, openib-1.1 userland. Between ZeonA, ZeonB: 1.4 G/s Between OptiA, OptiB, and OptiC: 1.2 G/s Between OptiB and OptiC after kernel upgrades to 2.6.17-gentoo: 1.4 G/s Between OptiA (2.6.15) and OptiB (2.6.17-gentoo): 1.4 G/s Between OptiX, OptiZ: 1.2 G/s Between OptiX, ZeonA: 1.2 G/s I'm bummed to be seeing 1.2 G/s on the newer systems with the 2.6.17 kernel. What might be the explanation? -bryan From rdreier at cisco.com Thu Sep 21 13:44:10 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Sep 2006 13:44:10 -0700 Subject: [openib-general] [PATCH] IB/ipoib: unlikely in send In-Reply-To: <1158845966.24776.122.camel@localhost> (Eli cohen's message of "Thu, 21 Sep 2006 16:39:26 +0300") References: <1158845966.24776.122.camel@localhost> Message-ID: Thanks, applied to for-2.6.19 From rdreier at cisco.com Thu Sep 21 13:47:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Sep 2006 13:47:09 -0700 Subject: [openib-general] [PATCH] Typo in ib_set_client_data() In-Reply-To: <20060919070214.5476.99212.sendpatchset@localhost.localdomain> (Krishna Kumar's message of "Tue, 19 Sep 2006 12:32:14 +0530") References: <20060919070214.5476.99212.sendpatchset@localhost.localdomain> Message-ID: Thanks, applied by hand to for-2.6.19, although you didn't make a patch that applies with '-p1'. I merge > 100 patches every kernel release. If I have to spend an extra 5 minutes for each one fixing a patch or pulling it out of svn, then I end up burning an extra 9 hours of stupid work. If 20+ people who contribute patches sent me clean patches, then everyone will be happier because I'll be able to merge things quicker and focus on productive work. From eli at dev.mellanox.co.il Thu Sep 21 15:33:17 2006 From: eli at dev.mellanox.co.il (eli at dev.mellanox.co.il) Date: Fri, 22 Sep 2006 01:33:17 +0300 (IDT) Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: <1158850657.24776.158.camel@localhost> Message-ID: <62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il> > > However I don't think we want to use peek CQ always -- I think that > extra CQ lock/unlock may kill a lot of the performance gain you see > with NAPI (and I don't think even mthca can do a lockless CQ peek, > since we need to protect against races with resize CQ, etc). So > probably what we need is a feature bit in the struct ib_device to say > whether the peek CQ is needed or whether req notify will generate > events for existing CQEs. > Sounds good to me > You might want to respin your patch against my for-2.6.19 branch -- I > already split up the handle WC routine into separate send and receive > functions, so the patch will become much smaller. > Sure. I can send an updated patch > Also, the handling of how many completions to poll and the logic of > when to call netif_rx_complete() seems very strange to me. First, you > totally ignore the budget parameter, Not totally. I update it whenever I decide it is a "not_done" which happens in two cases: a. I finish my quota b. I handled any completions - rx or tx. I do count tx as well since I want to coalesce as many completions as possible. I do not update quota and budget when I exit polling mode because I think there is no point in doing that (this is unlike the example in NAPI-howto.txt but I will check again if this is the right thing to do). > > My poll routine ended up looking like the following, which I think is > more correct: > > +int ipoib_poll(struct net_device *dev, int *budget) > +{ > + struct ipoib_dev_priv *priv = netdev_priv(dev); > + int max = min(*budget, dev->quota); > + int done = 0; > + int t; > + int empty = 0; > + int n, i; > + > + while (max) { > + t = min(IPOIB_NUM_WC, max); > + n = ib_poll_cq(priv->cq, t, priv->ibwc); > + > + for (i = 0; i < n; ++i) { > + if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) { > + ++done; > + --max; > + ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); > + } else > + ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); > + } > + > + if (n != t) { > + empty = 1; > + break; I don't think this is the right thing to do. Polling less completions then you could is no reason to quit polling mode. You may receive more completions in the next time you got called. > + } > + } > + > + dev->quota -= done; > + *budget -= done; > + > + if (empty) { > + netif_rx_complete(dev); > + ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP); > + /* XXX rotting packet! */ > + return 0; > + } > + > + return 1; > +} > From rdreier at cisco.com Thu Sep 21 16:00:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Sep 2006 16:00:32 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il> (eli@dev.mellanox.co.il's message of "Fri, 22 Sep 2006 01:33:17 +0300 (IDT)") References: <1158850657.24776.158.camel@localhost> <62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il> Message-ID: > > So probably what we need is a feature bit in the struct ib_device > > to say whether the peek CQ is needed or whether req notify will > > generate events for existing CQEs. > Sounds good to me The biggest problem I have with this is that I don't know what to call the feature bit. Any suggestions? > > Also, the handling of how many completions to poll and the logic of > > when to call netif_rx_complete() seems very strange to me. First, you > > totally ignore the budget parameter, > Not totally. I update it whenever I decide it is a "not_done" which > happens in two cases: > a. I finish my quota > b. I handled any completions - rx or tx. I do count tx as well since I > want to coalesce as many completions as possible. Right, but as far as I can see you don't handle the case where *budget is less than dev->quota -- you only update *budget, you never look at the original value passed through it. > > + if (n != t) { > > + empty = 1; > > + break; > I don't think this is the right thing to do. Polling less completions then > you could is no reason to quit polling mode. You may receive more > completions in the next time you got called. I misread your code slightly the first time through, so I don't think it's actually wrong now. But I am pretty confident that my code is "more correct": if we ask for n CQEs and only poll t < n of them, then we know we have drained the CQ without exhausting our quota of work. So we should switch back to event-driven mode at that point. I don't think a correct NAPI implementation would drain the CQ and then schedule another poll on an empty CQ. - R. From sashak at voltaire.com Thu Sep 21 18:27:05 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 22 Sep 2006 04:27:05 +0300 Subject: [openib-general] OFED-1.1-rc6 fails to build ibutils on PPC64 Message-ID: <20060922012705.GN11259@sashak.voltaire.com> Hi, Recently I've played with PPC64/SLES10 machine and found that it fails to build ibutils package. The build log says: gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/usr/include -I/var/tmp/OFED/usr/local/ofed/include/infiniband -I/var/tmp/OFED/usr/local/ofed/include -DOSM_VENDOR_INTF_OPENIB -DOSM_BUILD_OPENIB -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -O2 -Wall -fno-strict-aliasing -fPIC -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -MT ibis_wrap.lo -MD -MP -MF .deps/ibis_wrap.Tpo -c ibis_wrap.c -o ibis_wrap.o >/dev/null 2>&1 /bin/sh ../libtool --tag=CC --mode=link gcc -I/usr/include -I/var/tmp/OFED/usr/local/ofed/include/infiniband -I/var/tmp/OFED/usr/local/ofed/include -DOSM_VENDOR_INTF_OPENIB -DOSM_BUILD_OPENIB -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -O2 -Wall -fno-strict-aliasing -fPIC -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -o libibis.la -rpath /usr/local/ofed/lib64 -version-info "1:0:0" -no-undefined -Wl,-rpath -Wl,/var/tmp/OFED/usr/local/ofed/lib64 -L/var/tmp/OFED/usr/local/ofed/lib64 -lopensm -losmvendor -losmcomp -libumad -libcommon -L/usr/lib64 -ltcl8.4 -ldl -lm ibis_wrap.lo ibbbm.lo ibcr.lo ibis.lo ibis_gsi_mad_ctrl.lo ibpm.lo ibsac.lo ibsm.lo ibvs.lo libtool: link: warning: `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' seems to be moved libtool: link: warning: `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' seems to be moved libtool: link: warning: `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' seems to be moved libtool: link: warning: `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' seems to be moved libtool: link: warning: library `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' was moved. gcc -shared .libs/ibis_wrap.o .libs/ibbbm.o .libs/ibcr.o .libs/ibis.o .libs/ibis_gsi_mad_ctrl.o .libs/ibpm.o .libs/ibsac.o .libs/ibsm.o .libs/ibvs.o -Wl,--rpath -Wl,/var/tmp/OFED/usr/local/ofed/lib64 -Wl,--rpath -Wl,/var/tmp/OFED/usr/local/ofed/lib64 -L/var/tmp/OFED/usr/local/ofed/lib64 /var/tmp/OFED/usr/local/ofed/lib64/libopensm.so -L/var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/management/libibcommon -L/usr/lib64 -L/var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/management/libibumad /var/tmp/OFED/usr/local/ofed/lib64/libosmvendor.so /var/tmp/OFED/usr/local/ofed/lib64/libosmcomp.so /var/tmp/OFED/usr/local/ofed/lib64/libibumad.so /var/tmp/OFED/usr/local/ofed/lib64/libibcommon.so -ltcl8.4 -ldl -lm -Wl,-rpath -Wl,/var/tmp/OFED/usr/local/ofed/lib64 -Wl,-soname -Wl,libibis.so.1 -o .libs/libibis.so.1.0.0 /var/tmp/OFED/usr/local/ofed/lib64/libopensm.so: could not read symbols: File in wrong format collect2: ld returned 1 exit status make[3]: *** [libibis.la] Error 1 make[3]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ibutils-1.0/ibis/src' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ibutils-1.0/ibis' make[1]: *** [all] Error 2 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ibutils-1.0/ibis' make: *** [all-recursive] Error 1 error: Bad exit status from /var/tmp/rpm-tmp.16324 (%install) Seems that ibis uses gcc without -m64 flag and then tries to link with "pure" 64 library: $ file /var/tmp/OFED/usr/local/ofed/lib64/libopensm.so.1.1.0 /var/tmp/OFED/usr/local/ofed/lib64/libopensm.so.1.1.0: ELF 64-bit MSB shared object, cisco 7500, version 1 (SYSV), not stripped , and $ file ibis_wrap.o ibis_wrap.o: ELF 32-bit MSB relocatable, PowerPC or cisco 4500, version 1 (SYSV), not stripped Other less critical issue is warnings like: warning: user vlad does not exist - using root warning: group mtl does not exist - using root warning: user vlad does not exist - using root warning: group mtl does not exist - using root in the build log (in a different places). I don't think this is PPC64 related. I can add this report as update to [Bug 241] (PPC64/SLES10 OFED build) if needed. Sasha From mvharish at gmail.com Thu Sep 21 19:31:02 2006 From: mvharish at gmail.com (harish) Date: Thu, 21 Sep 2006 19:31:02 -0700 Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: <61399.85.250.167.59.1158860279.squirrel@dev.mellanox.co.il> References: <1158850592.24776.156.camel@localhost> <61399.85.250.167.59.1158860279.squirrel@dev.mellanox.co.il> Message-ID: Hi Eli, How did the CPU utilizations compare for the NAPI vs. no NAPI case? What are your thoughts on what bottleneck you are hitting? Sorry to bother you ;) thanks harish On 9/21/06, eli at dev.mellanox.co.il wrote: > > > Hi Eli, > > > > Thanks for sharing the results with us. It is great to see the reduction > > in > > Interrupts. Could you please specify the netperf test specifications > > [message size; socket size]. Wondering what the numbers would be if we > use > > large socket and message sizes [128K & 64K respectively]. The reason for > > the > > request is to make sure we are not hitting any TCP related bottleneck > > while > > comparing NAPI vs. no NAPI cases. Please let me know what you think. > > I used large socket buffer sizes. Here is the command line I used. The > reult for the bandwidth is the some of all the connections. > > netperf -H 11.4.3.144 -l 600 -f M -p $port -- -s 200000,200000 -S > 200000,200000 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Thu Sep 21 22:45:44 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 21 Sep 2006 22:45:44 -0700 Subject: [openib-general] all of the man pages should change the package name to OFED Message-ID: OpenFabrics maybe, but not OFED in my opinion. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Dotan Barak > Sent: Thursday, September 21, 2006 8:45 AM > To: Roland Dreier (rdreier); Hal Rosenstock > Cc: openib > Subject: [openib-general] all of the man pages should change > the package name to OFED > > Hi. > > When i executed "man ibv_devinfo" or "man ibstat" (for example) i > notices that those man pages are marked as part of the OpenIB package. > I believe that the package name should be changed to OFED. > > what do you think? > > thanks > Dotan > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From fuentesylogos at gmail.com Fri Sep 22 00:16:20 2006 From: fuentesylogos at gmail.com (Fuentes y Logos) Date: Fri, 22 Sep 2006 04:16:20 -0300 Subject: [openib-general] 20.000 FUENTES Y LOGOS A 35 PESOS - ENVIO GRATIS Message-ID: <20060922071658.9CEF53B0001@sentry-two.sandia.gov> PAQUETE DEFINITIVO DE FUENTES Y CIENTOS DE LOGOS MAS DE 20.000 FUENTES Y LOGOS DE UTILIDAD PARA DARLE PARA SU NEGOCIO O EMPRESA TOTALMENTE EDITABLES LA MEJOR IMAGEN A TU NEGOCIO O TRABAJO TODAS LAS FUENTES ACTUALIZADA 2006 EXCLUSIVA DE FUENTES Y LOGOS EL CD A SOLO 35.00 PESOS ENVIO GRATIS DENTRO DE CAPITAL FEDERAL EL ENVIO ES GRATIS SOLO DENTRO DE CAP.FED EL ENVIO FUERA DE CAP.FED SALE DE 15 A 20 PESOS EL ENVIO PARA OTROS PAISES SALE 10 DOLARES NUESTRO SITIO WEB: WWW.DEFUENTES.COM CONTACTO A: contacto at defuentes.com ----------------------------------------------------------------------------------------------------------------------------------------- SI DESEA SALIR DE ESTA LISTA RESPONDA EL MENSAJE CON EL SUJETO(SUBJET) REMOVER From delaitt at cpc.wmin.ac.uk Fri Sep 22 04:38:18 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Fri, 22 Sep 2006 12:38:18 +0100 (BST) Subject: [openib-general] openib on SLES10 and ksym errors Message-ID: hi, I've compiled OFED-1.0-plus-Open-MPI-1.1 on SLES10/32 bits. Linux n31 2.6.16.21-0.8-smp #3 SMP Thu Sep 21 17:18:27 BST 2006 i686 i686 However, when i install the rpms, i get the following ksym errors. I also tried with ofed-1.1 and get the same. I'm using an sles10 kernel with lustre patches. maybe there is a mismatch between the kernel i have and recompiled ofed rpms ? any help would be grately appreciated! Thanks, Thierry. ksym(fd_install) = d291f2c9 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(dma_free_coherent) = d4c86700 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(contig_page_data) = d748c5fa is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(dev_base) = db8cb539 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(dev_queue_xmit) = dbcc4c81 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(skb_under_panic) = dc41edc4 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(arp_tbl) = e0239e26 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(seq_lseek) = e040abe0 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(get_sb_pseudo) = e5a89af7 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(register_netdev) = e5c7634f is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(__alloc_skb) = e89833c0 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(__free_pages) = e981956f is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(fget) = ea0f36ab is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(__pci_register_driver) = eabba033 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(alloc_netdev) = ec64cd34 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(kill_fasync) = ec855dd5 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(mem_map) = ee2ba07 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(sysfs_create_group) = f5711d54 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(fasync_helper) = f76f81a9 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(unregister_netdev) = f7ad65b5 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ksym(pci_find_capability) = f906e7af is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586 ---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender. From hendole at yahoo.co.uk Thu Sep 21 03:10:43 2006 From: hendole at yahoo.co.uk (Henry) Date: Thu, 21 Sep 2006 10:10:43 -0000 Subject: [openib-general] Hello Message-ID: <0b4072b12621d1bfa0$2406-6ed9cdf4@POSTE4> Dear Sir, We are an investment company that specializes in co-ordinating highyielding investments for our clients who want to make confidential investments in any part of the world. We have clients who are interested in making investments worth millions of US dollars but specifically want to invest confidentially by appointing a third party to manage the investments on their behalf. We are contacting you to know your ability and willingness to make and manage such investments with little supervision. Please confirm to us that you will work with us to achieve the investment objectives of our clients.The fund is available in a secure bank account and shall be released upon favourable agreement with you. Please note that we want to make the investment as confidential as ever. If we do not hear from you in 7 days, we shall consider it that you are not favourably disposed to accept our offer. You can contact me on email;hendole at yahoo.co.uk I await your urgent response. Faithfully, Hnry Ndoye For:Global Investments Ltd From halr at voltaire.com Fri Sep 22 06:36:36 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2006 09:36:36 -0400 Subject: [openib-general] [PATCH 3/13] osm: port to WinIB stack : include/iba/ib_types.h In-Reply-To: <86y7silc1w.fsf@mtl066.yok.mtl.com> References: <86y7silc1w.fsf@mtl066.yok.mtl.com> Message-ID: <1158932195.4353.20494.camel@hal.voltaire.com> On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote: > Hi Hal > > Most are just adding OSM_API for fucntion declarations. > Some minor indentations. > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied (with some cosmetic changes) to trunk only. -- Hal From delaitt at cpc.wmin.ac.uk Fri Sep 22 07:00:21 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Fri, 22 Sep 2006 15:00:21 +0100 (BST) Subject: [openib-general] OFED for SLES10 Message-ID: Hi, Could someone point me in the right direction on how to compile ofed for sles10 ? Do i need to recompile modules for the kernel ? Do i need to patch the kernel before compiling the module ? Do i need to delete the existing infiniband directory in the kernel and replace it with ofed ? Does the ofed distri include the kernel modules ? Thanks, Thierry. ---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender. From thomas.bub at thomson.net Fri Sep 22 07:41:04 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Fri, 22 Sep 2006 16:41:04 +0200 Subject: [openib-general] OFED for SLES10 Message-ID: Thierry, the support for SLES10 has been introduced after the OFED-1.0 release had been done. Best thing is you download the OFED-1.1 Release Candidate 6 from: https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-rc6.tgz You should have the kernel sources of the SLES10 kernel installed. Unpack the tgz and run the install script found in the directory OFED-1.1-rc6. It does everything you need to get the drivers build and installed. That should be all whats needed. Thomas Bub ............................................................ Thomas Bub Grass Valley Germany GmbH Brunnenweg 9 64331 Weiterstadt, Germany Tel: +49 6150 104 147 Fax: +49 6150 104 656 Email: Thomas.Bub at thomson.net www.GrassValley.com ............................................................ > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Thierry Delaitre > Sent: Friday, September 22, 2006 4:00 PM > To: openib-general at openib.org > Subject: [openib-general] OFED for SLES10 > > > Hi, > > Could someone point me in the right direction on how to compile ofed for > sles10 ? > > Do i need to recompile modules for the kernel ? > Do i need to patch the kernel before compiling the module ? > Do i need to delete the existing infiniband directory in the kernel and > replace it with ofed ? > Does the ofed distri include the kernel modules ? > > Thanks, > > Thierry. > > ---------------------------------------- > Dr Thierry DELAITRE > Systems and Services Manager, CSCS > University of Westminster > 115 New Cavendish Street, London W1W 6UW > > Tel: 020 7911 5000 ext: 3586 > Fax: 020 7911 5089 > Mobile short dial code 1788 > > http://www.cscs.wmin.ac.uk/~delaitt > ---------------------------------------- > > This e-mail and its attachments are intended for the above named only > and may be confidential. If they have come to you in error you must > not copy or show them to anyone, nor should you take any action based > on them, other than to notify the error by replying to the sender. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From hnguyen at de.ibm.com Fri Sep 22 08:20:23 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Fri, 22 Sep 2006 17:20:23 +0200 Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface based on Anton Blanchard's new hvcall interface Message-ID: <200609221720.24191.hnguyen@de.ibm.com> Hello Roland! Below is the patch of ehca according to Anton's new hvcall interface, which has been committed in Paul's git tree: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git Besides the changes above this patch contains some coding style updates. I created this patch against your git tree, branch for-2.6.19. Thanks! Hoang-Nam Nguyen Signed-off-by: Hoang-Nam Nguyen --- ehca_main.c | 9 hcp_if.c | 845 ++++++++++++++++++++---------------------------------------- hcp_if.h | 2 hipz_hw.h | 2 ipz_pt_fn.h | 7 5 files changed, 300 insertions(+), 565 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 159b0be..0a0248f 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -5,6 +5,7 @@ * * Authors: Heiko J Schick * Hoang-Nam Nguyen + * Joachim Fenkes * * Copyright (c) 2005 IBM Corporation * @@ -48,7 +49,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0015"); +MODULE_VERSION("SVNEHCA_0016"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -268,7 +269,7 @@ int ehca_register_device(struct ehca_shc (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST) | (1ull << IB_USER_VERBS_CMD_DETACH_MCAST); - shca->ib_device.node_type = RDMA_NODE_IB_CA; + shca->ib_device.node_type = IB_NODE_CA; shca->ib_device.phys_port_cnt = shca->num_ports; shca->ib_device.dma_device = &shca->ibmebus_dev->ofdev.dev; shca->ib_device.query_device = ehca_query_device; @@ -446,7 +447,7 @@ static ssize_t ehca_show_##name(struct kfree(rblock); \ return 0; \ } \ - \ + \ data = rblock->name; \ kfree(rblock); \ \ @@ -749,7 +750,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0015)\n"); + "(Rel.: SVNEHCA_0016)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c index 260e82a..3fb46e6 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.c +++ b/drivers/infiniband/hw/ehca/hcp_if.c @@ -48,27 +48,27 @@ #include "hcp_phyp.h" #include "hipz_fns.h" #include "ipz_pt_fn.h" -#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9,11) -#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12,12) -#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13,15) -#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18,18) -#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19,21) -#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22,23) -#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31,31) -#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56,63) - -#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0,15) -#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16,31) -#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32,39) -#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40,47) - -#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16,31) -#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48,63) -#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8,15) -#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24,31) - -#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0,31) -#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32,63) +#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9, 11) +#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12, 12) +#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13, 15) +#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18, 18) +#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19, 21) +#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22, 23) +#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31, 31) +#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56, 63) + +#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0, 15) +#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16, 31) +#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32, 39) +#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40, 47) + +#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16, 31) +#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48, 63) +#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8, 15) +#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24, 31) + +#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0, 31) +#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32, 63) /* direct access qp controls */ #define DAQP_CTRL_ENABLE 0x01 @@ -95,35 +95,25 @@ static u32 get_longbusy_msecs(int longbu } } -static long ehca_hcall_7arg_7ret(unsigned long opcode, - unsigned long arg1, - unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5, - unsigned long arg6, - unsigned long arg7, - unsigned long *out1, - unsigned long *out2, - unsigned long *out3, - unsigned long *out4, - unsigned long *out5, - unsigned long *out6, - unsigned long *out7) +static long ehca_plpar_hcall_norets(unsigned long opcode, + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7) { long ret; int i, sleep_msecs; - ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx " - "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5, - arg6, arg7); + ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx " + "arg5=%lx arg6=%lx arg7=%lx", + opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7); for (i = 0; i < 5; i++) { - ret = plpar_hcall_7arg_7ret(opcode, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - out1, out2, out3, out4, - out5, out6,out7); + ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4, + arg5, arg6, arg7); if (H_IS_LONG_BUSY(ret)) { sleep_msecs = get_longbusy_msecs(ret); @@ -134,44 +124,30 @@ static long ehca_hcall_7arg_7ret(unsigne if (ret < H_SUCCESS) ehca_gen_err("opcode=%lx ret=%lx" " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" - " arg5=%lx arg6=%lx arg7=%lx" - " out1=%lx out2=%lx out3=%lx out4=%lx" - " out5=%lx out6=%lx out7=%lx", + " arg5=%lx arg6=%lx arg7=%lx ", opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7); + arg1, arg2, arg3, arg4, arg5, + arg6, arg7); - ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " - "out4=%lx out5=%lx out6=%lx out7=%lx", - opcode, ret, *out1, *out2, *out3, *out4, *out5, - *out6, *out7); + ehca_gen_dbg("opcode=%lx ret=%lx", opcode, ret); return ret; + } return H_BUSY; } -static long ehca_hcall_9arg_9ret(unsigned long opcode, - unsigned long arg1, - unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5, - unsigned long arg6, - unsigned long arg7, - unsigned long arg8, - unsigned long arg9, - unsigned long *out1, - unsigned long *out2, - unsigned long *out3, - unsigned long *out4, - unsigned long *out5, - unsigned long *out6, - unsigned long *out7, - unsigned long *out8, - unsigned long *out9) +static long ehca_plpar_hcall9(unsigned long opcode, + unsigned long *outs, /* array of 9 outputs */ + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7, + unsigned long arg8, + unsigned long arg9) { long ret; int i, sleep_msecs; @@ -182,13 +158,9 @@ static long ehca_hcall_9arg_9ret(unsigne arg8, arg9); for (i = 0; i < 5; i++) { - ret = plpar_hcall_9arg_9ret(opcode, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - out1, out2, out3, out4, - out5, out6, out7, out8, - out9); + ret = plpar_hcall9(opcode, outs, + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9); if (H_IS_LONG_BUSY(ret)) { sleep_msecs = get_longbusy_msecs(ret); @@ -205,37 +177,35 @@ static long ehca_hcall_9arg_9ret(unsigne " out5=%lx out6=%lx out7=%lx out8=%lx" " out9=%lx", opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, - *out9); + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9, + outs[0], outs[1], outs[2], outs[3], + outs[4], outs[5], outs[6], outs[7], + outs[8]); ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx " - "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, *out9); + "out9=%lx", + opcode, ret, outs[0], outs[1], outs[2], outs[3], + outs[4], outs[5], outs[6], outs[7], outs[8]); return ret; } return H_BUSY; } - u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle, struct ehca_pfeq *pfeq, const u32 neq_control, const u32 number_of_entries, struct ipz_eq_handle *eq_handle, - u32 * act_nr_of_entries, - u32 * act_pages, - u32 * eq_ist) + u32 *act_nr_of_entries, + u32 *act_pages, + u32 *eq_ist) { u64 ret; - u64 dummy; + u64 outs[PLPAR_HCALL9_BUFSIZE]; u64 allocate_controls; - u64 act_nr_of_entries_out, act_pages_out, eq_ist_out; /* resource type */ allocate_controls = 3ULL; @@ -246,22 +216,15 @@ u64 hipz_h_alloc_resource_eq(const struc else /* notification event queue */ allocate_controls = (1ULL << 63) | allocate_controls; - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - allocate_controls, /* r5 */ - number_of_entries, /* r6 */ - 0, 0, 0, 0, - &eq_handle->handle, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &act_nr_of_entries_out, /* r7 */ - &act_pages_out, /* r8 */ - &eq_ist_out, /* r8 */ - &dummy); - - *act_nr_of_entries = (u32)act_nr_of_entries_out; - *act_pages = (u32)act_pages_out; - *eq_ist = (u32)eq_ist_out; + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + number_of_entries, /* r6 */ + 0, 0, 0, 0, 0, 0); + eq_handle->handle = outs[0]; + *act_nr_of_entries = (u32)outs[3]; + *act_pages = (u32)outs[4]; + *eq_ist = (u32)outs[5]; if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resource - ret=%lx ", ret); @@ -273,20 +236,11 @@ u64 hipz_h_reset_event(const struct ipz_ struct ipz_eq_handle eq_handle, const u64 event_mask) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_RESET_EVENTS, - adapter_handle.handle, /* r4 */ - eq_handle.handle, /* r5 */ - event_mask, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_RESET_EVENTS, + adapter_handle.handle, /* r4 */ + eq_handle.handle, /* r5 */ + event_mask, /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, @@ -294,30 +248,21 @@ u64 hipz_h_alloc_resource_cq(const struc struct ehca_alloc_cq_parms *param) { u64 ret; - u64 dummy; - u64 act_nr_of_entries_out, act_pages_out; - u64 g_la_privileged_out, g_la_user_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 2, /* r5 */ - param->eq_handle.handle, /* r6 */ - cq->token, /* r7 */ - param->nr_cqe, /* r8 */ - 0, 0, - &cq->ipz_cq_handle.handle, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &act_nr_of_entries_out, /* r7 */ - &act_pages_out, /* r8 */ - &g_la_privileged_out, /* r9 */ - &g_la_user_out); /* r10 */ - - param->act_nr_of_entries = (u32)act_nr_of_entries_out; - param->act_pages = (u32)act_pages_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 2, /* r5 */ + param->eq_handle.handle, /* r6 */ + cq->token, /* r7 */ + param->nr_cqe, /* r8 */ + 0, 0, 0, 0); + cq->ipz_cq_handle.handle = outs[0]; + param->act_nr_of_entries = (u32)outs[3]; + param->act_pages = (u32)outs[4]; if (ret == H_SUCCESS) - hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out); + hcp_galpas_ctor(&cq->galpas, outs[5], outs[6]); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resources. ret=%lx", ret); @@ -330,8 +275,9 @@ u64 hipz_h_alloc_resource_qp(const struc struct ehca_alloc_qp_parms *parms) { u64 ret; - u64 dummy, allocate_controls, max_r10_reg; - u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out; + u64 allocate_controls; + u64 max_r10_reg; + u64 outs[PLPAR_HCALL9_BUFSIZE]; u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1; u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1; int daqp_ctrl = parms->daqp_ctrl; @@ -360,48 +306,36 @@ u64 hipz_h_alloc_resource_qp(const struc | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE, parms->max_recv_sge); - - ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - allocate_controls, /* r5 */ - qp->send_cq->ipz_cq_handle.handle, - qp->recv_cq->ipz_cq_handle.handle, - parms->ipz_eq_handle.handle, - ((u64)qp->token << 32) | parms->pd.value, - max_r10_reg, /* r10 */ - parms->ud_av_l_key_ctl, /* r11 */ - 0, - &qp->ipz_qp_handle.handle, - &qp_nr_out, /* r5 */ - &r6_out, /* r6 */ - &r7_out, /* r7 */ - &r8_out, /* r8 */ - &dummy, /* r9 */ - &g_la_user_out, /* r10 */ - &r11_out, - &dummy); - - /* extract outputs */ - qp->real_qp_num = (u32)qp_nr_out; - + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + qp->send_cq->ipz_cq_handle.handle, + qp->recv_cq->ipz_cq_handle.handle, + parms->ipz_eq_handle.handle, + ((u64)qp->token << 32) | parms->pd.value, + max_r10_reg, /* r10 */ + parms->ud_av_l_key_ctl, /* r11 */ + 0); + qp->ipz_qp_handle.handle = outs[0]; + qp->real_qp_num = (u32)outs[1]; parms->act_nr_send_sges = - (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out); + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, outs[2]); parms->act_nr_recv_wqes = - (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out); + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, outs[2]); parms->act_nr_send_sges = - (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out); + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, outs[3]); parms->act_nr_recv_sges = - (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out); + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, outs[3]); parms->nr_sq_pages = - (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out); + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, outs[4]); parms->nr_rq_pages = - (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out); + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, outs[4]); if (ret == H_SUCCESS) - hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out); + hcp_galpas_ctor(&qp->galpas, outs[6], outs[6]); if (ret == H_NOT_ENOUGH_RESOURCES) - ehca_gen_err("Not enough resources. ret=%lx",ret); + ehca_gen_err("Not enough resources. ret=%lx", ret); return ret; } @@ -411,7 +345,6 @@ u64 hipz_h_query_port(const struct ipz_a struct hipz_query_port *query_port_response_block) { u64 ret; - u64 dummy; u64 r_cb = virt_to_abs(query_port_response_block); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -419,18 +352,11 @@ u64 hipz_h_query_port(const struct ipz_a return H_PARAMETER; } - ret = ehca_hcall_7arg_7ret(H_QUERY_PORT, - adapter_handle.handle, /* r4 */ - port_id, /* r5 */ - r_cb, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_QUERY_PORT, + adapter_handle.handle, /* r4 */ + port_id, /* r5 */ + r_cb, /* r6 */ + 0, 0, 0, 0); if (ehca_debug_level) ehca_dmp(query_port_response_block, 64, "response_block"); @@ -441,7 +367,6 @@ u64 hipz_h_query_port(const struct ipz_a u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle, struct hipz_query_hca *query_hca_rblock) { - u64 dummy; u64 r_cb = virt_to_abs(query_hca_rblock); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -450,17 +375,10 @@ u64 hipz_h_query_hca(const struct ipz_ad return H_PARAMETER; } - return ehca_hcall_7arg_7ret(H_QUERY_HCA, - adapter_handle.handle, /* r4 */ - r_cb, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_QUERY_HCA, + adapter_handle.handle, /* r4 */ + r_cb, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle, @@ -470,22 +388,13 @@ u64 hipz_h_register_rpage(const struct i const u64 logical_address_of_page, u64 count) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES, - adapter_handle.handle, /* r4 */ - queue_type | pagesize << 8, /* r5 */ - resource_handle, /* r6 */ - logical_address_of_page, /* r7 */ - count, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_REGISTER_RPAGES, + adapter_handle.handle, /* r4 */ + queue_type | pagesize << 8, /* r5 */ + resource_handle, /* r6 */ + logical_address_of_page, /* r7 */ + count, /* r8 */ + 0, 0); } u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle, @@ -507,23 +416,14 @@ u64 hipz_h_register_rpage_eq(const struc logical_address_of_page, count); } -u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, +u64 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, u32 ist) { - u32 ret; - u64 dummy; - - ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE, - adapter_handle.handle, /* r4 */ - ist, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + u64 ret; + ret = ehca_plpar_hcall_norets(H_QUERY_INT_STATE, + adapter_handle.handle, /* r4 */ + ist, /* r5 */ + 0, 0, 0, 0, 0); if (ret != H_SUCCESS && ret != H_BUSY) ehca_gen_err("Could not query interrupt state."); @@ -576,25 +476,20 @@ u64 hipz_h_disable_and_get_wqe(const str void **log_addr_next_rq_wqe2processed, int dis_and_get_function_code) { - u64 dummy, dummy1, dummy2; - - if (!log_addr_next_sq_wqe2processed) - log_addr_next_sq_wqe2processed = (void**)&dummy1; - if (!log_addr_next_rq_wqe2processed) - log_addr_next_rq_wqe2processed = (void**)&dummy2; - - return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - dis_and_get_function_code, /* r5 */ - qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - (void*)log_addr_next_sq_wqe2processed, - (void*)log_addr_next_rq_wqe2processed, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + u64 ret; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, + adapter_handle.handle, /* r4 */ + dis_and_get_function_code, /* r5 */ + qp_handle.handle, /* r6 */ + 0, 0, 0, 0, 0, 0); + if (log_addr_next_sq_wqe2processed) + *log_addr_next_sq_wqe2processed = (void*)outs[0]; + if (log_addr_next_rq_wqe2processed) + *log_addr_next_rq_wqe2processed = (void*)outs[1]; + + return ret; } u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle, @@ -605,22 +500,13 @@ u64 hipz_h_modify_qp(const struct ipz_ad struct h_galpa gal) { u64 ret; - u64 dummy; - u64 invalid_attribute_identifier, rc_attrib_mask; - - ret = ehca_hcall_7arg_7ret(H_MODIFY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - update_mask, /* r6 */ - virt_to_abs(mqpcb), /* r7 */ - 0, 0, 0, - &invalid_attribute_identifier, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &dummy, /* r7 */ - &dummy, /* r8 */ - &rc_attrib_mask, /* r9 */ - &dummy); + u64 outs[PLPAR_HCALL9_BUFSIZE]; + ret = ehca_plpar_hcall9(H_MODIFY_QP, outs, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + update_mask, /* r6 */ + virt_to_abs(mqpcb), /* r7 */ + 0, 0, 0, 0, 0); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Insufficient resources ret=%lx", ret); @@ -634,61 +520,37 @@ u64 hipz_h_query_qp(const struct ipz_ada struct hcp_modify_qp_control_block *qqpcb, struct h_galpa gal) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_QUERY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - virt_to_abs(qqpcb), /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_QUERY_QP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + virt_to_abs(qqpcb), /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle, struct ehca_qp *qp) { u64 ret; - u64 dummy; - u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; ret = hcp_galpas_dtor(&qp->galpas); if (ret) { ehca_gen_err("Could not destruct qp->galpas"); return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - /* function code */ - 1, /* r5 */ - qp->ipz_qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - &ladr_next_sq_wqe_out, /* r4 */ - &ladr_next_rq_wqe_out, /* r5 */ - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, + adapter_handle.handle, /* r4 */ + /* function code */ + 1, /* r5 */ + qp->ipz_qp_handle.handle, /* r6 */ + 0, 0, 0, 0, 0, 0); if (ret == H_HARDWARE) ehca_gen_err("HCA not operational. ret=%lx", ret); - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - qp->ipz_qp_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + qp->ipz_qp_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("Resource still in use. ret=%lx", ret); @@ -701,20 +563,11 @@ u64 hipz_h_define_aqp0(const struct ipz_ struct h_galpa gal, u32 port) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_DEFINE_AQP0, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_DEFINE_AQP0, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle, @@ -724,24 +577,15 @@ u64 hipz_h_define_aqp1(const struct ipz_ u32 * bma_qp_nr) { u64 ret; - u64 dummy; - u64 pma_qp_nr_out, bma_qp_nr_out; - - ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &pma_qp_nr_out, /* r4 */ - &bma_qp_nr_out, /* r5 */ - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - *pma_qp_nr = (u32)pma_qp_nr_out; - *bma_qp_nr = (u32)bma_qp_nr_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0, 0, 0); + *pma_qp_nr = (u32)outs[0]; + *bma_qp_nr = (u32)outs[1]; if (ret == H_ALIAS_EXIST) ehca_gen_err("AQP1 already exists. ret=%lx", ret); @@ -756,22 +600,14 @@ u64 hipz_h_attach_mcqp(const struct ipz_ u64 subnet_prefix, u64 interface_id) { u64 ret; - u64 dummy; - - ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + + ret = ehca_plpar_hcall_norets(H_ATTACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resources. ret=%lx", ret); @@ -785,22 +621,13 @@ u64 hipz_h_detach_mcqp(const struct ipz_ u16 mcg_dlid, u64 subnet_prefix, u64 interface_id) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_DETACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_DETACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0); } u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle, @@ -808,7 +635,6 @@ u64 hipz_h_destroy_cq(const struct ipz_a u8 force_flag) { u64 ret; - u64 dummy; ret = hcp_galpas_dtor(&cq->galpas); if (ret) { @@ -816,18 +642,11 @@ u64 hipz_h_destroy_cq(const struct ipz_a return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - cq->ipz_cq_handle.handle, /* r5 */ - force_flag != 0 ? 1L : 0L, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + cq->ipz_cq_handle.handle, /* r5 */ + force_flag != 0 ? 1L : 0L, /* r6 */ + 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret); @@ -839,7 +658,6 @@ u64 hipz_h_destroy_eq(const struct ipz_a struct ehca_eq *eq) { u64 ret; - u64 dummy; ret = hcp_galpas_dtor(&eq->galpas); if (ret) { @@ -847,18 +665,10 @@ u64 hipz_h_destroy_eq(const struct ipz_a return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - eq->ipz_eq_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + eq->ipz_eq_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("Resource in use. ret=%lx ", ret); @@ -875,27 +685,19 @@ u64 hipz_h_alloc_resource_mr(const struc struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out; - u64 rkey_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 5, /* r5 */ - vaddr, /* r6 */ - length, /* r7 */ - (((u64)access_ctrl) << 32ULL), /* r8 */ - pd.value, /* r9 */ - 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 5, /* r5 */ + vaddr, /* r6 */ + length, /* r7 */ + (((u64)access_ctrl) << 32ULL), /* r8 */ + pd.value, /* r9 */ + 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -923,7 +725,6 @@ u64 hipz_h_register_rpage_mr(const struc queue_type, mr->ipz_mr_handle.handle, logical_address_of_page, count); - return ret; } @@ -932,24 +733,17 @@ u64 hipz_h_query_mr(const struct ipz_ada struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out; - - ret = ehca_hcall_7arg_7ret(H_QUERY_MR, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &outparms->len, /* r4 */ - &outparms->vaddr, /* r5 */ - &remote_len_out, /* r6 */ - &remote_vaddr_out, /* r7 */ - &acc_ctrl_pd_out, /* r8 */ - &r9_out, - &dummy); - - outparms->acl = acc_ctrl_pd_out >> 32; - outparms->lkey = (u32)(r9_out >> 32); - outparms->rkey = (u32)(r9_out & (0xffffffff)); + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_QUERY_MR, outs, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, 0, 0); + outparms->len = outs[0]; + outparms->vaddr = outs[1]; + outparms->acl = outs[4] >> 32; + outparms->lkey = (u32)(outs[5] >> 32); + outparms->rkey = (u32)(outs[5] & (0xffffffff)); return ret; } @@ -957,19 +751,10 @@ u64 hipz_h_query_mr(const struct ipz_ada u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle, const struct ehca_mr *mr) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, @@ -982,28 +767,20 @@ u64 hipz_h_reregister_pmr(const struct i struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - vaddr_in, /* r6 */ - length, /* r7 */ - /* r8 */ - ((((u64)access_ctrl) << 32ULL) | pd.value), - mr_addr_cb, /* r9 */ - 0, - &dummy, /* r4 */ - &outparms->vaddr, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + length, /* r7 */ + /* r8 */ + ((((u64)access_ctrl) << 32ULL) | pd.value), + mr_addr_cb, /* r9 */ + 0, 0, 0); + outparms->vaddr = outs[1]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1017,25 +794,18 @@ u64 hipz_h_register_smr(const struct ipz struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR, - adapter_handle.handle, /* r4 */ - orig_mr->ipz_mr_handle.handle, /* r5 */ - vaddr_in, /* r6 */ - (((u64)access_ctrl) << 32ULL), /* r7 */ - pd.value, /* r8 */ - 0, 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs, + adapter_handle.handle, /* r4 */ + orig_mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + (((u64)access_ctrl) << 32ULL), /* r7 */ + pd.value, /* r8 */ + 0, 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1046,23 +816,15 @@ u64 hipz_h_alloc_resource_mw(const struc struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 rkey_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 6, /* r5 */ - pd.value, /* r6 */ - 0, 0, 0, 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 6, /* r5 */ + pd.value, /* r6 */ + 0, 0, 0, 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1072,21 +834,13 @@ u64 hipz_h_query_mw(const struct ipz_ada struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 pd_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_QUERY_MW, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &rkey_out, /* r7 */ - &pd_out, /* r8 */ - &dummy, - &dummy); - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_QUERY_MW, outs, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, 0, 0); + outparms->rkey = (u32)outs[3]; return ret; } @@ -1094,19 +848,10 @@ u64 hipz_h_query_mw(const struct ipz_ada u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle, const struct ehca_mw *mw) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, @@ -1114,7 +859,6 @@ u64 hipz_h_error_data(const struct ipz_a void *rblock, unsigned long *byte_count) { - u64 dummy; u64 r_cb = virt_to_abs(rblock); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -1122,16 +866,9 @@ u64 hipz_h_error_data(const struct ipz_a return H_PARAMETER; } - return ehca_hcall_7arg_7ret(H_ERROR_DATA, - adapter_handle.handle, - ressource_handle, - r_cb, - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_ERROR_DATA, + adapter_handle.handle, + ressource_handle, + r_cb, + 0, 0, 0, 0); } diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h index 39956d8..587ebd4 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.h +++ b/drivers/infiniband/hw/ehca/hcp_if.h @@ -107,7 +107,7 @@ u64 hipz_h_register_rpage_eq(const struc const u64 logical_address_of_page, const u64 count); -u32 hipz_h_query_int_state(const struct ipz_adapter_handle +u64 hipz_h_query_int_state(const struct ipz_adapter_handle hcp_adapter_handle, u32 ist); diff --git a/drivers/infiniband/hw/ehca/hipz_hw.h b/drivers/infiniband/hw/ehca/hipz_hw.h index f5f4871..3fc92b0 100644 --- a/drivers/infiniband/hw/ehca/hipz_hw.h +++ b/drivers/infiniband/hw/ehca/hipz_hw.h @@ -184,8 +184,6 @@ struct hipz_mrmwmm { }; -#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0) - #define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x) struct hipz_qpedmm { diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h index 7e55a31..2f13509 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h @@ -226,10 +226,9 @@ static inline void *ipz_eqit_eq_get_inc_ { void *ret = ipz_qeit_get(queue); u32 qe = *(u8 *) ret; - if ((qe >> 7) == (queue->toggle_state & 1)) - ipz_qeit_eq_get_inc(queue); /* this is a good one */ - else - ret = NULL; + if ((qe >> 7) != (queue->toggle_state & 1)) + return NULL; + ipz_qeit_eq_get_inc(queue); /* this is a good one */ return ret; } -------------- next part -------------- diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 159b0be..0a0248f 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -5,6 +5,7 @@ * * Authors: Heiko J Schick * Hoang-Nam Nguyen + * Joachim Fenkes * * Copyright (c) 2005 IBM Corporation * @@ -48,7 +49,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0015"); +MODULE_VERSION("SVNEHCA_0016"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -268,7 +269,7 @@ int ehca_register_device(struct ehca_shc (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST) | (1ull << IB_USER_VERBS_CMD_DETACH_MCAST); - shca->ib_device.node_type = RDMA_NODE_IB_CA; + shca->ib_device.node_type = IB_NODE_CA; shca->ib_device.phys_port_cnt = shca->num_ports; shca->ib_device.dma_device = &shca->ibmebus_dev->ofdev.dev; shca->ib_device.query_device = ehca_query_device; @@ -446,7 +447,7 @@ static ssize_t ehca_show_##name(struct kfree(rblock); \ return 0; \ } \ - \ + \ data = rblock->name; \ kfree(rblock); \ \ @@ -749,7 +750,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0015)\n"); + "(Rel.: SVNEHCA_0016)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c index 260e82a..3fb46e6 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.c +++ b/drivers/infiniband/hw/ehca/hcp_if.c @@ -48,27 +48,27 @@ #include "hcp_phyp.h" #include "hipz_fns.h" #include "ipz_pt_fn.h" -#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9,11) -#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12,12) -#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13,15) -#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18,18) -#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19,21) -#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22,23) -#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31,31) -#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56,63) - -#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0,15) -#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16,31) -#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32,39) -#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40,47) - -#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16,31) -#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48,63) -#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8,15) -#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24,31) - -#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0,31) -#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32,63) +#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9, 11) +#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12, 12) +#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13, 15) +#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18, 18) +#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19, 21) +#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22, 23) +#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31, 31) +#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56, 63) + +#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0, 15) +#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16, 31) +#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32, 39) +#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40, 47) + +#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16, 31) +#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48, 63) +#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8, 15) +#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24, 31) + +#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0, 31) +#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32, 63) /* direct access qp controls */ #define DAQP_CTRL_ENABLE 0x01 @@ -95,35 +95,25 @@ static u32 get_longbusy_msecs(int longbu } } -static long ehca_hcall_7arg_7ret(unsigned long opcode, - unsigned long arg1, - unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5, - unsigned long arg6, - unsigned long arg7, - unsigned long *out1, - unsigned long *out2, - unsigned long *out3, - unsigned long *out4, - unsigned long *out5, - unsigned long *out6, - unsigned long *out7) +static long ehca_plpar_hcall_norets(unsigned long opcode, + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7) { long ret; int i, sleep_msecs; - ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx " - "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5, - arg6, arg7); + ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx " + "arg5=%lx arg6=%lx arg7=%lx", + opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7); for (i = 0; i < 5; i++) { - ret = plpar_hcall_7arg_7ret(opcode, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - out1, out2, out3, out4, - out5, out6,out7); + ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4, + arg5, arg6, arg7); if (H_IS_LONG_BUSY(ret)) { sleep_msecs = get_longbusy_msecs(ret); @@ -134,44 +124,30 @@ static long ehca_hcall_7arg_7ret(unsigne if (ret < H_SUCCESS) ehca_gen_err("opcode=%lx ret=%lx" " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" - " arg5=%lx arg6=%lx arg7=%lx" - " out1=%lx out2=%lx out3=%lx out4=%lx" - " out5=%lx out6=%lx out7=%lx", + " arg5=%lx arg6=%lx arg7=%lx ", opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7); + arg1, arg2, arg3, arg4, arg5, + arg6, arg7); - ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " - "out4=%lx out5=%lx out6=%lx out7=%lx", - opcode, ret, *out1, *out2, *out3, *out4, *out5, - *out6, *out7); + ehca_gen_dbg("opcode=%lx ret=%lx", opcode, ret); return ret; + } return H_BUSY; } -static long ehca_hcall_9arg_9ret(unsigned long opcode, - unsigned long arg1, - unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5, - unsigned long arg6, - unsigned long arg7, - unsigned long arg8, - unsigned long arg9, - unsigned long *out1, - unsigned long *out2, - unsigned long *out3, - unsigned long *out4, - unsigned long *out5, - unsigned long *out6, - unsigned long *out7, - unsigned long *out8, - unsigned long *out9) +static long ehca_plpar_hcall9(unsigned long opcode, + unsigned long *outs, /* array of 9 outputs */ + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7, + unsigned long arg8, + unsigned long arg9) { long ret; int i, sleep_msecs; @@ -182,13 +158,9 @@ static long ehca_hcall_9arg_9ret(unsigne arg8, arg9); for (i = 0; i < 5; i++) { - ret = plpar_hcall_9arg_9ret(opcode, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - out1, out2, out3, out4, - out5, out6, out7, out8, - out9); + ret = plpar_hcall9(opcode, outs, + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9); if (H_IS_LONG_BUSY(ret)) { sleep_msecs = get_longbusy_msecs(ret); @@ -205,37 +177,35 @@ static long ehca_hcall_9arg_9ret(unsigne " out5=%lx out6=%lx out7=%lx out8=%lx" " out9=%lx", opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, - *out9); + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9, + outs[0], outs[1], outs[2], outs[3], + outs[4], outs[5], outs[6], outs[7], + outs[8]); ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx " - "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, *out9); + "out9=%lx", + opcode, ret, outs[0], outs[1], outs[2], outs[3], + outs[4], outs[5], outs[6], outs[7], outs[8]); return ret; } return H_BUSY; } - u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle, struct ehca_pfeq *pfeq, const u32 neq_control, const u32 number_of_entries, struct ipz_eq_handle *eq_handle, - u32 * act_nr_of_entries, - u32 * act_pages, - u32 * eq_ist) + u32 *act_nr_of_entries, + u32 *act_pages, + u32 *eq_ist) { u64 ret; - u64 dummy; + u64 outs[PLPAR_HCALL9_BUFSIZE]; u64 allocate_controls; - u64 act_nr_of_entries_out, act_pages_out, eq_ist_out; /* resource type */ allocate_controls = 3ULL; @@ -246,22 +216,15 @@ u64 hipz_h_alloc_resource_eq(const struc else /* notification event queue */ allocate_controls = (1ULL << 63) | allocate_controls; - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - allocate_controls, /* r5 */ - number_of_entries, /* r6 */ - 0, 0, 0, 0, - &eq_handle->handle, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &act_nr_of_entries_out, /* r7 */ - &act_pages_out, /* r8 */ - &eq_ist_out, /* r8 */ - &dummy); - - *act_nr_of_entries = (u32)act_nr_of_entries_out; - *act_pages = (u32)act_pages_out; - *eq_ist = (u32)eq_ist_out; + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + number_of_entries, /* r6 */ + 0, 0, 0, 0, 0, 0); + eq_handle->handle = outs[0]; + *act_nr_of_entries = (u32)outs[3]; + *act_pages = (u32)outs[4]; + *eq_ist = (u32)outs[5]; if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resource - ret=%lx ", ret); @@ -273,20 +236,11 @@ u64 hipz_h_reset_event(const struct ipz_ struct ipz_eq_handle eq_handle, const u64 event_mask) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_RESET_EVENTS, - adapter_handle.handle, /* r4 */ - eq_handle.handle, /* r5 */ - event_mask, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_RESET_EVENTS, + adapter_handle.handle, /* r4 */ + eq_handle.handle, /* r5 */ + event_mask, /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, @@ -294,30 +248,21 @@ u64 hipz_h_alloc_resource_cq(const struc struct ehca_alloc_cq_parms *param) { u64 ret; - u64 dummy; - u64 act_nr_of_entries_out, act_pages_out; - u64 g_la_privileged_out, g_la_user_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 2, /* r5 */ - param->eq_handle.handle, /* r6 */ - cq->token, /* r7 */ - param->nr_cqe, /* r8 */ - 0, 0, - &cq->ipz_cq_handle.handle, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &act_nr_of_entries_out, /* r7 */ - &act_pages_out, /* r8 */ - &g_la_privileged_out, /* r9 */ - &g_la_user_out); /* r10 */ - - param->act_nr_of_entries = (u32)act_nr_of_entries_out; - param->act_pages = (u32)act_pages_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 2, /* r5 */ + param->eq_handle.handle, /* r6 */ + cq->token, /* r7 */ + param->nr_cqe, /* r8 */ + 0, 0, 0, 0); + cq->ipz_cq_handle.handle = outs[0]; + param->act_nr_of_entries = (u32)outs[3]; + param->act_pages = (u32)outs[4]; if (ret == H_SUCCESS) - hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out); + hcp_galpas_ctor(&cq->galpas, outs[5], outs[6]); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resources. ret=%lx", ret); @@ -330,8 +275,9 @@ u64 hipz_h_alloc_resource_qp(const struc struct ehca_alloc_qp_parms *parms) { u64 ret; - u64 dummy, allocate_controls, max_r10_reg; - u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out; + u64 allocate_controls; + u64 max_r10_reg; + u64 outs[PLPAR_HCALL9_BUFSIZE]; u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1; u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1; int daqp_ctrl = parms->daqp_ctrl; @@ -360,48 +306,36 @@ u64 hipz_h_alloc_resource_qp(const struc | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE, parms->max_recv_sge); - - ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - allocate_controls, /* r5 */ - qp->send_cq->ipz_cq_handle.handle, - qp->recv_cq->ipz_cq_handle.handle, - parms->ipz_eq_handle.handle, - ((u64)qp->token << 32) | parms->pd.value, - max_r10_reg, /* r10 */ - parms->ud_av_l_key_ctl, /* r11 */ - 0, - &qp->ipz_qp_handle.handle, - &qp_nr_out, /* r5 */ - &r6_out, /* r6 */ - &r7_out, /* r7 */ - &r8_out, /* r8 */ - &dummy, /* r9 */ - &g_la_user_out, /* r10 */ - &r11_out, - &dummy); - - /* extract outputs */ - qp->real_qp_num = (u32)qp_nr_out; - + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + qp->send_cq->ipz_cq_handle.handle, + qp->recv_cq->ipz_cq_handle.handle, + parms->ipz_eq_handle.handle, + ((u64)qp->token << 32) | parms->pd.value, + max_r10_reg, /* r10 */ + parms->ud_av_l_key_ctl, /* r11 */ + 0); + qp->ipz_qp_handle.handle = outs[0]; + qp->real_qp_num = (u32)outs[1]; parms->act_nr_send_sges = - (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out); + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, outs[2]); parms->act_nr_recv_wqes = - (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out); + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, outs[2]); parms->act_nr_send_sges = - (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out); + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, outs[3]); parms->act_nr_recv_sges = - (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out); + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, outs[3]); parms->nr_sq_pages = - (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out); + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, outs[4]); parms->nr_rq_pages = - (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out); + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, outs[4]); if (ret == H_SUCCESS) - hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out); + hcp_galpas_ctor(&qp->galpas, outs[6], outs[6]); if (ret == H_NOT_ENOUGH_RESOURCES) - ehca_gen_err("Not enough resources. ret=%lx",ret); + ehca_gen_err("Not enough resources. ret=%lx", ret); return ret; } @@ -411,7 +345,6 @@ u64 hipz_h_query_port(const struct ipz_a struct hipz_query_port *query_port_response_block) { u64 ret; - u64 dummy; u64 r_cb = virt_to_abs(query_port_response_block); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -419,18 +352,11 @@ u64 hipz_h_query_port(const struct ipz_a return H_PARAMETER; } - ret = ehca_hcall_7arg_7ret(H_QUERY_PORT, - adapter_handle.handle, /* r4 */ - port_id, /* r5 */ - r_cb, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_QUERY_PORT, + adapter_handle.handle, /* r4 */ + port_id, /* r5 */ + r_cb, /* r6 */ + 0, 0, 0, 0); if (ehca_debug_level) ehca_dmp(query_port_response_block, 64, "response_block"); @@ -441,7 +367,6 @@ u64 hipz_h_query_port(const struct ipz_a u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle, struct hipz_query_hca *query_hca_rblock) { - u64 dummy; u64 r_cb = virt_to_abs(query_hca_rblock); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -450,17 +375,10 @@ u64 hipz_h_query_hca(const struct ipz_ad return H_PARAMETER; } - return ehca_hcall_7arg_7ret(H_QUERY_HCA, - adapter_handle.handle, /* r4 */ - r_cb, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_QUERY_HCA, + adapter_handle.handle, /* r4 */ + r_cb, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle, @@ -470,22 +388,13 @@ u64 hipz_h_register_rpage(const struct i const u64 logical_address_of_page, u64 count) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES, - adapter_handle.handle, /* r4 */ - queue_type | pagesize << 8, /* r5 */ - resource_handle, /* r6 */ - logical_address_of_page, /* r7 */ - count, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_REGISTER_RPAGES, + adapter_handle.handle, /* r4 */ + queue_type | pagesize << 8, /* r5 */ + resource_handle, /* r6 */ + logical_address_of_page, /* r7 */ + count, /* r8 */ + 0, 0); } u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle, @@ -507,23 +416,14 @@ u64 hipz_h_register_rpage_eq(const struc logical_address_of_page, count); } -u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, +u64 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, u32 ist) { - u32 ret; - u64 dummy; - - ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE, - adapter_handle.handle, /* r4 */ - ist, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + u64 ret; + ret = ehca_plpar_hcall_norets(H_QUERY_INT_STATE, + adapter_handle.handle, /* r4 */ + ist, /* r5 */ + 0, 0, 0, 0, 0); if (ret != H_SUCCESS && ret != H_BUSY) ehca_gen_err("Could not query interrupt state."); @@ -576,25 +476,20 @@ u64 hipz_h_disable_and_get_wqe(const str void **log_addr_next_rq_wqe2processed, int dis_and_get_function_code) { - u64 dummy, dummy1, dummy2; - - if (!log_addr_next_sq_wqe2processed) - log_addr_next_sq_wqe2processed = (void**)&dummy1; - if (!log_addr_next_rq_wqe2processed) - log_addr_next_rq_wqe2processed = (void**)&dummy2; - - return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - dis_and_get_function_code, /* r5 */ - qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - (void*)log_addr_next_sq_wqe2processed, - (void*)log_addr_next_rq_wqe2processed, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + u64 ret; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, + adapter_handle.handle, /* r4 */ + dis_and_get_function_code, /* r5 */ + qp_handle.handle, /* r6 */ + 0, 0, 0, 0, 0, 0); + if (log_addr_next_sq_wqe2processed) + *log_addr_next_sq_wqe2processed = (void*)outs[0]; + if (log_addr_next_rq_wqe2processed) + *log_addr_next_rq_wqe2processed = (void*)outs[1]; + + return ret; } u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle, @@ -605,22 +500,13 @@ u64 hipz_h_modify_qp(const struct ipz_ad struct h_galpa gal) { u64 ret; - u64 dummy; - u64 invalid_attribute_identifier, rc_attrib_mask; - - ret = ehca_hcall_7arg_7ret(H_MODIFY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - update_mask, /* r6 */ - virt_to_abs(mqpcb), /* r7 */ - 0, 0, 0, - &invalid_attribute_identifier, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &dummy, /* r7 */ - &dummy, /* r8 */ - &rc_attrib_mask, /* r9 */ - &dummy); + u64 outs[PLPAR_HCALL9_BUFSIZE]; + ret = ehca_plpar_hcall9(H_MODIFY_QP, outs, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + update_mask, /* r6 */ + virt_to_abs(mqpcb), /* r7 */ + 0, 0, 0, 0, 0); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Insufficient resources ret=%lx", ret); @@ -634,61 +520,37 @@ u64 hipz_h_query_qp(const struct ipz_ada struct hcp_modify_qp_control_block *qqpcb, struct h_galpa gal) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_QUERY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - virt_to_abs(qqpcb), /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_QUERY_QP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + virt_to_abs(qqpcb), /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle, struct ehca_qp *qp) { u64 ret; - u64 dummy; - u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; ret = hcp_galpas_dtor(&qp->galpas); if (ret) { ehca_gen_err("Could not destruct qp->galpas"); return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - /* function code */ - 1, /* r5 */ - qp->ipz_qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - &ladr_next_sq_wqe_out, /* r4 */ - &ladr_next_rq_wqe_out, /* r5 */ - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, + adapter_handle.handle, /* r4 */ + /* function code */ + 1, /* r5 */ + qp->ipz_qp_handle.handle, /* r6 */ + 0, 0, 0, 0, 0, 0); if (ret == H_HARDWARE) ehca_gen_err("HCA not operational. ret=%lx", ret); - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - qp->ipz_qp_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + qp->ipz_qp_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("Resource still in use. ret=%lx", ret); @@ -701,20 +563,11 @@ u64 hipz_h_define_aqp0(const struct ipz_ struct h_galpa gal, u32 port) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_DEFINE_AQP0, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_DEFINE_AQP0, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle, @@ -724,24 +577,15 @@ u64 hipz_h_define_aqp1(const struct ipz_ u32 * bma_qp_nr) { u64 ret; - u64 dummy; - u64 pma_qp_nr_out, bma_qp_nr_out; - - ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &pma_qp_nr_out, /* r4 */ - &bma_qp_nr_out, /* r5 */ - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - *pma_qp_nr = (u32)pma_qp_nr_out; - *bma_qp_nr = (u32)bma_qp_nr_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0, 0, 0); + *pma_qp_nr = (u32)outs[0]; + *bma_qp_nr = (u32)outs[1]; if (ret == H_ALIAS_EXIST) ehca_gen_err("AQP1 already exists. ret=%lx", ret); @@ -756,22 +600,14 @@ u64 hipz_h_attach_mcqp(const struct ipz_ u64 subnet_prefix, u64 interface_id) { u64 ret; - u64 dummy; - - ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + + ret = ehca_plpar_hcall_norets(H_ATTACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resources. ret=%lx", ret); @@ -785,22 +621,13 @@ u64 hipz_h_detach_mcqp(const struct ipz_ u16 mcg_dlid, u64 subnet_prefix, u64 interface_id) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_DETACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_DETACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0); } u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle, @@ -808,7 +635,6 @@ u64 hipz_h_destroy_cq(const struct ipz_a u8 force_flag) { u64 ret; - u64 dummy; ret = hcp_galpas_dtor(&cq->galpas); if (ret) { @@ -816,18 +642,11 @@ u64 hipz_h_destroy_cq(const struct ipz_a return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - cq->ipz_cq_handle.handle, /* r5 */ - force_flag != 0 ? 1L : 0L, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + cq->ipz_cq_handle.handle, /* r5 */ + force_flag != 0 ? 1L : 0L, /* r6 */ + 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret); @@ -839,7 +658,6 @@ u64 hipz_h_destroy_eq(const struct ipz_a struct ehca_eq *eq) { u64 ret; - u64 dummy; ret = hcp_galpas_dtor(&eq->galpas); if (ret) { @@ -847,18 +665,10 @@ u64 hipz_h_destroy_eq(const struct ipz_a return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - eq->ipz_eq_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + eq->ipz_eq_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("Resource in use. ret=%lx ", ret); @@ -875,27 +685,19 @@ u64 hipz_h_alloc_resource_mr(const struc struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out; - u64 rkey_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 5, /* r5 */ - vaddr, /* r6 */ - length, /* r7 */ - (((u64)access_ctrl) << 32ULL), /* r8 */ - pd.value, /* r9 */ - 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 5, /* r5 */ + vaddr, /* r6 */ + length, /* r7 */ + (((u64)access_ctrl) << 32ULL), /* r8 */ + pd.value, /* r9 */ + 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -923,7 +725,6 @@ u64 hipz_h_register_rpage_mr(const struc queue_type, mr->ipz_mr_handle.handle, logical_address_of_page, count); - return ret; } @@ -932,24 +733,17 @@ u64 hipz_h_query_mr(const struct ipz_ada struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out; - - ret = ehca_hcall_7arg_7ret(H_QUERY_MR, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &outparms->len, /* r4 */ - &outparms->vaddr, /* r5 */ - &remote_len_out, /* r6 */ - &remote_vaddr_out, /* r7 */ - &acc_ctrl_pd_out, /* r8 */ - &r9_out, - &dummy); - - outparms->acl = acc_ctrl_pd_out >> 32; - outparms->lkey = (u32)(r9_out >> 32); - outparms->rkey = (u32)(r9_out & (0xffffffff)); + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_QUERY_MR, outs, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, 0, 0); + outparms->len = outs[0]; + outparms->vaddr = outs[1]; + outparms->acl = outs[4] >> 32; + outparms->lkey = (u32)(outs[5] >> 32); + outparms->rkey = (u32)(outs[5] & (0xffffffff)); return ret; } @@ -957,19 +751,10 @@ u64 hipz_h_query_mr(const struct ipz_ada u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle, const struct ehca_mr *mr) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, @@ -982,28 +767,20 @@ u64 hipz_h_reregister_pmr(const struct i struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - vaddr_in, /* r6 */ - length, /* r7 */ - /* r8 */ - ((((u64)access_ctrl) << 32ULL) | pd.value), - mr_addr_cb, /* r9 */ - 0, - &dummy, /* r4 */ - &outparms->vaddr, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + length, /* r7 */ + /* r8 */ + ((((u64)access_ctrl) << 32ULL) | pd.value), + mr_addr_cb, /* r9 */ + 0, 0, 0); + outparms->vaddr = outs[1]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1017,25 +794,18 @@ u64 hipz_h_register_smr(const struct ipz struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR, - adapter_handle.handle, /* r4 */ - orig_mr->ipz_mr_handle.handle, /* r5 */ - vaddr_in, /* r6 */ - (((u64)access_ctrl) << 32ULL), /* r7 */ - pd.value, /* r8 */ - 0, 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs, + adapter_handle.handle, /* r4 */ + orig_mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + (((u64)access_ctrl) << 32ULL), /* r7 */ + pd.value, /* r8 */ + 0, 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1046,23 +816,15 @@ u64 hipz_h_alloc_resource_mw(const struc struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 rkey_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 6, /* r5 */ - pd.value, /* r6 */ - 0, 0, 0, 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 6, /* r5 */ + pd.value, /* r6 */ + 0, 0, 0, 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1072,21 +834,13 @@ u64 hipz_h_query_mw(const struct ipz_ada struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 pd_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_QUERY_MW, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &rkey_out, /* r7 */ - &pd_out, /* r8 */ - &dummy, - &dummy); - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_QUERY_MW, outs, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, 0, 0); + outparms->rkey = (u32)outs[3]; return ret; } @@ -1094,19 +848,10 @@ u64 hipz_h_query_mw(const struct ipz_ada u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle, const struct ehca_mw *mw) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, @@ -1114,7 +859,6 @@ u64 hipz_h_error_data(const struct ipz_a void *rblock, unsigned long *byte_count) { - u64 dummy; u64 r_cb = virt_to_abs(rblock); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -1122,16 +866,9 @@ u64 hipz_h_error_data(const struct ipz_a return H_PARAMETER; } - return ehca_hcall_7arg_7ret(H_ERROR_DATA, - adapter_handle.handle, - ressource_handle, - r_cb, - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_ERROR_DATA, + adapter_handle.handle, + ressource_handle, + r_cb, + 0, 0, 0, 0); } diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h index 39956d8..587ebd4 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.h +++ b/drivers/infiniband/hw/ehca/hcp_if.h @@ -107,7 +107,7 @@ u64 hipz_h_register_rpage_eq(const struc const u64 logical_address_of_page, const u64 count); -u32 hipz_h_query_int_state(const struct ipz_adapter_handle +u64 hipz_h_query_int_state(const struct ipz_adapter_handle hcp_adapter_handle, u32 ist); diff --git a/drivers/infiniband/hw/ehca/hipz_hw.h b/drivers/infiniband/hw/ehca/hipz_hw.h index f5f4871..3fc92b0 100644 --- a/drivers/infiniband/hw/ehca/hipz_hw.h +++ b/drivers/infiniband/hw/ehca/hipz_hw.h @@ -184,8 +184,6 @@ struct hipz_mrmwmm { }; -#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0) - #define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x) struct hipz_qpedmm { diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h index 7e55a31..2f13509 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h @@ -226,10 +226,9 @@ static inline void *ipz_eqit_eq_get_inc_ { void *ret = ipz_qeit_get(queue); u32 qe = *(u8 *) ret; - if ((qe >> 7) == (queue->toggle_state & 1)) - ipz_qeit_eq_get_inc(queue); /* this is a good one */ - else - ret = NULL; + if ((qe >> 7) != (queue->toggle_state & 1)) + return NULL; + ipz_qeit_eq_get_inc(queue); /* this is a good one */ return ret; } From rdreier at cisco.com Fri Sep 22 08:27:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Sep 2006 08:27:49 -0700 Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface based on Anton Blanchard's new hvcall interface In-Reply-To: <200609221720.24191.hnguyen@de.ibm.com> (Hoang-Nam Nguyen's message of "Fri, 22 Sep 2006 17:20:23 +0200") References: <200609221720.24191.hnguyen@de.ibm.com> Message-ID: > - shca->ib_device.node_type = RDMA_NODE_IB_CA; > + shca->ib_device.node_type = IB_NODE_CA; Did you test this at all? I can't see how this would build against my for-2.6.19 tree... Please resend a patch that you know is working. - R. From hnguyen at de.ibm.com Fri Sep 22 13:00:12 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Fri, 22 Sep 2006 22:00:12 +0200 Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface based on Anton Blanchard's new hvcall interface Message-ID: <200609222200.12722.hnguyen@de.ibm.com> > - shca->ib_device.node_type = RDMA_NODE_IB_CA; > + shca->ib_device.node_type = IB_NODE_CA; My mistake, I tested against Paul's git tree only and then used a wrong patch script, which exchanged those defines. This time I did all manually and tested also against your git tree with Anton's patch http://ozlabs.org/pipermail/linuxppc-dev/2006-July/024556.html. Thanks! Nam Nguyen Signed-off-by: Hoang-Nam Nguyen --- ehca_main.c | 5 hcp_if.c | 845 ++++++++++++++++++++---------------------------------------- hcp_if.h | 2 hipz_hw.h | 2 ipz_pt_fn.h | 7 5 files changed, 298 insertions(+), 563 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 159b0be..2380994 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -5,6 +5,7 @@ * * Authors: Heiko J Schick * Hoang-Nam Nguyen + * Joachim Fenkes * * Copyright (c) 2005 IBM Corporation * @@ -48,7 +49,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0015"); +MODULE_VERSION("SVNEHCA_0016"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -749,7 +750,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0015)\n"); + "(Rel.: SVNEHCA_0016)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c index 260e82a..3fb46e6 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.c +++ b/drivers/infiniband/hw/ehca/hcp_if.c @@ -48,27 +48,27 @@ #include "hcp_phyp.h" #include "hipz_fns.h" #include "ipz_pt_fn.h" -#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9,11) -#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12,12) -#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13,15) -#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18,18) -#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19,21) -#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22,23) -#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31,31) -#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56,63) - -#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0,15) -#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16,31) -#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32,39) -#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40,47) - -#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16,31) -#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48,63) -#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8,15) -#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24,31) - -#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0,31) -#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32,63) +#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9, 11) +#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12, 12) +#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13, 15) +#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18, 18) +#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19, 21) +#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22, 23) +#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31, 31) +#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56, 63) + +#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0, 15) +#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16, 31) +#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32, 39) +#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40, 47) + +#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16, 31) +#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48, 63) +#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8, 15) +#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24, 31) + +#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0, 31) +#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32, 63) /* direct access qp controls */ #define DAQP_CTRL_ENABLE 0x01 @@ -95,35 +95,25 @@ static u32 get_longbusy_msecs(int longbu } } -static long ehca_hcall_7arg_7ret(unsigned long opcode, - unsigned long arg1, - unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5, - unsigned long arg6, - unsigned long arg7, - unsigned long *out1, - unsigned long *out2, - unsigned long *out3, - unsigned long *out4, - unsigned long *out5, - unsigned long *out6, - unsigned long *out7) +static long ehca_plpar_hcall_norets(unsigned long opcode, + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7) { long ret; int i, sleep_msecs; - ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx " - "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5, - arg6, arg7); + ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx " + "arg5=%lx arg6=%lx arg7=%lx", + opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7); for (i = 0; i < 5; i++) { - ret = plpar_hcall_7arg_7ret(opcode, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - out1, out2, out3, out4, - out5, out6,out7); + ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4, + arg5, arg6, arg7); if (H_IS_LONG_BUSY(ret)) { sleep_msecs = get_longbusy_msecs(ret); @@ -134,44 +124,30 @@ static long ehca_hcall_7arg_7ret(unsigne if (ret < H_SUCCESS) ehca_gen_err("opcode=%lx ret=%lx" " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" - " arg5=%lx arg6=%lx arg7=%lx" - " out1=%lx out2=%lx out3=%lx out4=%lx" - " out5=%lx out6=%lx out7=%lx", + " arg5=%lx arg6=%lx arg7=%lx ", opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7); + arg1, arg2, arg3, arg4, arg5, + arg6, arg7); - ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " - "out4=%lx out5=%lx out6=%lx out7=%lx", - opcode, ret, *out1, *out2, *out3, *out4, *out5, - *out6, *out7); + ehca_gen_dbg("opcode=%lx ret=%lx", opcode, ret); return ret; + } return H_BUSY; } -static long ehca_hcall_9arg_9ret(unsigned long opcode, - unsigned long arg1, - unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5, - unsigned long arg6, - unsigned long arg7, - unsigned long arg8, - unsigned long arg9, - unsigned long *out1, - unsigned long *out2, - unsigned long *out3, - unsigned long *out4, - unsigned long *out5, - unsigned long *out6, - unsigned long *out7, - unsigned long *out8, - unsigned long *out9) +static long ehca_plpar_hcall9(unsigned long opcode, + unsigned long *outs, /* array of 9 outputs */ + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7, + unsigned long arg8, + unsigned long arg9) { long ret; int i, sleep_msecs; @@ -182,13 +158,9 @@ static long ehca_hcall_9arg_9ret(unsigne arg8, arg9); for (i = 0; i < 5; i++) { - ret = plpar_hcall_9arg_9ret(opcode, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - out1, out2, out3, out4, - out5, out6, out7, out8, - out9); + ret = plpar_hcall9(opcode, outs, + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9); if (H_IS_LONG_BUSY(ret)) { sleep_msecs = get_longbusy_msecs(ret); @@ -205,37 +177,35 @@ static long ehca_hcall_9arg_9ret(unsigne " out5=%lx out6=%lx out7=%lx out8=%lx" " out9=%lx", opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, - *out9); + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9, + outs[0], outs[1], outs[2], outs[3], + outs[4], outs[5], outs[6], outs[7], + outs[8]); ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx " - "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, *out9); + "out9=%lx", + opcode, ret, outs[0], outs[1], outs[2], outs[3], + outs[4], outs[5], outs[6], outs[7], outs[8]); return ret; } return H_BUSY; } - u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle, struct ehca_pfeq *pfeq, const u32 neq_control, const u32 number_of_entries, struct ipz_eq_handle *eq_handle, - u32 * act_nr_of_entries, - u32 * act_pages, - u32 * eq_ist) + u32 *act_nr_of_entries, + u32 *act_pages, + u32 *eq_ist) { u64 ret; - u64 dummy; + u64 outs[PLPAR_HCALL9_BUFSIZE]; u64 allocate_controls; - u64 act_nr_of_entries_out, act_pages_out, eq_ist_out; /* resource type */ allocate_controls = 3ULL; @@ -246,22 +216,15 @@ u64 hipz_h_alloc_resource_eq(const struc else /* notification event queue */ allocate_controls = (1ULL << 63) | allocate_controls; - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - allocate_controls, /* r5 */ - number_of_entries, /* r6 */ - 0, 0, 0, 0, - &eq_handle->handle, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &act_nr_of_entries_out, /* r7 */ - &act_pages_out, /* r8 */ - &eq_ist_out, /* r8 */ - &dummy); - - *act_nr_of_entries = (u32)act_nr_of_entries_out; - *act_pages = (u32)act_pages_out; - *eq_ist = (u32)eq_ist_out; + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + number_of_entries, /* r6 */ + 0, 0, 0, 0, 0, 0); + eq_handle->handle = outs[0]; + *act_nr_of_entries = (u32)outs[3]; + *act_pages = (u32)outs[4]; + *eq_ist = (u32)outs[5]; if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resource - ret=%lx ", ret); @@ -273,20 +236,11 @@ u64 hipz_h_reset_event(const struct ipz_ struct ipz_eq_handle eq_handle, const u64 event_mask) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_RESET_EVENTS, - adapter_handle.handle, /* r4 */ - eq_handle.handle, /* r5 */ - event_mask, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_RESET_EVENTS, + adapter_handle.handle, /* r4 */ + eq_handle.handle, /* r5 */ + event_mask, /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, @@ -294,30 +248,21 @@ u64 hipz_h_alloc_resource_cq(const struc struct ehca_alloc_cq_parms *param) { u64 ret; - u64 dummy; - u64 act_nr_of_entries_out, act_pages_out; - u64 g_la_privileged_out, g_la_user_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 2, /* r5 */ - param->eq_handle.handle, /* r6 */ - cq->token, /* r7 */ - param->nr_cqe, /* r8 */ - 0, 0, - &cq->ipz_cq_handle.handle, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &act_nr_of_entries_out, /* r7 */ - &act_pages_out, /* r8 */ - &g_la_privileged_out, /* r9 */ - &g_la_user_out); /* r10 */ - - param->act_nr_of_entries = (u32)act_nr_of_entries_out; - param->act_pages = (u32)act_pages_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 2, /* r5 */ + param->eq_handle.handle, /* r6 */ + cq->token, /* r7 */ + param->nr_cqe, /* r8 */ + 0, 0, 0, 0); + cq->ipz_cq_handle.handle = outs[0]; + param->act_nr_of_entries = (u32)outs[3]; + param->act_pages = (u32)outs[4]; if (ret == H_SUCCESS) - hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out); + hcp_galpas_ctor(&cq->galpas, outs[5], outs[6]); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resources. ret=%lx", ret); @@ -330,8 +275,9 @@ u64 hipz_h_alloc_resource_qp(const struc struct ehca_alloc_qp_parms *parms) { u64 ret; - u64 dummy, allocate_controls, max_r10_reg; - u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out; + u64 allocate_controls; + u64 max_r10_reg; + u64 outs[PLPAR_HCALL9_BUFSIZE]; u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1; u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1; int daqp_ctrl = parms->daqp_ctrl; @@ -360,48 +306,36 @@ u64 hipz_h_alloc_resource_qp(const struc | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE, parms->max_recv_sge); - - ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - allocate_controls, /* r5 */ - qp->send_cq->ipz_cq_handle.handle, - qp->recv_cq->ipz_cq_handle.handle, - parms->ipz_eq_handle.handle, - ((u64)qp->token << 32) | parms->pd.value, - max_r10_reg, /* r10 */ - parms->ud_av_l_key_ctl, /* r11 */ - 0, - &qp->ipz_qp_handle.handle, - &qp_nr_out, /* r5 */ - &r6_out, /* r6 */ - &r7_out, /* r7 */ - &r8_out, /* r8 */ - &dummy, /* r9 */ - &g_la_user_out, /* r10 */ - &r11_out, - &dummy); - - /* extract outputs */ - qp->real_qp_num = (u32)qp_nr_out; - + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + qp->send_cq->ipz_cq_handle.handle, + qp->recv_cq->ipz_cq_handle.handle, + parms->ipz_eq_handle.handle, + ((u64)qp->token << 32) | parms->pd.value, + max_r10_reg, /* r10 */ + parms->ud_av_l_key_ctl, /* r11 */ + 0); + qp->ipz_qp_handle.handle = outs[0]; + qp->real_qp_num = (u32)outs[1]; parms->act_nr_send_sges = - (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out); + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, outs[2]); parms->act_nr_recv_wqes = - (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out); + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, outs[2]); parms->act_nr_send_sges = - (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out); + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, outs[3]); parms->act_nr_recv_sges = - (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out); + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, outs[3]); parms->nr_sq_pages = - (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out); + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, outs[4]); parms->nr_rq_pages = - (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out); + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, outs[4]); if (ret == H_SUCCESS) - hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out); + hcp_galpas_ctor(&qp->galpas, outs[6], outs[6]); if (ret == H_NOT_ENOUGH_RESOURCES) - ehca_gen_err("Not enough resources. ret=%lx",ret); + ehca_gen_err("Not enough resources. ret=%lx", ret); return ret; } @@ -411,7 +345,6 @@ u64 hipz_h_query_port(const struct ipz_a struct hipz_query_port *query_port_response_block) { u64 ret; - u64 dummy; u64 r_cb = virt_to_abs(query_port_response_block); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -419,18 +352,11 @@ u64 hipz_h_query_port(const struct ipz_a return H_PARAMETER; } - ret = ehca_hcall_7arg_7ret(H_QUERY_PORT, - adapter_handle.handle, /* r4 */ - port_id, /* r5 */ - r_cb, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_QUERY_PORT, + adapter_handle.handle, /* r4 */ + port_id, /* r5 */ + r_cb, /* r6 */ + 0, 0, 0, 0); if (ehca_debug_level) ehca_dmp(query_port_response_block, 64, "response_block"); @@ -441,7 +367,6 @@ u64 hipz_h_query_port(const struct ipz_a u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle, struct hipz_query_hca *query_hca_rblock) { - u64 dummy; u64 r_cb = virt_to_abs(query_hca_rblock); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -450,17 +375,10 @@ u64 hipz_h_query_hca(const struct ipz_ad return H_PARAMETER; } - return ehca_hcall_7arg_7ret(H_QUERY_HCA, - adapter_handle.handle, /* r4 */ - r_cb, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_QUERY_HCA, + adapter_handle.handle, /* r4 */ + r_cb, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle, @@ -470,22 +388,13 @@ u64 hipz_h_register_rpage(const struct i const u64 logical_address_of_page, u64 count) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES, - adapter_handle.handle, /* r4 */ - queue_type | pagesize << 8, /* r5 */ - resource_handle, /* r6 */ - logical_address_of_page, /* r7 */ - count, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_REGISTER_RPAGES, + adapter_handle.handle, /* r4 */ + queue_type | pagesize << 8, /* r5 */ + resource_handle, /* r6 */ + logical_address_of_page, /* r7 */ + count, /* r8 */ + 0, 0); } u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle, @@ -507,23 +416,14 @@ u64 hipz_h_register_rpage_eq(const struc logical_address_of_page, count); } -u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, +u64 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, u32 ist) { - u32 ret; - u64 dummy; - - ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE, - adapter_handle.handle, /* r4 */ - ist, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + u64 ret; + ret = ehca_plpar_hcall_norets(H_QUERY_INT_STATE, + adapter_handle.handle, /* r4 */ + ist, /* r5 */ + 0, 0, 0, 0, 0); if (ret != H_SUCCESS && ret != H_BUSY) ehca_gen_err("Could not query interrupt state."); @@ -576,25 +476,20 @@ u64 hipz_h_disable_and_get_wqe(const str void **log_addr_next_rq_wqe2processed, int dis_and_get_function_code) { - u64 dummy, dummy1, dummy2; - - if (!log_addr_next_sq_wqe2processed) - log_addr_next_sq_wqe2processed = (void**)&dummy1; - if (!log_addr_next_rq_wqe2processed) - log_addr_next_rq_wqe2processed = (void**)&dummy2; - - return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - dis_and_get_function_code, /* r5 */ - qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - (void*)log_addr_next_sq_wqe2processed, - (void*)log_addr_next_rq_wqe2processed, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + u64 ret; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, + adapter_handle.handle, /* r4 */ + dis_and_get_function_code, /* r5 */ + qp_handle.handle, /* r6 */ + 0, 0, 0, 0, 0, 0); + if (log_addr_next_sq_wqe2processed) + *log_addr_next_sq_wqe2processed = (void*)outs[0]; + if (log_addr_next_rq_wqe2processed) + *log_addr_next_rq_wqe2processed = (void*)outs[1]; + + return ret; } u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle, @@ -605,22 +500,13 @@ u64 hipz_h_modify_qp(const struct ipz_ad struct h_galpa gal) { u64 ret; - u64 dummy; - u64 invalid_attribute_identifier, rc_attrib_mask; - - ret = ehca_hcall_7arg_7ret(H_MODIFY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - update_mask, /* r6 */ - virt_to_abs(mqpcb), /* r7 */ - 0, 0, 0, - &invalid_attribute_identifier, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &dummy, /* r7 */ - &dummy, /* r8 */ - &rc_attrib_mask, /* r9 */ - &dummy); + u64 outs[PLPAR_HCALL9_BUFSIZE]; + ret = ehca_plpar_hcall9(H_MODIFY_QP, outs, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + update_mask, /* r6 */ + virt_to_abs(mqpcb), /* r7 */ + 0, 0, 0, 0, 0); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Insufficient resources ret=%lx", ret); @@ -634,61 +520,37 @@ u64 hipz_h_query_qp(const struct ipz_ada struct hcp_modify_qp_control_block *qqpcb, struct h_galpa gal) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_QUERY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - virt_to_abs(qqpcb), /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_QUERY_QP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + virt_to_abs(qqpcb), /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle, struct ehca_qp *qp) { u64 ret; - u64 dummy; - u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; ret = hcp_galpas_dtor(&qp->galpas); if (ret) { ehca_gen_err("Could not destruct qp->galpas"); return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - /* function code */ - 1, /* r5 */ - qp->ipz_qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - &ladr_next_sq_wqe_out, /* r4 */ - &ladr_next_rq_wqe_out, /* r5 */ - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, + adapter_handle.handle, /* r4 */ + /* function code */ + 1, /* r5 */ + qp->ipz_qp_handle.handle, /* r6 */ + 0, 0, 0, 0, 0, 0); if (ret == H_HARDWARE) ehca_gen_err("HCA not operational. ret=%lx", ret); - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - qp->ipz_qp_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + qp->ipz_qp_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("Resource still in use. ret=%lx", ret); @@ -701,20 +563,11 @@ u64 hipz_h_define_aqp0(const struct ipz_ struct h_galpa gal, u32 port) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_DEFINE_AQP0, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_DEFINE_AQP0, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle, @@ -724,24 +577,15 @@ u64 hipz_h_define_aqp1(const struct ipz_ u32 * bma_qp_nr) { u64 ret; - u64 dummy; - u64 pma_qp_nr_out, bma_qp_nr_out; - - ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &pma_qp_nr_out, /* r4 */ - &bma_qp_nr_out, /* r5 */ - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - *pma_qp_nr = (u32)pma_qp_nr_out; - *bma_qp_nr = (u32)bma_qp_nr_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0, 0, 0); + *pma_qp_nr = (u32)outs[0]; + *bma_qp_nr = (u32)outs[1]; if (ret == H_ALIAS_EXIST) ehca_gen_err("AQP1 already exists. ret=%lx", ret); @@ -756,22 +600,14 @@ u64 hipz_h_attach_mcqp(const struct ipz_ u64 subnet_prefix, u64 interface_id) { u64 ret; - u64 dummy; - - ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + + ret = ehca_plpar_hcall_norets(H_ATTACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resources. ret=%lx", ret); @@ -785,22 +621,13 @@ u64 hipz_h_detach_mcqp(const struct ipz_ u16 mcg_dlid, u64 subnet_prefix, u64 interface_id) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_DETACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_DETACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0); } u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle, @@ -808,7 +635,6 @@ u64 hipz_h_destroy_cq(const struct ipz_a u8 force_flag) { u64 ret; - u64 dummy; ret = hcp_galpas_dtor(&cq->galpas); if (ret) { @@ -816,18 +642,11 @@ u64 hipz_h_destroy_cq(const struct ipz_a return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - cq->ipz_cq_handle.handle, /* r5 */ - force_flag != 0 ? 1L : 0L, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + cq->ipz_cq_handle.handle, /* r5 */ + force_flag != 0 ? 1L : 0L, /* r6 */ + 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret); @@ -839,7 +658,6 @@ u64 hipz_h_destroy_eq(const struct ipz_a struct ehca_eq *eq) { u64 ret; - u64 dummy; ret = hcp_galpas_dtor(&eq->galpas); if (ret) { @@ -847,18 +665,10 @@ u64 hipz_h_destroy_eq(const struct ipz_a return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - eq->ipz_eq_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + eq->ipz_eq_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("Resource in use. ret=%lx ", ret); @@ -875,27 +685,19 @@ u64 hipz_h_alloc_resource_mr(const struc struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out; - u64 rkey_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 5, /* r5 */ - vaddr, /* r6 */ - length, /* r7 */ - (((u64)access_ctrl) << 32ULL), /* r8 */ - pd.value, /* r9 */ - 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 5, /* r5 */ + vaddr, /* r6 */ + length, /* r7 */ + (((u64)access_ctrl) << 32ULL), /* r8 */ + pd.value, /* r9 */ + 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -923,7 +725,6 @@ u64 hipz_h_register_rpage_mr(const struc queue_type, mr->ipz_mr_handle.handle, logical_address_of_page, count); - return ret; } @@ -932,24 +733,17 @@ u64 hipz_h_query_mr(const struct ipz_ada struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out; - - ret = ehca_hcall_7arg_7ret(H_QUERY_MR, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &outparms->len, /* r4 */ - &outparms->vaddr, /* r5 */ - &remote_len_out, /* r6 */ - &remote_vaddr_out, /* r7 */ - &acc_ctrl_pd_out, /* r8 */ - &r9_out, - &dummy); - - outparms->acl = acc_ctrl_pd_out >> 32; - outparms->lkey = (u32)(r9_out >> 32); - outparms->rkey = (u32)(r9_out & (0xffffffff)); + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_QUERY_MR, outs, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, 0, 0); + outparms->len = outs[0]; + outparms->vaddr = outs[1]; + outparms->acl = outs[4] >> 32; + outparms->lkey = (u32)(outs[5] >> 32); + outparms->rkey = (u32)(outs[5] & (0xffffffff)); return ret; } @@ -957,19 +751,10 @@ u64 hipz_h_query_mr(const struct ipz_ada u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle, const struct ehca_mr *mr) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, @@ -982,28 +767,20 @@ u64 hipz_h_reregister_pmr(const struct i struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - vaddr_in, /* r6 */ - length, /* r7 */ - /* r8 */ - ((((u64)access_ctrl) << 32ULL) | pd.value), - mr_addr_cb, /* r9 */ - 0, - &dummy, /* r4 */ - &outparms->vaddr, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + length, /* r7 */ + /* r8 */ + ((((u64)access_ctrl) << 32ULL) | pd.value), + mr_addr_cb, /* r9 */ + 0, 0, 0); + outparms->vaddr = outs[1]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1017,25 +794,18 @@ u64 hipz_h_register_smr(const struct ipz struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR, - adapter_handle.handle, /* r4 */ - orig_mr->ipz_mr_handle.handle, /* r5 */ - vaddr_in, /* r6 */ - (((u64)access_ctrl) << 32ULL), /* r7 */ - pd.value, /* r8 */ - 0, 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs, + adapter_handle.handle, /* r4 */ + orig_mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + (((u64)access_ctrl) << 32ULL), /* r7 */ + pd.value, /* r8 */ + 0, 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1046,23 +816,15 @@ u64 hipz_h_alloc_resource_mw(const struc struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 rkey_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 6, /* r5 */ - pd.value, /* r6 */ - 0, 0, 0, 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 6, /* r5 */ + pd.value, /* r6 */ + 0, 0, 0, 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1072,21 +834,13 @@ u64 hipz_h_query_mw(const struct ipz_ada struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 pd_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_QUERY_MW, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &rkey_out, /* r7 */ - &pd_out, /* r8 */ - &dummy, - &dummy); - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_QUERY_MW, outs, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, 0, 0); + outparms->rkey = (u32)outs[3]; return ret; } @@ -1094,19 +848,10 @@ u64 hipz_h_query_mw(const struct ipz_ada u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle, const struct ehca_mw *mw) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, @@ -1114,7 +859,6 @@ u64 hipz_h_error_data(const struct ipz_a void *rblock, unsigned long *byte_count) { - u64 dummy; u64 r_cb = virt_to_abs(rblock); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -1122,16 +866,9 @@ u64 hipz_h_error_data(const struct ipz_a return H_PARAMETER; } - return ehca_hcall_7arg_7ret(H_ERROR_DATA, - adapter_handle.handle, - ressource_handle, - r_cb, - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_ERROR_DATA, + adapter_handle.handle, + ressource_handle, + r_cb, + 0, 0, 0, 0); } diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h index 39956d8..587ebd4 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.h +++ b/drivers/infiniband/hw/ehca/hcp_if.h @@ -107,7 +107,7 @@ u64 hipz_h_register_rpage_eq(const struc const u64 logical_address_of_page, const u64 count); -u32 hipz_h_query_int_state(const struct ipz_adapter_handle +u64 hipz_h_query_int_state(const struct ipz_adapter_handle hcp_adapter_handle, u32 ist); diff --git a/drivers/infiniband/hw/ehca/hipz_hw.h b/drivers/infiniband/hw/ehca/hipz_hw.h index f5f4871..3fc92b0 100644 --- a/drivers/infiniband/hw/ehca/hipz_hw.h +++ b/drivers/infiniband/hw/ehca/hipz_hw.h @@ -184,8 +184,6 @@ struct hipz_mrmwmm { }; -#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0) - #define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x) struct hipz_qpedmm { diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h index 7e55a31..2f13509 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h @@ -226,10 +226,9 @@ static inline void *ipz_eqit_eq_get_inc_ { void *ret = ipz_qeit_get(queue); u32 qe = *(u8 *) ret; - if ((qe >> 7) == (queue->toggle_state & 1)) - ipz_qeit_eq_get_inc(queue); /* this is a good one */ - else - ret = NULL; + if ((qe >> 7) != (queue->toggle_state & 1)) + return NULL; + ipz_qeit_eq_get_inc(queue); /* this is a good one */ return ret; } -------------- next part -------------- diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 159b0be..2380994 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -5,6 +5,7 @@ * * Authors: Heiko J Schick * Hoang-Nam Nguyen + * Joachim Fenkes * * Copyright (c) 2005 IBM Corporation * @@ -48,7 +49,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0015"); +MODULE_VERSION("SVNEHCA_0016"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -749,7 +750,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0015)\n"); + "(Rel.: SVNEHCA_0016)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c index 260e82a..3fb46e6 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.c +++ b/drivers/infiniband/hw/ehca/hcp_if.c @@ -48,27 +48,27 @@ #include "hcp_phyp.h" #include "hipz_fns.h" #include "ipz_pt_fn.h" -#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9,11) -#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12,12) -#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13,15) -#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18,18) -#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19,21) -#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22,23) -#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31,31) -#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56,63) - -#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0,15) -#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16,31) -#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32,39) -#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40,47) - -#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16,31) -#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48,63) -#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8,15) -#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24,31) - -#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0,31) -#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32,63) +#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9, 11) +#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12, 12) +#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13, 15) +#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18, 18) +#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19, 21) +#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22, 23) +#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31, 31) +#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56, 63) + +#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0, 15) +#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16, 31) +#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32, 39) +#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40, 47) + +#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16, 31) +#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48, 63) +#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8, 15) +#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24, 31) + +#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0, 31) +#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32, 63) /* direct access qp controls */ #define DAQP_CTRL_ENABLE 0x01 @@ -95,35 +95,25 @@ static u32 get_longbusy_msecs(int longbu } } -static long ehca_hcall_7arg_7ret(unsigned long opcode, - unsigned long arg1, - unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5, - unsigned long arg6, - unsigned long arg7, - unsigned long *out1, - unsigned long *out2, - unsigned long *out3, - unsigned long *out4, - unsigned long *out5, - unsigned long *out6, - unsigned long *out7) +static long ehca_plpar_hcall_norets(unsigned long opcode, + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7) { long ret; int i, sleep_msecs; - ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx " - "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5, - arg6, arg7); + ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx " + "arg5=%lx arg6=%lx arg7=%lx", + opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7); for (i = 0; i < 5; i++) { - ret = plpar_hcall_7arg_7ret(opcode, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - out1, out2, out3, out4, - out5, out6,out7); + ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4, + arg5, arg6, arg7); if (H_IS_LONG_BUSY(ret)) { sleep_msecs = get_longbusy_msecs(ret); @@ -134,44 +124,30 @@ static long ehca_hcall_7arg_7ret(unsigne if (ret < H_SUCCESS) ehca_gen_err("opcode=%lx ret=%lx" " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" - " arg5=%lx arg6=%lx arg7=%lx" - " out1=%lx out2=%lx out3=%lx out4=%lx" - " out5=%lx out6=%lx out7=%lx", + " arg5=%lx arg6=%lx arg7=%lx ", opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7); + arg1, arg2, arg3, arg4, arg5, + arg6, arg7); - ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " - "out4=%lx out5=%lx out6=%lx out7=%lx", - opcode, ret, *out1, *out2, *out3, *out4, *out5, - *out6, *out7); + ehca_gen_dbg("opcode=%lx ret=%lx", opcode, ret); return ret; + } return H_BUSY; } -static long ehca_hcall_9arg_9ret(unsigned long opcode, - unsigned long arg1, - unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5, - unsigned long arg6, - unsigned long arg7, - unsigned long arg8, - unsigned long arg9, - unsigned long *out1, - unsigned long *out2, - unsigned long *out3, - unsigned long *out4, - unsigned long *out5, - unsigned long *out6, - unsigned long *out7, - unsigned long *out8, - unsigned long *out9) +static long ehca_plpar_hcall9(unsigned long opcode, + unsigned long *outs, /* array of 9 outputs */ + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7, + unsigned long arg8, + unsigned long arg9) { long ret; int i, sleep_msecs; @@ -182,13 +158,9 @@ static long ehca_hcall_9arg_9ret(unsigne arg8, arg9); for (i = 0; i < 5; i++) { - ret = plpar_hcall_9arg_9ret(opcode, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - out1, out2, out3, out4, - out5, out6, out7, out8, - out9); + ret = plpar_hcall9(opcode, outs, + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9); if (H_IS_LONG_BUSY(ret)) { sleep_msecs = get_longbusy_msecs(ret); @@ -205,37 +177,35 @@ static long ehca_hcall_9arg_9ret(unsigne " out5=%lx out6=%lx out7=%lx out8=%lx" " out9=%lx", opcode, ret, - arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8, - arg9, - *out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, - *out9); + arg1, arg2, arg3, arg4, arg5, + arg6, arg7, arg8, arg9, + outs[0], outs[1], outs[2], outs[3], + outs[4], outs[5], outs[6], outs[7], + outs[8]); ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx " - "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4, - *out5, *out6, *out7, *out8, *out9); + "out9=%lx", + opcode, ret, outs[0], outs[1], outs[2], outs[3], + outs[4], outs[5], outs[6], outs[7], outs[8]); return ret; } return H_BUSY; } - u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle, struct ehca_pfeq *pfeq, const u32 neq_control, const u32 number_of_entries, struct ipz_eq_handle *eq_handle, - u32 * act_nr_of_entries, - u32 * act_pages, - u32 * eq_ist) + u32 *act_nr_of_entries, + u32 *act_pages, + u32 *eq_ist) { u64 ret; - u64 dummy; + u64 outs[PLPAR_HCALL9_BUFSIZE]; u64 allocate_controls; - u64 act_nr_of_entries_out, act_pages_out, eq_ist_out; /* resource type */ allocate_controls = 3ULL; @@ -246,22 +216,15 @@ u64 hipz_h_alloc_resource_eq(const struc else /* notification event queue */ allocate_controls = (1ULL << 63) | allocate_controls; - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - allocate_controls, /* r5 */ - number_of_entries, /* r6 */ - 0, 0, 0, 0, - &eq_handle->handle, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &act_nr_of_entries_out, /* r7 */ - &act_pages_out, /* r8 */ - &eq_ist_out, /* r8 */ - &dummy); - - *act_nr_of_entries = (u32)act_nr_of_entries_out; - *act_pages = (u32)act_pages_out; - *eq_ist = (u32)eq_ist_out; + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + number_of_entries, /* r6 */ + 0, 0, 0, 0, 0, 0); + eq_handle->handle = outs[0]; + *act_nr_of_entries = (u32)outs[3]; + *act_pages = (u32)outs[4]; + *eq_ist = (u32)outs[5]; if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resource - ret=%lx ", ret); @@ -273,20 +236,11 @@ u64 hipz_h_reset_event(const struct ipz_ struct ipz_eq_handle eq_handle, const u64 event_mask) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_RESET_EVENTS, - adapter_handle.handle, /* r4 */ - eq_handle.handle, /* r5 */ - event_mask, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_RESET_EVENTS, + adapter_handle.handle, /* r4 */ + eq_handle.handle, /* r5 */ + event_mask, /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, @@ -294,30 +248,21 @@ u64 hipz_h_alloc_resource_cq(const struc struct ehca_alloc_cq_parms *param) { u64 ret; - u64 dummy; - u64 act_nr_of_entries_out, act_pages_out; - u64 g_la_privileged_out, g_la_user_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 2, /* r5 */ - param->eq_handle.handle, /* r6 */ - cq->token, /* r7 */ - param->nr_cqe, /* r8 */ - 0, 0, - &cq->ipz_cq_handle.handle, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &act_nr_of_entries_out, /* r7 */ - &act_pages_out, /* r8 */ - &g_la_privileged_out, /* r9 */ - &g_la_user_out); /* r10 */ - - param->act_nr_of_entries = (u32)act_nr_of_entries_out; - param->act_pages = (u32)act_pages_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 2, /* r5 */ + param->eq_handle.handle, /* r6 */ + cq->token, /* r7 */ + param->nr_cqe, /* r8 */ + 0, 0, 0, 0); + cq->ipz_cq_handle.handle = outs[0]; + param->act_nr_of_entries = (u32)outs[3]; + param->act_pages = (u32)outs[4]; if (ret == H_SUCCESS) - hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out); + hcp_galpas_ctor(&cq->galpas, outs[5], outs[6]); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resources. ret=%lx", ret); @@ -330,8 +275,9 @@ u64 hipz_h_alloc_resource_qp(const struc struct ehca_alloc_qp_parms *parms) { u64 ret; - u64 dummy, allocate_controls, max_r10_reg; - u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out; + u64 allocate_controls; + u64 max_r10_reg; + u64 outs[PLPAR_HCALL9_BUFSIZE]; u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1; u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1; int daqp_ctrl = parms->daqp_ctrl; @@ -360,48 +306,36 @@ u64 hipz_h_alloc_resource_qp(const struc | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE, parms->max_recv_sge); - - ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - allocate_controls, /* r5 */ - qp->send_cq->ipz_cq_handle.handle, - qp->recv_cq->ipz_cq_handle.handle, - parms->ipz_eq_handle.handle, - ((u64)qp->token << 32) | parms->pd.value, - max_r10_reg, /* r10 */ - parms->ud_av_l_key_ctl, /* r11 */ - 0, - &qp->ipz_qp_handle.handle, - &qp_nr_out, /* r5 */ - &r6_out, /* r6 */ - &r7_out, /* r7 */ - &r8_out, /* r8 */ - &dummy, /* r9 */ - &g_la_user_out, /* r10 */ - &r11_out, - &dummy); - - /* extract outputs */ - qp->real_qp_num = (u32)qp_nr_out; - + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + qp->send_cq->ipz_cq_handle.handle, + qp->recv_cq->ipz_cq_handle.handle, + parms->ipz_eq_handle.handle, + ((u64)qp->token << 32) | parms->pd.value, + max_r10_reg, /* r10 */ + parms->ud_av_l_key_ctl, /* r11 */ + 0); + qp->ipz_qp_handle.handle = outs[0]; + qp->real_qp_num = (u32)outs[1]; parms->act_nr_send_sges = - (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out); + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, outs[2]); parms->act_nr_recv_wqes = - (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out); + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, outs[2]); parms->act_nr_send_sges = - (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out); + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, outs[3]); parms->act_nr_recv_sges = - (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out); + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, outs[3]); parms->nr_sq_pages = - (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out); + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, outs[4]); parms->nr_rq_pages = - (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out); + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, outs[4]); if (ret == H_SUCCESS) - hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out); + hcp_galpas_ctor(&qp->galpas, outs[6], outs[6]); if (ret == H_NOT_ENOUGH_RESOURCES) - ehca_gen_err("Not enough resources. ret=%lx",ret); + ehca_gen_err("Not enough resources. ret=%lx", ret); return ret; } @@ -411,7 +345,6 @@ u64 hipz_h_query_port(const struct ipz_a struct hipz_query_port *query_port_response_block) { u64 ret; - u64 dummy; u64 r_cb = virt_to_abs(query_port_response_block); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -419,18 +352,11 @@ u64 hipz_h_query_port(const struct ipz_a return H_PARAMETER; } - ret = ehca_hcall_7arg_7ret(H_QUERY_PORT, - adapter_handle.handle, /* r4 */ - port_id, /* r5 */ - r_cb, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_QUERY_PORT, + adapter_handle.handle, /* r4 */ + port_id, /* r5 */ + r_cb, /* r6 */ + 0, 0, 0, 0); if (ehca_debug_level) ehca_dmp(query_port_response_block, 64, "response_block"); @@ -441,7 +367,6 @@ u64 hipz_h_query_port(const struct ipz_a u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle, struct hipz_query_hca *query_hca_rblock) { - u64 dummy; u64 r_cb = virt_to_abs(query_hca_rblock); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -450,17 +375,10 @@ u64 hipz_h_query_hca(const struct ipz_ad return H_PARAMETER; } - return ehca_hcall_7arg_7ret(H_QUERY_HCA, - adapter_handle.handle, /* r4 */ - r_cb, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_QUERY_HCA, + adapter_handle.handle, /* r4 */ + r_cb, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle, @@ -470,22 +388,13 @@ u64 hipz_h_register_rpage(const struct i const u64 logical_address_of_page, u64 count) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES, - adapter_handle.handle, /* r4 */ - queue_type | pagesize << 8, /* r5 */ - resource_handle, /* r6 */ - logical_address_of_page, /* r7 */ - count, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_REGISTER_RPAGES, + adapter_handle.handle, /* r4 */ + queue_type | pagesize << 8, /* r5 */ + resource_handle, /* r6 */ + logical_address_of_page, /* r7 */ + count, /* r8 */ + 0, 0); } u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle, @@ -507,23 +416,14 @@ u64 hipz_h_register_rpage_eq(const struc logical_address_of_page, count); } -u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, +u64 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, u32 ist) { - u32 ret; - u64 dummy; - - ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE, - adapter_handle.handle, /* r4 */ - ist, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + u64 ret; + ret = ehca_plpar_hcall_norets(H_QUERY_INT_STATE, + adapter_handle.handle, /* r4 */ + ist, /* r5 */ + 0, 0, 0, 0, 0); if (ret != H_SUCCESS && ret != H_BUSY) ehca_gen_err("Could not query interrupt state."); @@ -576,25 +476,20 @@ u64 hipz_h_disable_and_get_wqe(const str void **log_addr_next_rq_wqe2processed, int dis_and_get_function_code) { - u64 dummy, dummy1, dummy2; - - if (!log_addr_next_sq_wqe2processed) - log_addr_next_sq_wqe2processed = (void**)&dummy1; - if (!log_addr_next_rq_wqe2processed) - log_addr_next_rq_wqe2processed = (void**)&dummy2; - - return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - dis_and_get_function_code, /* r5 */ - qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - (void*)log_addr_next_sq_wqe2processed, - (void*)log_addr_next_rq_wqe2processed, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + u64 ret; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, + adapter_handle.handle, /* r4 */ + dis_and_get_function_code, /* r5 */ + qp_handle.handle, /* r6 */ + 0, 0, 0, 0, 0, 0); + if (log_addr_next_sq_wqe2processed) + *log_addr_next_sq_wqe2processed = (void*)outs[0]; + if (log_addr_next_rq_wqe2processed) + *log_addr_next_rq_wqe2processed = (void*)outs[1]; + + return ret; } u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle, @@ -605,22 +500,13 @@ u64 hipz_h_modify_qp(const struct ipz_ad struct h_galpa gal) { u64 ret; - u64 dummy; - u64 invalid_attribute_identifier, rc_attrib_mask; - - ret = ehca_hcall_7arg_7ret(H_MODIFY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - update_mask, /* r6 */ - virt_to_abs(mqpcb), /* r7 */ - 0, 0, 0, - &invalid_attribute_identifier, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &dummy, /* r7 */ - &dummy, /* r8 */ - &rc_attrib_mask, /* r9 */ - &dummy); + u64 outs[PLPAR_HCALL9_BUFSIZE]; + ret = ehca_plpar_hcall9(H_MODIFY_QP, outs, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + update_mask, /* r6 */ + virt_to_abs(mqpcb), /* r7 */ + 0, 0, 0, 0, 0); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Insufficient resources ret=%lx", ret); @@ -634,61 +520,37 @@ u64 hipz_h_query_qp(const struct ipz_ada struct hcp_modify_qp_control_block *qqpcb, struct h_galpa gal) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_QUERY_QP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - virt_to_abs(qqpcb), /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_QUERY_QP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + virt_to_abs(qqpcb), /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle, struct ehca_qp *qp) { u64 ret; - u64 dummy; - u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; ret = hcp_galpas_dtor(&qp->galpas); if (ret) { ehca_gen_err("Could not destruct qp->galpas"); return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, - adapter_handle.handle, /* r4 */ - /* function code */ - 1, /* r5 */ - qp->ipz_qp_handle.handle, /* r6 */ - 0, 0, 0, 0, - &ladr_next_sq_wqe_out, /* r4 */ - &ladr_next_rq_wqe_out, /* r5 */ - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, + adapter_handle.handle, /* r4 */ + /* function code */ + 1, /* r5 */ + qp->ipz_qp_handle.handle, /* r6 */ + 0, 0, 0, 0, 0, 0); if (ret == H_HARDWARE) ehca_gen_err("HCA not operational. ret=%lx", ret); - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - qp->ipz_qp_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + qp->ipz_qp_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("Resource still in use. ret=%lx", ret); @@ -701,20 +563,11 @@ u64 hipz_h_define_aqp0(const struct ipz_ struct h_galpa gal, u32 port) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_DEFINE_AQP0, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_DEFINE_AQP0, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0); } u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle, @@ -724,24 +577,15 @@ u64 hipz_h_define_aqp1(const struct ipz_ u32 * bma_qp_nr) { u64 ret; - u64 dummy; - u64 pma_qp_nr_out, bma_qp_nr_out; - - ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - port, /* r6 */ - 0, 0, 0, 0, - &pma_qp_nr_out, /* r4 */ - &bma_qp_nr_out, /* r5 */ - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - - *pma_qp_nr = (u32)pma_qp_nr_out; - *bma_qp_nr = (u32)bma_qp_nr_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0, 0, 0); + *pma_qp_nr = (u32)outs[0]; + *bma_qp_nr = (u32)outs[1]; if (ret == H_ALIAS_EXIST) ehca_gen_err("AQP1 already exists. ret=%lx", ret); @@ -756,22 +600,14 @@ u64 hipz_h_attach_mcqp(const struct ipz_ u64 subnet_prefix, u64 interface_id) { u64 ret; - u64 dummy; - - ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + + ret = ehca_plpar_hcall_norets(H_ATTACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0); if (ret == H_NOT_ENOUGH_RESOURCES) ehca_gen_err("Not enough resources. ret=%lx", ret); @@ -785,22 +621,13 @@ u64 hipz_h_detach_mcqp(const struct ipz_ u16 mcg_dlid, u64 subnet_prefix, u64 interface_id) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_DETACH_MCQP, - adapter_handle.handle, /* r4 */ - qp_handle.handle, /* r5 */ - mcg_dlid, /* r6 */ - interface_id, /* r7 */ - subnet_prefix, /* r8 */ - 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_DETACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0); } u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle, @@ -808,7 +635,6 @@ u64 hipz_h_destroy_cq(const struct ipz_a u8 force_flag) { u64 ret; - u64 dummy; ret = hcp_galpas_dtor(&cq->galpas); if (ret) { @@ -816,18 +642,11 @@ u64 hipz_h_destroy_cq(const struct ipz_a return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - cq->ipz_cq_handle.handle, /* r5 */ - force_flag != 0 ? 1L : 0L, /* r6 */ - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + cq->ipz_cq_handle.handle, /* r5 */ + force_flag != 0 ? 1L : 0L, /* r6 */ + 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret); @@ -839,7 +658,6 @@ u64 hipz_h_destroy_eq(const struct ipz_a struct ehca_eq *eq) { u64 ret; - u64 dummy; ret = hcp_galpas_dtor(&eq->galpas); if (ret) { @@ -847,18 +665,10 @@ u64 hipz_h_destroy_eq(const struct ipz_a return H_RESOURCE; } - ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - eq->ipz_eq_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); - + ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + eq->ipz_eq_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); if (ret == H_RESOURCE) ehca_gen_err("Resource in use. ret=%lx ", ret); @@ -875,27 +685,19 @@ u64 hipz_h_alloc_resource_mr(const struc struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out; - u64 rkey_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 5, /* r5 */ - vaddr, /* r6 */ - length, /* r7 */ - (((u64)access_ctrl) << 32ULL), /* r8 */ - pd.value, /* r9 */ - 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 5, /* r5 */ + vaddr, /* r6 */ + length, /* r7 */ + (((u64)access_ctrl) << 32ULL), /* r8 */ + pd.value, /* r9 */ + 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -923,7 +725,6 @@ u64 hipz_h_register_rpage_mr(const struc queue_type, mr->ipz_mr_handle.handle, logical_address_of_page, count); - return ret; } @@ -932,24 +733,17 @@ u64 hipz_h_query_mr(const struct ipz_ada struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out; - - ret = ehca_hcall_7arg_7ret(H_QUERY_MR, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &outparms->len, /* r4 */ - &outparms->vaddr, /* r5 */ - &remote_len_out, /* r6 */ - &remote_vaddr_out, /* r7 */ - &acc_ctrl_pd_out, /* r8 */ - &r9_out, - &dummy); - - outparms->acl = acc_ctrl_pd_out >> 32; - outparms->lkey = (u32)(r9_out >> 32); - outparms->rkey = (u32)(r9_out & (0xffffffff)); + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_QUERY_MR, outs, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, 0, 0); + outparms->len = outs[0]; + outparms->vaddr = outs[1]; + outparms->acl = outs[4] >> 32; + outparms->lkey = (u32)(outs[5] >> 32); + outparms->rkey = (u32)(outs[5] & (0xffffffff)); return ret; } @@ -957,19 +751,10 @@ u64 hipz_h_query_mr(const struct ipz_ada u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle, const struct ehca_mr *mr) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, @@ -982,28 +767,20 @@ u64 hipz_h_reregister_pmr(const struct i struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR, - adapter_handle.handle, /* r4 */ - mr->ipz_mr_handle.handle, /* r5 */ - vaddr_in, /* r6 */ - length, /* r7 */ - /* r8 */ - ((((u64)access_ctrl) << 32ULL) | pd.value), - mr_addr_cb, /* r9 */ - 0, - &dummy, /* r4 */ - &outparms->vaddr, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + length, /* r7 */ + /* r8 */ + ((((u64)access_ctrl) << 32ULL) | pd.value), + mr_addr_cb, /* r9 */ + 0, 0, 0); + outparms->vaddr = outs[1]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1017,25 +794,18 @@ u64 hipz_h_register_smr(const struct ipz struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 lkey_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR, - adapter_handle.handle, /* r4 */ - orig_mr->ipz_mr_handle.handle, /* r5 */ - vaddr_in, /* r6 */ - (((u64)access_ctrl) << 32ULL), /* r7 */ - pd.value, /* r8 */ - 0, 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &lkey_out, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - outparms->lkey = (u32)lkey_out; - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs, + adapter_handle.handle, /* r4 */ + orig_mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + (((u64)access_ctrl) << 32ULL), /* r7 */ + pd.value, /* r8 */ + 0, 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->lkey = (u32)outs[2]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1046,23 +816,15 @@ u64 hipz_h_alloc_resource_mw(const struc struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 rkey_out; - - ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, - adapter_handle.handle, /* r4 */ - 6, /* r5 */ - pd.value, /* r6 */ - 0, 0, 0, 0, - &(outparms->handle.handle), /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &rkey_out, /* r7 */ - &dummy, - &dummy, - &dummy); - - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, + adapter_handle.handle, /* r4 */ + 6, /* r5 */ + pd.value, /* r6 */ + 0, 0, 0, 0, 0, 0); + outparms->handle.handle = outs[0]; + outparms->rkey = (u32)outs[3]; return ret; } @@ -1072,21 +834,13 @@ u64 hipz_h_query_mw(const struct ipz_ada struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 dummy; - u64 pd_out, rkey_out; - - ret = ehca_hcall_7arg_7ret(H_QUERY_MW, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, /* r4 */ - &dummy, /* r5 */ - &dummy, /* r6 */ - &rkey_out, /* r7 */ - &pd_out, /* r8 */ - &dummy, - &dummy); - outparms->rkey = (u32)rkey_out; + u64 outs[PLPAR_HCALL9_BUFSIZE]; + + ret = ehca_plpar_hcall9(H_QUERY_MW, outs, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, 0, 0); + outparms->rkey = (u32)outs[3]; return ret; } @@ -1094,19 +848,10 @@ u64 hipz_h_query_mw(const struct ipz_ada u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle, const struct ehca_mw *mw) { - u64 dummy; - - return ehca_hcall_7arg_7ret(H_FREE_RESOURCE, - adapter_handle.handle, /* r4 */ - mw->ipz_mw_handle.handle, /* r5 */ - 0, 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0); } u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, @@ -1114,7 +859,6 @@ u64 hipz_h_error_data(const struct ipz_a void *rblock, unsigned long *byte_count) { - u64 dummy; u64 r_cb = virt_to_abs(rblock); if (r_cb & (EHCA_PAGESIZE-1)) { @@ -1122,16 +866,9 @@ u64 hipz_h_error_data(const struct ipz_a return H_PARAMETER; } - return ehca_hcall_7arg_7ret(H_ERROR_DATA, - adapter_handle.handle, - ressource_handle, - r_cb, - 0, 0, 0, 0, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy, - &dummy); + return ehca_plpar_hcall_norets(H_ERROR_DATA, + adapter_handle.handle, + ressource_handle, + r_cb, + 0, 0, 0, 0); } diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h index 39956d8..587ebd4 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.h +++ b/drivers/infiniband/hw/ehca/hcp_if.h @@ -107,7 +107,7 @@ u64 hipz_h_register_rpage_eq(const struc const u64 logical_address_of_page, const u64 count); -u32 hipz_h_query_int_state(const struct ipz_adapter_handle +u64 hipz_h_query_int_state(const struct ipz_adapter_handle hcp_adapter_handle, u32 ist); diff --git a/drivers/infiniband/hw/ehca/hipz_hw.h b/drivers/infiniband/hw/ehca/hipz_hw.h index f5f4871..3fc92b0 100644 --- a/drivers/infiniband/hw/ehca/hipz_hw.h +++ b/drivers/infiniband/hw/ehca/hipz_hw.h @@ -184,8 +184,6 @@ struct hipz_mrmwmm { }; -#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0) - #define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x) struct hipz_qpedmm { diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h index 7e55a31..2f13509 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h @@ -226,10 +226,9 @@ static inline void *ipz_eqit_eq_get_inc_ { void *ret = ipz_qeit_get(queue); u32 qe = *(u8 *) ret; - if ((qe >> 7) == (queue->toggle_state & 1)) - ipz_qeit_eq_get_inc(queue); /* this is a good one */ - else - ret = NULL; + if ((qe >> 7) != (queue->toggle_state & 1)) + return NULL; + ipz_qeit_eq_get_inc(queue); /* this is a good one */ return ret; } From ardavis at ichips.intel.com Fri Sep 22 14:09:35 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Fri, 22 Sep 2006 14:09:35 -0700 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> Message-ID: <4514510F.3050400@ichips.intel.com> Sean Hefty wrote: >Currently a DREP is only sent in response to a DREQ if a connection >has been found matching the DREQ, and it is in the proper state. Once >a DREP is sent, the local connection moves into timewait. Duplicate >DREQs received while in this state result in re-sending the DREP. > >However, it's likely that the local connection will enter and exit >timewait before the remote side times out a lost DREP and resends a DREQ. >There are a couple possible solutions to this. One is to increase how >long a connection remains in timewait, by multiplying its wait time by >max_cm_retries. This can greatly increase the timewait state before a QP >can be re-used when CM messages are not lost. > >An alternative is to send a DREP in response to a DREQ, even if a local >connection is not found, which is what this patch does. > > Would it be possible to get this fix in rc7? I am consistently seeing this problem with Intel MPI on a 64 node cluster. -arlin From rdreier at cisco.com Fri Sep 22 15:37:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Sep 2006 15:37:22 -0700 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree has: - Add support for iWARP (RDMA over IP) - Add amso1100 driver for Ammasso 1100 iWARP adapters - Add ehca driver for IBM GX InfiniBand adapters - ipath fixes - lots of other smaller stuff Bryan O'Sullivan: IB/ipath: More changes to support InfiniPath on PowerPC 970 systems IB/ipath: lock resource limit counters correctly IB/ipath: fix for crash on module unload, if cfgports < portcnt IB/ipath: fix handling of kpiobufs IB/ipath: drop requirement that PIO buffers be mmaped write-only IB/ipath: merge ipath_core and ib_ipath drivers IB/ipath: simplify layering code IB/ipath: simplify debugging code after ipath_core and ib_ipath merger IB/ipath: remove stale references to userspace SMA IB/ipath: trivial cleanups IB/ipath: add new minor device to allow sending of diag packets IB/ipath: do not allow use of CQ entries with invalid counts IB/ipath: account for attached QPs correctly IB/ipath: support new QLogic product naming scheme IB/ipath: add serial number to hardware freeze error message IB/ipath: be more strict about testing the modify QP verb IB/ipath: validate path_mig_state properly IB/ipath: put a limit on the number of QPs that can be created IB/ipath: handle sq_sig_all field correctly IB/ipath: allow SMA to be disabled IB/ipath: fix return value from ipath_poll IB/ipath: control receive polarity inversion Dotan Barak: IPoIB: Remove unused include of vmalloc.h Eli Cohen: IPoIB: Rejoin all multicast groups after a port event IPoIB: Add some likely/unlikely annotations in hot path Erez Zilber: IB/iser: fix a check of SG alignment for RDMA IB/iser: Limit the max size of a scsi command IB/iser: make FMR "page size" be 4K and not PAGE_SIZE IB/iser: fix some debug prints IB/iser: Do not use FMR for a single dma entry sg Heiko J Schick: IB/ehca: Add driver for IBM eHCA InfiniBand adapters Ishai Rabinovitz: IB/srp: Add port/device attributes Jack Morgenstein: IB/mthca: Fix lid used for sending traps IB/mthca: Fix default static rate returned for Tavor in AV IB/mthca: Return port number for unconnected QPs in query_qp IB/mthca: Return correct number of bits for static rate in query_qp IB/mthca: Recover from catastrophic errors James Lentini: IB/mthca: Include the header we really want IB/mad: Remove unused includes Krishna Kumar: IB: Fix typo in kerneldoc for ib_set_client_data() Michael S. Tsirkin: IB/mthca: Don't use privileged UAR for kernel access IB/ipoib: Fix flush/start xmit race (from code review) IB/sa: Require SA registration IB/cm: Do not track remote QPN in timewait state IB/sa: fix ib_sa_selector names Or Gerlitz: RDMA/cma: Document rdma_destroy_id() function RDMA/cma: Document rdma_accept() error handling Ralph Campbell: IB/uverbs: Allow resize CQ operation to return driver-specific data IB/uverbs: Pass userspace data to modify_srq and modify_qp methods IB/ipath: Performance improvements via mmap of queues Roland Dreier: IB/uverbs: Use idr_read_cq() where appropriate IB/uverbs: Fix lockdep warning when QP is created with 2 CQs IB: Whitespace fixes IPoIB: Refactor completion handling IB/mthca: Simplify calls to mthca_cq_clean() IB/iser: INFINIBAND_ISER depends on INET IPoIB: Create MCGs with all attributes required by RFC Sean Hefty: IB/cm: Enable atomics along with RDMA reads IB/cm: Use correct reject code for invalid GID IB/mad: Add support for dual-sided RMPP transfers. IB/cm: Randomize starting comm ID RDMA/cma: Protect against adding device during destruction Tom Tucker: RDMA: iWARP Connection Manager. RDMA: iWARP Core Changes. RDMA/amso1100: Add driver for Ammasso 1100 RNIC MAINTAINERS | 16 drivers/infiniband/Kconfig | 4 drivers/infiniband/Makefile | 4 drivers/infiniband/core/Makefile | 4 drivers/infiniband/core/addr.c | 22 drivers/infiniband/core/cache.c | 5 drivers/infiniband/core/cm.c | 66 - drivers/infiniband/core/cma.c | 403 +++- drivers/infiniband/core/device.c | 6 drivers/infiniband/core/iwcm.c | 1019 +++++++++ drivers/infiniband/core/iwcm.h | 62 + drivers/infiniband/core/mad.c | 19 drivers/infiniband/core/mad_priv.h | 1 drivers/infiniband/core/mad_rmpp.c | 94 + drivers/infiniband/core/sa_query.c | 67 + drivers/infiniband/core/smi.c | 16 drivers/infiniband/core/sysfs.c | 13 drivers/infiniband/core/ucm.c | 9 drivers/infiniband/core/user_mad.c | 7 drivers/infiniband/core/uverbs_cmd.c | 64 - drivers/infiniband/core/verbs.c | 21 drivers/infiniband/hw/amso1100/Kbuild | 8 drivers/infiniband/hw/amso1100/Kconfig | 15 drivers/infiniband/hw/amso1100/c2.c | 1255 ++++++++++++ drivers/infiniband/hw/amso1100/c2.h | 551 +++++ drivers/infiniband/hw/amso1100/c2_ae.c | 321 +++ drivers/infiniband/hw/amso1100/c2_ae.h | 108 + drivers/infiniband/hw/amso1100/c2_alloc.c | 144 + drivers/infiniband/hw/amso1100/c2_cm.c | 452 ++++ drivers/infiniband/hw/amso1100/c2_cq.c | 433 ++++ drivers/infiniband/hw/amso1100/c2_intr.c | 209 ++ drivers/infiniband/hw/amso1100/c2_mm.c | 375 +++ drivers/infiniband/hw/amso1100/c2_mq.c | 174 ++ drivers/infiniband/hw/amso1100/c2_mq.h | 106 + drivers/infiniband/hw/amso1100/c2_pd.c | 89 + drivers/infiniband/hw/amso1100/c2_provider.c | 869 ++++++++ drivers/infiniband/hw/amso1100/c2_provider.h | 181 ++ drivers/infiniband/hw/amso1100/c2_qp.c | 975 +++++++++ drivers/infiniband/hw/amso1100/c2_rnic.c | 663 ++++++ drivers/infiniband/hw/amso1100/c2_status.h | 158 + drivers/infiniband/hw/amso1100/c2_user.h | 82 + drivers/infiniband/hw/amso1100/c2_vq.c | 260 ++ drivers/infiniband/hw/amso1100/c2_vq.h | 63 + drivers/infiniband/hw/amso1100/c2_wr.h | 1520 ++++++++++++++ drivers/infiniband/hw/ehca/Kconfig | 16 drivers/infiniband/hw/ehca/Makefile | 16 drivers/infiniband/hw/ehca/ehca_av.c | 271 +++ drivers/infiniband/hw/ehca/ehca_classes.h | 346 +++ drivers/infiniband/hw/ehca/ehca_classes_pSeries.h | 236 ++ drivers/infiniband/hw/ehca/ehca_cq.c | 427 ++++ drivers/infiniband/hw/ehca/ehca_eq.c | 185 ++ drivers/infiniband/hw/ehca/ehca_hca.c | 241 ++ drivers/infiniband/hw/ehca/ehca_irq.c | 762 +++++++ drivers/infiniband/hw/ehca/ehca_irq.h | 77 + drivers/infiniband/hw/ehca/ehca_iverbs.h | 182 ++ drivers/infiniband/hw/ehca/ehca_main.c | 818 ++++++++ drivers/infiniband/hw/ehca/ehca_mcast.c | 131 + drivers/infiniband/hw/ehca/ehca_mrmw.c | 2261 +++++++++++++++++++++ drivers/infiniband/hw/ehca/ehca_mrmw.h | 140 + drivers/infiniband/hw/ehca/ehca_pd.c | 114 + drivers/infiniband/hw/ehca/ehca_qes.h | 259 ++ drivers/infiniband/hw/ehca/ehca_qp.c | 1507 ++++++++++++++ drivers/infiniband/hw/ehca/ehca_reqs.c | 653 ++++++ drivers/infiniband/hw/ehca/ehca_sqp.c | 111 + drivers/infiniband/hw/ehca/ehca_tools.h | 172 ++ drivers/infiniband/hw/ehca/ehca_uverbs.c | 392 ++++ drivers/infiniband/hw/ehca/hcp_if.c | 874 ++++++++ drivers/infiniband/hw/ehca/hcp_if.h | 261 ++ drivers/infiniband/hw/ehca/hcp_phyp.c | 80 + drivers/infiniband/hw/ehca/hcp_phyp.h | 90 + drivers/infiniband/hw/ehca/hipz_fns.h | 68 + drivers/infiniband/hw/ehca/hipz_fns_core.h | 100 + drivers/infiniband/hw/ehca/hipz_hw.h | 388 ++++ drivers/infiniband/hw/ehca/ipz_pt_fn.c | 149 + drivers/infiniband/hw/ehca/ipz_pt_fn.h | 247 ++ drivers/infiniband/hw/ipath/Kconfig | 21 drivers/infiniband/hw/ipath/Makefile | 29 drivers/infiniband/hw/ipath/ipath_common.h | 19 drivers/infiniband/hw/ipath/ipath_cq.c | 183 +- drivers/infiniband/hw/ipath/ipath_debug.h | 2 drivers/infiniband/hw/ipath/ipath_diag.c | 154 + drivers/infiniband/hw/ipath/ipath_driver.c | 349 ++- drivers/infiniband/hw/ipath/ipath_file_ops.c | 35 drivers/infiniband/hw/ipath/ipath_fs.c | 4 drivers/infiniband/hw/ipath/ipath_ht400.c | 1603 --------------- drivers/infiniband/hw/ipath/ipath_iba6110.c | 1612 +++++++++++++++ drivers/infiniband/hw/ipath/ipath_iba6120.c | 1264 ++++++++++++ drivers/infiniband/hw/ipath/ipath_init_chip.c | 21 drivers/infiniband/hw/ipath/ipath_intr.c | 24 drivers/infiniband/hw/ipath/ipath_kernel.h | 57 - drivers/infiniband/hw/ipath/ipath_keys.c | 3 drivers/infiniband/hw/ipath/ipath_layer.c | 1179 ----------- drivers/infiniband/hw/ipath/ipath_layer.h | 115 - drivers/infiniband/hw/ipath/ipath_mad.c | 339 +++ drivers/infiniband/hw/ipath/ipath_mmap.c | 122 + drivers/infiniband/hw/ipath/ipath_mr.c | 12 drivers/infiniband/hw/ipath/ipath_pe800.c | 1254 ------------ drivers/infiniband/hw/ipath/ipath_qp.c | 242 ++ drivers/infiniband/hw/ipath/ipath_rc.c | 9 drivers/infiniband/hw/ipath/ipath_registers.h | 7 drivers/infiniband/hw/ipath/ipath_ruc.c | 160 + drivers/infiniband/hw/ipath/ipath_srq.c | 244 +- drivers/infiniband/hw/ipath/ipath_stats.c | 27 drivers/infiniband/hw/ipath/ipath_sysfs.c | 41 drivers/infiniband/hw/ipath/ipath_uc.c | 5 drivers/infiniband/hw/ipath/ipath_ud.c | 182 +- drivers/infiniband/hw/ipath/ipath_verbs.c | 687 +++++- drivers/infiniband/hw/ipath/ipath_verbs.h | 252 ++ drivers/infiniband/hw/ipath/ipath_verbs_mcast.c | 7 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c | 52 drivers/infiniband/hw/ipath/verbs_debug.h | 108 - drivers/infiniband/hw/mthca/mthca_av.c | 2 drivers/infiniband/hw/mthca/mthca_catas.c | 62 + drivers/infiniband/hw/mthca/mthca_cmd.c | 2 drivers/infiniband/hw/mthca/mthca_cq.c | 10 drivers/infiniband/hw/mthca/mthca_dev.h | 12 drivers/infiniband/hw/mthca/mthca_mad.c | 2 drivers/infiniband/hw/mthca/mthca_main.c | 88 + drivers/infiniband/hw/mthca/mthca_provider.c | 2 drivers/infiniband/hw/mthca/mthca_qp.c | 20 drivers/infiniband/hw/mthca/mthca_srq.c | 2 drivers/infiniband/hw/mthca/mthca_uar.c | 2 drivers/infiniband/ulp/ipoib/ipoib.h | 2 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 194 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 37 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 34 drivers/infiniband/ulp/iser/Kconfig | 2 drivers/infiniband/ulp/iser/iscsi_iser.c | 1 drivers/infiniband/ulp/iser/iscsi_iser.h | 7 drivers/infiniband/ulp/iser/iser_memory.c | 80 + drivers/infiniband/ulp/iser/iser_verbs.c | 10 drivers/infiniband/ulp/srp/ib_srp.c | 43 include/rdma/ib_addr.h | 17 include/rdma/ib_sa.h | 45 include/rdma/ib_user_verbs.h | 2 include/rdma/ib_verbs.h | 31 include/rdma/iw_cm.h | 258 ++ include/rdma/rdma_cm.h | 12 138 files changed, 28416 insertions(+), 5494 deletions(-) create mode 100644 drivers/infiniband/core/iwcm.c create mode 100644 drivers/infiniband/core/iwcm.h create mode 100644 drivers/infiniband/hw/amso1100/Kbuild create mode 100644 drivers/infiniband/hw/amso1100/Kconfig create mode 100644 drivers/infiniband/hw/amso1100/c2.c create mode 100644 drivers/infiniband/hw/amso1100/c2.h create mode 100644 drivers/infiniband/hw/amso1100/c2_ae.c create mode 100644 drivers/infiniband/hw/amso1100/c2_ae.h create mode 100644 drivers/infiniband/hw/amso1100/c2_alloc.c create mode 100644 drivers/infiniband/hw/amso1100/c2_cm.c create mode 100644 drivers/infiniband/hw/amso1100/c2_cq.c create mode 100644 drivers/infiniband/hw/amso1100/c2_intr.c create mode 100644 drivers/infiniband/hw/amso1100/c2_mm.c create mode 100644 drivers/infiniband/hw/amso1100/c2_mq.c create mode 100644 drivers/infiniband/hw/amso1100/c2_mq.h create mode 100644 drivers/infiniband/hw/amso1100/c2_pd.c create mode 100644 drivers/infiniband/hw/amso1100/c2_provider.c create mode 100644 drivers/infiniband/hw/amso1100/c2_provider.h create mode 100644 drivers/infiniband/hw/amso1100/c2_qp.c create mode 100644 drivers/infiniband/hw/amso1100/c2_rnic.c create mode 100644 drivers/infiniband/hw/amso1100/c2_status.h create mode 100644 drivers/infiniband/hw/amso1100/c2_user.h create mode 100644 drivers/infiniband/hw/amso1100/c2_vq.c create mode 100644 drivers/infiniband/hw/amso1100/c2_vq.h create mode 100644 drivers/infiniband/hw/amso1100/c2_wr.h create mode 100644 drivers/infiniband/hw/ehca/Kconfig create mode 100644 drivers/infiniband/hw/ehca/Makefile create mode 100644 drivers/infiniband/hw/ehca/ehca_av.c create mode 100644 drivers/infiniband/hw/ehca/ehca_classes.h create mode 100644 drivers/infiniband/hw/ehca/ehca_classes_pSeries.h create mode 100644 drivers/infiniband/hw/ehca/ehca_cq.c create mode 100644 drivers/infiniband/hw/ehca/ehca_eq.c create mode 100644 drivers/infiniband/hw/ehca/ehca_hca.c create mode 100644 drivers/infiniband/hw/ehca/ehca_irq.c create mode 100644 drivers/infiniband/hw/ehca/ehca_irq.h create mode 100644 drivers/infiniband/hw/ehca/ehca_iverbs.h create mode 100644 drivers/infiniband/hw/ehca/ehca_main.c create mode 100644 drivers/infiniband/hw/ehca/ehca_mcast.c create mode 100644 drivers/infiniband/hw/ehca/ehca_mrmw.c create mode 100644 drivers/infiniband/hw/ehca/ehca_mrmw.h create mode 100644 drivers/infiniband/hw/ehca/ehca_pd.c create mode 100644 drivers/infiniband/hw/ehca/ehca_qes.h create mode 100644 drivers/infiniband/hw/ehca/ehca_qp.c create mode 100644 drivers/infiniband/hw/ehca/ehca_reqs.c create mode 100644 drivers/infiniband/hw/ehca/ehca_sqp.c create mode 100644 drivers/infiniband/hw/ehca/ehca_tools.h create mode 100644 drivers/infiniband/hw/ehca/ehca_uverbs.c create mode 100644 drivers/infiniband/hw/ehca/hcp_if.c create mode 100644 drivers/infiniband/hw/ehca/hcp_if.h create mode 100644 drivers/infiniband/hw/ehca/hcp_phyp.c create mode 100644 drivers/infiniband/hw/ehca/hcp_phyp.h create mode 100644 drivers/infiniband/hw/ehca/hipz_fns.h create mode 100644 drivers/infiniband/hw/ehca/hipz_fns_core.h create mode 100644 drivers/infiniband/hw/ehca/hipz_hw.h create mode 100644 drivers/infiniband/hw/ehca/ipz_pt_fn.c create mode 100644 drivers/infiniband/hw/ehca/ipz_pt_fn.h delete mode 100644 drivers/infiniband/hw/ipath/ipath_ht400.c create mode 100644 drivers/infiniband/hw/ipath/ipath_iba6110.c create mode 100644 drivers/infiniband/hw/ipath/ipath_iba6120.c create mode 100644 drivers/infiniband/hw/ipath/ipath_mmap.c delete mode 100644 drivers/infiniband/hw/ipath/ipath_pe800.c create mode 100644 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c delete mode 100644 drivers/infiniband/hw/ipath/verbs_debug.h create mode 100644 include/rdma/iw_cm.h From rdreier at cisco.com Fri Sep 22 18:21:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Sep 2006 18:21:47 -0700 Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface based on Anton Blanchard's new hvcall interface In-Reply-To: <200609222200.12722.hnguyen@de.ibm.com> (Hoang-Nam Nguyen's message of "Fri, 22 Sep 2006 22:00:12 +0200") References: <200609222200.12722.hnguyen@de.ibm.com> Message-ID: Thanks, I rolled this up in the ehca patch in my tree. Anyway both Paul and I merged with Linus today, so the hcall cleanup and ehca are both upstream. It would be great if you could do a quick check to make sure that ehca works in Linus's current git tree. Thanks, Roland From delaitt at cpc.wmin.ac.uk Sat Sep 23 03:47:28 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Sat, 23 Sep 2006 11:47:28 +0100 (BST) Subject: [openib-general] symbols missing on ib_cm and rdma_cm / ofed-1.1rc6 sles10 Message-ID: Hi, I'm using ofed-1.1 rc6 on sles10. ipoib works ok but i'm experiencing problems using lustre nal with rdma. in particular, i cannot load the following 2 ib mobules. any help would be appreciated. it seems the problem is with ofed (1.1-rc6). it seems those 2 ib modules cannot load and will hence prevent ko2iblnd from loading up. Sep 23 11:30:30 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_wc Sep 23 11:30:30 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_path Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_listen Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_destroy_cm_id Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_create_cm_id Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rep Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_init_qp_attr Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_drep Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rtu Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_dreq Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_req Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_establish Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rej Cheers, Thierry. From HNGUYEN at de.ibm.com Sat Sep 23 13:45:28 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Sat, 23 Sep 2006 22:45:28 +0200 Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface based on Anton Blanchard's new hvcall interface In-Reply-To: Message-ID: Hi Roland, > Anyway both Paul and I merged with Linus today, so the hcall cleanup > and ehca are both upstream. It would be great if you could do a quick > check to make sure that ehca works in Linus's current git tree. I compiled Linus's git tree and did some basic tests successfully with ehca (ipoib, userspace, netpipe tcp/ib). Thanks! Nam Nguyen From delaitt at cpc.wmin.ac.uk Sun Sep 24 02:57:32 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Sun, 24 Sep 2006 10:57:32 +0100 (BST) Subject: [openib-general] problems with lustre o2ib module & ofed Message-ID: I get the following when loading lustre o2ib module. I'm using ofed-1.1 rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the kernel i'm using and lustre too. I don't understand why i get the following as i only have one version of the ib modules ? ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq ko2iblnd: disagrees about version of symbol ib_dereg_mr ko2iblnd: Unknown symbol ib_dereg_mr ko2iblnd: disagrees about version of symbol ib_destroy_cq ko2iblnd: Unknown symbol ib_destroy_cq ko2iblnd: disagrees about version of symbol ib_get_dma_mr ko2iblnd: Unknown symbol ib_get_dma_mr ko2iblnd: disagrees about version of symbol ib_alloc_pd ko2iblnd: Unknown symbol ib_alloc_pd ko2iblnd: disagrees about version of symbol ib_modify_qp ko2iblnd: Unknown symbol ib_modify_qp ko2iblnd: disagrees about version of symbol ib_dealloc_pd ko2iblnd: Unknown symbol ib_dealloc_pd LustreError: 5725:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256 n32:~/lustre-1.5.95 # ls -l /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ total 328 -rw-r--r-- 1 root root 13190 Sep 24 10:16 ib_addr.ko -rw-r--r-- 1 root root 37875 Sep 24 10:16 ib_cm.ko -rw-r--r-- 1 root root 57592 Sep 24 10:16 ib_core.ko -rw-r--r-- 1 root root 42829 Sep 24 10:16 ib_mad.ko -rw-r--r-- 1 root root 20095 Sep 24 10:16 ib_sa.ko -rw-r--r-- 1 root root 22930 Sep 24 10:16 ib_ucm.ko -rw-r--r-- 1 root root 21234 Sep 24 10:16 ib_umad.ko -rw-r--r-- 1 root root 45057 Sep 24 10:16 ib_uverbs.ko -rw-r--r-- 1 root root 29987 Sep 24 10:16 rdma_cm.ko -rw-r--r-- 1 root root 17669 Sep 24 10:16 rdma_ucm.ko n32:~/lustre-1.5.95 # ls -ld /usr/local/ofed drwxr-xr-x 9 root root 328 Sep 24 10:18 /usr/local/ofed Thierry. From rjwalsh at pathscale.com Sun Sep 24 10:45:01 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Sun, 24 Sep 2006 10:45:01 -0700 Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: References: Message-ID: <4516C41D.1060301@pathscale.com> Thierry Delaitre wrote: > I get the following when loading lustre o2ib module. I'm using ofed-1.1 > rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the > kernel i'm using and lustre too. I don't understand why i get the > following as i only have one version of the ib modules ? This explanation gets ugly :-) The short description is: you can't build external modules that depend on other external modules that you previously built. The reason why is: the kernel devel stuff ships with a file called Module.symvers, which contains all the version information for all the symbols in the kernel and in all the modules built when the kernel was built. When you build an external module, the kernel build stuff looks in here to get the version information for any symbol referenced that it can't find in the group of modules you're building. If you've replaced some modules with newer ones (like what happens when you install OFED-1.1, for example), then the symbol versions in the new modules will not match what's in the Module.symvers file. In your case, you installed a bunch of new modules (OFED-1.1) and then, in a second step, installed another new module (Lustre). The OFED-1.1 build was OK because all external symbols that it referenced (all of which are in vmlinux, I think) had properly matching version entries in Module.symvers. The Lustre build, however, was pulling ib_* symbols from the new OFED-1.1 modules that had mismatching symbol versions in Module.symvers from the original kernel modules (I don't remember if the kernel build warns about mismatching symbol versions at build time.) At insmod time, the kernel checks that the symbol versions of already-loaded modules match the expected versions in the to-be-loaded module. In your case, they will not. One solutions is: extract the kernel sources form the OFED-1.1 distribution, patch them as the OFED build script would, add in the Lustre bits and build the whole thing yourself manually. Another solution is: update the Module.symvers file. Neither is a terribly satisfactory solution. Regards, Robert. From jackm at dev.mellanox.co.il Sun Sep 24 23:40:47 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 25 Sep 2006 09:40:47 +0300 Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: References: Message-ID: <200609250940.47936.jackm@dev.mellanox.co.il> Did you recompile Lustre following the installation of ofed-1.1? I'm not familiar with the Lustre installation procedure (i.e., if it gets compiled on the current host). If yes, you probably merely need to uninstall and reinstall Lustre o2ib. - Jack On Sunday 24 September 2006 12:57, Thierry Delaitre wrote: > > I get the following when loading lustre o2ib module. I'm using ofed-1.1 > rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the > kernel i'm using and lustre too. I don't understand why i get the > following as i only have one version of the ib modules ? > > ko2iblnd: disagrees about version of symbol ib_create_cq > ko2iblnd: Unknown symbol ib_create_cq > ko2iblnd: disagrees about version of symbol ib_dereg_mr > ko2iblnd: Unknown symbol ib_dereg_mr > ko2iblnd: disagrees about version of symbol ib_destroy_cq > ko2iblnd: Unknown symbol ib_destroy_cq > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > ko2iblnd: Unknown symbol ib_get_dma_mr > ko2iblnd: disagrees about version of symbol ib_alloc_pd > ko2iblnd: Unknown symbol ib_alloc_pd > ko2iblnd: disagrees about version of symbol ib_modify_qp > ko2iblnd: Unknown symbol ib_modify_qp > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > ko2iblnd: Unknown symbol ib_dealloc_pd > LustreError: 5725:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND > o2ib, module ko2iblnd, rc=256 > > > n32:~/lustre-1.5.95 # ls -l > /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ > total 328 > -rw-r--r-- 1 root root 13190 Sep 24 10:16 ib_addr.ko > -rw-r--r-- 1 root root 37875 Sep 24 10:16 ib_cm.ko > -rw-r--r-- 1 root root 57592 Sep 24 10:16 ib_core.ko > -rw-r--r-- 1 root root 42829 Sep 24 10:16 ib_mad.ko > -rw-r--r-- 1 root root 20095 Sep 24 10:16 ib_sa.ko > -rw-r--r-- 1 root root 22930 Sep 24 10:16 ib_ucm.ko > -rw-r--r-- 1 root root 21234 Sep 24 10:16 ib_umad.ko > -rw-r--r-- 1 root root 45057 Sep 24 10:16 ib_uverbs.ko > -rw-r--r-- 1 root root 29987 Sep 24 10:16 rdma_cm.ko > -rw-r--r-- 1 root root 17669 Sep 24 10:16 rdma_ucm.ko > n32:~/lustre-1.5.95 # ls -ld /usr/local/ofed > drwxr-xr-x 9 root root 328 Sep 24 10:18 /usr/local/ofed > > Thierry. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Sun Sep 24 23:48:04 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 25 Sep 2006 09:48:04 +0300 Subject: [openib-general] RDMA CM callback status In-Reply-To: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com> References: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com> Message-ID: <45177BA4.2090001@voltaire.com> Sean Hefty wrote: >> 2. /* handle error out-of-line */ above means I record failure in my connection >> data structure, start teardown and drop the callback's reference on it. >> When the last reference goes, the connection data structure is queued for >> final destruction (including rdma_destroy_id(cmid)). >> >> Given that this might race with the callback's caller is this OK? > > Yes - The RDMA CM holds a reference on the cmid while in a callback, and drops > it once the callback returns. rdma_destroy_id() will block until all references > are released on the cmid. Eric, Just to make sure, please be aware to the node in rdma_cm.h telling that you are not allowed to call rdma_destroy_id() from the **context** of the cma callback (since as Sean explained above in that case the cma will block on a ref which would never reach zero). Or. From ogerlitz at voltaire.com Sun Sep 24 23:58:20 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 25 Sep 2006 09:58:20 +0300 Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: <200609250940.47936.jackm@dev.mellanox.co.il> References: <200609250940.47936.jackm@dev.mellanox.co.il> Message-ID: <45177E0C.1040101@voltaire.com> Jack Morgenstein wrote: > Did you recompile Lustre following the installation of ofed-1.1? > I'm not familiar with the Lustre installation procedure (i.e., if it > gets compiled on the current host). If yes, you probably merely need > to uninstall and reinstall Lustre o2ib. OK, can we state clearly what's the user needs to do with modules directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and hopefully more to come). Is it recompile / uninstall / install ??? Or. > On Sunday 24 September 2006 12:57, Thierry Delaitre wrote: >> I get the following when loading lustre o2ib module. I'm using ofed-1.1 >> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the >> kernel i'm using and lustre too. I don't understand why i get the >> following as i only have one version of the ib modules ? From jackm at dev.mellanox.co.il Sun Sep 24 23:58:11 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 25 Sep 2006 09:58:11 +0300 Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: <4516C41D.1060301@pathscale.com> References: <4516C41D.1060301@pathscale.com> Message-ID: <200609250958.11476.jackm@dev.mellanox.co.il> Robert, We build "external modules that depend on other external modules that you previously built" all the time in our regression testing -- and this runs properly under lots of distibutions and under lots of different linux kernels. we do not experience the problem you describe. We build kernel modules which exercise various installed OFED 1.1 kernel modules (ib_verbs, ib_mad, etc etc). We then load these kernel modules during our regression testing to verify the operation of the OFED 1.1 kernel modules. If the explanation you provide below is correct, our kernel module testing would not work at all. (We do not do any of the workarounds you described below). We have seen problems like the one described when either: a. The dependent external modules were not rebuilt following OFED installation. or b. There were old .ko files lying around which were loaded instead of the installed OFED .ko files. - Jack On Sunday 24 September 2006 20:45, Robert Walsh wrote: > This explanation gets ugly :-) > > The short description is: you can't build external modules that depend > on other external modules that you previously built. > > The reason why is: the kernel devel stuff ships with a file called > Module.symvers, which contains all the version information for all the > symbols in the kernel and in all the modules built when the kernel was > built. When you build an external module, the kernel build stuff looks > in here to get the version information for any symbol referenced that it > can't find in the group of modules you're building. If you've replaced > some modules with newer ones (like what happens when you install > OFED-1.1, for example), then the symbol versions in the new modules will > not match what's in the Module.symvers file. > > In your case, you installed a bunch of new modules (OFED-1.1) and then, > in a second step, installed another new module (Lustre). The OFED-1.1 > build was OK because all external symbols that it referenced (all of > which are in vmlinux, I think) had properly matching version entries in > Module.symvers. The Lustre build, however, was pulling ib_* symbols > from the new OFED-1.1 modules that had mismatching symbol versions in > Module.symvers from the original kernel modules (I don't remember if the > kernel build warns about mismatching symbol versions at build time.) > > At insmod time, the kernel checks that the symbol versions of > already-loaded modules match the expected versions in the to-be-loaded > module. In your case, they will not. > > One solutions is: extract the kernel sources form the OFED-1.1 > distribution, patch them as the OFED build script would, add in the > Lustre bits and build the whole thing yourself manually. > > Another solution is: update the Module.symvers file. > > Neither is a terribly satisfactory solution. > > Regards, > Robert. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From jackm at dev.mellanox.co.il Mon Sep 25 00:34:55 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 25 Sep 2006 10:34:55 +0300 Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: <45177E0C.1040101@voltaire.com> References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> Message-ID: <200609251034.56218.jackm@dev.mellanox.co.il> On Monday 25 September 2006 09:58, Or Gerlitz wrote: > Jack Morgenstein wrote: > > Did you recompile Lustre following the installation of ofed-1.1? > > I'm not familiar with the Lustre installation procedure (i.e., if it > > gets compiled on the current host). If yes, you probably merely need > > to uninstall and reinstall Lustre o2ib. > > OK, can we state clearly what's the user needs to do with modules > directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and > hopefully more to come). > > Is it recompile / uninstall / install ??? If possible: - recompile (make) and reinstall to kernel (make install) Lustre o2ib Otherwise: - uninstall and reinstall onto the host Lustre o2ib (assuming that the Lustre installation compiles its modules on that host during the installation process and then installs them, rather than just copying over pre-compiled modules to /lib/modules//drivers/kernel/infiniband > > > On Sunday 24 September 2006 12:57, Thierry Delaitre wrote: > >> I get the following when loading lustre o2ib module. I'm using ofed-1.1 > >> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the > >> kernel i'm using and lustre too. I don't understand why i get the > >> following as i only have one version of the ib modules ? This is a bit unclear. Was Lustre installed AFTER the OFED installation? - Jack From mst at mellanox.co.il Mon Sep 25 00:41:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Sep 2006 10:41:55 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: <1158850657.24776.158.camel@localhost> <62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il> Message-ID: <20060925074155.GB21836@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: NAPI > > > > So probably what we need is a feature bit in the struct ib_device > > > to say whether the peek CQ is needed or whether req notify will > > > generate events for existing CQEs. > > > Sounds good to me > > The biggest problem I have with this is that I don't know what to call > the feature bit. Any suggestions? Actually, the reason it is hard to come up with the name is that what this enables is the natural poll/request notification order. Maybe set bit for the lack of the feature? REQUIRES_POLL_AFTER_ARM? -- MST From delaitt at cpc.wmin.ac.uk Mon Sep 25 00:42:59 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Mon, 25 Sep 2006 08:42:59 +0100 (BST) Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: <45177E0C.1040101@voltaire.com> References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> Message-ID: On Mon, 25 Sep 2006, Or Gerlitz wrote: > Jack Morgenstein wrote: > > Did you recompile Lustre following the installation of ofed-1.1? > > I'm not familiar with the Lustre installation procedure (i.e., if it > > gets compiled on the current host). If yes, you probably merely need > > to uninstall and reinstall Lustre o2ib. > > OK, can we state clearly what's the user needs to do with modules > directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and > hopefully more to come). > > Is it recompile / uninstall / install ??? The issue is about the installation of Lustre 1.5.95 o2ib with OFED-1.1rc6 for SLES10. ofed-1.1-rc6 compiles nicely as shown below. The ib kernel modules all resides under /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ and do match the ones compiled by ofed. I have tried these steps several times. n32:~ # lsmod | grep ib libcfs 103060 1 lnet ib_ucm 19332 0 ib_addr 10756 1 rdma_cm ib_cm 31968 2 ib_ucm,rdma_cm ib_ipoib 48144 0 ib_sa 16652 3 rdma_cm,ib_cm,ib_ipoib ib_uverbs 38312 2 rdma_ucm,ib_ucm ib_umad 17968 0 ib_mthca 116240 0 ib_mad 36116 4 ib_cm,ib_sa,ib_umad,ib_mthca ib_core 49024 9 ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad I compiled lustre for the above kernel and ofed installation. I get the following when doing a 'lctl network up' in lustre. I have modversion set to on in the kernel. If i set it to 'n' then i get a null pointer exception and the module crashes. ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq ko2iblnd: disagrees about version of symbol ib_dereg_mr ko2iblnd: Unknown symbol ib_dereg_mr ko2iblnd: disagrees about version of symbol ib_destroy_cq ko2iblnd: Unknown symbol ib_destroy_cq ko2iblnd: disagrees about version of symbol ib_get_dma_mr ko2iblnd: Unknown symbol ib_get_dma_mr ko2iblnd: disagrees about version of symbol ib_alloc_pd ko2iblnd: Unknown symbol ib_alloc_pd ko2iblnd: disagrees about version of symbol ib_modify_qp ko2iblnd: Unknown symbol ib_modify_qp ko2iblnd: disagrees about version of symbol ib_dealloc_pd ko2iblnd: Unknown symbol ib_dealloc_pd LustreError: 5725:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256 I have tried with ofed-1.1-rc5 and experiences the same issue. Thierry. > Or. > > > On Sunday 24 September 2006 12:57, Thierry Delaitre wrote: > >> I get the following when loading lustre o2ib module. I'm using ofed-1.1 > >> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the > >> kernel i'm using and lustre too. I don't understand why i get the > >> following as i only have one version of the ib modules ? > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > ---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender. From mst at mellanox.co.il Mon Sep 25 01:00:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Sep 2006 11:00:51 +0300 Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> Message-ID: <20060925080051.GC21836@mellanox.co.il> Quoting r. Thierry Delaitre : > Subject: Re: problems with lustre o2ib module & ofed > > > On Mon, 25 Sep 2006, Or Gerlitz wrote: > > > Jack Morgenstein wrote: > > > Did you recompile Lustre following the installation of ofed-1.1? > > > I'm not familiar with the Lustre installation procedure (i.e., if it > > > gets compiled on the current host). If yes, you probably merely need > > > to uninstall and reinstall Lustre o2ib. > > > > OK, can we state clearly what's the user needs to do with modules > > directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and > > hopefully more to come). > > > > Is it recompile / uninstall / install ??? > > The issue is about the installation of Lustre 1.5.95 o2ib with OFED-1.1rc6 > for SLES10. > > ofed-1.1-rc6 compiles nicely as shown below. The ib kernel modules all > resides under /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ > and do match the ones compiled by ofed. I have tried these steps several > times. > > n32:~ # lsmod | grep ib > libcfs 103060 1 lnet > ib_ucm 19332 0 > ib_addr 10756 1 rdma_cm > ib_cm 31968 2 ib_ucm,rdma_cm > ib_ipoib 48144 0 > ib_sa 16652 3 rdma_cm,ib_cm,ib_ipoib > ib_uverbs 38312 2 rdma_ucm,ib_ucm > ib_umad 17968 0 > ib_mthca 116240 0 > ib_mad 36116 4 ib_cm,ib_sa,ib_umad,ib_mthca > ib_core 49024 9 > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad > > I compiled lustre for the above kernel and ofed installation. I get the > following when doing a 'lctl network up' in lustre. I have modversion set > to on in the kernel. If i set it to 'n' then i get a null pointer > exception and the module crashes. > > ko2iblnd: disagrees about version of symbol ib_create_cq > ko2iblnd: Unknown symbol ib_create_cq don't know anything about lustre, but note you must point build to pick up headers from /usr/local/ofed/src/openib/include/ *before* the built-in header includes. replace /usr/local/ofed with the prefix you specified. -- MST From delaitt at cpc.wmin.ac.uk Mon Sep 25 01:12:01 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Mon, 25 Sep 2006 09:12:01 +0100 (BST) Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: <20060925080051.GC21836@mellanox.co.il> References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> <20060925080051.GC21836@mellanox.co.il> Message-ID: On Mon, 25 Sep 2006, Michael S. Tsirkin wrote: > Quoting r. Thierry Delaitre : > > Subject: Re: problems with lustre o2ib module & ofed > > > > > > On Mon, 25 Sep 2006, Or Gerlitz wrote: > > > > > Jack Morgenstein wrote: > > > > Did you recompile Lustre following the installation of ofed-1.1? > > > > I'm not familiar with the Lustre installation procedure (i.e., if it > > > > gets compiled on the current host). If yes, you probably merely need > > > > to uninstall and reinstall Lustre o2ib. > > > > > > OK, can we state clearly what's the user needs to do with modules > > > directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and > > > hopefully more to come). > > > > > > Is it recompile / uninstall / install ??? > > > > The issue is about the installation of Lustre 1.5.95 o2ib with OFED-1.1rc6 > > for SLES10. > > > > ofed-1.1-rc6 compiles nicely as shown below. The ib kernel modules all > > resides under /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ > > and do match the ones compiled by ofed. I have tried these steps several > > times. > > > > n32:~ # lsmod | grep ib > > libcfs 103060 1 lnet > > ib_ucm 19332 0 > > ib_addr 10756 1 rdma_cm > > ib_cm 31968 2 ib_ucm,rdma_cm > > ib_ipoib 48144 0 > > ib_sa 16652 3 rdma_cm,ib_cm,ib_ipoib > > ib_uverbs 38312 2 rdma_ucm,ib_ucm > > ib_umad 17968 0 > > ib_mthca 116240 0 > > ib_mad 36116 4 ib_cm,ib_sa,ib_umad,ib_mthca > > ib_core 49024 9 > > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad > > > > I compiled lustre for the above kernel and ofed installation. I get the > > following when doing a 'lctl network up' in lustre. I have modversion set > > to on in the kernel. If i set it to 'n' then i get a null pointer > > exception and the module crashes. > > > > ko2iblnd: disagrees about version of symbol ib_create_cq > > ko2iblnd: Unknown symbol ib_create_cq > > don't know anything about lustre, but note you must > point build to pick up headers from > /usr/local/ofed/src/openib/include/ > *before* the built-in header includes. I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the lustre's configure line below. Lustre's configure script looks for a driver/infiniband directory which only seems to exist under /usr/local/ofed/src/openib-1.1 ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/ Thierry. > replace /usr/local/ofed with the prefix you specified. > -- > MST From delaitt at cpc.wmin.ac.uk Mon Sep 25 01:16:04 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Mon, 25 Sep 2006 09:16:04 +0100 (BST) Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: <200609251034.56218.jackm@dev.mellanox.co.il> References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> <200609251034.56218.jackm@dev.mellanox.co.il> Message-ID: On Mon, 25 Sep 2006, Jack Morgenstein wrote: > On Monday 25 September 2006 09:58, Or Gerlitz wrote: > > Jack Morgenstein wrote: > > > Did you recompile Lustre following the installation of ofed-1.1? > > > I'm not familiar with the Lustre installation procedure (i.e., if it > > > gets compiled on the current host). If yes, you probably merely need > > > to uninstall and reinstall Lustre o2ib. > > > > OK, can we state clearly what's the user needs to do with modules > > directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and > > hopefully more to come). > > > > Is it recompile / uninstall / install ??? > > If possible: > - recompile (make) and reinstall to kernel (make install) Lustre o2ib > > Otherwise: > - uninstall and reinstall onto the host Lustre o2ib (assuming that > the Lustre installation compiles its modules on that host > during the installation process and then installs them, > rather than just copying over pre-compiled modules to > /lib/modules//drivers/kernel/infiniband > > > > > On Sunday 24 September 2006 12:57, Thierry Delaitre wrote: > > >> I get the following when loading lustre o2ib module. I'm using ofed-1.1 > > >> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the > > >> kernel i'm using and lustre too. I don't understand why i get the > > >> following as i only have one version of the ib modules ? > > This is a bit unclear. Was Lustre installed AFTER the OFED installation? 1) patch kernel with lustre patches and recompile/install kernel 2) boot with new kernel 3) make + install ofed-1.1-rc6 4) depmod -a 5) compile + install lustre (lustre was installed after ofed installation) 6) depmod -a 7) modprobe lnet 8) lctl network up. in step (5), lustre was configured as follows: ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/ Thierry. From mst at mellanox.co.il Mon Sep 25 01:26:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Sep 2006 11:26:09 +0300 Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> <20060925080051.GC21836@mellanox.co.il> Message-ID: <20060925082609.GE21836@mellanox.co.il> Quoting r. Thierry Delaitre : > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the > lustre's configure line below. Lustre's configure script looks for a > driver/infiniband directory which only seems to exist under > /usr/local/ofed/src/openib-1.1 > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/ > > Thierry. > > > replace /usr/local/ofed with the prefix you specified. This looks wrong - openib-1.1 is the pristine sources. openib/include is the exported interface and is what you should use for dependent modules. No idea why would lustre need drivers/infiniband. Try creating a softlink: mkdir /usr/local/ofed/src/openib/drivers/infiniband ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband -- MST From delaitt at cpc.wmin.ac.uk Mon Sep 25 01:56:30 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Mon, 25 Sep 2006 09:56:30 +0100 (BST) Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: <20060925082609.GE21836@mellanox.co.il> References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> <20060925080051.GC21836@mellanox.co.il> <20060925082609.GE21836@mellanox.co.il> Message-ID: On Mon, 25 Sep 2006, Michael S. Tsirkin wrote: > Quoting r. Thierry Delaitre : > > > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the > > lustre's configure line below. Lustre's configure script looks for a > > driver/infiniband directory which only seems to exist under > > /usr/local/ofed/src/openib-1.1 > > > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/ > > > > Thierry. > > > > > replace /usr/local/ofed with the prefix you specified. > > This looks wrong - openib-1.1 is the pristine sources. > openib/include is the exported interface and is what you should use > for dependent modules. > No idea why would lustre need drivers/infiniband. > Try creating a softlink: > > mkdir /usr/local/ofed/src/openib/drivers/infiniband > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband I untarred lustre 1.5.95, compiled it (./configure --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a and still get the following: my modprobe.conf is the following options lnet ip2nets="o2ib0 161.74.83.[0-255]" lctl network up LNET configure error 100: Network is down ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq ko2iblnd: disagrees about version of symbol ib_dereg_mr ko2iblnd: Unknown symbol ib_dereg_mr ko2iblnd: disagrees about version of symbol ib_destroy_cq ko2iblnd: Unknown symbol ib_destroy_cq ko2iblnd: disagrees about version of symbol ib_get_dma_mr ko2iblnd: Unknown symbol ib_get_dma_mr ko2iblnd: disagrees about version of symbol ib_alloc_pd ko2iblnd: Unknown symbol ib_alloc_pd ko2iblnd: disagrees about version of symbol ib_modify_qp ko2iblnd: Unknown symbol ib_modify_qp ko2iblnd: disagrees about version of symbol ib_dealloc_pd ko2iblnd: Unknown symbol ib_dealloc_pd LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256 lsmod | grep ib libcfs 103060 1 lnet ib_ucm 19332 0 ib_addr 10756 1 rdma_cm ib_cm 31968 2 ib_ucm,rdma_cm ib_ipoib 48400 0 ib_sa 16652 3 rdma_cm,ib_cm,ib_ipoib ib_uverbs 38312 2 rdma_ucm,ib_ucm ib_umad 17968 0 ib_mthca 116240 0 ib_mad 36116 4 ib_cm,ib_sa,ib_umad,ib_mthca ib_core 49024 9 ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd d5dcb698 A __crc_ib_alloc_pd 0000001c r __kcrctab_ib_alloc_pd 0000006a r __kstrtab_ib_alloc_pd 00000038 r __ksymtab_ib_alloc_pd 00000c65 T ib_alloc_pd from lustre's config.log: configure:6500: checking whether to enable OpenIB gen2 support configure:6586: cp conftest.c build && make modules CC=gcc -f /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include M=/root/lustre-1.5.95/build /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration isn't a prototype /root/lustre-1.5.95/build/conftest.c: In function 'main': /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason' /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr' /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr' /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr' /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param' WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined! configure:6589: $? = 0 configure:6591: test -s build/conftest.o configure:6594: $? = 0 configure:6597: result: yes Thierry. From ogerlitz at voltaire.com Mon Sep 25 03:16:40 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 25 Sep 2006 13:16:40 +0300 Subject: [openib-general] timer_pending kernel assertion while stopping IPoIB In-Reply-To: References: Message-ID: <4517AC88.9080202@voltaire.com> Roland Dreier wrote: > Or> the kernel is net-2.6.19 git > > My first guess would be it's a bug introduced in the net-2.6.19 tree. > Can you reproduce it with plain 2.6.18 and/or my for-2.6.19 branch? OK, i will be able to test this with 2.6.18 later this week, as for doing so with your for-2.6.19 branch, is it sufficient to do (assuming the tree was cloned and now updated with git pull) $ git checkout -f for-2.6.19 to have the sources "state" be as of that branch? for example following doing so i don't see the amso1100 directory below drivers/infiniband/hw Or. From delaitt at cpc.wmin.ac.uk Mon Sep 25 04:49:56 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Mon, 25 Sep 2006 12:49:56 +0100 (BST) Subject: [openib-general] problems with lustre o2ib module & ofed In-Reply-To: References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> <20060925080051.GC21836@mellanox.co.il> <20060925082609.GE21836@mellanox.co.il> Message-ID: It seems that lustre puts its modules in /lib/modules/2.6.16.21-0.8-default despite the fact that my kernel is 2.6.16.21-0.8-smp ! uname -a Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux make[3]: Nothing to be done for `install-exec-am'. /bin/sh ../../mkinstalldirs /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre /usr/bin/install -c -m 644 lquota.ko /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre/lquota I therefore ends up with a /lib/modules/2.6.16.21-0.8-smp and /lib/modules/2.6.16.21-0.8-default i'm now searching why lustre thinks my kernel is 2.6.16.21-0.8-default and not 2.6.16.21-0.8-smp Thierry. On Mon, 25 Sep 2006, Thierry Delaitre wrote: > > On Mon, 25 Sep 2006, Michael S. Tsirkin wrote: > > > Quoting r. Thierry Delaitre : > > > > > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the > > > lustre's configure line below. Lustre's configure script looks for a > > > driver/infiniband directory which only seems to exist under > > > /usr/local/ofed/src/openib-1.1 > > > > > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/ > > > > > > Thierry. > > > > > > > replace /usr/local/ofed with the prefix you specified. > > > > This looks wrong - openib-1.1 is the pristine sources. > > openib/include is the exported interface and is what you should use > > for dependent modules. > > No idea why would lustre need drivers/infiniband. > > Try creating a softlink: > > > > mkdir /usr/local/ofed/src/openib/drivers/infiniband > > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband > > I untarred lustre 1.5.95, compiled it (./configure > --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a > and still get the following: > > my modprobe.conf is the following > > options lnet ip2nets="o2ib0 161.74.83.[0-255]" > > lctl network up > LNET configure error 100: Network is down > > ko2iblnd: disagrees about version of symbol ib_create_cq > ko2iblnd: Unknown symbol ib_create_cq > ko2iblnd: disagrees about version of symbol ib_dereg_mr > ko2iblnd: Unknown symbol ib_dereg_mr > ko2iblnd: disagrees about version of symbol ib_destroy_cq > ko2iblnd: Unknown symbol ib_destroy_cq > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > ko2iblnd: Unknown symbol ib_get_dma_mr > ko2iblnd: disagrees about version of symbol ib_alloc_pd > ko2iblnd: Unknown symbol ib_alloc_pd > ko2iblnd: disagrees about version of symbol ib_modify_qp > ko2iblnd: Unknown symbol ib_modify_qp > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > ko2iblnd: Unknown symbol ib_dealloc_pd > LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND > o2ib, module ko2iblnd, rc=256 > > lsmod | grep ib > libcfs 103060 1 lnet > ib_ucm 19332 0 > ib_addr 10756 1 rdma_cm > ib_cm 31968 2 ib_ucm,rdma_cm > ib_ipoib 48400 0 > ib_sa 16652 3 rdma_cm,ib_cm,ib_ipoib > ib_uverbs 38312 2 rdma_ucm,ib_ucm > ib_umad 17968 0 > ib_mthca 116240 0 > ib_mad 36116 4 ib_cm,ib_sa,ib_umad,ib_mthca > ib_core 49024 9 > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad > > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd > d5dcb698 A __crc_ib_alloc_pd > 0000001c r __kcrctab_ib_alloc_pd > 0000006a r __kstrtab_ib_alloc_pd > 00000038 r __ksymtab_ib_alloc_pd > 00000c65 T ib_alloc_pd > > from lustre's config.log: > > configure:6500: checking whether to enable OpenIB gen2 support > configure:6586: cp conftest.c build && make modules CC=gcc -f > /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX > _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include M=/root/lustre-1.5.95/build > /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration > isn't a prototype > /root/lustre-1.5.95/build/conftest.c: In function 'main': > /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason' > /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr' > /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr' > /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr' > /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param' > WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined! > configure:6589: $? = 0 > configure:6591: test -s build/conftest.o > configure:6594: $? = 0 > configure:6597: result: yes > > > Thierry. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > ---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender. From kliteyn at dev.mellanox.co.il Mon Sep 25 05:35:37 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 25 Sep 2006 15:35:37 +0300 Subject: [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c: In osm_mcmr_rcv_create_new_mgrp, fix exactly selectors in response In-Reply-To: <450F7D7E.8070408@mellanox.co.il> References: <450F7D7E.8070408@mellanox.co.il> Message-ID: <4517CD19.20700@dev.mellanox.co.il> Hi Hal. The patch looks ok. A few remarks thought: It appears that the multicast group mtu/rate selectors are actually not referenced by anyone - the SM/SA code implicitly assumes that they should be 'exact', and acts accordingly. Same goes for the response - the selectors that are filled in are hard-coded to 'exact'. This is the reason why the bug that this patch fixes has never appeared, and why fixing it will not change the SM behavior. But of course, it is better to have this fix anyway. -- Yevgeny > Subject: > [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c: In > osm_mcmr_rcv_create_new_mgrp, fix exactly selectors in response > From: > "Hal Rosenstock" > Date: > 18 Sep 2006 20:30:37 -0400 > To: > openib-general at openib.org > > To: > openib-general at openib.org > CC: > "Roland Dreier" > > > OpenSM/osm_sa_mcmember_record.c: In osm_mcmr_rcv_create_new_mgrp, set > exactly selectors after rather than before mgrp is initialized > > Pointed out by: Roland Dreier > > Signed-off-by: Hal Rosenstock > > Index: opensm/osm_sa_mcmember_record.c > =================================================================== > --- opensm/osm_sa_mcmember_record.c (revision 9347) > +++ opensm/osm_sa_mcmember_record.c (working copy) > @@ -1337,15 +1337,18 @@ osm_mcmr_rcv_create_new_mgrp( > goto Exit; > } > > - /* the mcmember_record should have mtu_sel, rate_sel and pkt_lifetime_sel = 2 */ > - (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */ > - (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */ > - (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */ > - > /* Initialize the mgrp */ > (*pp_mgrp)->mcmember_rec = mcm_rec; > (*pp_mgrp)->mcmember_rec.mlid = mlid; > > + /* the mcmember_record should have mtu_sel, rate_sel, and pkt_lifetime_sel = 2 */ > + (*pp_mgrp)->mcmember_rec.mtu &= 0x3f; > + (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */ > + (*pp_mgrp)->mcmember_rec.rate &= 0x3f; > + (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */ > + (*pp_mgrp)->mcmember_rec.pkt_life &= 0x3f; > + (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */ > + > /* Insert the new group in the data base */ > > /* since we might have an old group by that mlid > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From eli at dev.mellanox.co.il Mon Sep 25 05:58:30 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Mon, 25 Sep 2006 15:58:30 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: <1158850657.24776.158.camel@localhost> <62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il> Message-ID: <1159189110.26523.12.camel@localhost> I experimented with your patch and could not find any noticeable change in BW or interrupt rate so I guess we can use your ideas. Still, note that NAPI_howto.txt does not read *budget to limit the number of polls and the code bellow from kernel 2.6.17.7 takes into account that budget can become negative. static void net_rx_action(struct softirq_action *h) { struct softnet_data *queue = &__get_cpu_var(softnet_data); unsigned long start_time = jiffies; int budget = netdev_budget; void *have; local_irq_disable(); while (!list_empty(&queue->poll_list)) { struct net_device *dev; if (budget <= 0 || jiffies - start_time > 1) goto softnet_break; ... } >I took a stab at implementing this myself, and it > > You might want to respin your patch against my for-2.6.19 branch Do you think I should work on this or you plan to push your code? >> The biggest problem I have with this is that I don't know what to call >> the feature bit. Any suggestions? >Maybe set bit for the lack of the feature? REQUIRES_POLL_AFTER_ARM? We can take Michael's suggestion or use NOT_REQUIRES_POLL_AFTER_ARM so we can implement this for mthca without touching ipath or ehca at the first step. From dotanb at dev.mellanox.co.il Mon Sep 25 06:06:17 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 25 Sep 2006 16:06:17 +0300 Subject: [openib-general] max message size for IB_WR_SEND In-Reply-To: References: <4512244D.4040404@dev.mellanox.co.il> Message-ID: <4517D449.60502@dev.mellanox.co.il> amit byron wrote: > Dotan Barak dev.mellanox.co.il> writes: > > >> Hi. >> >> amit byron wrote: >> >>> hi, >>> >>> if i evoke/call ib_post_send(IB_WR_SEND) with message >>> size 512 bytes, the message gets received on the >>> peer (second) node. the 2 nodes are connected point-to >>> -point. >>> >>> but if message size is increased to 4096 bytes then >>> second node receives the message; but message content >>> is missing (empty). >>> >>> won't infiniband stack break down message in smaller >>> chunks and assemble on peer node? >>> >>> thanks, >>> Amit. >>> >>> >> Which transport type are you using? >> if you are using a UD QP, then the answer is no. >> for any other transport type, the answer is yes (the message is being >> break down to packets with the MTU side as specified in the QP context. >> >> maybe you have a different problem in you code. did you check the >> completion status in both of the nodes? >> >> Dotan >> >> >> > > i'm using RC connection. the issue seems to occur only when > running in xen's domain 0 (xen0). on core linux kernel, the > code works -- i'm able to do both send message and perform > rdma write with size greater than 4096. > > i don't see any errors reported while sending a message with > size greater than 4096 (same hold true for rdma write). > > i'm able send message (greater than 4096 bytes) from code > running in core linux kernel to peer node code that is > running in xen's domain 0. > > this suggest that there is some hard-limit that prevents > infiniband to send message; but no errors are reported > from infiniband stack. > > any suggestions on how to enable tracing in hca driver? > > thanks, > Amit. > 1) You can use perfquery in the sender/receiver host to find how much data/packets were sent/received. 2) why does the number 4096 is so important? maybe the problem happens when using message size > MTU ... which MTU do you use in the QP? maybe you should try to send a message with the size of MTU + 1 bytes and check the result ... Dotan From eli at dev.mellanox.co.il Mon Sep 25 06:16:39 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Mon, 25 Sep 2006 16:16:39 +0300 Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: References: <1158850592.24776.156.camel@localhost> <61399.85.250.167.59.1158860279.squirrel@dev.mellanox.co.il> Message-ID: <1159190199.26523.18.camel@localhost> On Thu, 2006-09-21 at 19:31 -0700, harish wrote: > How did the CPU utilizations compare for the NAPI vs. no NAPI case? > What are your thoughts on what bottleneck you are hitting? > The CPU utilization reported by netperf is not accurate since it is not reported on a per cpu basis and I don't have one number to reliably describe utilization. I think the bottleneck is that the CPU that handles the softirq that handles the rx sk_buffs is 100% utilized and it dictates the limit. From kliteyn at dev.mellanox.co.il Mon Sep 25 06:12:00 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 25 Sep 2006 16:12:00 +0300 Subject: [openib-general] [PATCH] osm: cosmetic changes in osmtest multicast flow Message-ID: Hi Hal This patch is all about cosmetics - it improves the osmtest log readability, and it also has some cosmetic additions in the code. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmtest/osmt_multicast.c =================================================================== --- osmtest/osmt_multicast.c (revision 9622) +++ osmtest/osmt_multicast.c (working copy) @@ -54,6 +54,9 @@ #include #include "osmtest.h" +/********************************************************************** + **********************************************************************/ + static cl_status_t __match_mgids( @@ -76,6 +79,9 @@ __match_mgids( } +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_query_mcast( IN osmtest_t * const p_osmt ) { ib_api_status_t status = IB_SUCCESS; @@ -219,6 +225,9 @@ osmt_query_mcast( IN osmtest_t * const p return ( status ); } +/********************************************************************** + **********************************************************************/ + /* given a multicast request send and wait for response. */ ib_api_status_t osmt_send_mcast_request( IN osmtest_t * const p_osmt, @@ -334,6 +343,9 @@ osmt_send_mcast_request( IN osmtest_t * } +/********************************************************************** + **********************************************************************/ + void osmt_init_mc_query_rec(IN osmtest_t * const p_osmt, IN OUT ib_member_rec_t *p_mc_req) { @@ -702,9 +714,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + /* no MGID */ memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ @@ -727,9 +738,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != IB_SA_MAD_STATUS_INSUF_COMPS) { @@ -749,9 +759,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + /* no MGID */ memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ @@ -774,9 +783,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != IB_SA_MAD_STATUS_INSUF_COMPS) { @@ -803,10 +811,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Join with insufficient comp mask - flow label (o15.0.1.3)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -828,9 +834,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != IB_SA_MAD_STATUS_INSUF_COMPS) { @@ -854,9 +858,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER) ; @@ -878,9 +880,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != @@ -905,9 +905,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + /* no MGID */ /* memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); */ /* Request Join */ @@ -930,9 +929,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != @@ -1228,9 +1225,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Create given MGID=0 skip service level (o15.0.1.4)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); osmt_init_mc_query_rec(p_osmt, &mc_req_rec); @@ -1258,9 +1253,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != @@ -1311,9 +1304,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); osmt_init_mc_query_rec(p_osmt, &mc_req_rec); @@ -1342,9 +1333,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != @@ -1368,9 +1357,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Create given MGID=0 skip TClass (o15.0.1.4)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); osmt_init_mc_query_rec(p_osmt, &mc_req_rec); @@ -1400,9 +1387,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != @@ -1887,18 +1872,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + mc_req_rec.mgid.raw[0] = 0xFA; status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { @@ -1919,9 +1901,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid.raw[0] = 0xFF; mc_req_rec.mgid.raw[3] = 0x1B; comp_mask = comp_mask | IB_MCR_COMPMASK_SCOPE; @@ -1932,9 +1912,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -1955,9 +1934,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = good_mgid; @@ -1969,9 +1946,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2034,9 +2010,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = good_mgid; @@ -2048,9 +2022,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { @@ -2112,9 +2084,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = good_mgid; mc_req_rec.mgid.raw[12] = 0xFF; mc_req_rec.scope_state = 0x22; /* link-local scope */ @@ -2124,9 +2094,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2171,9 +2140,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); /* We have created a new MCG so now we need different mgid when cresting group otherwise it will be counted as join request .*/ mc_req_rec.mgid = good_mgid; mc_req_rec.mgid.raw[12] = 0xFC; @@ -2185,9 +2152,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2462,9 +2428,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking BAD RATE when connecting to existing MGID (o15.0.1.13)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = good_mgid; mc_req_rec.rate = @@ -2487,9 +2451,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2509,9 +2472,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "existing MGID (o15.0.1.13)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = osm_ipoib_mgid; mc_req_rec.mtu = @@ -2534,9 +2495,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2556,9 +2516,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "to existing MGID (o15.0.1.13)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = osm_ipoib_mgid; mc_req_rec.mtu = @@ -2581,9 +2539,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2697,18 +2654,16 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Delete by trying to Join deleted group (o15.0.1.14)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + mc_req_rec.scope_state = 0x22; /* use non member - so if no group fail */ status = osmt_send_mcast_request( p_osmt, 1, /* join */ &mc_req_rec, comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2727,9 +2682,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking BAD Delete of Mgid membership (no prev join) (o15.0.1.15)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = osm_ipoib_mgid; mc_req_rec.rate = @@ -2742,9 +2695,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2821,9 +2773,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = osm_ipoib_mgid; mc_req_rec.rate = @@ -2836,9 +2786,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if ((status != IB_REMOTE_ERROR) || (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -2896,9 +2845,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* impossible requested mtu always greater than exist in MCG */ osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + mc_req_rec.mtu = IB_MTU_LEN_4096 | IB_PATH_SELECTOR_GREATER_THAN << 6; memcpy(&mc_req_rec.mgid,&tmp_mgid,sizeof(ib_gid_t)); ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -2914,9 +2862,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " - " Expected Errors: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n" - ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status == IB_SUCCESS) { osm_log( &p_osmt->log, OSM_LOG_ERROR, From delaitt at cpc.wmin.ac.uk Mon Sep 25 07:01:33 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Mon, 25 Sep 2006 15:01:33 +0100 (BST) Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib module & ofed In-Reply-To: References: <200609250940.47936.jackm@dev.mellanox.co.il> <45177E0C.1040101@voltaire.com> <20060925080051.GC21836@mellanox.co.il> <20060925082609.GE21836@mellanox.co.il> Message-ID: On Mon, 25 Sep 2006, Thierry Delaitre wrote: > > It seems that lustre puts its modules in /lib/modules/2.6.16.21-0.8-default > despite the fact that my kernel is 2.6.16.21-0.8-smp ! > > uname -a > Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux > > make[3]: Nothing to be done for `install-exec-am'. > /bin/sh ../../mkinstalldirs /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre > /usr/bin/install -c -m 644 lquota.ko /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre/lquota > > I therefore ends up with a /lib/modules/2.6.16.21-0.8-smp and > /lib/modules/2.6.16.21-0.8-default > > i'm now searching why lustre thinks my kernel is 2.6.16.21-0.8-default and > not 2.6.16.21-0.8-smp I've updated the UTS_RELEASE string in /usr/src/linux-2.6.16.21-0.8/include/linux/version.h from default to smp and deleted my /lib/modules/ lustre now installs in /lib/modules/2.6.16.21-0.8-smp/kernel along with ofed ib drivers. i recompiled the kernel, ofed and lustre and still gets this: ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq ko2iblnd: disagrees about version of symbol ib_dereg_mr ko2iblnd: Unknown symbol ib_dereg_mr ko2iblnd: disagrees about version of symbol ib_destroy_cq ko2iblnd: Unknown symbol ib_destroy_cq ko2iblnd: disagrees about version of symbol ib_get_dma_mr ko2iblnd: Unknown symbol ib_get_dma_mr ko2iblnd: disagrees about version of symbol ib_alloc_pd ko2iblnd: Unknown symbol ib_alloc_pd ko2iblnd: disagrees about version of symbol ib_modify_qp ko2iblnd: Unknown symbol ib_modify_qp ko2iblnd: disagrees about version of symbol ib_dealloc_pd ko2iblnd: Unknown symbol ib_dealloc_pd LustreError: 7430:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256 nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_create_cq 3cfe7afa A __crc_ib_create_cq 00000060 r __kcrctab_ib_create_cq 0000015f r __kstrtab_ib_create_cq 000000c0 r __ksymtab_ib_create_cq 00000d50 T ib_create_cq i'm a bit stuck! Thierry. > Thierry. > > On Mon, 25 Sep 2006, Thierry Delaitre wrote: > > > > > On Mon, 25 Sep 2006, Michael S. Tsirkin wrote: > > > > > Quoting r. Thierry Delaitre : > > > > > > > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the > > > > lustre's configure line below. Lustre's configure script looks for a > > > > driver/infiniband directory which only seems to exist under > > > > /usr/local/ofed/src/openib-1.1 > > > > > > > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/ > > > > > > > > Thierry. > > > > > > > > > replace /usr/local/ofed with the prefix you specified. > > > > > > This looks wrong - openib-1.1 is the pristine sources. > > > openib/include is the exported interface and is what you should use > > > for dependent modules. > > > No idea why would lustre need drivers/infiniband. > > > Try creating a softlink: > > > > > > mkdir /usr/local/ofed/src/openib/drivers/infiniband > > > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband > > > > I untarred lustre 1.5.95, compiled it (./configure > > --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a > > and still get the following: > > > > my modprobe.conf is the following > > > > options lnet ip2nets="o2ib0 161.74.83.[0-255]" > > > > lctl network up > > LNET configure error 100: Network is down > > > > ko2iblnd: disagrees about version of symbol ib_create_cq > > ko2iblnd: Unknown symbol ib_create_cq > > ko2iblnd: disagrees about version of symbol ib_dereg_mr > > ko2iblnd: Unknown symbol ib_dereg_mr > > ko2iblnd: disagrees about version of symbol ib_destroy_cq > > ko2iblnd: Unknown symbol ib_destroy_cq > > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > > ko2iblnd: Unknown symbol ib_get_dma_mr > > ko2iblnd: disagrees about version of symbol ib_alloc_pd > > ko2iblnd: Unknown symbol ib_alloc_pd > > ko2iblnd: disagrees about version of symbol ib_modify_qp > > ko2iblnd: Unknown symbol ib_modify_qp > > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > > ko2iblnd: Unknown symbol ib_dealloc_pd > > LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND > > o2ib, module ko2iblnd, rc=256 > > > > lsmod | grep ib > > libcfs 103060 1 lnet > > ib_ucm 19332 0 > > ib_addr 10756 1 rdma_cm > > ib_cm 31968 2 ib_ucm,rdma_cm > > ib_ipoib 48400 0 > > ib_sa 16652 3 rdma_cm,ib_cm,ib_ipoib > > ib_uverbs 38312 2 rdma_ucm,ib_ucm > > ib_umad 17968 0 > > ib_mthca 116240 0 > > ib_mad 36116 4 ib_cm,ib_sa,ib_umad,ib_mthca > > ib_core 49024 9 > > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad > > > > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd > > d5dcb698 A __crc_ib_alloc_pd > > 0000001c r __kcrctab_ib_alloc_pd > > 0000006a r __kstrtab_ib_alloc_pd > > 00000038 r __ksymtab_ib_alloc_pd > > 00000c65 T ib_alloc_pd > > > > from lustre's config.log: > > > > configure:6500: checking whether to enable OpenIB gen2 support > > configure:6586: cp conftest.c build && make modules CC=gcc -f > > /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX > > _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include M=/root/lustre-1.5.95/build > > /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration > > isn't a prototype > > /root/lustre-1.5.95/build/conftest.c: In function 'main': > > /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason' > > /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr' > > /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr' > > /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr' > > /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param' > > WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined! > > configure:6589: $? = 0 > > configure:6591: test -s build/conftest.o > > configure:6594: $? = 0 > > configure:6597: result: yes > > > > > > Thierry. > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > ---------------------------------------- > Dr Thierry DELAITRE > Systems and Services Manager, CSCS > University of Westminster > 115 New Cavendish Street, London W1W 6UW > > Tel: 020 7911 5000 ext: 3586 > Fax: 020 7911 5089 > Mobile short dial code 1788 > > http://www.cscs.wmin.ac.uk/~delaitt > ---------------------------------------- > > This e-mail and its attachments are intended for the above named only > and may be confidential. If they have come to you in error you must > not copy or show them to anyone, nor should you take any action based > on them, other than to notify the error by replying to the sender. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > ---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender. From dotanb at dev.mellanox.co.il Mon Sep 25 07:04:50 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 25 Sep 2006 17:04:50 +0300 Subject: [openib-general] [cma] the private data length that arrives with the event RDMA_CM_EVENT_CONNECT_REQUEST is false Message-ID: <4517E202.8080509@dev.mellanox.co.il> Hi Sean. I'm using the following configuration: ************************************************************* Host Architecture : x86_64 Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 3) Kernel Version : 2.6.9-34.ELsmp GCC Version : gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2) Memory size : 4039892 kB Driver Version : gen2_linux-20060922-1700 (REV=9611) HCA ID(s) : mthca0 HCA model(s) : 25208 FW version(s) : 4.7.927 Board(s) : MT_00A0010001 ************************************************************* I have 2 sides: The first side calls rdma_connect() with private data (and private_data_len != 0) The second side wait for the RDMA_CM_EVENT_CONNECT_REQUEST event and check the private_data_len. The problem is that the private_data_len in the second side (receiver) is not equal to the sent data (length). Can you please check this issue? Thanks Dotan From rdreier at cisco.com Mon Sep 25 07:11:52 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 07:11:52 -0700 Subject: [openib-general] [cma] the private data length that arrives with the event RDMA_CM_EVENT_CONNECT_REQUEST is false In-Reply-To: <4517E202.8080509@dev.mellanox.co.il> (Dotan Barak's message of "Mon, 25 Sep 2006 17:04:50 +0300") References: <4517E202.8080509@dev.mellanox.co.il> Message-ID: Dotan> The problem is that the private_data_len in the second side Dotan> (receiver) is not equal to the sent data (length). How do you expect the private data length to be passed from one side to the other? There is no such field in the CM protocol. The only thing the RDMA CM can do is pass the maximum possible private data length to the passive consumer. - R. From rdreier at cisco.com Mon Sep 25 07:29:53 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 07:29:53 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060925074155.GB21836@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 25 Sep 2006 10:41:55 +0300") References: <1158850657.24776.158.camel@localhost> <62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il> <20060925074155.GB21836@mellanox.co.il> Message-ID: Michael> Actually, the reason it is hard to come up with the name Michael> is that what this enables is the natural poll/request Michael> notification order. Over the weekend and I thought about this and I came up with an idea I kind of like, inspired by Todd Rimmer's comments about poll-and-notify. We could change ib_req_notify_cq() to have an extra parameter: static inline int ib_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify, int *lost_event_possible) and if non-NULL is passed in for lost_event_possible, then req_notify_cq should do the equivalent of a CQ peek after arming the CQ event. Of course mthca would just set *lost_event_possible to 0 without needing to do any check. (The reason I make it a pointer parameter rather than just using the return value is so that consumers don't need to take the potential cost of a CQ peek on devices where arming a CQ is cheap but peeking in a CQ might require an extra lock or something). What do you think? - R. From dotanb at dev.mellanox.co.il Mon Sep 25 07:36:09 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 25 Sep 2006 17:36:09 +0300 Subject: [openib-general] [cma] the private data length that arrives with the event RDMA_CM_EVENT_CONNECT_REQUEST is false In-Reply-To: References: <4517E202.8080509@dev.mellanox.co.il> Message-ID: <4517E959.8010306@dev.mellanox.co.il> Roland Dreier wrote: > Dotan> The problem is that the private_data_len in the second side > Dotan> (receiver) is not equal to the sent data (length). > > How do you expect the private data length to be passed from one side > to the other? There is no such field in the CM protocol. > > The only thing the RDMA CM can do is pass the maximum possible private > data length to the passive consumer. > > - R. > You are right, the CM should support only private data (according to the IB spec chapter 12). The CMA implementation in the gen2 have the attribute private_data_len in the rdma_cm_event structure. So, what is the purpose of private_data_len (in the event structure)? thanks Dotan From swise at opengridcomputing.com Mon Sep 25 07:46:41 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 25 Sep 2006 09:46:41 -0500 Subject: [openib-general] [cma] the private data length that arrives with the event RDMA_CM_EVENT_CONNECT_REQUEST is false In-Reply-To: <4517E959.8010306@dev.mellanox.co.il> References: <4517E202.8080509@dev.mellanox.co.il> <4517E959.8010306@dev.mellanox.co.il> Message-ID: <1159195602.7283.2.camel@stevo-desktop> For iWARP, the private data length is a field in the MPA startup packets and thus can be passed up to the consumer in connect request events and connect reply events. On Mon, 2006-09-25 at 17:36 +0300, Dotan Barak wrote: > Roland Dreier wrote: > > Dotan> The problem is that the private_data_len in the second side > > Dotan> (receiver) is not equal to the sent data (length). > > > > How do you expect the private data length to be passed from one side > > to the other? There is no such field in the CM protocol. > > > > The only thing the RDMA CM can do is pass the maximum possible private > > data length to the passive consumer. > > > > - R. > > > You are right, the CM should support only private data (according to the > IB spec chapter 12). > The CMA implementation in the gen2 have the attribute private_data_len > in the rdma_cm_event structure. > > So, what is the purpose of private_data_len (in the event structure)? > > thanks > Dotan > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Mon Sep 25 07:54:27 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Sep 2006 17:54:27 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20060925145427.GB23882@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: NAPI > > Michael> Actually, the reason it is hard to come up with the name > Michael> is that what this enables is the natural poll/request > Michael> notification order. > > Over the weekend and I thought about this and I came up with an idea I > kind of like, inspired by Todd Rimmer's comments about poll-and-notify. > > We could change ib_req_notify_cq() to have an extra parameter: > > static inline int ib_req_notify_cq(struct ib_cq *cq, > enum ib_cq_notify cq_notify, > int *lost_event_possible) > > and if non-NULL is passed in for lost_event_possible, then > req_notify_cq should do the equivalent of a CQ peek after arming the > CQ event. I thought about this too. But this has a disadvantage over the device-wide flag: when flag is device-wide, we can just have 2 polling routines - with and without peek - and select the correct one at device open depending on the hardware capabilities. Thus we can avoid a conditional branch on the fast path, which I think is nice. So I think if we want to enable mthca-specific optimization, the righ tway is with device flags. On a separate note - ib_req_notify_cq is also testing the lost_event_possible flag - so now we have 2 conditional branches on fast path, and this hurts all ULPs. Ugh. If we extend the interface, I would rather make a new call ib_req_notify_and_peek_cq(truct ib_cq *cq, enum ib_cq_notify cq_notify) that returns 0 on empty CQ, 1 on non-empty and negative on error. -- MST From trimmer at silverstorm.com Mon Sep 25 09:27:23 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 25 Sep 2006 12:27:23 -0400 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060925145427.GB23882@mellanox.co.il> Message-ID: > From: Michael S. Tsirkin > Sent: Monday, September 25, 2006 10:54 AM > To: Roland Dreier > Cc: openib-general at openib.org > Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI > > Quoting r. Roland Dreier : > > Subject: Re: [PATCH] IB/ipoib: NAPI > > > > Michael> Actually, the reason it is hard to come up with the name > > Michael> is that what this enables is the natural poll/request > > Michael> notification order. > > > > Over the weekend and I thought about this and I came up with an idea I > > kind of like, inspired by Todd Rimmer's comments about poll-and-notify. > > > > We could change ib_req_notify_cq() to have an extra parameter: > > > > static inline int ib_req_notify_cq(struct ib_cq *cq, > > enum ib_cq_notify cq_notify, > > int *lost_event_possible) > > > > and if non-NULL is passed in for lost_event_possible, then > > req_notify_cq should do the equivalent of a CQ peek after arming the > > CQ event. > > I thought about this too. > > But this has a disadvantage over the device-wide flag: when flag is > device-wide, > we can just have 2 polling routines - with and without peek - and select > the > correct one at device open depending on the hardware capabilities. > Thus we can avoid a conditional branch on the fast path, > which I think is nice. > > So I think if we want to enable mthca-specific optimization, > the righ tway is with device flags. > > On a separate note - ib_req_notify_cq is also testing the > lost_event_possible flag - > so now we have 2 conditional branches on fast path, and this hurts all > ULPs. Ugh. > > If we extend the interface, I would rather make a new call > ib_req_notify_and_peek_cq(truct ib_cq *cq, enum ib_cq_notify > cq_notify) > that returns 0 on empty CQ, 1 on non-empty and negative on error. > > -- > MST > Its inefficient to peek the CQ if the next operation is likely to then be a poll. Performing the poll_and_notify in one call is more efficient. Then if you use poll_and_notify instead of poll_cq in the polling loops, you can also be equally efficient for all HCA models without needing a hardware capability flag and 2 polling algorithms in each ULP. Instead the HCA driver naturally provides the most efficient approach and all callers use the same algorithm. In the examples below, lets assume 2 CQEs are returned, then its rearmed and CQ is still empty afterward. For example on Mellanox HCAs the actual sequence would be: poll_and_notify returns a CQE, tells caller to call it again poll_and_notify returns a CQE, tells caller to call it again poll_and_notify finds CQ empty, rearms CQ, tells caller its done [note no peek needed] 3 Driver calls, 3 CQE access, 1 rearm For other HCAs the actual sequence would be: poll_and_notify returns a CQE, tells caller to call it again poll_and_notify returns a CQE, tells caller to call it again poll_and_notify finds CQ empty, rearms CQ, peeks CQ if CQ empty, tells caller its done [for this example, its true] if CQ not empty, tells caller to loop on poll_cq 3 Driver calls, 4 CQE access, 1 rearm In comparison the present code (or with a device capability flag) is: poll_cq returns a CQE poll_cq returns a CQE poll_cq finds CQ empty notify_cq rearms CQ if non-Mellanox HCA poll_cq - finds CQ empty 4-5 Driver calls, 3-4 CQE access, 1 rearm With notify with an internal peek (lost event flag approach) its: poll_cq returns a CQE poll_cq returns a CQE poll_cq finds CQ empty notify_cq rearms CQ, for non-mellanox HCA, peeks CQ - finds CQ empty if lost events indicated [for this example its false] poll_cq til empty 4-5 Driver calls, 3-4 CQE access, 1 rearm Hence for all HCA models, the poll_and_notify approach has fewer driver calls. (3 in above example, compared to 4 for other approaches). In general driver calls are going to be the expensive factor in this comparison. The main difference in all the above examples will be the spin_lock for the CQ. Depending on HCA design, the poll_cq and/or notify_cq and/or peek_cq operations may also incur an expensive PCI bus read or write. However, with the exception of the notify w/peek approach, those costs are the same for all the above examples. In the case (not shown above) where there was 1 additional CQE found after the rearm [applicable only to non-Mellanox HCAs], the poll_and_notify approach will also save 1 CQE access as compared to the notify w/internal peek approach. Todd Rimmer From rdreier at cisco.com Mon Sep 25 09:41:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 09:41:26 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: (Todd Rimmer's message of "Mon, 25 Sep 2006 12:27:23 -0400") References: Message-ID: Todd> Its inefficient to peek the CQ if the next operation is Todd> likely to then be a poll. Performing the poll_and_notify in Todd> one call is more efficient. Yes, but if you think carefully about how to implement NAPI for IPoIB, you'll see that poll-and-notify is not a useful operation. If a device does not support the "exact" Mellanox CQ notify semantics, then there is no way around using peek CQ somehow. - R. From rdreier at cisco.com Mon Sep 25 09:45:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 09:45:58 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060925145427.GB23882@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 25 Sep 2006 17:54:27 +0300") References: <20060925145427.GB23882@mellanox.co.il> Message-ID: Michael> But this has a disadvantage over the device-wide flag: Michael> when flag is device-wide, we can just have 2 polling Michael> routines - with and without peek - and select the correct Michael> one at device open depending on the hardware Michael> capabilities. Thus we can avoid a conditional branch on Michael> the fast path, which I think is nice. Yeah, but I can't make up my mind whether two polling routines is a good thing or a bad thing. We get a very specific optimization, but we have two copies of the same code then. Michael> On a separate note - ib_req_notify_cq is also testing the Michael> lost_event_possible flag - so now we have 2 conditional Michael> branches on fast path, and this hurts all ULPs. Ugh. I suspect that the cost here is minimal -- lost_event_possible is going to be in a register, etc. Michael> If we extend the interface, I would rather make a new Michael> call ib_req_notify_and_peek_cq(truct ib_cq *cq, enum Michael> ib_cq_notify cq_notify) that returns 0 on empty CQ, 1 on Michael> non-empty and negative on error. And again, I don't want to make the interface too fat... There are a few of tradeoffs here: microoptimization vs. maintainability, IPoIB & NAPI vs. all other ULPs... - R. From mst at mellanox.co.il Mon Sep 25 10:58:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Sep 2006 20:58:54 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20060925175854.GA25001@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: NAPI > > Michael> But this has a disadvantage over the device-wide flag: > Michael> when flag is device-wide, we can just have 2 polling > Michael> routines - with and without peek - and select the correct > Michael> one at device open depending on the hardware > Michael> capabilities. Thus we can avoid a conditional branch on > Michael> the fast path, which I think is nice. > > Yeah, but I can't make up my mind whether two polling routines is a > good thing or a bad thing. We get a very specific optimization, but > we have two copies of the same code then. Well, with a flag the ULP can decide what it wants to do, we are not forcing anything here. > Michael> On a separate note - ib_req_notify_cq is also testing the > Michael> lost_event_possible flag - so now we have 2 conditional > Michael> branches on fast path, and this hurts all ULPs. Ugh. > > I suspect that the cost here is minimal -- lost_event_possible is > going to be in a register, etc. Hmm, since we are passing it by pointer to a function called through a pointer, I don't see how can gcc move it out of memory into register. Am I wrong? > Michael> If we extend the interface, I would rather make a new > Michael> call ib_req_notify_and_peek_cq(truct ib_cq *cq, enum > Michael> ib_cq_notify cq_notify) that returns 0 on empty CQ, 1 on > Michael> non-empty and negative on error. > > And again, I don't want to make the interface too fat... Well, lots of flags that you are required to implement amounts to the same thing from low level driver developer perspective, isn't that right? > There are a few of tradeoffs here: microoptimization > vs. maintainability, IPoIB & NAPI vs. all other ULPs... I just find a flag + conditional peek a much simpler approach. Since all our testing is done on mthca anyway, almost all approaches amount to doing a NOP in various ways for us. So I would suggest - get Eli's patch with simple flag into shape & working on all hardware, push into git. - people interested in specific hardware test performance and propose patches to improve it even further. Does this sound good? -- MST From ardavis at ichips.intel.com Mon Sep 25 11:37:53 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 25 Sep 2006 11:37:53 -0700 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <4514510F.3050400@ichips.intel.com> References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> <4514510F.3050400@ichips.intel.com> Message-ID: <45182201.4000105@ichips.intel.com> Arlin Davis wrote: >Sean Hefty wrote: > > > >>Currently a DREP is only sent in response to a DREQ if a connection >>has been found matching the DREQ, and it is in the proper state. Once >>a DREP is sent, the local connection moves into timewait. Duplicate >>DREQs received while in this state result in re-sending the DREP. >> >>However, it's likely that the local connection will enter and exit >>timewait before the remote side times out a lost DREP and resends a DREQ. >>There are a couple possible solutions to this. One is to increase how >>long a connection remains in timewait, by multiplying its wait time by >>max_cm_retries. This can greatly increase the timewait state before a QP >>can be re-used when CM messages are not lost. >> >>An alternative is to send a DREP in response to a DREQ, even if a local >>connection is not found, which is what this patch does. >> >> >> >> > >Would it be possible to get this fix in rc7? I am consistently seeing >this problem with Intel MPI on a 64 node cluster. > >-arlin > > Aviram? Is there an rc7 and could this get in? >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From trimmer at silverstorm.com Mon Sep 25 11:50:19 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 25 Sep 2006 14:50:19 -0400 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Monday, September 25, 2006 12:41 PM > To: Rimmer, Todd > Cc: Michael S. Tsirkin; openib-general at openib.org > Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI > > Todd> Its inefficient to peek the CQ if the next operation is > Todd> likely to then be a poll. Performing the poll_and_notify in > Todd> one call is more efficient. > > Yes, but if you think carefully about how to implement NAPI for IPoIB, > you'll see that poll-and-notify is not a useful operation. If a > device does not support the "exact" Mellanox CQ notify semantics, then > there is no way around using peek CQ somehow. > > - R. Roland, What were your thoughts on how to handle this part of Eli's proposed code: ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); /* TODO we need peek_cq here for hw devices that could would not generate interrupts for completions arriving between end of polling till request notify */ return 0; On a non-Mellanox HCA, if the CQ is not empty here, isn't this required to poll it til empty and process all the CQEs (otherwise we may not get another interrupt). If instead we return 1 from the dev->poll routine here, we could be scheduled for a future poll and a future interrupt (which might be bad). Todd From rdreier at cisco.com Mon Sep 25 11:54:30 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 11:54:30 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: (Todd Rimmer's message of "Mon, 25 Sep 2006 14:50:19 -0400") References: Message-ID: > What were your thoughts on how to handle this part of Eli's proposed > code: > ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); > /* TODO we need peek_cq here for hw devices that > could would not generate interrupts for completions > arriving between end of polling till request notify */ > > return 0; > > On a non-Mellanox HCA, if the CQ is not empty here, isn't this required > to poll it til empty and process all the CQEs (otherwise we may not get > another interrupt). If instead we return 1 from the dev->poll routine > here, we could be scheduled for a future poll and a future interrupt > (which might be bad). That's exactly where we need peek CQ. We can't repoll the CQ, because netif_rx_complete() has already been called, so the poll routine might already be running on another CPU. The only thing I can see to do is peek in the CQ, and if it's not empty, then go through the whole netif_rx_reschedule() song and dance. - R. From rdreier at cisco.com Mon Sep 25 11:58:43 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 11:58:43 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060925175854.GA25001@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 25 Sep 2006 20:58:54 +0300") References: <20060925175854.GA25001@mellanox.co.il> Message-ID: > I just find a flag + conditional peek a much simpler approach. > Since all our testing is done on mthca anyway, almost > all approaches amount to doing a NOP in various ways for us. Umm, that's a pretty parochial attitude... > So I would suggest > - get Eli's patch with simple flag into shape & working on all hardware, > push into git. > - people interested in specific hardware test performance and propose patches > to improve it even further. What is a 'simple flag'? Who is going to implement peek_cq() for ehca and ipath? I'm not really that interested in the most micro-optimized approach. I'd rather have something simple and easy to maintain -- in other words, I don't want 2 or 3 completion handling paths in IPoIB. And from that perspective, extending req_notify_cq() looks pretty good to me. - R. From rdreier at cisco.com Mon Sep 25 12:02:35 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 12:02:35 -0700 Subject: [openib-general] Question about ehca CQ handling Message-ID: While looking over the ehca driver from the perspective of adding a "peek CQ" operation, I noticed some code that looked funny. In hipz_set_cqx_n0() and hipz_set_cqx_n1(), what is the point of the calls to hipz_galpa_load_cq()? The return value is discarded. I see that hipz_galpa_load_cq() dereferences a volatile pointer internally, so I'm guessing this is some sort of ordering constraint. But would it be just as good to do "barrier()" there? - R. From rdreier at cisco.com Mon Sep 25 12:08:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 12:08:05 -0700 Subject: [openib-general] timer_pending kernel assertion while stopping IPoIB In-Reply-To: <4517AC88.9080202@voltaire.com> (Or Gerlitz's message of "Mon, 25 Sep 2006 13:16:40 +0300") References: <4517AC88.9080202@voltaire.com> Message-ID: Or> OK, i will be able to test this with 2.6.18 later this week, Or> as for doing so with your for-2.6.19 branch, is it sufficient Or> to do (assuming the tree was cloned and now updated with git Or> pull) Or> $ git checkout -f for-2.6.19 Or> to have the sources "state" be as of that branch? for example Or> following doing so i don't see the amso1100 directory below Or> drivers/infiniband/hw Well, I guess it's too late now, since both Dave M. and I have merged upstream with Linus. But still it would be worth reproducing with Linus's latest git tree. - R. From trimmer at silverstorm.com Mon Sep 25 12:11:24 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 25 Sep 2006 15:11:24 -0400 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Monday, September 25, 2006 2:55 PM > To: Rimmer, Todd > Cc: Michael S. Tsirkin; openib-general at openib.org > Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI > > > What were your thoughts on how to handle this part of Eli's proposed > > code: > > ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); > > /* TODO we need peek_cq here for hw devices that > > could would not generate interrupts for completions > > arriving between end of polling till request notify */ > > > > return 0; > > > > On a non-Mellanox HCA, if the CQ is not empty here, isn't this required > > to poll it til empty and process all the CQEs (otherwise we may not get > > another interrupt). If instead we return 1 from the dev->poll routine > > here, we could be scheduled for a future poll and a future interrupt > > (which might be bad). > > That's exactly where we need peek CQ. We can't repoll the CQ, because > netif_rx_complete() has already been called, so the poll routine might > already be running on another CPU. The only thing I can see to do is > peek in the CQ, and if it's not empty, then go through the whole > netif_rx_reschedule() song and dance. > > - R. I agree. This would also mean the ipoib_warn in ipoib_ib_completion would go away (would be a valid situation). I'm going to keep thinking about this, seems like we can't call req_notify until after netif_rx_complete, otherwise we have a different race. That leads to the req_notify and peek approach. It's a shame, because for all other ULPs, the poll_and_notify approach works well. I too would prefer not to see dual algorithms and a device flag as it could quickly lead to a lot of redundant code. Todd Rimmer From rdreier at cisco.com Mon Sep 25 12:16:51 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 12:16:51 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: (Todd Rimmer's message of "Mon, 25 Sep 2006 15:11:24 -0400") References: Message-ID: Todd> I agree. This would also mean the ipoib_warn in Todd> ipoib_ib_completion would go away (would be a valid Todd> situation). Which warning? I don't see anything that would change, and I don't see any warnings at all in ipoib_ib_completion(). Todd> I'm going to keep thinking about this, seems like we can't Todd> call req_notify until after netif_rx_complete, otherwise we Todd> have a different race. That leads to the req_notify and Todd> peek approach. Yes, that's right. Doing req_notify before netif_rx_complete risks triggering the event before netif_rx_complete, which leads to the poll routine never getting scheduled at all. - R. From mst at mellanox.co.il Mon Sep 25 12:17:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Sep 2006 22:17:33 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20060925191733.GD25001@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: NAPI > > > I just find a flag + conditional peek a much simpler approach. > > Since all our testing is done on mthca anyway, almost > > all approaches amount to doing a NOP in various ways for us. > > Umm, that's a pretty parochial attitude... But what do you suggest? I don't have all IB hardware. All I was saying was that on mthca ib_req_notify(&lost) if (lost) { reschedule } and ib_req_notify if (dev->flags & NEED) && peek_cq() { reschedule } never calls reschedule, so they are equivalent from that POV. > > So I would suggest > > - get Eli's patch with simple flag into shape & working on all hardware, > > push into git. > > - people interested in specific hardware test performance and propose patches > > to improve it even further. > > What is a 'simple flag'? Something like IB_DEVICE_IMMEDIATE_COMPETION_EVENT, set in mthca. ib_req_notify if (!(dev->flags & IB_DEVICE_IMMEDIATE_COMPETION_EVENT) && peek_cq() { reschedule } > Who is going to implement peek_cq() for ehca > and ipath? I can do that - it's easy. But it's still true I only really test mthca. > I'm not really that interested in the most micro-optimized approach. I'm still not convinced an extra peek cq is *that* expensive. Maybe we can just always peek without any clever tricks? > I'd rather have something simple and easy to maintain -- in other > words, I don't want 2 or 3 completion handling paths in IPoIB. And > from that perspective, extending req_notify_cq() looks pretty good to > me. Hmm, OK I guess. So, can you post the patch updating req_notify_cq? Eli can then respin the IPoIB patch to use that. -- MST From trimmer at silverstorm.com Mon Sep 25 12:32:42 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 25 Sep 2006 15:32:42 -0400 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Monday, September 25, 2006 3:17 PM > To: Rimmer, Todd > Cc: Michael S. Tsirkin; openib-general at openib.org > Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI > > Todd> I agree. This would also mean the ipoib_warn in > Todd> ipoib_ib_completion would go away (would be a valid > Todd> situation). > > Which warning? I don't see anything that would change, and I don't > see any warnings at all in ipoib_ib_completion(). > In Eli's proposed code: +void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) { + struct net_device *dev = (struct net_device *) dev_ptr; + struct ipoib_dev_priv *priv = netdev_priv(dev); + + /* tell the network layer we have packts */ + if (netif_rx_schedule_prep(dev)) + __netif_rx_schedule(dev); + else { + ipoib_warn(priv, "received interupt while in polling\n"); + } } The call to ipoib_warn should be removed if you go with a peek and reschedule approach, because after notify has been called, we could get a completion callback after the peek has decided to reschedule the poll. Todd Rimmer From ishai at dev.mellanox.co.il Mon Sep 25 12:31:13 2006 From: ishai at dev.mellanox.co.il (ishai at dev.mellanox.co.il) Date: Mon, 25 Sep 2006 22:31:13 +0300 (IDT) Subject: [openib-general] High Availability status in OFED (was Re: [openfabrics-ewg] Mellanox/Voltaire/QLogic/IBM SQA results for OFED 1.1?) In-Reply-To: References: Message-ID: <1439.89.1.166.254.1159212673.squirrel@dev.mellanox.co.il> Hi Scott, The IPoIB HA (High Availability)solution in OFED 1.1 is a short term solution. (There is an on going work on a full solution, that uses bonding). This short term solution for IPoIB HA uses the command "ip monitor link" to find out when a link goes down, and then updates the ip address of the other port. Apparently RHEL4 uses an old version of iproute package (iproute-2.6.9-3 with ip utility, iproute2-ss040831 in RHEL4.0 U4) in which there is no unique indication when a port goes down. (It gives the same indication when a port goes up or down). In SLES10 there is a newer version of iproute and our solution works well with this version. In order to solve the problem, The next RC will include also an installation of a version of iproute (iproute2-2.6.16-060323 with ip utility, iproute2-ss060323). This version will be installed only for OFED installation that includes the IPoIB HA option and only on RHEL4. The package will be installed in a private directory inside the OFED directory (It will not replace the iproute version of the distribution) and will be accessed by the IPoIB scripts using the exact path. As for SRP HA: SRP HA is currently available only for SLES10. The reason is that SRP HA uses the device-mapper multipath that needs high version of udev (>050). RHEL4 uses udev 039. Ishai >> As for the HA it works on SuSE but not on RH. Ishai will >> issue a report. > > This will be fixed for 1.1, right? > > Scott > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From mst at mellanox.co.il Mon Sep 25 12:35:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Sep 2006 22:35:58 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20060925193558.GF25001@mellanox.co.il> Quoting r. Rimmer, Todd : > In Eli's proposed code: > +void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) { > + struct net_device *dev = (struct net_device *) dev_ptr; > + struct ipoib_dev_priv *priv = netdev_priv(dev); > + > + /* tell the network layer we have packts */ > + if (netif_rx_schedule_prep(dev)) > + __netif_rx_schedule(dev); > + else { > + ipoib_warn(priv, "received interupt while in > polling\n"); > + } > } > > The call to ipoib_warn should be removed if you go with a peek and > reschedule approach, because after notify has been called, we could get > a completion callback after the peek has decided to reschedule the poll. right. -- MST From rdreier at cisco.com Mon Sep 25 12:36:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Sep 2006 12:36:08 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: (Todd Rimmer's message of "Mon, 25 Sep 2006 15:32:42 -0400") References: Message-ID: Todd> The call to ipoib_warn should be removed if you go with a Todd> peek and reschedule approach, because after notify has been Todd> called, we could get a completion callback after the peek Todd> has decided to reschedule the poll. Right. In fact I would just make ipoib_ib_completion() do nothing but call netif_rx_schedule() (which encapsulates all this logic anyway). - R. From mst at mellanox.co.il Mon Sep 25 12:46:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Sep 2006 22:46:14 +0300 Subject: [openib-general] openib.org svn upgrade to 1.4.0 In-Reply-To: <1158642390.7167.298.camel@localhost> References: <1158642390.7167.298.camel@localhost> Message-ID: <20060925194614.GG25001@mellanox.co.il> Would it be possible to update openib.org svn server to 1.4.0? It is still running 1.0.3 which is quite dated. There were a lot of speed improvements in svn since then. This would also enable svnsync - the ability to replicate history from one repository to another, which is extremely important for people like me that often work over a slow connection or offline. -- MST From sean.hefty at intel.com Mon Sep 25 17:03:36 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 25 Sep 2006 17:03:36 -0700 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove Message-ID: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com> Now that changes from the iWarp branch have been merged upstream, I wanted to get feedback about migrating existing changes in svn upstream, or removing features from svn. Specifically, the following features are in svn only: * RDMA CM: - userspace support - multicast support - UD QP support (required for multicast) - IB specific options (set paths, CM timeouts) * Local SA cache * IB multicast module Additionally, there have been patches to provide userspace SA support, which requires additional discussion. (Roland sent a proposal to run a daemon on top of the existing usermad interface.) Feedback on the userspace API, and user/kernel API are welcome. The main intent behind most of these changes is providing userspace multicast support through both the RDMA CM, as well as raw IB. I would like to get a better understanding of which features in svn are in use, so that I can create patches for their inclusion for 2.6.20, and identify if any features can be removed from svn. Userspace support for any feature will target 2.6.20 as well, but may go into the -mm branch if people feel that it still needs to mature. My hope is that the differences between svn and the kernel can be limited primarily to new modules (e.g. the local SA). - Sean From mst at mellanox.co.il Mon Sep 25 22:51:10 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Sep 2006 08:51:10 +0300 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove In-Reply-To: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com> References: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com> Message-ID: <20060926055109.GB21085@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [RFC] determining which changes in svn to merge upstream or remove > > Now that changes from the iWarp branch have been merged upstream, I wanted to > get feedback about migrating existing changes in svn upstream, or removing > features from svn. Specifically, the following features are in svn only: > > * RDMA CM: BTW, there was a set of bugfix patches for CMA posted that didn't get acked or nacked yet. They looked sane and I took them into ofed - could you take the time to review please? Should I repost? It might make sense to put stability fixes in before adding more features. > - userspace support I think we agreed that this will use timewait support in core/low level drivers to handle timewait/stale packets right. Is that right? If so, I really need to fid the time to do this. > - multicast support > - UD QP support (required for multicast) > - IB specific options (set paths, CM timeouts) I think that at some point we agreed that at least the option to set retry count can be made generic (with a limit of 15 retries). This kind of makes sense since TCP sockets have SYN retry option. Wrt CM timeouts, asking the ULP to guess the timeout does not make much sense to me - how does the ULP know? IMO we need to implement a smarter heuristic that will set them automatically somehow. Is RDMA CM using all data from the path record query already? How about implementing exponential backoff? Other ideas? > * Local SA cache This is supposed to reduce the load on SM, but personally, I am still not convinced this is actually necessary - we are seeing gen2 based clusters running just fine without these tricks. What is more, this seems to break the model of IB network as a centrally managed fabric, and a look at the code gives me the feeling no one thought through how this will interact with SM features such as QoS, balancing, tavor MTU optimizations etc. > * IB multicast module Last time I tested this, there still were crashes with the IPoIB. If there's a patch that adds just this change, I might be able to test it. OTOH, I'm still not sure why are we touching IPoIB at all since it seems unlikely any other ULP will want to share in the IPoIB mcast group. -- MST From mst at mellanox.co.il Mon Sep 25 22:56:59 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Sep 2006 08:56:59 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: <1158850657.24776.158.camel@localhost> Message-ID: <20060926055659.GC21085@mellanox.co.il> Quoting r. Roland Dreier : > I took a stab at implementing this myself BTW, are you taking over this work? Just implementing peek/request for notification enhancements? A couple of your comments sounded like you are - a little coordination won't hurt here. -- MST From sean.hefty at intel.com Tue Sep 26 01:12:17 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 26 Sep 2006 01:12:17 -0700 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove In-Reply-To: <20060926055109.GB21085@mellanox.co.il> Message-ID: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com> >BTW, there was a set of bugfix patches for CMA posted that didn't get acked or >nacked yet. They looked sane and I took them into ofed - could you take the >time to review please? Should I repost? It might make sense to put stability >fixes in before adding more features. I've actually been on vacation for 2 of the last 3 weeks, so haven't had a chance to review recent patches. I should get to them by the end of this week. I want to ensure that whatever ends up being submitted upstream makes the most sense, including pushing fixes before other changes. This adds more work, which is why I'd like to get svn more in sync with the kernel. >> - userspace support > >I think we agreed that this will use timewait support in >core/low level drivers to handle timewait/stale packets right. >Is that right? If so, I really need to fid the time to do this. I think so. I don't see a clean alternative. I would also propose that we discourage connecting QPs outside of the IB CM to allow for detecting duplicate connections. We don't necessarily need to enforce this in code, but changing test programs to at least comment that connecting over sockets is discouraged may help. >> - multicast support >> - UD QP support (required for multicast) >> - IB specific options (set paths, CM timeouts) > >I think that at some point we agreed that at least the option to set >retry count can be made generic (with a limit of 15 retries). >This kind of makes sense since TCP sockets have SYN retry option. > >Wrt CM timeouts, asking the ULP to guess the timeout >does not make much sense to me - how does the ULP know? >IMO we need to implement a smarter heuristic that will set them >automatically somehow. Is RDMA CM using all data from the path record >query already? How about implementing exponential backoff? Other ideas? My thoughts on the options are to try to hold off merging them upstream. The option to get/set path records needs reworked. Getting paths should be done outside of the RDMA CM, such as through a userspace SA, and the user to kernel interface should pass the attribute values as defined by the spec (i.e. in network order) to avoid marshalling issues. The CM timeout/retry options are used by uDAPL, but the fix to increase the default retry count to the maximum may help. The RDMA CM uses the data from the path record, but the ULP has the most data about how long the remote side might take to respond to a CM REQ message (remote_cm_response_timeout). We might be able to have the RDMA CM make clever use of the MRA to avoid the issue, and even in the short term, dumb use of the MRA may help. (The issue is that as connections are formed, they begin being used, which can greatly affect how quickly new connections can be created. We've seen them take up to 60 seconds.) > >> * Local SA cache > >This is supposed to reduce the load on SM, but personally, I am still not >convinced this is actually necessary - we are seeing gen2 based clusters >running >just fine without these tricks. > >What is more, this seems to break the model of IB network as a centrally >managed >fabric, and a look at the code gives me the feeling no one thought through how >this will interact with SM features such as QoS, balancing, tavor MTU >optimizations etc. This is a module that I'm not inclined to submit upstream. It was requested as part of the Path Forward work, but I haven't seen any feedback on its use or performance. >> * IB multicast module > >Last time I tested this, there still were crashes with the IPoIB. >If there's a patch that adds just this change, I might be able to test it. >OTOH, I'm still not sure why are we touching IPoIB at all since >it seems unlikely any other ULP will want to share in the IPoIB mcast group. Personally, I think ipoib should use it to reduce code duplication. Declining the change because there _might_ be a bug in the new module doesn't seem like the right approach to take. (Why accept changes in the HCA driver then?) Plus we continue to find bugs in the ipoib multicast code. The main reason I modified ipoib was so that ib_multicast had a real user that I could test with, but there's nothing architecturally that prevents a process from joining the ipoib multicast group (maybe to snoop traffic for some reason...). I haven't seen any crashes with ipoib using ib_multicast, and since my last fix, I haven't seen any bug reports. The patches for ipoib need to be regenerated because of changes since the svn checkin. - Sean From mst at mellanox.co.il Tue Sep 26 02:34:35 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Sep 2006 12:34:35 +0300 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove In-Reply-To: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com> References: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com> Message-ID: <20060926093435.GA21473@mellanox.co.il> Quoting r. Sean Hefty : > The CM timeout/retry options are used by uDAPL, but the fix to increase the > default retry count to the maximum may help. The RDMA CM uses the data from the > path record, but the ULP has the most data about how long the remote side might > take to respond to a CM REQ message (remote_cm_response_timeout). We might be > able to have the RDMA CM make clever use of the MRA to avoid the issue, and even > in the short term, dumb use of the MRA may help. (The issue is that as > connections are formed, they begin being used, which can greatly affect how > quickly new connections can be created. We've seen them take up to 60 seconds.) Connections taking 60 sec to create is an issue. Can you please explain how the fact that some connections are used affect the time it takes to send the response? Why would sending MRA be faster than sending the response? > >> * IB multicast module > > > >Last time I tested this, there still were crashes with the IPoIB. > >If there's a patch that adds just this change, I might be able to test it. > >OTOH, I'm still not sure why are we touching IPoIB at all since > >it seems unlikely any other ULP will want to share in the IPoIB mcast group. > > Personally, I think ipoib should use it to reduce code duplication. I did not notice any significant reduction in ipoib LOC, but maybe I'm mistaken. Let's see the updated patch. -- MST -- MST From mst at mellanox.co.il Tue Sep 26 02:51:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Sep 2006 12:51:22 +0300 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove In-Reply-To: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com> References: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com> Message-ID: <20060926095122.GB21473@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [RFC] determining which changes in svn to merge upstream or remove > > >BTW, there was a set of bugfix patches for CMA posted that didn't get acked or > >nacked yet. They looked sane and I took them into ofed - could you take the > >time to review please? Should I repost? It might make sense to put stability > >fixes in before adding more features. > > I've actually been on vacation for 2 of the last 3 weeks, so haven't had a > chance to review recent patches. I should get to them by the end of this week. You can get the full list of stuff we apply in ofed here: git://www.mellanox.co.il/~git/infiniband ofed_addons look in directory kernel_patches/fixes Here's a list: cma_list_init.patch cma_mem_leak.patch cma_race_fix.patch cma_tavor_quirk.patch and your own: sean_cma_establish.patch The following needs some discussion: sean_cm_drep_on_not_found.patch -- MST From ishai at mellanox.co.il Tue Sep 26 06:28:50 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 26 Sep 2006 16:28:50 +0300 Subject: [openib-general] [PATCH] IB/SRP identify QP in error state Message-ID: <20060926132850.GA17342@mellanox.co.il> There is a bug in mthca low level driver. A call to ib_post_send that tries to post to a QP that is in error state does not return immediately with error. It terminates with errors after a timeout. This causes SRP to wait a long time to reconnect. (Each abort call and each reset_device call performs post_send and waits for the timeout). The following patch solves this problem by identifying the failure and returning an immediate error code. Signed-off-by: Ishai Rabinovitz --- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-25 13:51:47.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-25 15:40:04.000000000 +0300 @@ -543,6 +543,7 @@ static int srp_reconnect_target(struct s target->tx_head = 0; target->tx_tail = 0; + target->need_reset = 0; ret = srp_connect_target(target); if (ret) goto err; @@ -858,6 +859,7 @@ static void srp_completion(struct ib_cq printk(KERN_ERR PFX "failed %s status %d\n", wc.wr_id & SRP_OP_RECV ? "receive" : "send", wc.status); + target->need_reset = 1; break; } @@ -1313,6 +1315,8 @@ static int srp_abort(struct scsi_cmnd *s printk(KERN_ERR "SRP abort called\n"); + if (target->need_reset) + return FAILED; if (srp_find_req(target, scmnd, &req)) return FAILED; if (srp_send_tsk_mgmt(target, req, SRP_TSK_ABORT_TASK)) @@ -1341,6 +1345,8 @@ static int srp_reset_device(struct scsi_ printk(KERN_ERR "SRP reset_device called\n"); + if (target->need_reset) + return FAILED; if (srp_find_req(target, scmnd, &req)) return FAILED; if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) @@ -1750,6 +1756,7 @@ static ssize_t srp_create_target(struct goto err_free; } + target->need_reset = 0; ret = srp_connect_target(target); if (ret) { printk(KERN_ERR PFX "Connection failed\n"); Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.h =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.h 2006-09-25 13:51:47.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.h 2006-09-25 14:00:36.000000000 +0300 @@ -158,6 +158,7 @@ struct srp_target_port { struct completion done; int status; enum srp_target_state state; + int need_reset; }; struct srp_iu { -- Ishai Rabinovitz From jackm at dev.mellanox.co.il Tue Sep 26 07:07:37 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 26 Sep 2006 17:07:37 +0300 Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib module & ofed In-Reply-To: References: Message-ID: <200609261707.37720.jackm@dev.mellanox.co.il> On Monday 25 September 2006 17:01, Thierry Delaitre wrote: I noticed in the Lustre configure file the following --with-linux=path set path to Linux source (default=/usr/src/linux) Where does /usr/src/linux link to? You might consider explicitly specifying the following options as well in the Lustre ./configure step: --with-linux=path set path to Linux source (default=/usr/src/linux) --with-linux-obj=path set path to Linux objects dir (default=$LINUX) --with-linux-config=path set path to Linux .conf (default=$LINUX_OBJ/.config) - Jack > > On Mon, 25 Sep 2006, Thierry Delaitre wrote: > > > > > It seems that lustre puts its modules in /lib/modules/2.6.16.21-0.8-default > > despite the fact that my kernel is 2.6.16.21-0.8-smp ! > > > > uname -a > > Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux > > > > make[3]: Nothing to be done for `install-exec-am'. > > /bin/sh ../../mkinstalldirs /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre > > /usr/bin/install -c -m 644 lquota.ko /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre/lquota > > > > I therefore ends up with a /lib/modules/2.6.16.21-0.8-smp and > > /lib/modules/2.6.16.21-0.8-default > > > > i'm now searching why lustre thinks my kernel is 2.6.16.21-0.8-default and > > not 2.6.16.21-0.8-smp > > I've updated the UTS_RELEASE string in > /usr/src/linux-2.6.16.21-0.8/include/linux/version.h from default to smp > and deleted my /lib/modules/ > lustre now installs in /lib/modules/2.6.16.21-0.8-smp/kernel along with > ofed ib drivers. i recompiled the kernel, ofed and lustre and still gets > this: > > ko2iblnd: disagrees about version of symbol ib_create_cq > ko2iblnd: Unknown symbol ib_create_cq > ko2iblnd: disagrees about version of symbol ib_dereg_mr > ko2iblnd: Unknown symbol ib_dereg_mr > ko2iblnd: disagrees about version of symbol ib_destroy_cq > ko2iblnd: Unknown symbol ib_destroy_cq > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > ko2iblnd: Unknown symbol ib_get_dma_mr > ko2iblnd: disagrees about version of symbol ib_alloc_pd > ko2iblnd: Unknown symbol ib_alloc_pd > ko2iblnd: disagrees about version of symbol ib_modify_qp > ko2iblnd: Unknown symbol ib_modify_qp > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > ko2iblnd: Unknown symbol ib_dealloc_pd > LustreError: 7430:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND > o2ib, module ko2iblnd, rc=256 > > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_create_cq > 3cfe7afa A __crc_ib_create_cq > 00000060 r __kcrctab_ib_create_cq > 0000015f r __kstrtab_ib_create_cq > 000000c0 r __ksymtab_ib_create_cq > 00000d50 T ib_create_cq > > i'm a bit stuck! > > Thierry. > > > Thierry. > > > > On Mon, 25 Sep 2006, Thierry Delaitre wrote: > > > > > > > > On Mon, 25 Sep 2006, Michael S. Tsirkin wrote: > > > > > > > Quoting r. Thierry Delaitre : > > > > > > > > > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the > > > > > lustre's configure line below. Lustre's configure script looks for a > > > > > driver/infiniband directory which only seems to exist under > > > > > /usr/local/ofed/src/openib-1.1 > > > > > > > > > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/ > > > > > > > > > > Thierry. > > > > > > > > > > > replace /usr/local/ofed with the prefix you specified. > > > > > > > > This looks wrong - openib-1.1 is the pristine sources. > > > > openib/include is the exported interface and is what you should use > > > > for dependent modules. > > > > No idea why would lustre need drivers/infiniband. > > > > Try creating a softlink: > > > > > > > > mkdir /usr/local/ofed/src/openib/drivers/infiniband > > > > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband > > > > > > I untarred lustre 1.5.95, compiled it (./configure > > > --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a > > > and still get the following: > > > > > > my modprobe.conf is the following > > > > > > options lnet ip2nets="o2ib0 161.74.83.[0-255]" > > > > > > lctl network up > > > LNET configure error 100: Network is down > > > > > > ko2iblnd: disagrees about version of symbol ib_create_cq > > > ko2iblnd: Unknown symbol ib_create_cq > > > ko2iblnd: disagrees about version of symbol ib_dereg_mr > > > ko2iblnd: Unknown symbol ib_dereg_mr > > > ko2iblnd: disagrees about version of symbol ib_destroy_cq > > > ko2iblnd: Unknown symbol ib_destroy_cq > > > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > > > ko2iblnd: Unknown symbol ib_get_dma_mr > > > ko2iblnd: disagrees about version of symbol ib_alloc_pd > > > ko2iblnd: Unknown symbol ib_alloc_pd > > > ko2iblnd: disagrees about version of symbol ib_modify_qp > > > ko2iblnd: Unknown symbol ib_modify_qp > > > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > > > ko2iblnd: Unknown symbol ib_dealloc_pd > > > LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND > > > o2ib, module ko2iblnd, rc=256 > > > > > > lsmod | grep ib > > > libcfs 103060 1 lnet > > > ib_ucm 19332 0 > > > ib_addr 10756 1 rdma_cm > > > ib_cm 31968 2 ib_ucm,rdma_cm > > > ib_ipoib 48400 0 > > > ib_sa 16652 3 rdma_cm,ib_cm,ib_ipoib > > > ib_uverbs 38312 2 rdma_ucm,ib_ucm > > > ib_umad 17968 0 > > > ib_mthca 116240 0 > > > ib_mad 36116 4 ib_cm,ib_sa,ib_umad,ib_mthca > > > ib_core 49024 9 > > > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad > > > > > > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd > > > d5dcb698 A __crc_ib_alloc_pd > > > 0000001c r __kcrctab_ib_alloc_pd > > > 0000006a r __kstrtab_ib_alloc_pd > > > 00000038 r __ksymtab_ib_alloc_pd > > > 00000c65 T ib_alloc_pd > > > > > > from lustre's config.log: > > > > > > configure:6500: checking whether to enable OpenIB gen2 support > > > configure:6586: cp conftest.c build && make modules CC=gcc -f > > > /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX > > > _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include M=/root/lustre-1.5.95/build > > > /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration > > > isn't a prototype > > > /root/lustre-1.5.95/build/conftest.c: In function 'main': > > > /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason' > > > /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr' > > > /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr' > > > /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr' > > > /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param' > > > WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined! > > > configure:6589: $? = 0 > > > configure:6591: test -s build/conftest.o > > > configure:6594: $? = 0 > > > configure:6597: result: yes > > > > > > > > > Thierry. > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > ---------------------------------------- > > Dr Thierry DELAITRE > > Systems and Services Manager, CSCS > > University of Westminster > > 115 New Cavendish Street, London W1W 6UW > > > > Tel: 020 7911 5000 ext: 3586 > > Fax: 020 7911 5089 > > Mobile short dial code 1788 > > > > http://www.cscs.wmin.ac.uk/~delaitt > > ---------------------------------------- > > > > This e-mail and its attachments are intended for the above named only > > and may be confidential. If they have come to you in error you must > > not copy or show them to anyone, nor should you take any action based > > on them, other than to notify the error by replying to the sender. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > ---------------------------------------- > Dr Thierry DELAITRE > Systems and Services Manager, CSCS > University of Westminster > 115 New Cavendish Street, London W1W 6UW > > Tel: 020 7911 5000 ext: 3586 > Fax: 020 7911 5089 > Mobile short dial code 1788 > > http://www.cscs.wmin.ac.uk/~delaitt > ---------------------------------------- > > This e-mail and its attachments are intended for the above named only > and may be confidential. If they have come to you in error you must > not copy or show them to anyone, nor should you take any action based > on them, other than to notify the error by replying to the sender. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Tue Sep 26 07:24:00 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Sep 2006 17:24:00 +0300 Subject: [openib-general] [PATCH] IB/SRP identify QP in error state In-Reply-To: <20060926132850.GA17342@mellanox.co.il> References: <20060926132850.GA17342@mellanox.co.il> Message-ID: <20060926142400.GF21473@mellanox.co.il> Quoting r. Ishai Rabinovitz : > Subject: [PATCH] IB/SRP identify QP in error state > > There is a bug in mthca low level driver. > A call to ib_post_send that tries to post to a QP that is in error state does > not return immediately with error. It terminates with errors after a timeout. Let me rephrase: after post send/receive to QP in error state in mthca, a completion with error might never get generated. SRP will then timeout. To fix mthca, we'd need to change QP state on completion with error and on modify to error, and add actual code where it now says /* XXX check that state is OK to post send */ /* XXX check that state is OK to post receive */ I guess the reason we never fixed this was because it did not seem to actually hurt any real ULPs, and testing QP state will affect fast path performance. However, IB spec is quite explicit on this point, and fixing a low level drivers seems a better approach than adding work-arounds in ULPs. Roland, what do you think? -- MST From ishai at mellanox.co.il Tue Sep 26 07:45:41 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 26 Sep 2006 17:45:41 +0300 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel Message-ID: <20060926144541.GA17938@mellanox.co.il> Hi Roland, SRP High Availability needs an initiator to connect to the same target several times, e.g., once from each IB port of the target (this way we can use device mapper multipath for failover). Note that both connections are actually active, e.g. multipath is issuing commands to to get the remote scsi id. Since multiple channel operation is currently disabled in connection request, each new connection request will cause the target to disconnect the existing connection which forces us to bounce a lot between the two channels. This patch enables multiple channel operation in connection requests, to avoid getting disconnects when multiple connections are active. There does not seem to be any harm in doing this even when multipath is not used. Signed-off-by: Ishai Rabinovitz --- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-26 09:22:13.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-26 14:54:35.000000000 +0300 @@ -329,6 +329,7 @@ static int srp_send_req(struct srp_targe req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); + req->priv.req_flags = SRP_MULTICHAN_MULTI; /* * In the published SRP specification (draft rev. 16a), the * port identifier format is 8 bytes of ID extension followed -- Ishai Rabinovitz From thomas.bub at thomson.net Tue Sep 26 08:00:07 2006 From: thomas.bub at thomson.net (Bub Thomas) Date: Tue, 26 Sep 2006 17:00:07 +0200 Subject: [openib-general] How to register, query and delete a service_id? Message-ID: Hi, as I'm porting my gen1 application to gen2 my last task is to port the service_id registration, query and deletion to gen2. With the help of Mellanox I got it running under gen1 using read/write of mad messages on the device "/dev/ts_ua0". I browsed though the ofed sources and got lost in there. Is there some good and simple example that can help me out of my blind? I assume I have to use the ibmad and/or ibumad library? Thanks Thomas Bub ............................................................ Thomas Bub Grass Valley Germany GmbH Brunnenweg 9 64331 Weiterstadt, Germany Tel: +49 6150 104 147 Fax: +49 6150 104 656 Email: Thomas.Bub at thomson.net www.GrassValley.com ............................................................ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Tue Sep 26 08:10:00 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 26 Sep 2006 18:10:00 +0300 Subject: [openib-general] [PATCH] osm: cosmetic changes in osmtest multicast flow In-Reply-To: References: Message-ID: <20060926151000.GA8949@sashak.voltaire.com> On 16:12 Mon 25 Sep , Yevgeny Kliteynik wrote: > Hi Hal > > This patch is all about cosmetics - it improves > the osmtest log readability, and it also has some > cosmetic additions in the code. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Applied (to trunk). Thanks. Sasha From rdreier at cisco.com Tue Sep 26 08:07:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Sep 2006 08:07:45 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060926055659.GC21085@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 26 Sep 2006 08:56:59 +0300") References: <1158850657.24776.158.camel@localhost> <20060926055659.GC21085@mellanox.co.il> Message-ID: Michael> BTW, are you taking over this work? Just implementing Michael> peek/request for notification enhancements? A couple of Michael> your comments sounded like you are - a little Michael> coordination won't hurt here. I think I can handle everything from here -- I will post patches based on my current approach soon. - R. From rdreier at cisco.com Tue Sep 26 08:12:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Sep 2006 08:12:08 -0700 Subject: [openib-general] [PATCH] IB/SRP identify QP in error state In-Reply-To: <20060926142400.GF21473@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 26 Sep 2006 17:24:00 +0300") References: <20060926132850.GA17342@mellanox.co.il> <20060926142400.GF21473@mellanox.co.il> Message-ID: Michael> Let me rephrase: after post send/receive to QP in error Michael> state in mthca, a completion with error might never get Michael> generated. Won't a flush error be generated for every request posted to a QP in the error state? Michael> To fix mthca, we'd need to change QP state on completion Michael> with error and on modify to error, and add actual code Michael> where it now says Michael> /* XXX check that state is OK to post send */ /* XXX Michael> check that state is OK to post receive */ Michael> I guess the reason we never fixed this was because it did Michael> not seem to actually hurt any real ULPs, and testing QP Michael> state will affect fast path performance. Michael> However, IB spec is quite explicit on this point, and Michael> fixing a low level drivers seems a better approach than Michael> adding work-arounds in ULPs. Michael> Roland, what do you think? Yes, that was something I just never got around to implementing. Of course since transition to error state may be done asynchronously by hardware, we still have the case where the consumer tries to post a work request to a QP in error but the low-level driver still thinks the QP is in RTS. - R. In-Reply-To: <20060926142400.GF21473 at mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 26 Sep 2006 17:24:00 +0300") User-Agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.4.18 (linux) Date: Tue, 26 Sep 2006 08:11:54 -0700 Message-ID: From aviram at mellanox.co.il Tue Sep 26 08:39:37 2006 From: aviram at mellanox.co.il (Aviram Gutman) Date: Tue, 26 Sep 2006 18:39:37 +0300 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ Message-ID: <2D5DEE3C6A0E0244B0133244731D4C4B04D147@mtlexch01.mtl.com> -----Original Message----- From: Arlin Davis [mailto:ardavis at ichips.intel.com] Sent: Monday, September 25, 2006 9:38 PM To: Arlin Davis Cc: Sean Hefty; openib-general at openib.org; Aviram Gutman Subject: Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ Arlin Davis wrote: >Sean Hefty wrote: > > > >>Currently a DREP is only sent in response to a DREQ if a connection >>has been found matching the DREQ, and it is in the proper state. Once >>a DREP is sent, the local connection moves into timewait. Duplicate >>DREQs received while in this state result in re-sending the DREP. >> >>However, it's likely that the local connection will enter and exit >>timewait before the remote side times out a lost DREP and resends a DREQ. >>There are a couple possible solutions to this. One is to increase how >>long a connection remains in timewait, by multiplying its wait time by >>max_cm_retries. This can greatly increase the timewait state before a >>QP can be re-used when CM messages are not lost. >> >>An alternative is to send a DREP in response to a DREQ, even if a >>local connection is not found, which is what this patch does. >> >> >> >> > >Would it be possible to get this fix in rc7? I am consistently seeing >this problem with Intel MPI on a 64 node cluster. > >-arlin > > > Aviram? Is there an rc7 and could this get in? >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general > > > Yes, Michael Tsirkin add it. From sean.hefty at intel.com Tue Sep 26 08:48:11 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 26 Sep 2006 08:48:11 -0700 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove In-Reply-To: <20060926093435.GA21473@mellanox.co.il> Message-ID: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com> >Connections taking 60 sec to create is an issue. >Can you please explain how the fact that some connections are used affect >the time it takes to send the response? This is in userspace, and IMO, an application issue. Threads using established connections simply begin consuming all processor time. This is while running under heavy load and trying to scale up the application. >Why would sending MRA be faster than sending the response? An MRA could be sent directly by the RDMA CM in the kernel in a REQ callback, whereas the response requires the userspace application to poll the REQ and generate a REP. - Sean From aviram at dev.mellanox.co.il Tue Sep 26 09:00:34 2006 From: aviram at dev.mellanox.co.il (Aviram Gutman) Date: Tue, 26 Sep 2006 19:00:34 +0300 Subject: [openib-general] OFED Status Message-ID: <45194EA2.6070202@dev.mellanox.co.il> Hi, OFED 1.1 RC6 was released on Thu. The issues that were resolved since are: 1) OpenIB Diags build on SLES10 ppc - Solved by Moshe Katzir from Voltaire 2) iSER build on SLES10 needs root privilege - Voltaire fixed it 3) Bug #233 SDP crash on ipath - I believe MST fixed. Betsy please confirm. 4) Fix IBDM to allow multiple devices on the same machine - Eitan Zahavi fixed 5) SRP HA - Fixed by Ishai 6) IPoIB HA on RH - Vlad made progess, issue is still not solved. 7) The CM fix that Arlin asked - In Pending that IPoIB HA is solved would like to issue RC7 that suppose to be final. Is everyone OK with this approach? Aviram From bos at pathscale.com Tue Sep 26 09:58:27 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 26 Sep 2006 09:58:27 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <45194EA2.6070202@dev.mellanox.co.il> References: <45194EA2.6070202@dev.mellanox.co.il> Message-ID: <1159289907.9652.18.camel@chalcedony.pathscale.com> On Tue, 2006-09-26 at 19:00 +0300, Aviram Gutman wrote: > 3) Bug #233 SDP crash on ipath - I believe MST fixed. Betsy please confirm. Yes, this seems to be fixed. References: <200609261707.37720.jackm@dev.mellanox.co.il> Message-ID: On Tue, 26 Sep 2006, Jack Morgenstein wrote: > On Monday 25 September 2006 17:01, Thierry Delaitre wrote: > > I noticed in the Lustre configure file the following > --with-linux=path set path to Linux source (default=/usr/src/linux) > > Where does /usr/src/linux link to? > > You might consider explicitly specifying the following options as well in the > Lustre ./configure step: > > --with-linux=path set path to Linux source (default=/usr/src/linux) > --with-linux-obj=path set path to Linux objects dir (default=$LINUX) > --with-linux-config=path > set path to Linux .conf (default=$LINUX_OBJ/.config) I specified the whole string and still the same. ./configure --with-o2ib=/usr/local/ofed/src/openib --with-linux=/usr/src/linux-2.6.16.21-0.8 --with-linux-obj=/usr/src/linux-2.6.16.21-0.8 --with-linux-config=/usr/src/linux-2.6.16.21-0.8/.config Thierry. > - Jack > > > > On Mon, 25 Sep 2006, Thierry Delaitre wrote: > > > > > > > > It seems that lustre puts its modules in /lib/modules/2.6.16.21-0.8-default > > > despite the fact that my kernel is 2.6.16.21-0.8-smp ! > > > > > > uname -a > > > Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux > > > > > > make[3]: Nothing to be done for `install-exec-am'. > > > /bin/sh ../../mkinstalldirs /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre > > > /usr/bin/install -c -m 644 lquota.ko /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre/lquota > > > > > > I therefore ends up with a /lib/modules/2.6.16.21-0.8-smp and > > > /lib/modules/2.6.16.21-0.8-default > > > > > > i'm now searching why lustre thinks my kernel is 2.6.16.21-0.8-default and > > > not 2.6.16.21-0.8-smp > > > > I've updated the UTS_RELEASE string in > > /usr/src/linux-2.6.16.21-0.8/include/linux/version.h from default to smp > > and deleted my /lib/modules/ > > lustre now installs in /lib/modules/2.6.16.21-0.8-smp/kernel along with > > ofed ib drivers. i recompiled the kernel, ofed and lustre and still gets > > this: > > > > ko2iblnd: disagrees about version of symbol ib_create_cq > > ko2iblnd: Unknown symbol ib_create_cq > > ko2iblnd: disagrees about version of symbol ib_dereg_mr > > ko2iblnd: Unknown symbol ib_dereg_mr > > ko2iblnd: disagrees about version of symbol ib_destroy_cq > > ko2iblnd: Unknown symbol ib_destroy_cq > > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > > ko2iblnd: Unknown symbol ib_get_dma_mr > > ko2iblnd: disagrees about version of symbol ib_alloc_pd > > ko2iblnd: Unknown symbol ib_alloc_pd > > ko2iblnd: disagrees about version of symbol ib_modify_qp > > ko2iblnd: Unknown symbol ib_modify_qp > > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > > ko2iblnd: Unknown symbol ib_dealloc_pd > > LustreError: 7430:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND > > o2ib, module ko2iblnd, rc=256 > > > > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_create_cq > > 3cfe7afa A __crc_ib_create_cq > > 00000060 r __kcrctab_ib_create_cq > > 0000015f r __kstrtab_ib_create_cq > > 000000c0 r __ksymtab_ib_create_cq > > 00000d50 T ib_create_cq > > > > i'm a bit stuck! > > > > Thierry. > > > > > Thierry. > > > > > > On Mon, 25 Sep 2006, Thierry Delaitre wrote: > > > > > > > > > > > On Mon, 25 Sep 2006, Michael S. Tsirkin wrote: > > > > > > > > > Quoting r. Thierry Delaitre : > > > > > > > > > > > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the > > > > > > lustre's configure line below. Lustre's configure script looks for a > > > > > > driver/infiniband directory which only seems to exist under > > > > > > /usr/local/ofed/src/openib-1.1 > > > > > > > > > > > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/ > > > > > > > > > > > > Thierry. > > > > > > > > > > > > > replace /usr/local/ofed with the prefix you specified. > > > > > > > > > > This looks wrong - openib-1.1 is the pristine sources. > > > > > openib/include is the exported interface and is what you should use > > > > > for dependent modules. > > > > > No idea why would lustre need drivers/infiniband. > > > > > Try creating a softlink: > > > > > > > > > > mkdir /usr/local/ofed/src/openib/drivers/infiniband > > > > > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband > > > > > > > > I untarred lustre 1.5.95, compiled it (./configure > > > > --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a > > > > and still get the following: > > > > > > > > my modprobe.conf is the following > > > > > > > > options lnet ip2nets="o2ib0 161.74.83.[0-255]" > > > > > > > > lctl network up > > > > LNET configure error 100: Network is down > > > > > > > > ko2iblnd: disagrees about version of symbol ib_create_cq > > > > ko2iblnd: Unknown symbol ib_create_cq > > > > ko2iblnd: disagrees about version of symbol ib_dereg_mr > > > > ko2iblnd: Unknown symbol ib_dereg_mr > > > > ko2iblnd: disagrees about version of symbol ib_destroy_cq > > > > ko2iblnd: Unknown symbol ib_destroy_cq > > > > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > > > > ko2iblnd: Unknown symbol ib_get_dma_mr > > > > ko2iblnd: disagrees about version of symbol ib_alloc_pd > > > > ko2iblnd: Unknown symbol ib_alloc_pd > > > > ko2iblnd: disagrees about version of symbol ib_modify_qp > > > > ko2iblnd: Unknown symbol ib_modify_qp > > > > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > > > > ko2iblnd: Unknown symbol ib_dealloc_pd > > > > LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND > > > > o2ib, module ko2iblnd, rc=256 > > > > > > > > lsmod | grep ib > > > > libcfs 103060 1 lnet > > > > ib_ucm 19332 0 > > > > ib_addr 10756 1 rdma_cm > > > > ib_cm 31968 2 ib_ucm,rdma_cm > > > > ib_ipoib 48400 0 > > > > ib_sa 16652 3 rdma_cm,ib_cm,ib_ipoib > > > > ib_uverbs 38312 2 rdma_ucm,ib_ucm > > > > ib_umad 17968 0 > > > > ib_mthca 116240 0 > > > > ib_mad 36116 4 ib_cm,ib_sa,ib_umad,ib_mthca > > > > ib_core 49024 9 > > > > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad > > > > > > > > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd > > > > d5dcb698 A __crc_ib_alloc_pd > > > > 0000001c r __kcrctab_ib_alloc_pd > > > > 0000006a r __kstrtab_ib_alloc_pd > > > > 00000038 r __ksymtab_ib_alloc_pd > > > > 00000c65 T ib_alloc_pd > > > > > > > > from lustre's config.log: > > > > > > > > configure:6500: checking whether to enable OpenIB gen2 support > > > > configure:6586: cp conftest.c build && make modules CC=gcc -f > > > > /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX > > > > _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include M=/root/lustre-1.5.95/build > > > > /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration > > > > isn't a prototype > > > > /root/lustre-1.5.95/build/conftest.c: In function 'main': > > > > /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason' > > > > /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr' > > > > /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr' > > > > /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr' > > > > /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param' > > > > WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined! > > > > configure:6589: $? = 0 > > > > configure:6591: test -s build/conftest.o > > > > configure:6594: $? = 0 > > > > configure:6597: result: yes > > > > > > > > > > > > Thierry. > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > ---------------------------------------- > > > Dr Thierry DELAITRE > > > Systems and Services Manager, CSCS > > > University of Westminster > > > 115 New Cavendish Street, London W1W 6UW > > > > > > Tel: 020 7911 5000 ext: 3586 > > > Fax: 020 7911 5089 > > > Mobile short dial code 1788 > > > > > > http://www.cscs.wmin.ac.uk/~delaitt > > > ---------------------------------------- > > > > > > This e-mail and its attachments are intended for the above named only > > > and may be confidential. If they have come to you in error you must > > > not copy or show them to anyone, nor should you take any action based > > > on them, other than to notify the error by replying to the sender. > > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss at clusterfs.com > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > ---------------------------------------- > > Dr Thierry DELAITRE > > Systems and Services Manager, CSCS > > University of Westminster > > 115 New Cavendish Street, London W1W 6UW > > > > Tel: 020 7911 5000 ext: 3586 > > Fax: 020 7911 5089 > > Mobile short dial code 1788 > > > > http://www.cscs.wmin.ac.uk/~delaitt > > ---------------------------------------- > > > > This e-mail and its attachments are intended for the above named only > > and may be confidential. If they have come to you in error you must > > not copy or show them to anyone, nor should you take any action based > > on them, other than to notify the error by replying to the sender. > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > ---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender. From xma at us.ibm.com Tue Sep 26 11:14:53 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 11:14:53 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: Roland, We had a simple version of NAPI patch. We saw the performance improvement on mthca but not ehca. We will test this NAPI patch on ehca when it's available to see how's the performance. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Sep 26 11:21:14 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 11:21:14 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <1158850657.24776.158.camel@localhost> Message-ID: > This patch implements NAPI for iopib. It is a draft implementation. > I would like your opinion on whether we need a module parameter > to control if NAPI should be activated or not. It can be a configuration option to enable/disable NAPI, just like other network device. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Sep 26 11:28:49 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 11:28:49 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: Message-ID: Since linux 2.6.18 supports GSO, I have patched IPoIB to enable GSO, but haven't tested the performance yet. Has anyone tried already? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlleinin at hpcn.ca.sandia.gov Tue Sep 26 11:44:13 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Tue, 26 Sep 2006 11:44:13 -0700 Subject: [openib-general] OpenFabrics IBTA DevCon 2006 presentations Message-ID: <1159296253.15009.57.camel@localhost> Most of the presentations from the OpenFabrics IBTA DevCon 2006 in San Francisco yesterday have been posted online at http://openfabrics.org/conference/sep2006devcon/ and http://www.infinibandta.org/events/DevCon2006_presentations Thanks to everyone who helped set up this event and to those that participated. - Matt From mst at mellanox.co.il Tue Sep 26 12:20:07 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Sep 2006 22:20:07 +0300 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove In-Reply-To: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com> References: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com> Message-ID: <20060926192007.GA24009@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [RFC] determining which changes in svn to merge upstream or remove > > >Connections taking 60 sec to create is an issue. > >Can you please explain how the fact that some connections are used affect > >the time it takes to send the response? > > This is in userspace, and IMO, an application issue. Threads using established > connections simply begin consuming all processor time. This is while running > under heavy load and trying to scale up the application. > > >Why would sending MRA be faster than sending the response? > > An MRA could be sent directly by the RDMA CM in the kernel in a REQ callback, > whereas the response requires the userspace application to poll the REQ and > generate a REP. I see. So it actually does look like for userspace clients, CMA should send MRA immediately and then let userspace send REP in its own good time. -- MST From mst at mellanox.co.il Tue Sep 26 12:28:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Sep 2006 22:28:22 +0300 Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib module & ofed In-Reply-To: References: <200609261707.37720.jackm@dev.mellanox.co.il> Message-ID: <20060926192822.GD24009@mellanox.co.il> Quoting r. Thierry Delaitre : > Subject: Re: [Lustre-discuss] Re: problems with lustre o2ib module & ofed > > > On Tue, 26 Sep 2006, Jack Morgenstein wrote: > > > On Monday 25 September 2006 17:01, Thierry Delaitre wrote: > > > > I noticed in the Lustre configure file the following > > --with-linux=path set path to Linux source (default=/usr/src/linux) > > > > Where does /usr/src/linux link to? > > > > You might consider explicitly specifying the following options as well in the > > Lustre ./configure step: > > > > --with-linux=path set path to Linux source (default=/usr/src/linux) > > --with-linux-obj=path set path to Linux objects dir (default=$LINUX) > > --with-linux-config=path > > set path to Linux .conf (default=$LINUX_OBJ/.config) > > I specified the whole string and still the same. > > ./configure --with-o2ib=/usr/local/ofed/src/openib --with-linux=/usr/src/linux-2.6.16.21-0.8 --with-linux-obj=/usr/src/linux-2.6.16.21-0.8 --with-linux-config=/usr/src/linux-2.6.16.21-0.8/.config > > Thierry. 1. Did you reboot after rebuilding everything? 2. Try to check the compiler command line used for building lustre. You must make sure gen2 is before linux kernel in -I flag list. -- MST From narravul at cse.ohio-state.edu Tue Sep 26 12:38:34 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Tue, 26 Sep 2006 15:38:34 -0400 (EDT) Subject: [openib-general] Port reuse issue for rdma_cm/iwarp Message-ID: Hi, We are facing a problem while running back-to-back applications using the same port number for rdma_cm over iwarp (Ammasso). The port seems to be busy for about 60 seconds after each disconnect. The first execution finishes without any problems or errors. When the execution is repeated immediately, we see a RDMA_CM_EVENT_REJECTED event on the active connect side. However, if we use a different port or if we include a delay of more than 60 seconds between the runs, we do not see this problem. Is this a known issue? Is there anyway to force a immediate reuse of the port? Thanks, --Sundeep. From swise at opengridcomputing.com Tue Sep 26 12:50:51 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 26 Sep 2006 14:50:51 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. Message-ID: <1159300251.11549.6.camel@stevo-desktop> Roland, Whats the status with the main trunk kernel code and 2.6.18? I noticed that it doesn't build and needs something like this. I haven't tested this yet... Signed-off-by: Steve Wise Index: uverbs_main.c =================================================================== --- uverbs_main.c (revision 9632) +++ uverbs_main.c (working copy) @@ -820,11 +820,12 @@ kref_put(&uverbs_dev->ref, ib_uverbs_release_dev); } -static struct super_block *uverbs_event_get_sb(struct file_system_type *fs_type, int flags, - const char *dev_name, void *data)+static int uverbs_event_get_sb(struct file_system_type *fs_type, int flags, + const char *dev_name, void *data, + struct vfsmount *mnt) { return get_sb_pseudo(fs_type, "infinibandevent:", NULL, - INFINIBANDEVENTFS_MAGIC); + INFINIBANDEVENTFS_MAGIC, mnt); } static struct file_system_type uverbs_event_fs = { From swise at opengridcomputing.com Tue Sep 26 13:01:34 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 26 Sep 2006 15:01:34 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159300251.11549.6.camel@stevo-desktop> References: <1159300251.11549.6.camel@stevo-desktop> Message-ID: <1159300894.11549.11.camel@stevo-desktop> On Tue, 2006-09-26 at 14:50 -0500, Steve Wise wrote: > Roland, > > Whats the status with the main trunk kernel code and 2.6.18? > > I noticed that it doesn't build and needs something like this. I haven't > tested this yet... > > Signed-off-by: Steve Wise Oops, that patch was mangled. Try this: Index: uverbs_main.c =================================================================== --- uverbs_main.c (revision 9632) +++ uverbs_main.c (working copy) @@ -820,11 +820,12 @@ kref_put(&uverbs_dev->ref, ib_uverbs_release_dev); } -static struct super_block *uverbs_event_get_sb(struct file_system_type *fs_type, int flags, - const char *dev_name, void *data) +static int uverbs_event_get_sb(struct file_system_type *fs_type, int flags, + const char *dev_name, void *data, + struct vfsmount *mnt) { return get_sb_pseudo(fs_type, "infinibandevent:", NULL, - INFINIBANDEVENTFS_MAGIC); + INFINIBANDEVENTFS_MAGIC, mnt); } static struct file_system_type uverbs_event_fs = { From xma at us.ibm.com Tue Sep 26 13:04:09 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 13:04:09 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159300251.11549.6.camel@stevo-desktop> Message-ID: > Whats the status with the main trunk kernel code and 2.6.18? > > I noticed that it doesn't build and needs something like this. I haven't > tested this yet... Yes. You need this patch and also need change ipoib_multicast.c: dev->xmit_lock to dev->_xmit_lock to build the trunk on 2.6.18. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Sep 26 13:12:04 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 13:12:04 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: We did some touch test on ehca driver, we saw performance drop somehow. I strongly recommand NAPI as a configurable option in ipoib. So customers can turn on/off based on their configurations. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris_youb at yahoo.ca Tue Sep 26 13:21:14 2006 From: chris_youb at yahoo.ca (chris_youb at yahoo.ca) Date: Wed, 27 Sep 2006 04:21:14 +0800 Subject: [openib-general] OpenSM -> 'open /dev/infiniband/umad1 failed' Message-ID: <6169134.1159302074231.JavaMail.websites@opensubscriber> I'm trying to setup OpenSM on one of our boxes. I've installed the RPMs from ofed-1.0-sles10-rpms_i686.tar.gz and updated the firmware on our Mellanox card. When I try to start opensm I get the following error message: 'umad_open_port: open /dev/infiniband/umad1 failed'. Any suggestions of what I can try next? ******** Setup ******** H/W: Dell 1550 O/S: Suse 10.0 (linux 2.6.13-15.12-default) HBC: Mellanox MT23108 rev 3.5.000 S/W: ofed-1.0-sles10-rpms_i686.tar.gz ******** OpenSM ******** linux:/usr/local/ofed/bin # ./opensm -V -d5 ------------------------------------------------- OpenSM Rev:openib-1.2.1 Based on OpenIB svn Exported revision Command Line Arguments: Big V selected d level = 0x5 Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.2.1 OpenIB svn Exported revision ibwarn: [6860] umad_init: ibwarn: [6860] umad_get_cas_names: max 32 ibwarn: [6860] umad_get_cas_names: return 1 cas ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 64 ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] umad_get_port: ca_name (null) portnum 0 ibwarn: [6860] umad_get_cas_names: max 20 ibwarn: [6860] umad_get_cas_names: return 1 cas ibwarn: [6860] resolve_ca_name: checking ca 'mthca0' ibwarn: [6860] resolve_ca_port: checking ca 'mthca0' ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] resolve_ca_port: checking port 0 ibwarn: [6860] resolve_ca_port: checking port 1 ibwarn: [6860] resolve_ca_port: checking port 2 ibwarn: [6860] resolve_ca_name: found ca mthca0 with port 2 type 0 ibwarn: [6860] resolve_ca_name: phys found 0 on mthca0 port 2 ibwarn: [6860] umad_release_port: port mthca0:2 ibwarn: [6860] umad_release_port: releasing mthca0:2 Using default GUID 0x2c90107fbfcf2 ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 32 ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] umad_get_port: ca_name mthca0 portnum 2 ibwarn: [6860] umad_open_port: ca mthca0 port 2 ibwarn: [6860] umad_open_port: opening mthca0 port 2 ibwarn: [6860] dev_to_umad_id: mapped mthca0 2 to 1 ibwarn: [6860] umad_open_port: open /dev/infiniband/umad1 failed Error from osm_opensm_bind (0x2A) Exiting SM ibwarn: [6860] umad_done: ******** Drivers ******** ib_mthca 97692 0 ib_mad 34324 2 ib_umad,ib_mthca ib_core 39680 3 ib_umad,ib_mthca,ib_mad ******** Logs ******** linux:/usr/local/ofed/bin # tail -f /var/log/osm.log Jan 28 14:35:41 017194 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000 Jan 28 14:35:41 017349 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000 Jan 28 14:35:41 025501 [4018DFE0] -> osm_vendor_bind: Binding to port 0x2c90107fbfcf2 Jan 28 14:35:41 030909 [4018DFE0] -> osm_vendor_open_port: ERR 542C: umad_open_port() failed Jan 28 14:35:41 030986 [4018DFE0] -> osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90107fbfcf2 Jan 28 14:35:41 031015 [4018DFE0] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed Jan 28 14:35:41 031228 [4018DFE0] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR) Jan 28 14:35:41 031742 [4018DFE0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind Jan 28 14:35:41 032313 [0000] -> Exiting SM -- This message was sent on behalf of chris_youb at yahoo.ca at openSubscriber.com http://www.opensubscriber.com/messages/openib-general at openib.org/topic.html From halr at voltaire.com Tue Sep 26 13:32:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 26 Sep 2006 23:32:42 +0300 Subject: [openib-general] OpenSM -> 'open /dev/infiniband/umad1 failed' References: <6169134.1159302074231.JavaMail.websites@opensubscriber> Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589ADBC@taurus.voltaire.com> Hi, Do you have udev installed and configured ? You may want to refer to the wiki (https://openib.org/tiki/tiki-index.php) for more troubleshooting info. There's some info in the cheat sheet (https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet) which may help. -- Hal ________________________________ From: openib-general-bounces at openib.org on behalf of chris_youb at yahoo.ca Sent: Tue 9/26/2006 4:21 PM To: openib-general at openib.org Subject: [openib-general] OpenSM -> 'open /dev/infiniband/umad1 failed' I'm trying to setup OpenSM on one of our boxes. I've installed the RPMs from ofed-1.0-sles10-rpms_i686.tar.gz and updated the firmware on our Mellanox card. When I try to start opensm I get the following error message: 'umad_open_port: open /dev/infiniband/umad1 failed'. Any suggestions of what I can try next? ******** Setup ******** H/W: Dell 1550 O/S: Suse 10.0 (linux 2.6.13-15.12-default) HBC: Mellanox MT23108 rev 3.5.000 S/W: ofed-1.0-sles10-rpms_i686.tar.gz ******** OpenSM ******** linux:/usr/local/ofed/bin # ./opensm -V -d5 ------------------------------------------------- OpenSM Rev:openib-1.2.1 Based on OpenIB svn Exported revision Command Line Arguments: Big V selected d level = 0x5 Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.2.1 OpenIB svn Exported revision ibwarn: [6860] umad_init: ibwarn: [6860] umad_get_cas_names: max 32 ibwarn: [6860] umad_get_cas_names: return 1 cas ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 64 ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] umad_get_port: ca_name (null) portnum 0 ibwarn: [6860] umad_get_cas_names: max 20 ibwarn: [6860] umad_get_cas_names: return 1 cas ibwarn: [6860] resolve_ca_name: checking ca 'mthca0' ibwarn: [6860] resolve_ca_port: checking ca 'mthca0' ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] resolve_ca_port: checking port 0 ibwarn: [6860] resolve_ca_port: checking port 1 ibwarn: [6860] resolve_ca_port: checking port 2 ibwarn: [6860] resolve_ca_name: found ca mthca0 with port 2 type 0 ibwarn: [6860] resolve_ca_name: phys found 0 on mthca0 port 2 ibwarn: [6860] umad_release_port: port mthca0:2 ibwarn: [6860] umad_release_port: releasing mthca0:2 Using default GUID 0x2c90107fbfcf2 ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 32 ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports ibwarn: [6860] umad_get_ca: ca_name mthca0 ibwarn: [6860] umad_get_ca: opened mthca0 ibwarn: [6860] umad_get_port: ca_name mthca0 portnum 2 ibwarn: [6860] umad_open_port: ca mthca0 port 2 ibwarn: [6860] umad_open_port: opening mthca0 port 2 ibwarn: [6860] dev_to_umad_id: mapped mthca0 2 to 1 ibwarn: [6860] umad_open_port: open /dev/infiniband/umad1 failed Error from osm_opensm_bind (0x2A) Exiting SM ibwarn: [6860] umad_done: ******** Drivers ******** ib_mthca 97692 0 ib_mad 34324 2 ib_umad,ib_mthca ib_core 39680 3 ib_umad,ib_mthca,ib_mad ******** Logs ******** linux:/usr/local/ofed/bin # tail -f /var/log/osm.log Jan 28 14:35:41 017194 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000 Jan 28 14:35:41 017349 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000 Jan 28 14:35:41 025501 [4018DFE0] -> osm_vendor_bind: Binding to port 0x2c90107fbfcf2 Jan 28 14:35:41 030909 [4018DFE0] -> osm_vendor_open_port: ERR 542C: umad_open_port() failed Jan 28 14:35:41 030986 [4018DFE0] -> osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90107fbfcf2 Jan 28 14:35:41 031015 [4018DFE0] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed Jan 28 14:35:41 031228 [4018DFE0] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR) Jan 28 14:35:41 031742 [4018DFE0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind Jan 28 14:35:41 032313 [0000] -> Exiting SM -- This message was sent on behalf of chris_youb at yahoo.ca at openSubscriber.com http://www.opensubscriber.com/messages/openib-general at openib.org/topic.html _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From caitlinb at broadcom.com Tue Sep 26 13:36:43 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 26 Sep 2006 13:36:43 -0700 Subject: [openib-general] Port reuse issue for rdma_cm/iwarp In-Reply-To: Message-ID: <54AD0F12E08D1541B826BE97C98F99F1A31348@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > Hi, > We are facing a problem while running back-to-back > applications using the same port number for rdma_cm over > iwarp (Ammasso). The port seems to be busy for about 60 > seconds after each disconnect. > > The first execution finishes without any problems or errors. > When the execution is repeated immediately, we see a > RDMA_CM_EVENT_REJECTED event on the active connect side. > However, if we use a different port or if we include a delay > of more than 60 seconds between the runs, we do not see this problem. > > Is this a known issue? Is there anyway to force a immediate > reuse of the port? > TCP restricts prompt re-use of the same Source/Destination Address/Port pair while old traffic could still be in-flight. This is generally not an issue because prompt re-use of the exact four tuple is rare. Is there a special reason why your application needs to reuse the same port from the active side? If the port number is being used to identify the rank, could private data be used instead? From sashak at voltaire.com Tue Sep 26 13:44:08 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 26 Sep 2006 23:44:08 +0300 Subject: [openib-general] OpenSM -> 'open /dev/infiniband/umad1 failed' In-Reply-To: <6169134.1159302074231.JavaMail.websites@opensubscriber> References: <6169134.1159302074231.JavaMail.websites@opensubscriber> Message-ID: <20060926204408.GA23096@sashak.voltaire.com> Hi, On 04:21 Wed 27 Sep , chris_youb at yahoo.ca wrote: > I'm trying to setup OpenSM on one of our boxes. I've installed the RPMs from ofed-1.0-sles10-rpms_i686.tar.gz and updated the firmware on our Mellanox card. > When I try to start opensm I get the following error message: 'umad_open_port: open /dev/infiniband/umad1 failed'. Any suggestions of what I can try next? Be sure that device node '/dev/infiniband/umad1' exists and you have permission to access it for read/write. Sasha > > ******** Setup ******** > H/W: Dell 1550 > O/S: Suse 10.0 (linux 2.6.13-15.12-default) > HBC: Mellanox MT23108 rev 3.5.000 > S/W: ofed-1.0-sles10-rpms_i686.tar.gz > > ******** OpenSM ******** > linux:/usr/local/ofed/bin # ./opensm -V -d5 > ------------------------------------------------- > OpenSM Rev:openib-1.2.1 > Based on OpenIB svn Exported revision > Command Line Arguments: > Big V selected > d level = 0x5 > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.2.1 OpenIB svn Exported revision > > ibwarn: [6860] umad_init: > ibwarn: [6860] umad_get_cas_names: max 32 > ibwarn: [6860] umad_get_cas_names: return 1 cas > ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 64 > ibwarn: [6860] umad_get_ca: ca_name mthca0 > ibwarn: [6860] umad_get_ca: opened mthca0 > ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports > ibwarn: [6860] umad_get_ca: ca_name mthca0 > ibwarn: [6860] umad_get_ca: opened mthca0 > ibwarn: [6860] umad_get_port: ca_name (null) portnum 0 > ibwarn: [6860] umad_get_cas_names: max 20 > ibwarn: [6860] umad_get_cas_names: return 1 cas > ibwarn: [6860] resolve_ca_name: checking ca 'mthca0' > ibwarn: [6860] resolve_ca_port: checking ca 'mthca0' > ibwarn: [6860] umad_get_ca: ca_name mthca0 > ibwarn: [6860] umad_get_ca: opened mthca0 > ibwarn: [6860] resolve_ca_port: checking port 0 > ibwarn: [6860] resolve_ca_port: checking port 1 > ibwarn: [6860] resolve_ca_port: checking port 2 > ibwarn: [6860] resolve_ca_name: found ca mthca0 with port 2 type 0 > ibwarn: [6860] resolve_ca_name: phys found 0 on mthca0 port 2 > ibwarn: [6860] umad_release_port: port mthca0:2 > ibwarn: [6860] umad_release_port: releasing mthca0:2 > Using default GUID 0x2c90107fbfcf2 > ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 32 > ibwarn: [6860] umad_get_ca: ca_name mthca0 > ibwarn: [6860] umad_get_ca: opened mthca0 > ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports > ibwarn: [6860] umad_get_ca: ca_name mthca0 > ibwarn: [6860] umad_get_ca: opened mthca0 > ibwarn: [6860] umad_get_port: ca_name mthca0 portnum 2 > ibwarn: [6860] umad_open_port: ca mthca0 port 2 > ibwarn: [6860] umad_open_port: opening mthca0 port 2 > ibwarn: [6860] dev_to_umad_id: mapped mthca0 2 to 1 > ibwarn: [6860] umad_open_port: open /dev/infiniband/umad1 failed > > Error from osm_opensm_bind (0x2A) > Exiting SM > > ibwarn: [6860] umad_done: > > ******** Drivers ******** > ib_mthca 97692 0 > ib_mad 34324 2 ib_umad,ib_mthca > ib_core 39680 3 ib_umad,ib_mthca,ib_mad > > ******** Logs ******** > linux:/usr/local/ofed/bin # tail -f /var/log/osm.log > Jan 28 14:35:41 017194 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000 > Jan 28 14:35:41 017349 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000 > Jan 28 14:35:41 025501 [4018DFE0] -> osm_vendor_bind: Binding to port 0x2c90107fbfcf2 > Jan 28 14:35:41 030909 [4018DFE0] -> osm_vendor_open_port: ERR 542C: umad_open_port() failed > Jan 28 14:35:41 030986 [4018DFE0] -> osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90107fbfcf2 > Jan 28 14:35:41 031015 [4018DFE0] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed > Jan 28 14:35:41 031228 [4018DFE0] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR) > Jan 28 14:35:41 031742 [4018DFE0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind > Jan 28 14:35:41 032313 [0000] -> Exiting SM > > > -- > This message was sent on behalf of chris_youb at yahoo.ca at openSubscriber.com > http://www.opensubscriber.com/messages/openib-general at openib.org/topic.html > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Tue Sep 26 13:51:30 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 26 Sep 2006 23:51:30 +0300 Subject: [openib-general] [PATCH TRIVIAL] opensm: libibumad: show open()'s errno string. Message-ID: <20060926205130.GB23096@sashak.voltaire.com> Show errno string then open() fails. Signed-off-by: Sasha Khapyorsky --- libibumad/src/umad.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c index cb9eef6..7bf0048 100644 --- a/libibumad/src/umad.c +++ b/libibumad/src/umad.c @@ -575,7 +575,7 @@ umad_open_port(char *ca_name, int portnu UMAD_DEV_DIR , umad_id); if ((port->dev_fd = open(port->dev_file, O_RDWR|O_NONBLOCK)) < 0) { - DEBUG("open %s failed", port->dev_file); + DEBUG("open %s failed: %s", port->dev_file, strerror(errno)); return -EIO; } From shemminger at osdl.org Tue Sep 26 13:51:14 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Tue, 26 Sep 2006 13:51:14 -0700 Subject: [openib-general] Compile warnings (cross build) Message-ID: <20060926135114.1da96c1b@freekitty> Hello, At OSDL we have been running automated cross-compiles on the scsi-misc and scsi-rc-fixes trees and I thought it might be helpful to post the warnings and errors which appear compared to the tree it is based on. SCSI is clean, so mostly these are warnings. We do allmodconfig or defconfig on arm, I386, ia64, powerpc, ppc, sparc64, x86_64. If there were no additional warnings, then that architecture is not in the output. So, here are the _additional_ warnings from the linux-2.6.18-scsi-misc1 compile outputs versus the linux-2.6.18 compile outputs. Let me know if this is useful, or how it could be better. WKR, Judith Lebzelter OSDL *********ia64********* > drivers/infiniband/hw/amso1100/c2_provider.c: In function `c2_reg_phys_mr': > drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: long long unsigned int format, long unsigned int arg (arg 6) > drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: long long unsigned int format, long unsigned int arg (arg 7) > drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: long long unsigned int format, long unsigned int arg (arg 8) > drivers/infiniband/hw/amso1100/c2_rnic.c: In function `c2_rnic_init': > drivers/infiniband/hw/amso1100/c2_rnic.c:529: warning: long long unsigned int format, dma_addr_t arg (arg 4) > drivers/infiniband/hw/amso1100/c2_rnic.c:552: warning: long long unsigned int format, dma_addr_t arg (arg 4) > drivers/infiniband/hw/amso1100/c2_alloc.c: In function `c2_alloc_mqsp': > drivers/infiniband/hw/amso1100/c2_alloc.c:117: warning: long long unsigned int format, long unsigned int arg (arg 4) > drivers/infiniband/hw/amso1100/c2_alloc.c:94: warning: 'mqsp' might be used uninitialized in this function > drivers/infiniband/hw/amso1100/c2_ae.c: In function `c2_ae_event': > drivers/infiniband/hw/amso1100/c2_ae.c:195: warning: long long unsigned int format, long unsigned int arg (arg 4) *********powerpc********* > drivers/infiniband/hw/amso1100/c2_provider.c: In function 'c2_reg_phys_mr': > drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: format '%llx' expects type 'long long unsigned int', but argument 6 has type 'u64' > drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64' > drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64' > drivers/infiniband/hw/amso1100/c2_rnic.c: In function 'c2_rnic_init': > drivers/infiniband/hw/amso1100/c2_rnic.c:529: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'dma_addr_t' > drivers/infiniband/hw/amso1100/c2_rnic.c:552: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'dma_addr_t' > drivers/infiniband/hw/amso1100/c2_alloc.c: In function 'c2_alloc_mqsp': > drivers/infiniband/hw/amso1100/c2_alloc.c:117: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'dma_addr_t' > drivers/infiniband/hw/amso1100/c2_ae.c: In function 'c2_ae_event': > drivers/infiniband/hw/amso1100/c2_ae.c:195: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' -- Stephen Hemminger From rdreier at cisco.com Tue Sep 26 14:09:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Sep 2006 14:09:47 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: (Shirley Ma's message of "Tue, 26 Sep 2006 11:21:14 -0700") References: Message-ID: Shirley> It can be a configuration option to enable/disable NAPI, Shirley> just like other network device. But is there any reason to keep the non-NAPI code around? I hate to have two codepaths to maintain. From rdreier at cisco.com Tue Sep 26 14:11:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Sep 2006 14:11:05 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: (Shirley Ma's message of "Tue, 26 Sep 2006 11:28:49 -0700") References: Message-ID: Shirley> Since linux 2.6.18 supports GSO, I have patched IPoIB to Shirley> enable GSO, but haven't tested the performance yet. Has Shirley> anyone tried already? No, I don't think anyone looked at that yet. Could you post your patch? What is required? Supporting gather/scatter? - R. From rdreier at cisco.com Tue Sep 26 14:13:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Sep 2006 14:13:32 -0700 Subject: [openib-general] Compile warnings (cross build) In-Reply-To: <20060926135114.1da96c1b@freekitty> (Stephen Hemminger's message of "Tue, 26 Sep 2006 13:51:14 -0700") References: <20060926135114.1da96c1b@freekitty> Message-ID: > At OSDL we have been running automated cross-compiles on the > scsi-misc and scsi-rc-fixes trees and I thought it might be > helpful to post the warnings and errors which appear compared > to the tree it is based on. SCSI is clean, so mostly these > are warnings. We do allmodconfig or defconfig on arm, I386, > ia64, powerpc, ppc, sparc64, x86_64. If there were no > additional warnings, then that architecture is not in the output. I assume you mean my infiniband.git tree? (Probably cut-and-paste from another email ;) > Let me know if this is useful, or how it could be better. This is super-useful! Please continue to post these reports. I'll fix up these warnings and merge upstream. Thanks, Roland From rdreier at cisco.com Tue Sep 26 14:15:52 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Sep 2006 14:15:52 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: (Roland Dreier's message of "Tue, 26 Sep 2006 14:09:47 -0700") References: Message-ID: Roland> But is there any reason to keep the non-NAPI code around? Roland> I hate to have two codepaths to maintain. I see you said that ehca showed a performance drop with NAPI. What approach did you use to handle "rotting packets" (race between poll CQ and request notification on a CQ)? Do you know how ehca behaves? Does it have that race? ie what happens in this situation: poll CQ -> CQ is empty (new completion is added to CQ) request notify on CQ (no more completions are added) Mellanox HCAs will generate a CQ event in this case, although it's not strictly required by the IB spec. How will ehca behave? - R. From ttelford.groups at gmail.com Tue Sep 26 14:19:02 2006 From: ttelford.groups at gmail.com (Troy Telford) Date: Tue, 26 Sep 2006 15:19:02 -0600 Subject: [openib-general] DAPL setup/config help Message-ID: I've never set up dapl before, however I now have a reason to try... The problem is, I can't seem to find any documentation on how to set it up. I've tried the sample /etc/dat.conf (modified for the IPoIB address on the system), but I'm not sure I've been sucessful. I've: * compiled from OFED 1.0 * verified the library paths listed in /etc/dat.conf are correct * I do know that things like IP over IB, MVAPICH, Open MPI, etc. work fine; but they're not using DAPL * tried the 'dapltest' and 'dtest' programs. In both cases, I receive an error to the extent of: DAT_PROVIDER_NOT_FOUND DAT_NAME_NOT_REGISTERED Can anybody point me in the right direction (so I can RTFM and get on with life?) From wombat2 at us.ibm.com Tue Sep 26 14:35:34 2006 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Tue, 26 Sep 2006 17:35:34 -0400 Subject: [openib-general] [PATCH] IB/ipoib: NAPI Message-ID: Eli and Roland, Has anyone run the RR test in Netperf to look at latency? What 1 byte RR rates did you see before and after applying the patch. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Sep 26 14:46:53 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 14:46:53 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: Roland, > Do you know how ehca behaves? Does it have that race? ie what > happens in this situation: > > poll CQ -> CQ is empty > (new completion is added to CQ) > request notify on CQ > (no more completions are added) > > Mellanox HCAs will generate a CQ event in this case, although it's not > strictly required by the IB spec. How will ehca behave? > > - R. That could be the reason. I did see mthca poll empty entry, but not on ehca. I will confirm this with ehca team. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From narravul at cse.ohio-state.edu Tue Sep 26 14:53:36 2006 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Tue, 26 Sep 2006 17:53:36 -0400 (EDT) Subject: [openib-general] Port reuse issue for rdma_cm/iwarp In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1A31348@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: > TCP restricts prompt re-use of the same Source/Destination > Address/Port pair while old traffic could still be in-flight. > This is generally not an issue because prompt re-use of the > exact four tuple is rare. > > Is there a special reason why your application needs to > reuse the same port from the active side? If the port number > is being used to identify the rank, could private data be > used instead? Our application is primarily an invocation of multiple independent parallel jobs which all need to connect or each other on each invocation. Since this is a TCP limitation, is there any interface similar to setsockopt with TCP_NODELAY. We probably need to use different ports otherwise. Thanks, --Sundeep. > > > From mst at mellanox.co.il Tue Sep 26 12:20:07 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Sep 2006 22:20:07 +0300 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove In-Reply-To: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com> References: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com> Message-ID: <20060926192007.GA24009@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [RFC] determining which changes in svn to merge upstream or remove > > >Connections taking 60 sec to create is an issue. > >Can you please explain how the fact that some connections are used affect > >the time it takes to send the response? > > This is in userspace, and IMO, an application issue. Threads using established > connections simply begin consuming all processor time. This is while running > under heavy load and trying to scale up the application. > > >Why would sending MRA be faster than sending the response? > > An MRA could be sent directly by the RDMA CM in the kernel in a REQ callback, > whereas the response requires the userspace application to poll the REQ and > generate a REP. I see. So it actually does look like for userspace clients, CMA should send MRA immediately and then let userspace send REP in its own good time. -- MST From ardavis at ichips.intel.com Tue Sep 26 15:15:58 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Tue, 26 Sep 2006 15:15:58 -0700 Subject: [openib-general] DAPL setup/config help In-Reply-To: References: Message-ID: <4519A69E.1070008@ichips.intel.com> Troy Telford wrote: >I've never set up dapl before, however I now have a reason to try... > >The problem is, I can't seem to find any documentation on how to set it >up. I've tried the sample /etc/dat.conf (modified for the IPoIB address >on the system), but I'm not sure I've been sucessful. > >I've: >* compiled from OFED 1.0 >* verified the library paths listed in /etc/dat.conf are correct >* I do know that things like IP over IB, MVAPICH, Open MPI, etc. work >fine; but they're not using DAPL >* tried the 'dapltest' and 'dtest' programs. > >In both cases, I receive an error to the extent of: >DAT_PROVIDER_NOT_FOUND DAT_NAME_NOT_REGISTERED > > The dapl provider name that your application uses for the open must match the ia_name entry in dat.conf. sample dat.conf: # Each entry should have the following fields: # # \ # OpenIB-cma u1.2 nonthreadsafe default /usr/lib/libdaplcma.so mv_dapl.1.2 "ib0 0" "" The dtest makefile with OFED 1.0 should use OpenIB-cma as the provider name instead of OpenIB-cma-ip. The default configuration was fixed in OFED 1.1. For dapltest you must pass this dat.conf name as an argument to all scripts. For example "./srv.sh OpenIB-cma" -arlin >Can anybody point me in the right direction (so I can RTFM and get on with >life?) > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From xma at us.ibm.com Tue Sep 26 14:53:53 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 14:53:53 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: Message-ID: > Shirley> Since linux 2.6.18 supports GSO, I have patched IPoIB to > Shirley> enable GSO, but haven't tested the performance yet. Has > Shirley> anyone tried already? > > No, I don't think anyone looked at that yet. Could you post your > patch? What is required? Supporting gather/scatter? > > - R. Don't need too. GSO only improves sender side performance. It allows large packet send in ULPs, and segments this packet in interface layer before driver xmit. The GSO enablement is through ethtool. Since ipoib doesn't support ethtool, i just simply added a module parameter to set the interface GSO flag when loading the module. My next step is to enable gather/scatter in ipoib send to chain multiple packets together for one door bell. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From ttelford.groups at gmail.com Tue Sep 26 16:05:05 2006 From: ttelford.groups at gmail.com (Troy Telford) Date: Tue, 26 Sep 2006 17:05:05 -0600 Subject: [openib-general] DAPL setup/config help In-Reply-To: <4519A69E.1070008@ichips.intel.com> References: <4519A69E.1070008@ichips.intel.com> Message-ID: So far, so good. Thanks! > The dapl provider name that your application uses for the open must > match the ia_name entry in dat.conf. > > sample dat.conf: > > # Each entry should have the following fields: > # > # \ > # > OpenIB-cma u1.2 nonthreadsafe default /usr/lib/libdaplcma.so mv_dapl.1.2 > "ib0 0" "" > > > The dtest makefile with OFED 1.0 should use OpenIB-cma as the provider > name instead of OpenIB-cma-ip. The default configuration was fixed in > OFED 1.1. > > For dapltest you must pass this dat.conf name as an argument to all > scripts. For example "./srv.sh OpenIB-cma" > > -arlin -- Troy From geneing at gmail.com Tue Sep 26 16:32:23 2006 From: geneing at gmail.com (EI) Date: Tue, 26 Sep 2006 16:32:23 -0700 Subject: [openib-general] 90-ib.rules incorrect? Message-ID: Isn't the format of 90-ib.rules in https://openfabrics.org/svn/gen2/trunk/ofed/openib/scripts/90-ib.rulesincorrect. We have KERNEL="umad*", NAME="infiniband/%k", which should be KERNEL=="umad*", NAME="infiniband/%k" Am I missing something? Eugene -------------- next part -------------- An HTML attachment was scrubbed... URL: From vuhuong at mellanox.com Tue Sep 26 19:56:45 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 26 Sep 2006 19:56:45 -0700 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <20060926144541.GA17938@mellanox.co.il> References: <20060926144541.GA17938@mellanox.co.il> Message-ID: <4519E86D.9030508@mellanox.com> Ishai Rabinovitz wrote: > Hi Roland, > > SRP High Availability needs an initiator to connect to the same target > several times, e.g., once from each IB port of the target (this way we can use > device mapper multipath for failover). Note that both connections are actually > active, e.g. multipath is issuing commands to to get the remote scsi id. > It depends on how you define a path. An target-port-ID & initiator-port-ID tuple will form a path. Current srp implementation uses port GID as initiator_port_ID, and target ioc_guid + id_ext as target_port_ID. With this implementation, a physical host port & physical target port will form a path multipath driver will see the same scsi_id of a lun thru multiple paths > Since multiple channel operation is currently disabled in connection request, > each new connection request will cause the target to disconnect > the existing connection which forces us to bounce a lot between the two channels. Either you can use multiple channels or derive different initiator_port_ID in the login req to have multiple paths on the same physical port Most of srp targets that I tested don't support multiple channels. -vu From mst at mellanox.co.il Tue Sep 26 20:48:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 06:48:51 +0300 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <4519E86D.9030508@mellanox.com> References: <20060926144541.GA17938@mellanox.co.il> <4519E86D.9030508@mellanox.com> Message-ID: <20060927034851.GH24009@mellanox.co.il> Quoting r. Vu Pham : > Most of srp targets that I tested don't support multiple > channels. Which are these? And what happens when you ask for multichannel support? -- MST From mst at mellanox.co.il Tue Sep 26 21:07:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 07:07:45 +0300 Subject: [openib-general] backporting fixes Message-ID: <20060927040745.GI24009@mellanox.co.il> Hi! Now that 2.6.18 (with an additional patch) I looked at backporting bugfixes to older kernels. The main problem I see is that the neighbour destructor interface change is not in 2.6.16, so IPoIB crashes randomly. So approaches are - Try to push the change into 2.6.16 by netdev - Use the all-neighbour list as done by ofed - Abandon the whole project Ideas? -- MST From xma at us.ibm.com Tue Sep 26 21:34:22 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 21:34:22 -0700 Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: <1158850592.24776.156.camel@localhost> Message-ID: Hi, Eli, > Hi, > I have a draft implementation of NAPI in ipoib and got the following > results: > System descriptions > =================== > Quad CPU E64T 2.4 Ghz > 4 GB RAM > MT25204 Sinai HCA > I used netperf for benchmarking, the BW test ran for 600 seconds with 8 > clients and 8 servers. > > The results I received are bellow: > > netperf TCP_STREAM: > BW [MByte/sec] clients side [irqs/sec] server side [irqs/sec] > -------------- ----------------------- ---------------------- > without NAPI: 506 86441 66311 > with NAPI: 550 6830 13600 > > > netperf TCP_RR: > rate [tran/sec] > --------------- > without NAPI: 39600 > with NAPI: 39470 > > > > Please note this is still under work and we plan to do more tests and > measure on other devices. NAPI patch moves ipoib poll from hardware interrupt context to softirq context. It would reduce the hardware interrupts, reduce hardware latency and induce some network latency. It might reduce cpu utilization. But I still question about the BW improvement. I did see various performance with the same test under the same condition. Have you tested this patch with different message sizes, different socket sizes? Are these results consistent better? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue Sep 26 21:44:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 07:44:33 +0300 Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: References: <1158850592.24776.156.camel@localhost> Message-ID: <20060927044433.GK24009@mellanox.co.il> Quoting r. Shirley Ma : > It might reduce cpu utilization. But I still question about the BW > improvement. Well, since (with enough sockets) we are CPU-bound, here's your answer why BW would go up with NAPI. -- MST From mst at mellanox.co.il Tue Sep 26 21:59:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 07:59:30 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20060927045929.GL24009@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: [PATCH] IB/ipoib: NAPI > > We did some touch test on ehca driver, we saw performance drop somehow. Hmm, it seems ehca still defers the completion event to a tasklet. It always seemed weird to me. So that could be the reason - with NAPI you now get 2 tasklet schedules, as you are actually doing part of what NAPI does, inside the low level driver. Try ripping that out and calling the event handler directly, and see what it does to performance with NAPI. > I strongly recommand NAPI as a configurable option in ipoib. So customers can turn on/off based on their configurations. I still hope ehca NAPI performance can be fixed. But if not, maybe we should have the low level driver set a disable_napi flag rather than have users play with module options. -- MST From xma at us.ibm.com Tue Sep 26 22:08:35 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 22:08:35 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: Hi, Roland, > Shirley> It can be a configuration option to enable/disable NAPI, > Shirley> just like other network device. > > But is there any reason to keep the non-NAPI code around? I hate to > have two codepaths to maintain. If you would like to maintain one code path only, then we need to compare the NAPI patch with thread-context polling mode patch. I did see big performance improvement with thread-context polling mode patch I have been working on. (I used to split CQ. I am tring without splitting CQ now). And I think it would improve multiple links performance in share one EQ situation. thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Sep 26 22:55:11 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 22:55:11 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060927045929.GL24009@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 09/26/2006 09:59:30 PM: > Quoting r. Shirley Ma : > > Subject: Re: [PATCH] IB/ipoib: NAPI > > > > We did some touch test on ehca driver, we saw performance drop somehow. > > Hmm, it seems ehca still defers the completion event to a tasklet. It always > seemed weird to me. So that could be the reason - with NAPI you now get 2 > tasklet schedules, as you are actually doing part of what NAPI does,inside the > low level driver. Try ripping that out and calling the event > handler directly, > and see what it does to performance with NAPI. The reason for this ehca implementation is two ports/links shared one EQ. We are implementing multiple EQs suport for one adapter now. If that works, then we can modify the ehca code as mthca. Actually mthca has the same problem as ehca over two links on the same adapter. Two links on the same adapter performance are very bad, not scaled at all. > > I strongly recommand NAPI as a configurable option in ipoib. So > customers can turn on/off based on their configurations. > > I still hope ehca NAPI performance can be fixed. But if not, maybe we should > have the low level driver set a disable_napi flag rather than have users play > with module options. > > -- > MST We have been working on this issue for some time. That's the reason we didn't post our NAPI patch. Hopefully we can fix it. If we can show NAPI performance (latency, BW, cpu utilization) are better in all cases (UP vs. SMP, one socket vs. multiple sockets, one link vs. multiple links, different message sizes, different socket sizes....) I will agree to turn on NAPI as default. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Sep 26 23:05:55 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 23:05:55 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060927045929.GL24009@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 09/26/2006 09:59:30 PM: > I still hope ehca NAPI performance can be fixed. But if not, maybe we should > have the low level driver set a disable_napi flag rather than have users play > with module options. > > -- > MST I forgot to mention these NAPI parameters should be tunable for different device drivers, like dev->weight, or set up in lower driver. thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From delaitt at cpc.wmin.ac.uk Tue Sep 26 23:16:02 2006 From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre) Date: Wed, 27 Sep 2006 07:16:02 +0100 (BST) Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib module & ofed In-Reply-To: <20060926192822.GD24009@mellanox.co.il> References: <200609261707.37720.jackm@dev.mellanox.co.il> <20060926192822.GD24009@mellanox.co.il> Message-ID: On Tue, 26 Sep 2006, Michael S. Tsirkin wrote: > Quoting r. Thierry Delaitre : > > Subject: Re: [Lustre-discuss] Re: problems with lustre o2ib module & ofed > > > > > > On Tue, 26 Sep 2006, Jack Morgenstein wrote: > > > > > On Monday 25 September 2006 17:01, Thierry Delaitre wrote: > > > > > > I noticed in the Lustre configure file the following > > > --with-linux=path set path to Linux source (default=/usr/src/linux) > > > > > > Where does /usr/src/linux link to? > > > > > > You might consider explicitly specifying the following options as well in the > > > Lustre ./configure step: > > > > > > --with-linux=path set path to Linux source (default=/usr/src/linux) > > > --with-linux-obj=path set path to Linux objects dir (default=$LINUX) > > > --with-linux-config=path > > > set path to Linux .conf (default=$LINUX_OBJ/.config) > > > > I specified the whole string and still the same. > > > > ./configure --with-o2ib=/usr/local/ofed/src/openib --with-linux=/usr/src/linux-2.6.16.21-0.8 --with-linux-obj=/usr/src/linux-2.6.16.21-0.8 --with-linux-config=/usr/src/linux-2.6.16.21-0.8/.config > > > > Thierry. > > 1. Did you reboot after rebuilding everything? > > 2. Try to check the compiler command line used for building lustre. > You must make sure gen2 is before linux kernel in -I flag list. I've managed to solve the problem by deleting the /usr/src/linux/include/rdma and pointing /usr/src/linux/driver/infiniband to /usr/local/ofed/src/openib/driver/infiniband Thanks to all for your help. Thierry. From mst at mellanox.co.il Tue Sep 26 23:23:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 09:23:16 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20060927062316.GO24009@mellanox.co.il> Quoting r. Shirley Ma : > We > are implementing multiple EQs suport for one adapter now. I think with MSI we can have a per-interface EQ in mthca. Main reason I'm not doing this is because I haven't figured out the right interface to pass this information to the low level driver yet. Maybe we should just assign EQs to CQs in a round-robin fashion for now, and just hope typical use allocates CQs sequentially. Worst case, we are back to where we are now, performance-wise. Roland, how does this sound? > If that works, then > we can modify the ehca code as mthca. Actually mthca has the same problem as > ehca over two links on the same adapter. OK, but if as you point out the issue is not device-specific - that's a good reason not to do tricks in low-level driver to try and work around this, but address this at ULP level. -- MST From mst at mellanox.co.il Tue Sep 26 23:28:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 09:28:22 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20060927062822.GQ24009@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: [PATCH] IB/ipoib: NAPI > > "Michael S. Tsirkin" wrote on 09/26/2006 09:59:30 PM: > > I still hope ehca NAPI performance can be fixed. But if not, maybe we should > > have the low level driver set a disable_napi flag rather than have users play > > with module options. > > I forgot to mention these NAPI parameters should be tunable for different device drivers, like dev->weight, or set up in lower driver. So we need something like poll_weight in struct ib_device, to give a hint on how expensive an interrupt is versus poll? Seems to make sense, and actually might be useful for other ULPs. Roland, what do you think? -- MST From eli at dev.mellanox.co.il Tue Sep 26 23:35:26 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Wed, 27 Sep 2006 09:35:26 +0300 Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: References: Message-ID: <1159338926.27719.17.camel@localhost> On Tue, 2006-09-26 at 21:34 -0700, Shirley Ma wrote: > NAPI patch moves ipoib poll from hardware interrupt context to softirq > context. It would reduce the hardware interrupts, reduce hardware > latency and induce some network latency. It might reduce cpu > utilization. But I still question about the BW improvement. I did see > various performance with the same test under the same condition. > When you open just one connection you can see around 10% of variations in BW measure. But then you don't utilize all the CPU power you have and you don't get to the threshold where NAPI becomes effective. Using multiple connections utilizes all CPUs in the system, increases send rate, and increases the chances of the receiver to poll CQEs up to its quota and be scheduled again without re-enabling interrupts. > Have you tested this patch with different message sizes, different > socket sizes? Are these results consistent better? > I used large socket sizes but I think with that with multiple connections this parameter does not have significant effect. From eli at dev.mellanox.co.il Tue Sep 26 23:38:31 2006 From: eli at dev.mellanox.co.il (Eli cohen) Date: Wed, 27 Sep 2006 09:38:31 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <1159339111.27719.21.camel@localhost> On Tue, 2006-09-26 at 17:35 -0400, Bernard King-Smith wrote: > Has anyone run the RR test in Netperf to look at latency? What 1 byte > RR rates did you see before and after applying the patch. > netperf TCP_RR: rate [tran/sec] --------------- without NAPI: 39600 with NAPI: 39470 As you can see there is a minor difference. From xma at us.ibm.com Tue Sep 26 23:39:37 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 26 Sep 2006 23:39:37 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060927062316.GO24009@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 09/26/2006 11:23:16 PM: > Quoting r. Shirley Ma : > > We > > are implementing multiple EQs suport for one adapter now. > > I think with MSI we can have a per-interface EQ in mthca. > Main reason I'm not doing this is because I haven't figured out > the right interface to pass this information to the low level driver yet. > > Maybe we should just assign EQs to CQs in a round-robin fashion > for now, and just hope typical use allocates CQs sequentially. > Worst case, we are back to where we are now, performance-wise. > Roland, how does this sound? > > > If that works, then > > we can modify the ehca code as mthca. Actually mthca has the same problem as > > ehca over two links on the same adapter. > > OK, but if as you point out the issue is not device-specific - > that's a good reason not to do tricks in low-level driver to try > and work around this, but address this at ULP level. > > -- > MST Yes. That's what we are working on to define the right APIs to pass information to low level driver. Now we are trying per interface per EQ, then we will extent the work to N(CQ):M(EQ) mapping. ehca could support up to 127 EQs, I would suggest to use hash. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From vuhuong at mellanox.com Tue Sep 26 23:45:50 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 26 Sep 2006 23:45:50 -0700 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <20060927034851.GH24009@mellanox.co.il> References: <20060926144541.GA17938@mellanox.co.il> <4519E86D.9030508@mellanox.com> <20060927034851.GH24009@mellanox.co.il> Message-ID: <451A1E1E.80203@mellanox.com> Michael S. Tsirkin wrote: > Quoting r. Vu Pham : > >>Most of srp targets that I tested don't support multiple >>channels. > > > Which are these? Mellanox referenced srp target, Texas Memory System's SSD, Engenio. > And what happens when you ask for multichannel support? > For Texas' SSD the login req is rejected For Mellanox srp target the new multi channel/connection is established; however, if the host is in error recovery and do target reset, the host should terminate all outstanding channels/connections else the target have outstanding I/Os dangled on multi-channel/connection and try to complete the I/Os. This is violate scsi task management From mst at mellanox.co.il Wed Sep 27 00:10:59 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 10:10:59 +0300 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <4519E86D.9030508@mellanox.com> References: <20060926144541.GA17938@mellanox.co.il> <4519E86D.9030508@mellanox.com> Message-ID: <20060927071059.GA21509@mellanox.co.il> Quoting r. Vu Pham : > Either you can use multiple channels or derive different > initiator_port_ID in the login req to have multiple paths on > the same physical port So how about we just stick a pointer inside the indentifier extension instead of enabling multichannel? -- MST From sweitzen at cisco.com Wed Sep 27 00:20:35 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 27 Sep 2006 00:20:35 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status Message-ID: Yes, this is fine with me. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openfabrics-ewg-bounces at openib.org > [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Aviram Gutman > Sent: Tuesday, September 26, 2006 9:01 AM > To: EWG; Openib-General at Openib.Org > Subject: [openfabrics-ewg] OFED Status > > Hi, > > OFED 1.1 RC6 was released on Thu. > > The issues that were resolved since are: > > 1) OpenIB Diags build on SLES10 ppc - Solved by Moshe Katzir > from Voltaire > 2) iSER build on SLES10 needs root privilege - Voltaire fixed it > 3) Bug #233 SDP crash on ipath - I believe MST fixed. Betsy > please confirm. > 4) Fix IBDM to allow multiple devices on the same machine - > Eitan Zahavi > fixed > 5) SRP HA - Fixed by Ishai > 6) IPoIB HA on RH - Vlad made progess, issue is still not solved. > 7) The CM fix that Arlin asked - In > > Pending that IPoIB HA is solved would like to issue RC7 that > suppose to > be final. Is everyone OK with this approach? > > > Aviram > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From erezz at voltaire.com Wed Sep 27 00:55:42 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 27 Sep 2006 10:55:42 +0300 Subject: [openib-general] oops after rmmod ib_cm when stopping iSER Message-ID: <451A2E7E.8050504@voltaire.com> Sean, When stopping iSER, we run 'modprobe -r ib_iser'. Then, we see an oops (below). In order to check which module caused that oops, I replaced the 'modprobe -r' call with rmmod for each module: rmmod ib_iser rmmod libiscsi rmmod scsi_transport_iscsi rmmod rdma_cm rmmod ib_addr rmmod ib_cm If I wait a few seconds before the removal of ib_cm, everything is ok. thyme login: Sep 27 09:50:08 thyme kernel: iser: iscsi_iser_ep_disconnect:ib conn ffff81005e426000 state 2 Sep 27 09:50:08 thyme kernel: iser: iser_cq_tasklet_fn:comp w. error op 0 status 5 Sep 27 09:50:08 thyme last message repeated 3 times Sep 27 09:50:08 thyme kernel: iser: iser_cma_handler:event 10 conn ffff81005e426000 id ffff81006c304a00 Sep 27 09:50:08 thyme kernel: iser: iser_free_ib_conn_res:freeing conn ffff81005e426000 cma_id ffff81006c304a00 fmr pool ffff8100560f2e40 qp f0 Sep 27 09:50:08 thyme kernel: iser: iser_device_try_release:device ffff8100796037c0 refcount 0 Sep 27 09:50:09 thyme kernel: cma_cleanup: entry Sep 27 09:50:09 thyme kernel: cma_cleanup: calling destroy_workqueue Sep 27 09:50:09 thyme kernel: cma_cleanup: calling idr_destroy(&sdp_ps) Sep 27 09:50:09 thyme kernel: cma_cleanup: calling idr_destroy(&tcp_ps) Sep 27 09:50:09 thyme kernel: cma_cleanup: exit Sep 27 09:50:09 thyme kernel: ib_cm_cleanup: entry Sep 27 09:50:09 thyme kernel: ib_cm_cleanup: calling ib_unregister_client Sep 27 09:50:09 thyme kernel: ib_cm_cleanup: calling idr_destroy Sep 27 09:50:09 thyme kernel: ib_cm_cleanup: exit Unable to handle kernel paging request at ffffffff8b02e017 RIP: [] delayed_work_timer_fn+0x2c/0x40 PGD 203027 PUD 205027 PMD 0 Oops: 0000 [1] SMP CPU 3 Modules linked in: ib_uverbs ib_ipoib ib_sa autofs usbserial parport_pc lp parport edd cpufreq_userspace acpi_cpufreq thermal processor fan bud Pid: 0, comm: swapper Not tainted 2.6.18-rc4-ga2d9f966-dirty #1 RIP: 0010:[] [] delayed_work_timer_fn+0x2c/0x40 RSP: 0018:ffff81007e36fef8 EFLAGS: 00010246 RAX: ffffffff8b02dfff RBX: 0000000000000100 RCX: ffff81006b152d20 RDX: 0000000000000003 RSI: ffff810068576a00 RDI: ffff810068576a00 RBP: ffff81007e340000 R08: efe9331445cb91ec R09: ffff81007e3a8008 R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff80241310 R13: ffff81007e36ff00 R14: 000000000000000a R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff81007e344b40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffffff8b02e017 CR3: 0000000060a44000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff81007e36a000, task ffff81007e347080) Stack: ffffffff80239826 ffff81007e36ff00 ffff81007e36ff00 ffff81000102ac20 0000000000000000 0000000000000096 0000000000000011 ffffffff8065a110 ffffffff806b2b20 0000000000000003 ffffffff80235d0b ffff81007e36ff48 Call Trace: [] run_timer_softirq+0x156/0x1e0 [] __do_softirq+0x6b/0xe0 [] call_softirq+0x1c/0x34 [] do_softirq+0x2c/0x90 [] mwait_idle+0x0/0x50 [] apic_timer_interrupt+0x66/0x6c [] mwait_idle+0x36/0x50 [] cpu_idle+0x6a/0x90 [] start_secondary+0x499/0x4b0 Code: 48 8b 3c d0 e9 4b ff ff ff 66 66 66 90 66 66 66 90 66 66 90 RIP [] delayed_work_timer_fn+0x2c/0x40 RSP CR2: ffffffff8b02e017 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com From mst at mellanox.co.il Wed Sep 27 01:30:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 11:30:03 +0300 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: References: Message-ID: <20060927083003.GB22263@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: enable GSO over IPoIB > > > Shirley> Since linux 2.6.18 supports GSO, I have patched IPoIB to > > Shirley> enable GSO, but haven't tested the performance yet. Has > > Shirley> anyone tried already? > > > > No, I don't think anyone looked at that yet. Could you post your > > patch? What is required? Supporting gather/scatter? > > > > - R. > > Don't need too. GSO only improves sender side performance. It allows large packet send in ULPs, and segments this packet in interface layer before driver xmit. The GSO enablement is through ethtool. Since ipoib doesn't support ethtool, i just simply added a module parameter to set the interface GSO flag when loading the module. Any idea what does ethtool do that IPoIB can't support? -- MST From xma at us.ibm.com Wed Sep 27 01:37:04 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 27 Sep 2006 01:37:04 -0700 Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: <1159338926.27719.17.camel@localhost> Message-ID: Hi, Eli, Eli cohen wrote on 09/26/2006 11:35:26 PM: > On Tue, 2006-09-26 at 21:34 -0700, Shirley Ma wrote: > > > NAPI patch moves ipoib poll from hardware interrupt context to softirq > > context. It would reduce the hardware interrupts, reduce hardware > > latency and induce some network latency. It might reduce cpu > > utilization. But I still question about the BW improvement. I did see > > various performance with the same test under the same condition. > > > When you open just one connection you can see around 10% of variations > in BW measure. But then you don't utilize all the CPU power you have and > you don't get to the threshold where NAPI becomes effective. > Using multiple connections utilizes all CPUs in the system, increases > send rate, and increases the chances of the receiver to poll CQEs up to > its quota and be scheduled again without re-enabling interrupts. Send rate shouldn't be limited by one connection. The cpu is much faster than the link speed. I don't think multiple connections send rate is increased than one connection. Do you have any data to show that? When I monitored the CQEs, I didn't see too many CQEs in CQ for one notification, and I don't think moving NAPI from hardware interrupt context to softirq context would increase that number. Or the latency might cause the number increased, I did see that number increased and performance increased with some udelay in hardware interrupt polling mode. If you saw the packets increased, how many packets did you see in both one hardware interrupt poll and one NAPI poll? You NAPI poll is driven either by receiver quota or any send CQE in CQ. Have you tested UDP performance? any difference? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed Sep 27 01:45:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 11:45:25 +0300 Subject: [openib-general] heads-up - ipoib NAPI In-Reply-To: References: <1159338926.27719.17.camel@localhost> Message-ID: <20060927084525.GB22364@mellanox.co.il> Quoting r. Shirley Ma : > You NAPI poll is driven either by receiver quota or any send CQE in CQ. Have you tested UDP performance? any difference? The thing to do currently is probably to wait for Roland to post an updated patch, then test it. -- MST From xma at us.ibm.com Wed Sep 27 01:45:49 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 27 Sep 2006 01:45:49 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: <20060927083003.GB22263@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 09/27/2006 01:30:03 AM: >Any idea what does ethtool do that IPoIB can't support? ethtool is an ethernet device tool. It's OK to partically implement ethtool operations in IPoIB. We also need to patch the userlevel utility to support ibX interface. Now it only supports ethX. thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From dennis at osc.edu Wed Sep 27 02:44:22 2006 From: dennis at osc.edu (Dennis Dalessandro) Date: Wed, 27 Sep 2006 11:44:22 +0200 Subject: [openib-general] Port reuse issue for rdma_cm/iwarp In-Reply-To: References: Message-ID: <1159350262.2785.3.camel@barney> Has to do with the socket going into the time wait state, which is because it is waiting for any possibly still in flight packets as Caitlin said. From what I was told, there is not really any option to get around this with the Ammasso card. This was back when they were still in business though, and for their ccil driver. Probably better off to use different ports. -Dennis On Tue, 2006-09-26 at 17:53 -0400, Sundeep Narravula wrote: > > TCP restricts prompt re-use of the same Source/Destination > > Address/Port pair while old traffic could still be in-flight. > > This is generally not an issue because prompt re-use of the > > exact four tuple is rare. > > > > Is there a special reason why your application needs to > > reuse the same port from the active side? If the port number > > is being used to identify the rank, could private data be > > used instead? > > Our application is primarily an invocation of multiple independent parallel > jobs which all need to connect or each other on each invocation. Since > this is a TCP limitation, is there any interface similar to setsockopt > with TCP_NODELAY. We probably need to use different ports otherwise. > > Thanks, > --Sundeep. > > > > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From erezz at voltaire.com Wed Sep 27 05:21:35 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 27 Sep 2006 15:21:35 +0300 (IDT) Subject: [openib-general] [PATCH 0/3] IB/iser: bug fixes for 2.6.19 rc1 Message-ID: Roland, Here is a series of patches for iSER. Most of them are bug fixes. I hope that they can be added to rc1. Thanks Erez From erezz at voltaire.com Wed Sep 27 05:27:10 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 27 Sep 2006 15:27:10 +0300 (IDT) Subject: [openib-general] [PATCH 1/3] IB/iser: have iSER data transaction object pointing to iSER conn In-Reply-To: Message-ID: iSER uses a data transaction object (struct iser_dto) as part of its IB data descriptors (struct iser_desc) management. It also uses a hierarchy of connection structures pointing to each other. A DTO may exist even after the iscsi_iser connection pointed by it is destructed (eg one that is bounded to post receive buffer which was flushed by the IB HW). Hence DTOs need point to the lowest connection, which is struct iser_conn. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.c | 2 ++ drivers/infiniband/ulp/iser/iscsi_iser.h | 2 +- drivers/infiniband/ulp/iser/iser_initiator.c | 11 ++++++----- drivers/infiniband/ulp/iser/iser_verbs.c | 8 +++++--- 4 files changed, 14 insertions(+), 9 deletions(-) 57b132002a5e3bf3ba0ae362f174404e29c69449 diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 101e407..b37f429 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -317,6 +317,8 @@ iscsi_iser_conn_destroy(struct iscsi_cls struct iscsi_iser_conn *iser_conn = conn->dd_data; iscsi_conn_teardown(cls_conn); + if (iser_conn->ib_conn) + iser_conn->ib_conn->iser_conn = NULL; kfree(iser_conn); } diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 7c3d0c9..7f44636 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -187,7 +187,7 @@ struct iser_regd_buf { struct iser_dto { struct iscsi_iser_cmd_task *ctask; - struct iscsi_iser_conn *conn; + struct iser_conn *ib_conn; int notify_enable; /* vector of registered buffers */ diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index ccf56f6..14ae61e 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -249,7 +249,7 @@ static int iser_post_receive_control(str } recv_dto = &rx_desc->dto; - recv_dto->conn = iser_conn; + recv_dto->ib_conn = iser_conn->ib_conn; recv_dto->regd_vector_len = 0; regd_hdr = &rx_desc->hdr_regd_buf; @@ -296,7 +296,7 @@ static void iser_create_send_desc(struct regd_hdr->virt_addr = tx_desc; /* == &tx_desc->iser_header */ regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; - send_dto->conn = iser_conn; + send_dto->ib_conn = iser_conn->ib_conn; send_dto->notify_enable = 1; send_dto->regd_vector_len = 0; @@ -588,7 +588,7 @@ void iser_rcv_completion(struct iser_des unsigned long dto_xfer_len) { struct iser_dto *dto = &rx_desc->dto; - struct iscsi_iser_conn *conn = dto->conn; + struct iscsi_iser_conn *conn = dto->ib_conn->iser_conn; struct iscsi_session *session = conn->iscsi_conn->session; struct iscsi_cmd_task *ctask; struct iscsi_iser_cmd_task *iser_ctask; @@ -641,7 +641,8 @@ void iser_rcv_completion(struct iser_des void iser_snd_completion(struct iser_desc *tx_desc) { struct iser_dto *dto = &tx_desc->dto; - struct iscsi_iser_conn *iser_conn = dto->conn; + struct iser_conn *ib_conn = dto->ib_conn; + struct iscsi_iser_conn *iser_conn = ib_conn->iser_conn; struct iscsi_conn *conn = iser_conn->iscsi_conn; struct iscsi_mgmt_task *mtask; @@ -652,7 +653,7 @@ void iser_snd_completion(struct iser_des if (tx_desc->type == ISCSI_TX_DATAOUT) kmem_cache_free(ig.desc_cache, tx_desc); - atomic_dec(&iser_conn->ib_conn->post_send_buf_count); + atomic_dec(&ib_conn->post_send_buf_count); write_lock(conn->recv_lock); if (conn->suspend_tx) { diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 72febf1..11d4e87 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -570,6 +570,8 @@ void iser_conn_release(struct iser_conn /* on EVENT_ADDR_ERROR there's no device yet for this conn */ if (device != NULL) iser_device_try_release(device); + if (ib_conn->iser_conn) + ib_conn->iser_conn->ib_conn = NULL; kfree(ib_conn); } @@ -692,7 +694,7 @@ int iser_post_recv(struct iser_desc *rx_ struct iser_dto *recv_dto = &rx_desc->dto; /* Retrieve conn */ - ib_conn = recv_dto->conn->ib_conn; + ib_conn = recv_dto->ib_conn; iser_dto_to_iov(recv_dto, iov, 2); @@ -725,7 +727,7 @@ int iser_post_send(struct iser_desc *tx_ struct iser_conn *ib_conn; struct iser_dto *dto = &tx_desc->dto; - ib_conn = dto->conn->ib_conn; + ib_conn = dto->ib_conn; iser_dto_to_iov(dto, iov, MAX_REGD_BUF_VECTOR_LEN); @@ -772,7 +774,7 @@ static void iser_comp_error_worker(void static void iser_handle_comp_error(struct iser_desc *desc) { struct iser_dto *dto = &desc->dto; - struct iser_conn *ib_conn = dto->conn->ib_conn; + struct iser_conn *ib_conn = dto->ib_conn; iser_dto_buffs_release(dto); -- 1.2.6 From erezz at voltaire.com Wed Sep 27 06:43:06 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 27 Sep 2006 16:43:06 +0300 (IDT) Subject: [openib-general] [PATCH 2/3] IB/iser: dma unmap an unaligned for rdma data before touching it In-Reply-To: Message-ID: iSER uses the dma mapping api to map the page holding the scsi command data to the hca dma address space. When the command data is not aligned for rdma, the data is copied to/from an allocated buffer which in turn is used for executing this command. The pages associated with the command must be unmapped before being are touched. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.h | 7 ++++ drivers/infiniband/ulp/iser/iser_initiator.c | 49 +++++--------------------- drivers/infiniband/ulp/iser/iser_memory.c | 42 ++++++++++++++++++++++ 3 files changed, 59 insertions(+), 39 deletions(-) 78a237418bd3547cfeb49828a8b857ac5241749f diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 7f44636..4a7069f 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -350,4 +350,11 @@ int iser_post_send(struct iser_desc *tx int iser_conn_state_comp(struct iser_conn *ib_conn, enum iser_ib_conn_state comp); + +int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, + struct iser_data_buf *data, + enum iser_data_dir iser_dir, + enum dma_data_direction dma_dir); + +void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask); #endif diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 14ae61e..9b3d79c 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -66,42 +66,6 @@ static void iser_dto_add_regd_buff(struc dto->regd_vector_len++; } -static int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, - struct iser_data_buf *data, - enum iser_data_dir iser_dir, - enum dma_data_direction dma_dir) -{ - struct device *dma_device; - - iser_ctask->dir[iser_dir] = 1; - dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; - - data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir); - if (data->dma_nents == 0) { - iser_err("dma_map_sg failed!!!\n"); - return -EINVAL; - } - return 0; -} - -static void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask) -{ - struct device *dma_device; - struct iser_data_buf *data; - - dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; - - if (iser_ctask->dir[ISER_DIR_IN]) { - data = &iser_ctask->data[ISER_DIR_IN]; - dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE); - } - - if (iser_ctask->dir[ISER_DIR_OUT]) { - data = &iser_ctask->data[ISER_DIR_OUT]; - dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE); - } -} - /* Register user buffer memory and initialize passive rdma * dto descriptor. Total data size is stored in * iser_ctask->data[ISER_DIR_IN].data_len @@ -699,14 +663,19 @@ void iser_ctask_rdma_init(struct iscsi_i void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask) { int deferred; + int is_rdma_aligned = 1; /* if we were reading, copy back to unaligned sglist, * anyway dma_unmap and free the copy */ - if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) + if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) { + is_rdma_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_IN); - if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) + } + if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) { + is_rdma_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_OUT); + } if (iser_ctask->dir[ISER_DIR_IN]) { deferred = iser_regd_buff_release @@ -726,7 +695,9 @@ void iser_ctask_rdma_finalize(struct isc } } - iser_dma_unmap_task_data(iser_ctask); + /* if the data was unaligned, it was already unmapped and then copied */ + if (is_rdma_aligned) + iser_dma_unmap_task_data(iser_ctask); } void iser_dto_buffs_release(struct iser_dto *dto) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 31950a5..0f87163 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -360,6 +360,44 @@ static void iser_page_vec_build(struct i } } +int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, + struct iser_data_buf *data, + enum iser_data_dir iser_dir, + enum dma_data_direction dma_dir) +{ + struct device *dma_device; + + iser_ctask->dir[iser_dir] = 1; + dma_device = + iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir); + if (data->dma_nents == 0) { + iser_err("dma_map_sg failed!!!\n"); + return -EINVAL; + } + return 0; +} + +void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask) +{ + struct device *dma_device; + struct iser_data_buf *data; + + dma_device = + iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + if (iser_ctask->dir[ISER_DIR_IN]) { + data = &iser_ctask->data[ISER_DIR_IN]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE); + } + + if (iser_ctask->dir[ISER_DIR_OUT]) { + data = &iser_ctask->data[ISER_DIR_OUT]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE); + } +} + /** * iser_reg_rdma_mem - Registers memory intended for RDMA, * obtaining rkey and va @@ -382,6 +420,10 @@ int iser_reg_rdma_mem(struct iscsi_iser_ iser_err("rdma alignment violation %d/%d aligned\n", aligned_len, mem->size); iser_data_buf_dump(mem); + + /* unmap the command data before accessing it */ + iser_dma_unmap_task_data(iser_ctask); + /* allocate copy buf, if we are writing, copy the */ /* unaligned scatterlist, dma map the copy */ if (iser_start_rdma_unaligned_sg(iser_ctask, cmd_dir) != 0) -- 1.2.6 From erezz at voltaire.com Wed Sep 27 06:48:57 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 27 Sep 2006 16:48:57 +0300 (IDT) Subject: [openib-general] [PATCH 3/3] IB/iser: fix the description of iSER in Kconfig In-Reply-To: Message-ID: fix the description of iSER in Kconfig. It is not accurate. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/Kconfig | 11 ++++++----- 1 files changed, 6 insertions(+), 5 deletions(-) e6a8887cad4e2270c5173451e8b706b907b88133 diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig index fead87d..80f6716 100644 --- a/drivers/infiniband/ulp/iser/Kconfig +++ b/drivers/infiniband/ulp/iser/Kconfig @@ -1,11 +1,12 @@ config INFINIBAND_ISER - tristate "ISCSI RDMA Protocol" + tristate "iSCSI Extensions for RDMA (iSER)" depends on INFINIBAND && SCSI select SCSI_ISCSI_ATTRS ---help--- - Support for the ISCSI RDMA Protocol over InfiniBand. This - allows you to access storage devices that speak ISER/ISCSI + Support for the iSCSI Extensions for RDMA (iSER) Protocol over InfiniBand. This + allows you to access storage devices that speak iSCSI over iSER over InfiniBand. - The ISER protocol is defined by IETF. - See . + The iSER protocol is defined by IETF. + See + and -- 1.2.6 From erezz at voltaire.com Wed Sep 27 06:58:02 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 27 Sep 2006 16:58:02 +0300 (IDT) Subject: [openib-general] [PATCH 1/3] IB/iser: have iSER data transaction object pointing to iSER conn In-Reply-To: Message-ID: (This patch may be a duplicate. Something went wrong with my previous mail.) iSER uses a data transaction object (struct iser_dto) as part of its IB data descriptors (struct iser_desc) management. It also uses a hierarchy of connection structures pointing to each other. A DTO may exist even after the iscsi_iser connection pointed by it is destructed (eg one that is bounded to post receive buffer which was flushed by the IB HW). Hence DTOs need point to the lowest connection, which is struct iser_conn. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.c | 2 ++ drivers/infiniband/ulp/iser/iscsi_iser.h | 2 +- drivers/infiniband/ulp/iser/iser_initiator.c | 11 ++++++----- drivers/infiniband/ulp/iser/iser_verbs.c | 8 +++++--- 4 files changed, 14 insertions(+), 9 deletions(-) 57b132002a5e3bf3ba0ae362f174404e29c69449 diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 101e407..b37f429 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -317,6 +317,8 @@ iscsi_iser_conn_destroy(struct iscsi_cls struct iscsi_iser_conn *iser_conn = conn->dd_data; iscsi_conn_teardown(cls_conn); + if (iser_conn->ib_conn) + iser_conn->ib_conn->iser_conn = NULL; kfree(iser_conn); } diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 7c3d0c9..7f44636 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -187,7 +187,7 @@ struct iser_regd_buf { struct iser_dto { struct iscsi_iser_cmd_task *ctask; - struct iscsi_iser_conn *conn; + struct iser_conn *ib_conn; int notify_enable; /* vector of registered buffers */ diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index ccf56f6..14ae61e 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -249,7 +249,7 @@ static int iser_post_receive_control(str } recv_dto = &rx_desc->dto; - recv_dto->conn = iser_conn; + recv_dto->ib_conn = iser_conn->ib_conn; recv_dto->regd_vector_len = 0; regd_hdr = &rx_desc->hdr_regd_buf; @@ -296,7 +296,7 @@ static void iser_create_send_desc(struct regd_hdr->virt_addr = tx_desc; /* == &tx_desc->iser_header */ regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; - send_dto->conn = iser_conn; + send_dto->ib_conn = iser_conn->ib_conn; send_dto->notify_enable = 1; send_dto->regd_vector_len = 0; @@ -588,7 +588,7 @@ void iser_rcv_completion(struct iser_des unsigned long dto_xfer_len) { struct iser_dto *dto = &rx_desc->dto; - struct iscsi_iser_conn *conn = dto->conn; + struct iscsi_iser_conn *conn = dto->ib_conn->iser_conn; struct iscsi_session *session = conn->iscsi_conn->session; struct iscsi_cmd_task *ctask; struct iscsi_iser_cmd_task *iser_ctask; @@ -641,7 +641,8 @@ void iser_rcv_completion(struct iser_des void iser_snd_completion(struct iser_desc *tx_desc) { struct iser_dto *dto = &tx_desc->dto; - struct iscsi_iser_conn *iser_conn = dto->conn; + struct iser_conn *ib_conn = dto->ib_conn; + struct iscsi_iser_conn *iser_conn = ib_conn->iser_conn; struct iscsi_conn *conn = iser_conn->iscsi_conn; struct iscsi_mgmt_task *mtask; @@ -652,7 +653,7 @@ void iser_snd_completion(struct iser_des if (tx_desc->type == ISCSI_TX_DATAOUT) kmem_cache_free(ig.desc_cache, tx_desc); - atomic_dec(&iser_conn->ib_conn->post_send_buf_count); + atomic_dec(&ib_conn->post_send_buf_count); write_lock(conn->recv_lock); if (conn->suspend_tx) { diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 72febf1..11d4e87 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -570,6 +570,8 @@ void iser_conn_release(struct iser_conn /* on EVENT_ADDR_ERROR there's no device yet for this conn */ if (device != NULL) iser_device_try_release(device); + if (ib_conn->iser_conn) + ib_conn->iser_conn->ib_conn = NULL; kfree(ib_conn); } @@ -692,7 +694,7 @@ int iser_post_recv(struct iser_desc *rx_ struct iser_dto *recv_dto = &rx_desc->dto; /* Retrieve conn */ - ib_conn = recv_dto->conn->ib_conn; + ib_conn = recv_dto->ib_conn; iser_dto_to_iov(recv_dto, iov, 2); @@ -725,7 +727,7 @@ int iser_post_send(struct iser_desc *tx_ struct iser_conn *ib_conn; struct iser_dto *dto = &tx_desc->dto; - ib_conn = dto->conn->ib_conn; + ib_conn = dto->ib_conn; iser_dto_to_iov(dto, iov, MAX_REGD_BUF_VECTOR_LEN); @@ -772,7 +774,7 @@ static void iser_comp_error_worker(void static void iser_handle_comp_error(struct iser_desc *desc) { struct iser_dto *dto = &desc->dto; - struct iser_conn *ib_conn = dto->conn->ib_conn; + struct iser_conn *ib_conn = dto->ib_conn; iser_dto_buffs_release(dto); -- 1.2.6 From erezz at voltaire.com Wed Sep 27 07:02:49 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 27 Sep 2006 17:02:49 +0300 (IDT) Subject: [openib-general] [PATCH 2/3] IB/iser: dma unmap an unaligned for rdma data before touching it In-Reply-To: Message-ID: (This patch may be a duplicate. Something went wrong with my previous mail.) iSER uses the dma mapping api to map the page holding the scsi command data to the hca dma address space. When the command data is not aligned for rdma, the data is copied to/from an allocated buffer which in turn is used for executing this command. The pages associated with the command must be unmapped before being touched. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.h | 7 ++++ drivers/infiniband/ulp/iser/iser_initiator.c | 49 +++++--------------------- drivers/infiniband/ulp/iser/iser_memory.c | 42 ++++++++++++++++++++++ 3 files changed, 59 insertions(+), 39 deletions(-) 78a237418bd3547cfeb49828a8b857ac5241749f diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 7f44636..4a7069f 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -350,4 +350,11 @@ int iser_post_send(struct iser_desc *tx int iser_conn_state_comp(struct iser_conn *ib_conn, enum iser_ib_conn_state comp); + +int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, + struct iser_data_buf *data, + enum iser_data_dir iser_dir, + enum dma_data_direction dma_dir); + +void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask); #endif diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 14ae61e..9b3d79c 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -66,42 +66,6 @@ static void iser_dto_add_regd_buff(struc dto->regd_vector_len++; } -static int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, - struct iser_data_buf *data, - enum iser_data_dir iser_dir, - enum dma_data_direction dma_dir) -{ - struct device *dma_device; - - iser_ctask->dir[iser_dir] = 1; - dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; - - data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir); - if (data->dma_nents == 0) { - iser_err("dma_map_sg failed!!!\n"); - return -EINVAL; - } - return 0; -} - -static void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask) -{ - struct device *dma_device; - struct iser_data_buf *data; - - dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; - - if (iser_ctask->dir[ISER_DIR_IN]) { - data = &iser_ctask->data[ISER_DIR_IN]; - dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE); - } - - if (iser_ctask->dir[ISER_DIR_OUT]) { - data = &iser_ctask->data[ISER_DIR_OUT]; - dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE); - } -} - /* Register user buffer memory and initialize passive rdma * dto descriptor. Total data size is stored in * iser_ctask->data[ISER_DIR_IN].data_len @@ -699,14 +663,19 @@ void iser_ctask_rdma_init(struct iscsi_i void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask) { int deferred; + int is_rdma_aligned = 1; /* if we were reading, copy back to unaligned sglist, * anyway dma_unmap and free the copy */ - if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) + if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) { + is_rdma_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_IN); - if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) + } + if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) { + is_rdma_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_OUT); + } if (iser_ctask->dir[ISER_DIR_IN]) { deferred = iser_regd_buff_release @@ -726,7 +695,9 @@ void iser_ctask_rdma_finalize(struct isc } } - iser_dma_unmap_task_data(iser_ctask); + /* if the data was unaligned, it was already unmapped and then copied */ + if (is_rdma_aligned) + iser_dma_unmap_task_data(iser_ctask); } void iser_dto_buffs_release(struct iser_dto *dto) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 31950a5..0f87163 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -360,6 +360,44 @@ static void iser_page_vec_build(struct i } } +int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, + struct iser_data_buf *data, + enum iser_data_dir iser_dir, + enum dma_data_direction dma_dir) +{ + struct device *dma_device; + + iser_ctask->dir[iser_dir] = 1; + dma_device = + iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir); + if (data->dma_nents == 0) { + iser_err("dma_map_sg failed!!!\n"); + return -EINVAL; + } + return 0; +} + +void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask) +{ + struct device *dma_device; + struct iser_data_buf *data; + + dma_device = + iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + if (iser_ctask->dir[ISER_DIR_IN]) { + data = &iser_ctask->data[ISER_DIR_IN]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE); + } + + if (iser_ctask->dir[ISER_DIR_OUT]) { + data = &iser_ctask->data[ISER_DIR_OUT]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE); + } +} + /** * iser_reg_rdma_mem - Registers memory intended for RDMA, * obtaining rkey and va @@ -382,6 +420,10 @@ int iser_reg_rdma_mem(struct iscsi_iser_ iser_err("rdma alignment violation %d/%d aligned\n", aligned_len, mem->size); iser_data_buf_dump(mem); + + /* unmap the command data before accessing it */ + iser_dma_unmap_task_data(iser_ctask); + /* allocate copy buf, if we are writing, copy the */ /* unaligned scatterlist, dma map the copy */ if (iser_start_rdma_unaligned_sg(iser_ctask, cmd_dir) != 0) -- 1.2.6 From RAISCH at de.ibm.com Wed Sep 27 07:18:23 2006 From: RAISCH at de.ibm.com (Christoph Raisch) Date: Wed, 27 Sep 2006 16:18:23 +0200 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: > Roland, > > > Do you know how ehca behaves? Does it have that race? ie what > > happens in this situation: > > > > poll CQ -> CQ is empty > > (new completion is added to CQ) > > request notify on CQ > > (no more completions are added) > > > > Mellanox HCAs will generate a CQ event in this case, although it's not > > strictly required by the IB spec. How will ehca behave? > > > > - R. > > That could be the reason. I did see mthca poll empty entry, but not > on ehca. I will confirm this with ehca team. > > Thanks > Shirley Ma It's possible that a race will happen between the interrupt handler, the poll routine and the hardware. By doing a poll CQ -> CQ is empty (new completion is added to CQ) request notify on CQ (no more completions are added) poll one more time you should be on the safe side. Gruss / Regards . . . Christoph Raisch From moshek at voltaire.com Wed Sep 27 07:42:41 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Wed, 27 Sep 2006 17:42:41 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 and whendriver is not loaded on AMD Message-ID: Michael, Frank new version was tested once more in Voltaire and is working o.k. . I tested `./mstflint -d q` when drivers are loaded and when drivers are not loaded. in all cases it worked o.k. Test was ferformed on the following environments : - IBM js21 ppc64 sles10 PCI-E - IBM js21 ppc64 sles9 sp3 PCI-E - IBM hs21 em64t redhat as 4 u3 PCI-E - IBM hs21 em64t sles 9 sp3 PCI-E - x86_64 sles10 PCI-E - MAC ppc64 sles10 PCI-X - MAC ppc64 sles10 PCI-E Please consider inserting the patch to OFED . Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Tseng-hui Lin [mailto:tsenglin at us.ibm.com] Sent: Monday, September 25, 2006 7:49 AM To: Moshe Kazir Subject: Re: FW: [openib-general] Mstflint - not working on ppc64 and whendriver is not loaded on AMD Moshe: This version should take care of the ppc64 and x86_64 not working without mthca driver problem. I have tried on my JS21 and PowerMAC G5 with PCIe HCAs and a x86_64 blade with PCI-X HCA. My change rewrites the mopen() function. Even though the change is not really big, it makes the OpenFabrics mantainer Michael unconfortable ti integrate my change into OFED. I have make it easy to disable my changes on machines we don't test in this version. Hopefully Michael would pick the change up this time. The attached patch is made against the mstflint in OFED-1.1r6. I also attached the tar-gz in case you don't like to use patch. Please try it on the x86_64 machines failed previously. Thanks. (See attached file: mstflint.patch)(See attached file: mstflint.tgz) Tseng-Hui (Frank) Lin, PhD Dept 7UEA, LTC Linux OS - Yellow, eServer I/O (Go Orange!) Building 902-6B007, 11501 Burnet Road, Austin, TX 78758 Phone: (512)838-8312 T/L: 678-8312 FAX: (512)838-8858 T/L: 678-8358 e-mail: tsenglin at us.ibm.com "Moshe Kazir" "Moshe Kazir" 09/19/2006 01:28 AM To Tseng-hui Lin/Austin/IBM at IBMUS cc Subject FW: [openib-general] Mstflint - not working on ppc64 and whendriver is not loaded on AMD Frank, 1. I sent you the last OFED 1.0 mstflint and as I know it did not changed. 2. You may download the last OFED 1.1 release ->OFED-1.1-RC5 (see attached message) . The most update mstflint directory is located in SOURCE/openib-1.1.tgz Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [ mailto:mst at mellanox.co.il ] Sent: Monday, September 18, 2006 8:22 PM To: Tseng-Hui (Frank) Lin Cc: Moshe Kazir; Or Gerlitz; Tseng-hui Lin; openib-general at openib.org Subject: Re: [openib-general] Mstflint - not working on ppc64 and whendriver is not loaded on AMD Quoting r. Tseng-Hui (Frank) Lin : > You mentioned "your version" of mstflint. Is that a different one > from the one in OFED-1.0? If it is, would you mind sending me a copy > of your version so that I can play with it as well? Thanks. Jut the one in svn trunk/OFED 1.1 RC. -- MST ----- Message from on Thu, 14 Sep 2006 19:39:16 +0300 ----- To: cc: Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready Hi, OFED-1.1-rc5 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ File: OFED-1.1-rc5.tgz Please report any issues in bugzilla http://openib.org/bugzilla/ Release details: ================ Build_id: OFED-1.1-rc5 openib-1.1 (REV=9485) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09 # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm OS support: =========== Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up3 - Redhat EL4 up4 kernel.org: - Kernel 2.6.17 Bug fixes from OFED-1.1-rc4: ========================== 1. ISER compilation fixed on SLES10 2. Fixed build on SLES9 PPC64 3. Updated libehca 4. OpenSM fixes 5. Added tavor_quirk option to rdma_cm module (disabled by default): Tavor performance quirk: limit MTU to 1K if > 0 (int) Known issues: ============= libipathverbs compilation fails on SLES10 (Bug:204) OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday or Tuesday. Regards, Vladimir > Hi, > > The plan is to issue OFED RC5 on Thursday 9/14 and final release next > week. I am aware of the following issues: > > > 1) Compilation on SLES9 on PPC - Jack Morgenstein > 2) Huge pages on PPC - Eli Cohen > 3) libipathverbs: - Qlogic > a) libipathverbs ABI issue > b) libipathverbs build on SLES10 > 4) SDP performance on Tavor - Michael Tsirkin > 5) iSER issue on SLES10 - Voltaire > > > In order to meet tomorrow's RC5 release all owners please send your > patches by end of today. > > > Regards, > > Aviram > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mstflint.patch Type: application/octet-stream Size: 6903 bytes Desc: mstflint.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mstflint.tgz Type: application/octet-stream Size: 47588 bytes Desc: mstflint.tgz URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 20576761.gif Type: image/gif Size: 105 bytes Desc: 20576761.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 20283974.gif Type: image/gif Size: 45 bytes Desc: 20283974.gif URL: From vuhuong at mellanox.com Wed Sep 27 08:16:37 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 27 Sep 2006 08:16:37 -0700 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <20060927071059.GA21509@mellanox.co.il> References: <20060926144541.GA17938@mellanox.co.il> <4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il> Message-ID: <451A95D5.7060409@mellanox.com> Michael S. Tsirkin wrote: > Quoting r. Vu Pham : > >>Either you can use multiple channels or derive different >>initiator_port_ID in the login req to have multiple paths on >>the same physical port > > > So how about we just stick a pointer inside the indentifier extension > instead of enabling multichannel? > That's the simple change. Beside that you have to maintain a list of connections/channels connected to the same target, to manage/clean-up resource associated with these connections, how to handle error recovery especially target reset and host reset... What is the advantage to have multiple connections/qps on the same physical port to the same target? The disavantages are wasting resources, instability, no fail-over on physical port error... From mst at mellanox.co.il Wed Sep 27 08:19:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 27 Sep 2006 18:19:16 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 and whendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20060927151916.GB26351@mellanox.co.il> Quoting r. Moshe Kazir : > Subject: FW: [openib-general] Mstflint - not working on ppc64 and whendriver is not loaded on AMD > > Michael, > > Frank new version was tested once more in Voltaire and is working o.k. . > I tested `./mstflint -d q` when drivers are loaded and when drivers are not loaded. in all cases it worked o.k. Thanks for testing, but I'd like to get a handle on what's going on first. First, I'm pretty sure when driver is loaded things work OK on all systems. When driver is not loaded - could you please answer whether using /sys/bus/pci/devices/0000\:03\:00.0/resource0 works for you (on systems that have resource0)? > > Test was ferformed on the following environments : > > - IBM js21 ppc64 sles10 PCI-E > - IBM js21 ppc64 sles9 sp3 PCI-E > - IBM hs21 em64t redhat as 4 u3 PCI-E > - IBM hs21 em64t sles 9 sp3 PCI-E > - x86_64 sles10 PCI-E > - MAC ppc64 sles10 PCI-X > - MAC ppc64 sles10 PCI-E > > Please consider inserting the patch to OFED . > > Moshe Since I don't consider this a critical fix (there's no reason driver won't go up, and if it does not, there's a simple workaround by specifying the /proc interface, that is slower but works), I don't think this should go into OFED 1.1. Unfortunately, I never got a small bugfix patch against the latest mstflint - the patch I saw posted touches all kind of things all over the code - so I can't insert it in trunk, either. -- MST From michaelc at cs.wisc.edu Wed Sep 27 09:13:50 2006 From: michaelc at cs.wisc.edu (Mike Christie) Date: Wed, 27 Sep 2006 11:13:50 -0500 Subject: [openib-general] [PATCH 1/3] IB/iser: have iSER data transaction object pointing to iSER conn In-Reply-To: References: Message-ID: <451AA33E.2050009@cs.wisc.edu> Erez Zilber wrote: > iSER uses a data transaction object (struct iser_dto) as part > of its IB data descriptors (struct iser_desc) management. > It also uses a hierarchy of connection structures pointing to > each other. A DTO may exist even after the iscsi_iser connection > pointed by it is destructed (eg one that is bounded to post > receive buffer which was flushed by the IB HW). Hence DTOs need > point to the lowest connection, which is struct iser_conn. > > Signed-off-by: Erez Zilber > Both look fine to me. One question not really related to your patches. How much work would you guys have to do to iscsi_iser to support bi directional commands? From vuhuong at mellanox.com Wed Sep 27 09:21:01 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 27 Sep 2006 09:21:01 -0700 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <451A95D5.7060409@mellanox.com> References: <20060926144541.GA17938@mellanox.co.il> <4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il> <451A95D5.7060409@mellanox.com> Message-ID: <451AA4ED.7010501@mellanox.com> Vu Pham wrote: > Michael S. Tsirkin wrote: > >>Quoting r. Vu Pham : >> >> >>>Either you can use multiple channels or derive different >>>initiator_port_ID in the login req to have multiple paths on >>>the same physical port >> >> >>So how about we just stick a pointer inside the indentifier extension >>instead of enabling multichannel? >> > > > That's the simple change. Beside that you have to maintain a > list of connections/channels connected to the same target, > to manage/clean-up resource associated with these > connections, how to handle error recovery especially target > reset and host reset... > > What is the advantage to have multiple connections/qps on > the same physical port to the same target? The disavantages > are wasting resources, instability, no fail-over on physical > port error... > I see the limitation of current srp implementation. If we have the following topoloty host port 1 -- target port 1 host port 1 -- target port 2 the current srp implementation will use the same initiator_port_id for both login requests and the target will reject the second login if you don't turn on SUPPORT_MULTI_CHANNEL Another way to solve this is to use different initiator_port_id for the logins ie. path 1: initiator_port_ID{target_port1_GUID, initiator_port1_GUID} and target_port_ID{id_ext, ioc_guid} path 2: initiator_port_ID{target_port2_GUID,initiator_port1_GUID} and target_port_ID This also will guarantee the uniqueness of initiator_port_id in the fabric From ishai at dev.mellanox.co.il Wed Sep 27 09:42:33 2006 From: ishai at dev.mellanox.co.il (ishai at dev.mellanox.co.il) Date: Wed, 27 Sep 2006 19:42:33 +0300 (IDT) Subject: [openib-general] 90-ib.rules incorrect? In-Reply-To: References: Message-ID: <16725.194.90.237.34.1159375353.squirrel@dev.mellanox.co.il> In early versions of udev the syntax was different. The syntax used (=) and not (==). RHEL4 for example is still using such old version of udev. Apparently the new udev versions (used for example in SLES10) still supports the old syntax. So this way we can have one file that suits both udev versions. Ishai > Isn't the format of 90-ib.rules in > https://openfabrics.org/svn/gen2/trunk/ofed/openib/scripts/90-ib.rulesincorrect. > > We have > > KERNEL="umad*", NAME="infiniband/%k", which should be > KERNEL=="umad*", NAME="infiniband/%k" > > Am I missing something? > > Eugene > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From thlin at us.ibm.com Wed Sep 27 09:45:59 2006 From: thlin at us.ibm.com (Tseng-Hui (Frank) Lin) Date: Wed, 27 Sep 2006 11:45:59 -0500 Subject: [openib-general] FW: Mstflint - not working on ppc64 and whendriver is not loaded on AMD In-Reply-To: <20060927151916.GB26351@mellanox.co.il> References: <20060927151916.GB26351@mellanox.co.il> Message-ID: <1159375559.21249.60.camel@flin.austin.ibm.com> On Wed, 2006-09-27 at 18:19 +0300, Michael S. Tsirkin wrote: > Quoting r. Moshe Kazir : > > Subject: FW: [openib-general] Mstflint - not working on ppc64 and whendriver is not loaded on AMD > > > > Michael, > > > > Frank new version was tested once more in Voltaire and is working o.k. . > > I tested `./mstflint -d q` when drivers are loaded and when drivers are not loaded. in all cases it worked o.k. > > Thanks for testing, but I'd like to get a handle on what's going on first. > > First, I'm pretty sure when driver is loaded things work OK on all systems. > When driver is not loaded - could you please answer whether using > /sys/bus/pci/devices/0000\:03\:00.0/resource0 > works for you (on systems that have resource0)? > It doesn't work. > > > > Test was ferformed on the following environments : > > > > - IBM js21 ppc64 sles10 PCI-E > > - IBM js21 ppc64 sles9 sp3 PCI-E > > - IBM hs21 em64t redhat as 4 u3 PCI-E > > - IBM hs21 em64t sles 9 sp3 PCI-E > > - x86_64 sles10 PCI-E > > - MAC ppc64 sles10 PCI-X > > - MAC ppc64 sles10 PCI-E > > > > Please consider inserting the patch to OFED . > > > > Moshe > > Since I don't consider this a critical fix (there's no reason driver won't go > up, and if it does not, there's a simple workaround by specifying the /proc > interface, that is slower but works), I don't think this should go into OFED 1.1. > > Unfortunately, I never got a small bugfix patch against the latest mstflint - > the patch I saw posted touches all kind of things all over the code - > so I can't insert it in trunk, either. > I agree this is not critical. The patch changes nothing but the way of opening the device. On some ppc64 and x86_64 machines, the I/O memory mapped by mmap() is not accessable (return 0xFFFFFFFF) unless the kernel code (usually the device driver) does an ioremap. This is why mmap resource0 does not work on these machines. There is no way I am aware of can do ioremap from user space code like mstflint. The only thing I can think of is to fall back to use the config space file in /proc/bus/pci/. The (big) patch I made checks if the faster way (mmap resource0) works. It it doesn't, the patch tries other slower ways and use the fastest working way it can find. That's all the patch does. It does not make big fix. It just save the users trouble of trying all possible ways of opening a devices manually. I understand applying big patch is risky unless it can be throughly tested. Unfortunately, no one has all the machines to test the patch. Moshe and I have tested the patch on Power MAC, Squadrons, JS20, and JS21 (almost all living ppc64 machines) as well as a few x86_64 machines. We believe this patch is safe for these machines. The patch can be enabled by defining CONFIG_MOPEN_FALL_BACK to 1. CONFIG_MOPEN_FALL_BACK is defined to 1 for ppc64 and x86_64 and 0 for others. We can enable this patch on other machines when people who have these machines tested the patch. I agree this is no a critical patch, but it is a useful one. Moreover, it is well tested on the machines with the patch enabled and change nothing on the machines with the patch disabled. I believe this is a safe patch. Please re-consider adding it. Thanks. From ishai at mellanox.co.il Wed Sep 27 10:00:06 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Wed, 27 Sep 2006 20:00:06 +0300 Subject: [openib-general] [PATCH] IB/SRP: allowing multiple connections from taregt to initiator Message-ID: <20060927170006.GC32010@mellanox.co.il> SRP High Availability should enable an initiator to connect to the same target several times, e.g., once from each IB port of the target. Some targets do not support multichannel. In order to work with them as well we will use another identifier_extension to the initiator port for each target connection. Signed-off-by: Ishai Rabinovitz --- I think this is the best solution. It allows users to use all four physical connections from the initiator to target. It also allows users to have several logical connections on one physical connection (If they want connection with different attributes - for example different max_cmd_per_lun). It is SRP spec compliant. I also added a module param, so it is possible to turn this option off. Index: latest/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- latest.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-27 10:36:13.000000000 +0300 +++ latest/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-27 16:48:12.000000000 +0300 @@ -85,6 +85,13 @@ MODULE_PARM_DESC(mellanox_workarounds, static const u8 mellanox_oui[3] = { 0x00, 0x02, 0xc9 }; +static int variable_identifier_extension = 1; + +module_param(variable_identifier_extension, int, 0444); +MODULE_PARM_DESC(variable_identifier_extension, + "Use another identifier_extension on each connection to target" + ", allows multichannel connection on all targets if != 0"); + static void srp_add_one(struct ib_device *device); static void srp_remove_one(struct ib_device *device); static void srp_completion(struct ib_cq *cq, void *target_ptr); @@ -329,6 +336,7 @@ static int srp_send_req(struct srp_targe req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); + /* * In the published SRP specification (draft rev. 16a), the * port identifier format is 8 bytes of ID extension followed @@ -341,13 +349,23 @@ static int srp_send_req(struct srp_targe if (target->io_class == SRP_REV10_IB_IO_CLASS) { memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id + 8, 8); - memcpy(req->priv.initiator_port_id + 8, - target->srp_host->initiator_port_id, 8); + if (variable_identifier_extension) + memcpy(req->priv.initiator_port_id + 8, + &target, sizeof target); + else + memcpy(req->priv.initiator_port_id + 8, + target->srp_host->initiator_port_id, 8); memcpy(req->priv.target_port_id, &target->ioc_guid, 8); memcpy(req->priv.target_port_id + 8, &target->id_ext, 8); } else { - memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id, 16); + if (variable_identifier_extension) + memcpy(req->priv.initiator_port_id, + &target, sizeof target); + else + memcpy(req->priv.initiator_port_id, + target->srp_host->initiator_port_id, 8); + memcpy(req->priv.initiator_port_id + 8, + target->srp_host->initiator_port_id + 8, 8); memcpy(req->priv.target_port_id, &target->id_ext, 8); memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); } @@ -1823,7 +1841,8 @@ static struct srp_host *srp_add_port(str host->dev = device; host->port = port; - host->initiator_port_id[7] = port; + if (!variable_identifier_extension) + host->initiator_port_id[7] = port; memcpy(host->initiator_port_id + 8, &device->dev->node_guid, 8); host->class_dev.class = &srp_class; -- Ishai Rabinovitz From mshefty at ichips.intel.com Wed Sep 27 10:04:46 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Sep 2006 10:04:46 -0700 Subject: [openib-general] oops after rmmod ib_cm when stopping iSER In-Reply-To: <451A2E7E.8050504@voltaire.com> References: <451A2E7E.8050504@voltaire.com> Message-ID: <451AAF2E.1060602@ichips.intel.com> Erez Zilber wrote: > When stopping iSER, we run 'modprobe -r ib_iser'. Then, we see an oops > (below). In order to check which module caused that oops, I replaced the > 'modprobe -r' call with rmmod for each module: > > rmmod ib_iser > rmmod libiscsi > rmmod scsi_transport_iscsi > rmmod rdma_cm > rmmod ib_addr > rmmod ib_cm > > If I wait a few seconds before the removal of ib_cm, everything is ok. Thanks for the info. My guess is that the cm_id's are not taking a reference on the cm devices, which is allowing the module unload to proceed while cm_id's still remain in timewait. I will look at this in more detail and work on a patch. How reproducible is this? - Sean From geneing at gmail.com Wed Sep 27 10:48:56 2006 From: geneing at gmail.com (EI) Date: Wed, 27 Sep 2006 10:48:56 -0700 Subject: [openib-general] 90-ib.rules incorrect? In-Reply-To: <16725.194.90.237.34.1159375353.squirrel@dev.mellanox.co.il> References: <16725.194.90.237.34.1159375353.squirrel@dev.mellanox.co.il> Message-ID: Ishai, udev in OpenSuSE 10.2 alpha gives an error with the current rules file that are using (=). Eugene On 9/27/06, ishai at dev.mellanox.co.il wrote: > > In early versions of udev the syntax was different. The syntax used (=) > and not (==). > RHEL4 for example is still using such old version of udev. > > Apparently the new udev versions (used for example in SLES10) still > supports the old syntax. > > So this way we can have one file that suits both udev versions. > > Ishai > > > > Isn't the format of 90-ib.rules in > > > https://openfabrics.org/svn/gen2/trunk/ofed/openib/scripts/90-ib.rulesincorrect > . > > > > We have > > > > KERNEL="umad*", NAME="infiniband/%k", which should be > > KERNEL=="umad*", NAME="infiniband/%k" > > > > Am I missing something? > > > > Eugene > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Wed Sep 27 10:54:57 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Wed, 27 Sep 2006 10:54:57 -0700 Subject: [openib-general] 5 new diag tools. Message-ID: <20060927105457.7c147e0e.weiny2@llnl.gov> The included patch is for 5 new diag tools which I have written in perl. I started out building this to combine the information of the other tools to give more information about the port counters. Information like the specific source and destination port, link speed, etc. In the process I found that it was also beneficial to limit the amount of information reported as some of the lower level diags report so much information that some people simply dismiss the tools as useless. I hope they are useful. Thanks, Ira Weiny weiny2 at llnl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: new-ib-diags.patch Type: application/octet-stream Size: 43797 bytes Desc: not available URL: From bgreen at nas.nasa.gov Wed Sep 27 10:58:48 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Wed, 27 Sep 2006 10:58:48 -0700 Subject: [openib-general] mvapich2-gen2 svn - vapi <--> gen2 ?? Message-ID: <200609271758.k8RHwm1h000536@ece06.nas.nasa.gov> Hello, Regarding mvapich2-gen2 in the openib svn, can an mvapich2 vapi build on one machine communicate with a gen2 build on another? -bryan From mshefty at ichips.intel.com Wed Sep 27 11:02:32 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Sep 2006 11:02:32 -0700 Subject: [openib-general] Different byte order between gen1 CM and gen2 CM ->RE: How to connect gen2 CM to gen1 IBGD CM? In-Reply-To: <45097457.5020007@ichips.intel.com> References: <45097457.5020007@ichips.intel.com> Message-ID: <451ABCB8.2010605@ichips.intel.com> Sean Hefty wrote: > The byte ordering in the kernel APIs are fairly clear about this, but that > documentation didn't carry up to userspace everywhere. I will update the > userspace documentation, but it may take me a few weeks to get to this. I've added some additional comments next to structure fields that are specified in network-byte order. Hopefully this will help others avoid running into similar issues. - Sean From mshefty at ichips.intel.com Wed Sep 27 11:12:28 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Sep 2006 11:12:28 -0700 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> Message-ID: <451ABF0C.90607@ichips.intel.com> Sean Hefty wrote: > Currently a DREP is only sent in response to a DREQ if a connection > has been found matching the DREQ, and it is in the proper state. Once > a DREP is sent, the local connection moves into timewait. Duplicate > DREQs received while in this state result in re-sending the DREP. > > However, it's likely that the local connection will enter and exit > timewait before the remote side times out a lost DREP and resends a DREQ. > There are a couple possible solutions to this. One is to increase how > long a connection remains in timewait, by multiplying its wait time by > max_cm_retries. This can greatly increase the timewait state before a QP > can be re-used when CM messages are not lost. > > An alternative is to send a DREP in response to a DREQ, even if a local > connection is not found, which is what this patch does. If there are no objections, I will commit this patch to svn, and submit for inclusion upstream. - Sean From robert.j.woodruff at intel.com Wed Sep 27 11:37:35 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 27 Sep 2006 11:37:35 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status Message-ID: Aviram wrote, >Pending that IPoIB HA is solved would like to issue RC7 that suppose to >be final. Is everyone OK with this approach? >Aviram Sounds good, What is the target date for RC7 ? From mshefty at ichips.intel.com Wed Sep 27 11:44:22 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Sep 2006 11:44:22 -0700 Subject: [openib-general] [PATCH] ucma : Encapsulate duplicate code to common routine In-Reply-To: <20060919070206.5476.64107.sendpatchset@localhost.localdomain> References: <20060919070206.5476.64107.sendpatchset@localhost.localdomain> Message-ID: <451AC686.4040703@ichips.intel.com> Krishna Kumar wrote: > Encapsulate duplicate code to common routine - avoid checking same > errors in multiple places. I went back and forth on this, but ended up committing it, since it does slightly simplify maintenance. - Sean From mshefty at ichips.intel.com Wed Sep 27 11:48:03 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Sep 2006 11:48:03 -0700 Subject: [openib-general] [PATCH] id_priv_list->list is not initialized sometimes In-Reply-To: <20060919070210.5476.68607.sendpatchset@localhost.localdomain> References: <20060919070210.5476.68607.sendpatchset@localhost.localdomain> Message-ID: <451AC763.208@ichips.intel.com> Krishna Kumar wrote: > rdma_listen could be called from a context where id_priv->list > is not initialized. Then at a later stage, a cma_cancel_listen > does a list_del() which could oops since this element is not > on any list. > > Eg, in rdma_listen(), if id->device is !NULL, it calls > cma_ib_listen() which doesn't add this id to any list. A > cma_cancel_listen() will do a list_del. I don't think this is needed. cma_cancel_listens() is only called if the id is listening across multiple devices (and id->device is NULL). See cma_cancel_operation(). - Sean From xma at us.ibm.com Wed Sep 27 12:34:05 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 27 Sep 2006 12:34:05 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: Message-ID: I have created a patch to monitor CQ. That wasn't the reason for performance drop. I couldn't see any race from the output. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed Sep 27 13:53:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Sep 2006 13:53:04 -0700 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <451AA4ED.7010501@mellanox.com> (Vu Pham's message of "Wed, 27 Sep 2006 09:21:01 -0700") References: <20060926144541.GA17938@mellanox.co.il> <4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il> <451A95D5.7060409@mellanox.com> <451AA4ED.7010501@mellanox.com> Message-ID: Maybe we should just use the port GUID instead of the node GUID to form the initiator ID? That would solve this pretty cleanly I think. From mshefty at ichips.intel.com Wed Sep 27 14:02:36 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Sep 2006 14:02:36 -0700 Subject: [openib-general] [PATCH] Fix freed mem deref race in cma_process_remove/cma_req_handler In-Reply-To: <20060918073545.26067.41763.sendpatchset@localhost.localdomain> References: <20060918073545.26067.41763.sendpatchset@localhost.localdomain> Message-ID: <451AE6EC.3000301@ichips.intel.com> Good catch. Thanks - committed. - Sean From ishai at dev.mellanox.co.il Wed Sep 27 14:08:11 2006 From: ishai at dev.mellanox.co.il (Ishai Rabinovitz) Date: Thu, 28 Sep 2006 00:08:11 +0300 (IDT) Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: References: <20060926144541.GA17938@mellanox.co.il> <4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il> <451A95D5.7060409@mellanox.com> <451AA4ED.7010501@mellanox.com> Message-ID: <1106.89.1.173.135.1159391291.squirrel@dev.mellanox.co.il> Roland Dreier wrote: > Maybe we should just use the port GUID instead of the node GUID to > form the initiator ID? That would solve this pretty cleanly I think. This is also Vu's idea. There are two issues: 1) My patch allows a sophisticated user to have two logical connections on the same physical solution. He can have different connection parameters (e.g., MAX_CMD_PER_LUN) according to the application needs. Do you think there is such need? 2) In the current implementation there is a problem when there are two connections on the same physical connection - when the second connection sends REQ to the target, the target sends a DREQ to the first connection, but when someone tries to access the first scsi_host, ib_srp tries to reconnect the first connection and then the second connection gets a DREQ - and so the ping pong goes. And if there is a multipath daemon that checks the status of the connections this ping pong can be for ever. We need to find a way to eliminate this behavior. Ishai From mshefty at ichips.intel.com Wed Sep 27 14:10:21 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Sep 2006 14:10:21 -0700 Subject: [openib-general] [PATCH] fix cma_leave_mc_groups In-Reply-To: <20060919070203.5476.17650.sendpatchset@localhost.localdomain> References: <20060919070203.5476.17650.sendpatchset@localhost.localdomain> Message-ID: <451AE8BD.9050203@ichips.intel.com> Krishna Kumar wrote: > - cma_leave_mc_groups can race with other routines updating > or reading the mclist, so use lock. Eg while doing a > rdma_destroy_id(), other processes could be looking at > this id and de-referencing mclist. I don't think that there's an issue here. The mc_list is only accessed by other direct API calls. For example, rdma_join_multicast() or rdma_leave_multicast(). A user cannot call rdma_destroy_id() with other API calls. - Sean From rdreier at cisco.com Wed Sep 27 14:20:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Sep 2006 14:20:21 -0700 Subject: [openib-general] backporting fixes In-Reply-To: <20060927040745.GI24009@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 27 Sep 2006 07:07:45 +0300") References: <20060927040745.GI24009@mellanox.co.il> Message-ID: > Now that 2.6.18 (with an additional patch) I looked at backporting bugfixes to > older kernels. The main problem I see is that the neighbour destructor > interface change is not in 2.6.16, so IPoIB crashes randomly. > > So approaches are > - Try to push the change into 2.6.16 by netdev > - Use the all-neighbour list as done by ofed > - Abandon the whole project > > Ideas? Unfortunately I don't think this bug is very amenable to being fixed in a 2.6.16/-stable tree. So the third solution is probably the best we can do at this point. - R. From rdreier at cisco.com Wed Sep 27 14:24:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Sep 2006 14:24:13 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159300894.11549.11.camel@stevo-desktop> (Steve Wise's message of "Tue, 26 Sep 2006 15:01:34 -0500") References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> Message-ID: Do we have to keep the kernel modules in svn limping along? As time goes on, I have less and less patience for double maintenance. Oh well, since you provided the patch I'll apply it. - R. From mshefty at ichips.intel.com Wed Sep 27 14:39:57 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 27 Sep 2006 14:39:57 -0700 Subject: [openib-general] RDMA CM callback status In-Reply-To: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com> References: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com> Message-ID: <451AEFAD.4000708@ichips.intel.com> Sean Hefty wrote: >>1. Should I even be looking at event->status or does the event type tell me >> everything I need to know? I've had a report that the assertion >> (event->status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR. > > It sounds like (and looks like from reading the code) that you've hit a bug with > the ROUTE_ERROR event. The failure status isn't being propagated up to the > user. I've committed a patch to svn which will set the event status correctly when a route error occurs. - Sean From bos at pathscale.com Wed Sep 27 14:46:18 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 27 Sep 2006 14:46:18 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> Message-ID: <1159393578.21086.16.camel@chalcedony.pathscale.com> On Wed, 2006-09-27 at 14:24 -0700, Roland Dreier wrote: > Do we have to keep the kernel modules in svn limping along? As time > goes on, I have less and less patience for double maintenance. I'm still all in favour of nuking them... (Stephen Hemminger's message of "Tue, 26 Sep 2006 13:51:14 -0700") References: <20060926135114.1da96c1b@freekitty> Message-ID: OK, this is what I just came up with to fix these. Look OK to you Tom? diff --git a/drivers/infiniband/hw/amso1100/c2_ae.c b/drivers/infiniband/hw/amso1100/c2_ae.c index 08f46c8..3aae497 100644 --- a/drivers/infiniband/hw/amso1100/c2_ae.c +++ b/drivers/infiniband/hw/amso1100/c2_ae.c @@ -197,7 +197,7 @@ void c2_ae_event(struct c2_dev *c2dev, u "resource=%x, qp_state=%s\n", __FUNCTION__, to_event_str(event_id), - be64_to_cpu(wr->ae.ae_generic.user_context), + (unsigned long long) be64_to_cpu(wr->ae.ae_generic.user_context), be32_to_cpu(wr->ae.ae_generic.resource_type), be32_to_cpu(wr->ae.ae_generic.resource), to_qp_state_str(be32_to_cpu(wr->ae.ae_generic.qp_state))); diff --git a/drivers/infiniband/hw/amso1100/c2_alloc.c b/drivers/infiniband/hw/amso1100/c2_alloc.c index 1d25299..028a60b 100644 --- a/drivers/infiniband/hw/amso1100/c2_alloc.c +++ b/drivers/infiniband/hw/amso1100/c2_alloc.c @@ -115,7 +115,7 @@ u16 *c2_alloc_mqsp(struct c2_dev *c2dev, ((unsigned long) &(head->shared_ptr[mqsp]) - (unsigned long) head); pr_debug("%s addr %p dma_addr %llx\n", __FUNCTION__, - &(head->shared_ptr[mqsp]), (u64)*dma_addr); + &(head->shared_ptr[mqsp]), (unsigned long long) *dma_addr); return &(head->shared_ptr[mqsp]); } return NULL; diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index dd6af55..622d6f1 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -397,7 +397,9 @@ static struct ib_mr *c2_reg_phys_mr(stru pr_debug("%s - page shift %d, pbl_depth %d, total_len %u, " "*iova_start %llx, first pa %llx, last pa %llx\n", __FUNCTION__, page_shift, pbl_depth, total_len, - *iova_start, page_list[0], page_list[pbl_depth-1]); + (unsigned long long) *iova_start, + (unsigned long long) page_list[0], + (unsigned long long) page_list[pbl_depth-1]); err = c2_nsmr_register_phys_kern(to_c2dev(ib_pd->device), page_list, (1 << page_shift), pbl_depth, total_len, 0, iova_start, diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index f49a32b..e37c568 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -527,7 +527,7 @@ int c2_rnic_init(struct c2_dev *c2dev) DMA_FROM_DEVICE); pci_unmap_addr_set(&c2dev->rep_vq, mapping, c2dev->rep_vq.host_dma); pr_debug("%s rep_vq va %p dma %llx\n", __FUNCTION__, q1_pages, - (u64)c2dev->rep_vq.host_dma); + (unsigned long long) c2dev->rep_vq.host_dma); c2_mq_rep_init(&c2dev->rep_vq, 1, qsize, @@ -550,7 +550,7 @@ int c2_rnic_init(struct c2_dev *c2dev) DMA_FROM_DEVICE); pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma); pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages, - (u64)c2dev->rep_vq.host_dma); + (unsigned long long) c2dev->rep_vq.host_dma); c2_mq_rep_init(&c2dev->aeq, 2, qsize, From rdreier at cisco.com Wed Sep 27 14:52:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Sep 2006 14:52:29 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060927062822.GQ24009@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 27 Sep 2006 09:28:22 +0300") References: <20060927062822.GQ24009@mellanox.co.il> Message-ID: Shirley> I forgot to mention these NAPI parameters should be Shirley> tunable for different device drivers, like dev->weight, Shirley> or set up in lower driver. Michael> So we need something like poll_weight in struct Michael> ib_device, to give a hint on how expensive an interrupt Michael> is versus poll? Seems to make sense, and actually might Michael> be useful for other ULPs. Roland, what do you think? How could a low-level driver possibly know the cost of an interrupt vs polling a CQ? It depends on the particular CPU/cache/chipset details of the system and it might not even be the same from one PCI slot to another. If this value makes a real difference in practice, we can make it tunable but I would like to see some hard benchmarks that show it making a big difference one way or another. But we have too many knobs as it is so I'm inclined to just pick a value that works OK. - R. From rdreier at cisco.com Wed Sep 27 14:54:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Sep 2006 14:54:05 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060927062316.GO24009@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 27 Sep 2006 09:23:16 +0300") References: <20060927062316.GO24009@mellanox.co.il> Message-ID: Michael> Maybe we should just assign EQs to CQs in a round-robin Michael> fashion for now, and just hope typical use allocates CQs Michael> sequentially. Worst case, we are back to where we are Michael> now, performance-wise. Roland, how does this sound? I think what we should do is follow the IB verbs extensions and expose multiple CQ event vectors, and let the consumer pick which one to use when creating a CQ. If IPoIB wants to go round robin itself, that would be fine. This is what I tried to set the userspace API up for. Nothing in userspace would have to change for this -- the kernel just needs to add multiple EQ support. - R. From tom at opengridcomputing.com Wed Sep 27 20:57:03 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 27 Sep 2006 22:57:03 -0500 Subject: [openib-general] Compile warnings (cross build) In-Reply-To: Message-ID: This all looks good to me. Thanks, Tom On 9/27/06 4:47 PM, "Roland Dreier" wrote: > OK, this is what I just came up with to fix these. > > Look OK to you Tom? > > diff --git a/drivers/infiniband/hw/amso1100/c2_ae.c > b/drivers/infiniband/hw/amso1100/c2_ae.c > index 08f46c8..3aae497 100644 > --- a/drivers/infiniband/hw/amso1100/c2_ae.c > +++ b/drivers/infiniband/hw/amso1100/c2_ae.c > @@ -197,7 +197,7 @@ void c2_ae_event(struct c2_dev *c2dev, u > "resource=%x, qp_state=%s\n", > __FUNCTION__, > to_event_str(event_id), > - be64_to_cpu(wr->ae.ae_generic.user_context), > + (unsigned long long) be64_to_cpu(wr->ae.ae_generic.user_context), > be32_to_cpu(wr->ae.ae_generic.resource_type), > be32_to_cpu(wr->ae.ae_generic.resource), > to_qp_state_str(be32_to_cpu(wr->ae.ae_generic.qp_state))); > diff --git a/drivers/infiniband/hw/amso1100/c2_alloc.c > b/drivers/infiniband/hw/amso1100/c2_alloc.c > index 1d25299..028a60b 100644 > --- a/drivers/infiniband/hw/amso1100/c2_alloc.c > +++ b/drivers/infiniband/hw/amso1100/c2_alloc.c > @@ -115,7 +115,7 @@ u16 *c2_alloc_mqsp(struct c2_dev *c2dev, > ((unsigned long) &(head->shared_ptr[mqsp]) - > (unsigned long) head); > pr_debug("%s addr %p dma_addr %llx\n", __FUNCTION__, > - &(head->shared_ptr[mqsp]), (u64)*dma_addr); > + &(head->shared_ptr[mqsp]), (unsigned long long) *dma_addr); > return &(head->shared_ptr[mqsp]); > } > return NULL; > diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c > b/drivers/infiniband/hw/amso1100/c2_provider.c > index dd6af55..622d6f1 100644 > --- a/drivers/infiniband/hw/amso1100/c2_provider.c > +++ b/drivers/infiniband/hw/amso1100/c2_provider.c > @@ -397,7 +397,9 @@ static struct ib_mr *c2_reg_phys_mr(stru > pr_debug("%s - page shift %d, pbl_depth %d, total_len %u, " > "*iova_start %llx, first pa %llx, last pa %llx\n", > __FUNCTION__, page_shift, pbl_depth, total_len, > - *iova_start, page_list[0], page_list[pbl_depth-1]); > + (unsigned long long) *iova_start, > + (unsigned long long) page_list[0], > + (unsigned long long) page_list[pbl_depth-1]); > err = c2_nsmr_register_phys_kern(to_c2dev(ib_pd->device), page_list, > (1 << page_shift), pbl_depth, > total_len, 0, iova_start, > diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c > b/drivers/infiniband/hw/amso1100/c2_rnic.c > index f49a32b..e37c568 100644 > --- a/drivers/infiniband/hw/amso1100/c2_rnic.c > +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c > @@ -527,7 +527,7 @@ int c2_rnic_init(struct c2_dev *c2dev) > DMA_FROM_DEVICE); > pci_unmap_addr_set(&c2dev->rep_vq, mapping, c2dev->rep_vq.host_dma); > pr_debug("%s rep_vq va %p dma %llx\n", __FUNCTION__, q1_pages, > - (u64)c2dev->rep_vq.host_dma); > + (unsigned long long) c2dev->rep_vq.host_dma); > c2_mq_rep_init(&c2dev->rep_vq, > 1, > qsize, > @@ -550,7 +550,7 @@ int c2_rnic_init(struct c2_dev *c2dev) > DMA_FROM_DEVICE); > pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma); > pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages, > - (u64)c2dev->rep_vq.host_dma); > + (unsigned long long) c2dev->rep_vq.host_dma); > c2_mq_rep_init(&c2dev->aeq, > 2, > qsize, From eitan at mellanox.co.il Wed Sep 27 21:40:11 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 28 Sep 2006 07:40:11 +0300 Subject: [openib-general] [PATCH] osm_vendor_mlx_sa.c - missing status on timeout SA query Message-ID: <868xk4zjro.fsf@mtl066.yok.mtl.com> Hi Hal Similar to the bug discovered by Yevgeny on the osm_vendor_ibumad_sa.c the very same bug happens on osm_vendor_mlx_sa.c which fails osmtest. The issue is that the status of the result of the query is not returned as the result of the SA query. Eitan Signed-off-by: Eitan Zahavi Index: libvendor/osm_vendor_mlx_sa.c =================================================================== --- libvendor/osm_vendor_mlx_sa.c (revision 9642) +++ libvendor/osm_vendor_mlx_sa.c (working copy) @@ -219,7 +219,8 @@ __osmv_sa_mad_err_cb( query_res.status = IB_TIMEOUT; query_res.result_cnt = 0; - + query_res.p_result_madw->status = IB_TIMEOUT; + p_madw->status = IB_TIMEOUT; query_res.query_type = p_query_req_copy->query_type; p_query_req_copy->pfn_query_cb( &query_res ); @@ -611,6 +612,7 @@ __osmv_send_sa_req( "Waiting for async event.\n" ); cl_event_wait_on( &p_bind->sync_event, EVENT_NO_TIMEOUT, FALSE ); cl_event_reset(&p_bind->sync_event); + status = p_madw->status; } Exit: From erezz at voltaire.com Wed Sep 27 22:25:09 2006 From: erezz at voltaire.com (Erez Zilber) Date: Thu, 28 Sep 2006 08:25:09 +0300 Subject: [openib-general] oops after rmmod ib_cm when stopping iSER In-Reply-To: <451AAF2E.1060602@ichips.intel.com> References: <451A2E7E.8050504@voltaire.com> <451AAF2E.1060602@ichips.intel.com> Message-ID: <451B5CB5.8090407@voltaire.com> Sean Hefty wrote: > Erez Zilber wrote: >> When stopping iSER, we run 'modprobe -r ib_iser'. Then, we see an >> oops (below). In order to check which module caused that oops, I >> replaced the 'modprobe -r' call with rmmod for each module: >> >> rmmod ib_iser >> rmmod libiscsi >> rmmod scsi_transport_iscsi >> rmmod rdma_cm >> rmmod ib_addr >> rmmod ib_cm >> >> If I wait a few seconds before the removal of ib_cm, everything is ok. > > Thanks for the info. My guess is that the cm_id's are not taking a > reference on the cm devices, which is allowing the module unload to > proceed while cm_id's still remain in timewait. I will look at this > in more detail and work on a patch. How reproducible is this? > > - Sean 100% reproducible. It happens every time. Erez From moshek at voltaire.com Wed Sep 27 22:38:31 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 28 Sep 2006 08:38:31 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: Michael wrote : > Since I don't consider this a critical fix (there's no reason driver > won't go up, and if it does not, there's a simple workaround by Michael , The mstflint operated in the "classic way" in OFED-1.1 is not working on PPC64 sles10 !!! Telling the customer to use a workaround (open /proc...) if there platform is PPC64 is not nice !! We need to fix the bug in the code ! Frank wrote : > The patch can be enabled by defining CONFIG_MOPEN_FALL_BACK to 1. CONFIG_MOPEN_FALL_BACK is defined to 1 for ppc64 and x86_64 and 0 for others This define keeps the program from been damaged when running on other platforms. Can you have a look at the code once more and write how you want us (me and Frank ) to refine it ? It's o.k. for us if the fix will be enter to the OFED-1.2 but we need it in the code ! Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Tseng-Hui (Frank) Lin [mailto:thlin at us.ibm.com] Sent: Wednesday, September 27, 2006 7:46 PM To: Michael S. Tsirkin Cc: Moshe Kazir; Tseng-hui Lin; openib-general at openib.org Subject: Re: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD On Wed, 2006-09-27 at 18:19 +0300, Michael S. Tsirkin wrote: > Quoting r. Moshe Kazir : > > Subject: FW: [openib-general] Mstflint - not working on ppc64 and > > whendriver is not loaded on AMD > > > > Michael, > > > > Frank new version was tested once more in Voltaire and is working > > o.k. . I tested `./mstflint -d q` when drivers are > > loaded and when drivers are not loaded. in all cases it worked o.k. > > Thanks for testing, but I'd like to get a handle on what's going on > first. > > First, I'm pretty sure when driver is loaded things work OK on all > systems. When driver is not loaded - could you please answer whether > using /sys/bus/pci/devices/0000\:03\:00.0/resource0 > works for you (on systems that have resource0)? > It doesn't work. > > > > Test was ferformed on the following environments : > > > > - IBM js21 ppc64 sles10 PCI-E > > - IBM js21 ppc64 sles9 sp3 PCI-E > > - IBM hs21 em64t redhat as 4 u3 PCI-E > > - IBM hs21 em64t sles 9 sp3 PCI-E > > - x86_64 sles10 PCI-E > > - MAC ppc64 sles10 PCI-X > > - MAC ppc64 sles10 PCI-E > > > > Please consider inserting the patch to OFED . > > > > Moshe > > Since I don't consider this a critical fix (there's no reason driver > won't go up, and if it does not, there's a simple workaround by > specifying the /proc interface, that is slower but works), I don't > think this should go into OFED 1.1. > > Unfortunately, I never got a small bugfix patch against the latest > mstflint - the patch I saw posted touches all kind of things all over > the code - so I can't insert it in trunk, either. > I agree this is not critical. The patch changes nothing but the way of opening the device. On some ppc64 and x86_64 machines, the I/O memory mapped by mmap() is not accessable (return 0xFFFFFFFF) unless the kernel code (usually the device driver) does an ioremap. This is why mmap resource0 does not work on these machines. There is no way I am aware of can do ioremap from user space code like mstflint. The only thing I can think of is to fall back to use the config space file in /proc/bus/pci/. The (big) patch I made checks if the faster way (mmap resource0) works. It it doesn't, the patch tries other slower ways and use the fastest working way it can find. That's all the patch does. It does not make big fix. It just save the users trouble of trying all possible ways of opening a devices manually. I understand applying big patch is risky unless it can be throughly tested. Unfortunately, no one has all the machines to test the patch. Moshe and I have tested the patch on Power MAC, Squadrons, JS20, and JS21 (almost all living ppc64 machines) as well as a few x86_64 machines. We believe this patch is safe for these machines. The patch can be enabled by defining CONFIG_MOPEN_FALL_BACK to 1. CONFIG_MOPEN_FALL_BACK is defined to 1 for ppc64 and x86_64 and 0 for others. We can enable this patch on other machines when people who have these machines tested the patch. I agree this is no a critical patch, but it is a useful one. Moreover, it is well tested on the machines with the patch enabled and change nothing on the machines with the patch disabled. I believe this is a safe patch. Please re-consider adding it. Thanks. From mst at mellanox.co.il Wed Sep 27 23:03:48 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 09:03:48 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: <1159375559.21249.60.camel@flin.austin.ibm.com> References: <1159375559.21249.60.camel@flin.austin.ibm.com> Message-ID: <20060928060348.GB23828@mellanox.co.il> Quoting r. Tseng-Hui (Frank) Lin : > On some ppc64 and x86_64 machines, the I/O memory mapped by mmap() is > not accessable (return 0xFFFFFFFF) unless the kernel code (usually the > device driver) does an ioremap. This is why mmap resource0 does not work > on these machines. Let's be exact here: ioremap *only* does not work if driver is not loaded. Is that right? If yes, the typical and safe thing for the user is to have driver loaded and do -d /sys/class/infiniband/mthca0/device/resource0 without playing with lspci and other low level hacks, and I would rather you told users to do *that* (by the way, would it help if you could use "-d mthca0")? > There is no way I am aware of can do ioremap from > user space code like mstflint. The only thing I can think of is to fall > back to use the config space file in /proc/bus/pci/. How about write/read to/from resource0? Does that work? > The (big) patch I made checks if the faster way (mmap resource0) works. > It it doesn't, the patch tries other slower ways and use the fastest > working way it can find. That's all the patch does. It does not make big > fix. It just save the users trouble of trying all possible ways of > opening a devices manually. I don't reject that approach, not on principle. This is absolutely something we can consider for trunk. But let's fist try to make memory access work, even if it's not with mmap. -- MST From mst at mellanox.co.il Wed Sep 27 23:08:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 09:08:17 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20060928060817.GD23828@mellanox.co.il> Quoting r. Roland Dreier : > If this value makes a real difference in practice, we can make it > tunable but I would like to see some hard benchmarks that show it > making a big difference one way or another. But we have too many > knobs as it is so I'm inclined to just pick a value that works OK. Fair enough, let's start simple. BTW, are you going to post the rewritten NAPI patch for testing soon? -- MST From ogerlitz at voltaire.com Wed Sep 27 23:18:45 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 28 Sep 2006 09:18:45 +0300 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <451ABF0C.90607@ichips.intel.com> References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> <451ABF0C.90607@ichips.intel.com> Message-ID: <451B6945.1050707@voltaire.com> Sean Hefty wrote: > Sean Hefty wrote: >> An alternative is to send a DREP in response to a DREQ, even if a local >> connection is not found, which is what this patch does. > If there are no objections, I will commit this patch to svn, and submit for > inclusion upstream. Sean, My understanding is that without this patch the side that sends the DREQ would do few DREQ resends as of the "firsts" DREPs being lost and no DREPs sent once the id at the peer side left the timewait state, correct? Arlin, Can you please share what were the implications with intel MPI running a 64 nodes (128 ranks?) job? was the issue here just making the ***job termination time*** bigger? I don't have an objection for merging it, i just think it can be nice if we understand better what problem this patch comes to solve in terms of this use case that has driven the fix. Or. From mst at mellanox.co.il Wed Sep 27 23:26:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 09:26:14 +0300 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <451ABF0C.90607@ichips.intel.com> References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> <451ABF0C.90607@ichips.intel.com> Message-ID: <20060928062614.GF23828@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ > > Sean Hefty wrote: > > Currently a DREP is only sent in response to a DREQ if a connection > > has been found matching the DREQ, and it is in the proper state. Once > > a DREP is sent, the local connection moves into timewait. Duplicate > > DREQs received while in this state result in re-sending the DREP. > > > > However, it's likely that the local connection will enter and exit > > timewait before the remote side times out a lost DREP and resends a DREQ. > > There are a couple possible solutions to this. One is to increase how > > long a connection remains in timewait, by multiplying its wait time by > > max_cm_retries. This can greatly increase the timewait state before a QP > > can be re-used when CM messages are not lost. > > > > An alternative is to send a DREP in response to a DREQ, even if a local > > connection is not found, which is what this patch does. > > If there are no objections, I will commit this patch to svn, and submit for > inclusion upstream. I'm OK with this change. -- MST From mst at mellanox.co.il Wed Sep 27 23:27:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 09:27:23 +0300 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159393578.21086.16.camel@chalcedony.pathscale.com> References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> Message-ID: <20060928062723.GG23828@mellanox.co.il> Quoting r. Bryan O'Sullivan : > Subject: Re: 2.6.18 kernel support in the main trunk. > > On Wed, 2006-09-27 at 14:24 -0700, Roland Dreier wrote: > > Do we have to keep the kernel modules in svn limping along? As time > > goes on, I have less and less patience for double maintenance. > > I'm still all in favour of nuking them... Me too. -- MST From mst at mellanox.co.il Wed Sep 27 23:29:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 09:29:19 +0300 Subject: [openib-general] Compile warnings (cross build) In-Reply-To: References: <20060926135114.1da96c1b@freekitty> Message-ID: <20060928062919.GH23828@mellanox.co.il> Quoting r. Roland Dreier : > - (u64)c2dev->rep_vq.host_dma); > + (unsigned long long) c2dev->rep_vq.host_dma); BTW, is there some printk format to print u64 type? -- MST From mst at mellanox.co.il Wed Sep 27 23:31:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 09:31:33 +0300 Subject: [openib-general] RDMA CM callback status In-Reply-To: <451AEFAD.4000708@ichips.intel.com> References: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com> <451AEFAD.4000708@ichips.intel.com> Message-ID: <20060928063133.GI23828@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: RDMA CM callback status > > Sean Hefty wrote: > >>1. Should I even be looking at event->status or does the event type tell me > >> everything I need to know? I've had a report that the assertion > >> (event->status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR. > > > > It sounds like (and looks like from reading the code) that you've hit a bug with > > the ROUTE_ERROR event. The failure status isn't being propagated up to the > > user. > > I've committed a patch to svn which will set the event status correctly when a > route error occurs. Can you post a patch pls? -- MST From mst at mellanox.co.il Wed Sep 27 23:33:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 09:33:01 +0300 Subject: [openib-general] backporting fixes In-Reply-To: References: <20060927040745.GI24009@mellanox.co.il> Message-ID: <20060928063301.GJ23828@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: backporting fixes > > > Now that 2.6.18 (with an additional patch) I looked at backporting bugfixes to > > older kernels. The main problem I see is that the neighbour destructor > > interface change is not in 2.6.16, so IPoIB crashes randomly. > > > > So approaches are > > - Try to push the change into 2.6.16 by netdev > > - Use the all-neighbour list as done by ofed > > - Abandon the whole project > > > > Ideas? > > Unfortunately I don't think this bug is very amenable to being fixed > in a 2.6.16/-stable tree. So the third solution is probably the best > we can do at this point. OK. How about 2.6.17.y? I'm somewhat confused whether someone is still maintaining these. -- MST From mst at mellanox.co.il Wed Sep 27 23:35:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 09:35:06 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20060928063506.GK23828@mellanox.co.il> Quoting r. Moshe Kazir : > The mstflint operated in the "classic way" in OFED-1.1 is not working > on PPC64 sles10 !!! I consider the classic way to be -d /sys/class/infiniband/mthca0/device/resource0 It does seem a bit verbse now that you mention this - would a shortcut to allow just -d mthca0 help a lot? -- MST From ogerlitz at voltaire.com Wed Sep 27 23:43:04 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 28 Sep 2006 09:43:04 +0300 Subject: [openib-general] is IB/cm: Randomize starting comm ID fix missing in OFED 1.1 ?! Message-ID: Michael, I understand that OFED 1.1 is based on the IB code of 2.6.18-rc6, however, this patch which was pushed to 2.6.19-rc1 solves a real problem which was reported from a Lustre field install and can be easily reproducable in the lab. Can it go into rc7? http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06d26537559113207e4b73af6a22eaa5c5e9dc3 Or. From gil at mellanox.co.il Thu Sep 28 00:16:00 2006 From: gil at mellanox.co.il (Gil Bloch) Date: Thu, 28 Sep 2006 10:16:00 +0300 Subject: [openib-general] mvapich2-gen2 svn - vapi <--> gen2 ?? Message-ID: <6C2C79E72C305246B504CBA17B5500C9059AD6@mtlexch01.mtl.com> Bryan, As far as I know, the mvapich2 libraries are not intended for heterogeneous IB installation. We (and I think it's the same with OSU) do not check it in such configuration. For more details you might want to contact the mvapich team: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ Regards, Gil Bloch Mellanox Technologies > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Bryan Green > Sent: Wednesday, September 27, 2006 8:59 PM > To: openib-general at openib.org > Subject: [openib-general] mvapich2-gen2 svn - vapi <--> gen2 ?? > > Hello, > Regarding mvapich2-gen2 in the openib svn, > can an mvapich2 vapi build on one machine > communicate with a gen2 build on another? > > -bryan > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From mst at mellanox.co.il Thu Sep 28 01:21:02 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 11:21:02 +0300 Subject: [openib-general] is IB/cm: Randomize starting comm ID fix missing in OFED 1.1 ?! In-Reply-To: References: Message-ID: <20060928082102.GD25010@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: is IB/cm: Randomize starting comm ID fix missing in OFED 1.1 ?! > > Michael, > > I understand that OFED 1.1 is based on the IB code of 2.6.18-rc6, however, > this patch which was pushed to 2.6.19-rc1 solves a real problem which was > reported from a Lustre field install and can be easily reproducable in the lab. > > Can it go into rc7? > > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06d26537559113207e4b73af6a22eaa5c5e9dc3 > > Or. > Looks safe enough. OK. -- MST From mst at mellanox.co.il Thu Sep 28 01:39:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 11:39:09 +0300 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: <20060927062316.GO24009@mellanox.co.il> Message-ID: <20060928083909.GF25010@mellanox.co.il> Quoting r. Roland Dreier : > I think what we should do is follow the IB verbs extensions and expose > multiple CQ event vectors, and let the consumer pick which one to use > when creating a CQ. If IPoIB wants to go round robin itself, that > would be fine. > > This is what I tried to set the userspace API up for. Nothing in > userspace would have to change for this -- the kernel just needs to > add multiple EQ support. Sounds good. Fancy taking it up now, or should I look into this? -- MST From moshek at voltaire.com Thu Sep 28 02:00:10 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 28 Sep 2006 12:00:10 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: I prefer the "mstflint -d 0c:00.0 q " format As in enables the writing of script that extract lscpi info and getting results -> # mstflint -d `lspci | grep Mellanox |grep -v Bridge | cut -f1 -d" "` q Image type: Failsafe I.S. Version: 1 Chip Revision: A0 GUID Des: Node Port1 Port2 Sys image GUIDs: 0008f1040398047c 0008f1040398047d 0008f1040398047e 0008f1040398047f Board ID: (0TLV00700003) VSD: PSID: 0TLV00700003 # The format "mstflint -d mtch0 " is good but no sufficient . When the HCA is old/wrong/damaged insmod may fail. In this case we'll need mstflint to fix problems. Me must have a way to operate mstflint when driver is not loaded. > When mthca is loaded, what does > mstflint -d /sys/class/infiniband/mthca0/device/resource0 q > do on PPC? On PPC64 sles10 with Franks last fix # ./mstflint -d /sys/class/infiniband/mthca0/device/resource0 q *** ERROR *** Can not open /sys/class/infiniband/mthca0/device/resource0: Invalid argument *** ERROR *** Can not get flash type using device /sys/class/infiniband/mthca0/device/resource0 # On PPC64 with OFED-1.1 rc6 original sources # ./mstflint -d /sys/class/infiniband/mthca0/device/resource0 q *** ERROR *** Can not open /sys/class/infiniband/mthca0/device/resource0: Invalid argument *** ERROR *** Can not get flash type using device /sys/class/infiniband/mthca0/device/resource0 # Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Thursday, September 28, 2006 9:35 AM To: Moshe Kazir Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Quoting r. Moshe Kazir : > The mstflint operated in the "classic way" in OFED-1.1 is not working > on PPC64 sles10 !!! I consider the classic way to be -d /sys/class/infiniband/mthca0/device/resource0 It does seem a bit verbse now that you mention this - would a shortcut to allow just -d mthca0 help a lot? -- MST From mst at mellanox.co.il Thu Sep 28 02:48:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 12:48:18 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20060928094818.GH25010@mellanox.co.il> Quoting r. Moshe Kazir : > > When mthca is loaded, what does > > mstflint -d /sys/class/infiniband/mthca0/device/resource0 q > > do on PPC? > > > On PPC64 sles10 with Franks last fix > # ./mstflint -d /sys/class/infiniband/mthca0/device/resource0 q > *** ERROR *** Can not open Does /sys/class/infiniband/mthca0/device/resource0 exist on this system? Pls send output of ls /sys/class/infiniband/mthca0/device/ -- MST From mst at mellanox.co.il Thu Sep 28 02:53:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 12:53:08 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20060928095308.GI25010@mellanox.co.il> Quoting r. Moshe Kazir : > Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD > > I prefer the "mstflint -d 0c:00.0 q " format BTW, this won't work on systems with multiple domains - you must add the domain as well: mstflint -d 0000:0c:00.0 q -- MST From ogerlitz at voltaire.com Thu Sep 28 04:05:52 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 28 Sep 2006 14:05:52 +0300 Subject: [openib-general] is IB/cm: Randomize starting comm ID fix missing in OFED 1.1 ?! In-Reply-To: <20060928082102.GD25010@mellanox.co.il> References: <20060928082102.GD25010@mellanox.co.il> Message-ID: <451BAC90.5020002@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> Subject: is IB/cm: Randomize starting comm ID fix missing in OFED 1.1 ?! >> >> Michael, >> >> I understand that OFED 1.1 is based on the IB code of 2.6.18-rc6, however, >> this patch which was pushed to 2.6.19-rc1 solves a real problem which was >> reported from a Lustre field install and can be easily reproducable in the lab. >> >> Can it go into rc7? >> >> http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06d26537559113207e4b73af6a22eaa5c5e9dc3 >> >> Or. >> > > Looks safe enough. OK. cool, thanks. Or. From moshek at voltaire.com Thu Sep 28 04:16:26 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 28 Sep 2006 14:16:26 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: O.k. mstflint -d `lspci | grep Mellanox |grep -v Bridge | cut -f1 -d" "` q Will do the job . Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Thursday, September 28, 2006 12:53 PM To: Moshe Kazir Cc: openib-general at openib.org; openfabrics-ewg at openib.org Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Quoting r. Moshe Kazir : > Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not > loaded on AMD > > I prefer the "mstflint -d 0c:00.0 q " format BTW, this won't work on systems with multiple domains - you must add the domain as well: mstflint -d 0000:0c:00.0 q -- MST From moshek at voltaire.com Thu Sep 28 04:25:32 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 28 Sep 2006 14:25:32 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: # ls /sys/class/infiniband/mthca0/device/resource0 /sys/class/infiniband/mthca0/device/resource0 # ls -ald /sys/class/infiniband/mthca0/device/* lrwxrwxrwx 1 root root 0 Sep 27 11:33 /sys/class/infiniband/mthca0/device/bus -> ../../../../bus/pci -r--r--r-- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/class -rw-r--r-- 1 root root 256 Sep 28 14:17 /sys/class/infiniband/mthca0/device/config -r--r--r-- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/device -r--r--r-- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/devspec lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/driver -> ../../../../bus/pci/drivers/ib_mthca lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/infiniband:mthca0 -> ../../../../class/infiniband/mthca0 lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/infiniband_mad:issm0 -> ../../../../class/infiniband_mad/issm0 lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/infiniband_mad:issm1 -> ../../../../class/infiniband_mad/issm1 lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/infiniband_mad:umad0 -> ../../../../class/infiniband_mad/umad0 lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/infiniband_mad:umad1 -> ../../../../class/infiniband_mad/umad1 lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/infiniband_verbs:uverbs0 -> ../../../../class/infiniband_verbs/uverbs0 -r--r--r-- 1 root root 4096 Sep 28 14:17 /sys/class/infiniband/mthca0/device/irq -r--r--r-- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/local_cpus -r--r--r-- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/modalias lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/net:ib0 -> ../../../../class/net/ib0 lrwxrwxrwx 1 root root 0 Sep 28 11:43 /sys/class/infiniband/mthca0/device/net:ib1 -> ../../../../class/net/ib1 -r--r--r-- 1 root root 4096 Sep 28 11:43 /sys/class/infiniband/mthca0/device/pools -r--r--r-- 1 root root 4096 Sep 28 14:17 /sys/class/infiniband/mthca0/device/resource -rw------- 1 root root 1048576 Sep 28 14:17 /sys/class/infiniband/mthca0/device/resource0 -rw------- 1 root root 8388608 Sep 27 11:33 /sys/class/infiniband/mthca0/device/resource2 -rw------- 1 root root 134217728 Sep 27 11:33 /sys/class/infiniband/mthca0/device/resource4 -r--r--r-- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/subsystem_device -r--r--r-- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/subsystem_vendor --w------- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/uevent -r--r--r-- 1 root root 4096 Sep 27 11:33 /sys/class/infiniband/mthca0/device/vendor # Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Thursday, September 28, 2006 12:48 PM To: Moshe Kazir Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Quoting r. Moshe Kazir : > > When mthca is loaded, what does > > mstflint -d /sys/class/infiniband/mthca0/device/resource0 q do on > > PPC? > > > On PPC64 sles10 with Franks last fix > # ./mstflint -d /sys/class/infiniband/mthca0/device/resource0 q > *** ERROR *** Can not open Does /sys/class/infiniband/mthca0/device/resource0 exist on this system? Pls send output of ls /sys/class/infiniband/mthca0/device/ -- MST From mst at mellanox.co.il Thu Sep 28 04:41:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 14:41:03 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20060928114103.GA26457@mellanox.co.il> Quoting r. Moshe Kazir : > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD > > > # ls /sys/class/infiniband/mthca0/device/resource0 > /sys/class/infiniband/mthca0/device/resource0 OK, so can you try this please: strace -f -v -o log mstflint -d /sys/class/infiniband/mthca0/device/resource0 q cat log -- MST From moshek at voltaire.com Thu Sep 28 04:59:04 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 28 Sep 2006 14:59:04 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: See attached files. Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Thursday, September 28, 2006 2:41 PM To: Moshe Kazir Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Quoting r. Moshe Kazir : > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not > loaded on AMD > > > # ls /sys/class/infiniband/mthca0/device/resource0 > /sys/class/infiniband/mthca0/device/resource0 OK, so can you try this please: strace -f -v -o log mstflint -d /sys/class/infiniband/mthca0/device/resource0 q cat log -- MST -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Franks.mstflint.trace.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OFED-1.1-orig.mstflint.trace.txt URL: From mst at mellanox.co.il Thu Sep 28 05:12:00 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 15:12:00 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20060928121200.GB26457@mellanox.co.il> Quoting r. Moshe Kazir : > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD > > > See attached files. OK, so we can open the file, but can't mmap it. Let's see if we can read it. Pls compile the following test and run with strace: >strace -f -x -v -o log a.out >cat log #define _XOPEN_SOURCE 500 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include int main() { int fd, rc; unsigned value; fd = open("/sys/class/infiniband/mthca0/device/resource0" ,O_RDWR | O_SYNC); rc = pread(fd, &value, 4, 0xf0014); printf("0x%x\n", value); return 0; } -- MST From mlakshmanan at silverstorm.com Thu Sep 28 05:36:50 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Thu, 28 Sep 2006 08:36:50 -0400 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <1106.89.1.173.135.1159391291.squirrel@dev.mellanox.co.il> Message-ID: >> Roland Dreier wrote: >> Maybe we should just use the port GUID instead of the node GUID to >> form the initiator ID? That would solve this pretty cleanly I think. > This is also Vu's idea. > > There are two issues: > > 1) My patch allows a sophisticated user to have two logical connections on > the same physical solution. He can have different connection parameters > (e.g., MAX_CMD_PER_LUN) according to the application needs. > Do you think there is such need? > > 2) In the current implementation there is a problem when there are two > connections on the same physical connection - when the second connection > sends REQ to the target, the target sends a DREQ to the first connection, > but when someone tries to access the first scsi_host, ib_srp tries to > reconnect the first connection and then the second connection gets a DREQ > - and so the ping pong goes. > And if there is a multipath daemon that checks the status of the > connections this ping pong can be for ever. > We need to find a way to eliminate this behavior. > Ishai Silverstorm's native SRP implementation allows for the initiator ID to be the port GUID and the initiator extension to be user-specified. This approach is taken to initiate multiple connections to a single SRP target from the same host; i.e. the initiator ID is kept the same (port GUID) and a different initiator extension is specified. Btw, could you point me to the latest source code? I didn't see it under gen2/trunk/src/linux-kernel/infiniband/ulp/srp. I'd like to collaborate with you on OFED SRP. Madhu Lakshmanan SilverStorm Technologies _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Thu Sep 28 05:53:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 15:53:57 +0300 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: References: <1106.89.1.173.135.1159391291.squirrel@dev.mellanox.co.il> Message-ID: <20060928125357.GA28381@mellanox.co.il> Quoting r. Lakshmanan, Madhu : > gen2/trunk/src/linux-kernel/infiniband/ulp/srp. This is deprecated. You can get the exact code used for OFED 1.1 from ofed git tree. The instructions are here: https://openib.org/svn/gen2/branches/1.1/ofed/docs/HOWTO.build_ofed > I'd like to collaborate with you on OFED SRP. Please note that OFED 1.1 is in freeze and only critical and documentation fixes are accepted. Note also that OFED is a distribution testing and packaging, not a development effort. OFED backports kernel.org code to older kernels, so there's no "OFED SRP" as such: to get your work into the next OFED release you should just work against the latest kernel.org tree, and get Roland to accept your patches. -- MST From mst at mellanox.co.il Thu Sep 28 06:00:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 16:00:52 +0300 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: References: <20060926144541.GA17938@mellanox.co.il> <4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il> <451A95D5.7060409@mellanox.com> <451AA4ED.7010501@mellanox.com> Message-ID: <20060928130052.GB28381@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/SRP: Enable multichannel > > Maybe we should just use the port GUID instead of the node GUID to > form the initiator ID? That would solve this pretty cleanly I think. Sounds good. I think we should also stick the pkey into the identifier extension - I think it's nice for each partition to be able to act as a separate virtual network, not affecting others. What do you think? -- MST From mst at mellanox.co.il Thu Sep 28 06:40:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 16:40:30 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20060928134029.GA25913@mellanox.co.il> Quoting r. Moshe Kazir : > > Quoting r. Moshe Kazir : > > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not > > loaded on AMD > > > > > > # ls /sys/class/infiniband/mthca0/device/resource0 > > /sys/class/infiniband/mthca0/device/resource0 > > OK, so can you try this please: > > strace -f -v -o log mstflint -d > /sys/class/infiniband/mthca0/device/resource0 q > > cat log > > -- > MST > 30463 open("/sys/class/infiniband/mthca0/device/resource0", O_RDWR|O_SYNC|O_LARGEFILE) = 3 > 30463 mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 EINVAL (Invalid argument) So we see that mmap is failing with EINVAL. But why? We seem to be passing all valid parameters to it. I'm looking at arch/ppc/kernel/pci.c at the moment. It seems that EINVAL is returned if __pci_mmap_make_offset fails, and that seems to be only looking for a valid resource size. Are you up to finding the root cause of the problem in arch/ppc/kernel/pci.c? Maybe the resource offsets are wrong? What does cat /sys/class/infiniband/mthca0/device/resource show? Maybe there's some problem to map a full megabyte? Here's a test that only maps 4K. Could you strace it please? >>>>>>>>>>> #define _XOPEN_SOURCE 500 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* #include * #include */ int main() { int fd; unsigned value; volatile void *ptr; fd = open("/proc/bus/pci/00/00.0" ,O_RDWR | O_SYNC); /* ioctl(fd, PCIIOC_MMAP_IS_MEM); */ ptr = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xf0000); memcpy(&value, (void*)(ptr + 0x14), sizeof value); printf("0x%x\n"); return 0; } -- MST From moshek at voltaire.com Thu Sep 28 06:59:14 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 28 Sep 2006 16:59:14 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: Michael, Frank found the cause to the problem in the implementation of arch/ppc/kernel/pci.c , and asked the IBM kernel group to send a bug fix to the Linux kernel group. The problem is : 1. This bug fix will not enter SLES10 as it is closed. 2. It also will not enter SLES9 :-) or Redhate as4 u4 . So we need a bug fix that will enable the use of mstflint on js21 PPC64 + backport to old systems . Franks fix is based on two points (if I understand the code with no errors) - 1. It opens /proc/bus/pci... And not /sys/bus/pci/... 2. It perform an ictl(fd, PCIIOC_MMAP_IS_MEM) ; Frank - am I write ? Can we enter these two small changes to the mstflint to have it working on the PPC64 js21 ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Thursday, September 28, 2006 4:41 PM To: Moshe Kazir Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Quoting r. Moshe Kazir : > > Quoting r. Moshe Kazir : > > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is > > not > > loaded on AMD > > > > > > # ls /sys/class/infiniband/mthca0/device/resource0 > > /sys/class/infiniband/mthca0/device/resource0 > > OK, so can you try this please: > > strace -f -v -o log mstflint -d > /sys/class/infiniband/mthca0/device/resource0 q > > cat log > > -- > MST > 30463 open("/sys/class/infiniband/mthca0/device/resource0", O_RDWR|O_SYNC|O_LARGEFILE) = 3 > 30463 mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 EINVAL (Invalid argument) So we see that mmap is failing with EINVAL. But why? We seem to be passing all valid parameters to it. I'm looking at arch/ppc/kernel/pci.c at the moment. It seems that EINVAL is returned if __pci_mmap_make_offset fails, and that seems to be only looking for a valid resource size. Are you up to finding the root cause of the problem in arch/ppc/kernel/pci.c? Maybe the resource offsets are wrong? What does cat /sys/class/infiniband/mthca0/device/resource show? Maybe there's some problem to map a full megabyte? Here's a test that only maps 4K. Could you strace it please? >>>>>>>>>>> #define _XOPEN_SOURCE 500 #define _FILE_OFFSET_BITS 64 #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* #include * #include */ int main() { int fd; unsigned value; volatile void *ptr; fd = open("/proc/bus/pci/00/00.0" ,O_RDWR | O_SYNC); /* ioctl(fd, PCIIOC_MMAP_IS_MEM); */ ptr = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xf0000); memcpy(&value, (void*)(ptr + 0x14), sizeof value); printf("0x%x\n"); return 0; } -- MST From mst at mellanox.co.il Thu Sep 28 07:17:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 17:17:15 +0300 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20060928141715.GB28790@mellanox.co.il> Quoting r. Moshe Kazir : > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD > > Michael, > > Frank found the cause to the problem in the implementation of > arch/ppc/kernel/pci.c , > and asked the IBM kernel group to send a bug fix to the Linux kernel > group. > > The problem is : > > 1. This bug fix will not enter SLES10 as it is closed. > 2. It also will not enter SLES9 :-) or Redhate as4 u4 . > > So we need a bug fix that will enable the use of mstflint on js21 PPC64 > + backport to old systems . OK, cool, but could I see this discussion/patch please, to understand the solution? Just googling for Frank's name only gets me something related to SIOCGIFCONF ioctl. > Franks fix is based on two points (if I understand the code with no > errors) - > > 1. It opens /proc/bus/pci... And not /sys/bus/pci/... > 2. It perform an ictl(fd, PCIIOC_MMAP_IS_MEM) ; > > Frank - am I write ? > > Can we enter these two small changes to the mstflint to have it working > on the PPC64 js21 ? Oh, I was under impression that we were falling back on pread/pwrite from /proc, which is not safe without locking. -- MST From rdreier at cisco.com Thu Sep 28 07:52:35 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 07:52:35 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060928060817.GD23828@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 28 Sep 2006 09:08:17 +0300") References: <20060928060817.GD23828@mellanox.co.il> Message-ID: Michael> Fair enough, let's start simple. BTW, are you going to Michael> post the rewritten NAPI patch for testing soon? Yes. I need to finish the driver changes first. From rdreier at cisco.com Thu Sep 28 07:53:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 07:53:03 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060928083909.GF25010@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 28 Sep 2006 11:39:09 +0300") References: <20060927062316.GO24009@mellanox.co.il> <20060928083909.GF25010@mellanox.co.il> Message-ID: Michael> Sounds good. Fancy taking it up now, or should I look Michael> into this? Go ahead and work on it -- I've been meaning to for a year or so, and I haven't started yet. From rdreier at cisco.com Thu Sep 28 07:53:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 07:53:24 -0700 Subject: [openib-general] Compile warnings (cross build) In-Reply-To: <20060928062919.GH23828@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 28 Sep 2006 09:29:19 +0300") References: <20060926135114.1da96c1b@freekitty> <20060928062919.GH23828@mellanox.co.il> Message-ID: Michael> BTW, is there some printk format to print u64 type? Not that I know of. From bos at pathscale.com Thu Sep 28 07:58:26 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 07:58:26 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <20060928062723.GG23828@mellanox.co.il> References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> Message-ID: <1159455506.11976.1.camel@chalcedony.pathscale.com> On Thu, 2006-09-28 at 09:27 +0300, Michael S. Tsirkin wrote: > Me too. Roland and I (following his example) checked in changes to the mthca and ipath drivers in SVN yesterday that add a #warning to a core driver source file saying "don't look here, look over there!" That's a first step towards dropping the drivers from SVN trunk altogether. References: <20060928060817.GD23828@mellanox.co.il> Message-ID: <20060928151549.GG28790@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: NAPI > > Michael> Fair enough, let's start simple. BTW, are you going to > Michael> post the rewritten NAPI patch for testing soon? > > Yes. I need to finish the driver changes first. Looked pretty simple on the outset, but oh well. Keep us posted. -- MST From mst at mellanox.co.il Thu Sep 28 08:18:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 28 Sep 2006 18:18:14 +0300 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159455506.11976.1.camel@chalcedony.pathscale.com> References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> Message-ID: <20060928151814.GH28790@mellanox.co.il> Quoting r. Bryan O'Sullivan : > Subject: Re: 2.6.18 kernel support in the main trunk. > > On Thu, 2006-09-28 at 09:27 +0300, Michael S. Tsirkin wrote: > > > Me too. > > Roland and I (following his example) checked in changes to the mthca and > ipath drivers in SVN yesterday that add a #warning to a core driver > source file saying "don't look here, look over there!" That's a first > step towards dropping the drivers from SVN trunk altogether. Good idea. -- MST From kliteyn at dev.mellanox.co.il Thu Sep 28 08:16:36 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 28 Sep 2006 18:16:36 +0300 Subject: [openib-general] [PATCH 1/2] osm: osmtest ignores error status Message-ID: Hi Hal. This patch takes care of several cases where osmtest ignored error status. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmt_slvl_vl_arb.c =================================================================== --- osmt_slvl_vl_arb.c (revision 9661) +++ osmt_slvl_vl_arb.c (working copy) @@ -164,12 +164,9 @@ osmt_query_vl_arb( if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_query_vl_arb: ERR 0466: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_query_vl_arb: ERR 0466: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { @@ -385,12 +382,9 @@ osmt_query_slvl_map( if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_query_slvl_map: ERR 0470: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_query_slvl_map: ERR 0470: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { Index: osmt_inform.c =================================================================== --- osmt_inform.c (revision 9661) +++ osmt_inform.c (working copy) @@ -103,6 +103,7 @@ osmt_bind_inform_qp( IN osmtest_t * cons osm_log( p_log, OSM_LOG_ERROR, "osmt_bind_inform_qp: ERR 0109: " "Unable to obtain CA and port (%d).\n" ); + status = IB_ERROR; goto Exit; } @@ -579,6 +580,7 @@ osmt_send_trap_wait_for_forward( IN osmt "Did not receive a Report(Notice) but attr:%d\n", cl_ntoh16(p_sa_mad->attr_id) ); + status = IB_ERROR; } } else @@ -588,6 +590,7 @@ osmt_send_trap_wait_for_forward( IN osmt "Received an Unexpected Method:%d\n", p_smp->method ); + status = IB_ERROR; } Exit: @@ -666,6 +669,7 @@ osmt_trap_wait( IN osmtest_t * const "Did not receive a Report(Notice) but attr:%d\n", cl_ntoh16(p_sa_mad->attr_id) ); + status = IB_ERROR; } } else @@ -675,6 +679,7 @@ osmt_trap_wait( IN osmtest_t * const "Received an Unexpected Method:%d\n", p_smp->method ); + status = IB_ERROR; } Exit: From or.gerlitz at gmail.com Thu Sep 28 08:20:57 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 28 Sep 2006 17:20:57 +0200 Subject: [openib-general] [RFC] determining which changes in svn to merge upstream or remove In-Reply-To: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com> References: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com> Message-ID: <15ddcffd0609280820r6e36d834q2c6e8802b180ae25@mail.gmail.com> On 9/26/06, Sean Hefty wrote: > Specifically, the following features are in svn only: > * RDMA CM: > - userspace support > - multicast support > - UD QP support (required for multicast) I think that all the above three should be prioritized to be push for 2.6.20 and can be cool to have them in -mm before so people can experience with them before (the latter two are not part of OFED). The user space support for writing IB/RC ULPs is now in OFED and used by uDAPL which is in turn used in Intel MPI and more products to come, i hope. Exposing IB/UD/Mcast RDMA CM api in user space would allow to offload UDP based ULPs which use IP multicast, which is something also being talked here and there. It makes sense to use the IB mulitcast module in the kernel for keeping refs and managing user space processes SA Join/Leave interaction. It makes much sense to port IPoIB to this module, I saw the patch and basically it seems stright-forward. Or. From kliteyn at dev.mellanox.co.il Thu Sep 28 08:20:28 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 28 Sep 2006 18:20:28 +0300 Subject: [openib-general] [PATCH 2/2] osm: osmtest ignores error status Message-ID: Hi Hal. This patch takes care of several cases where osmtest ignored error status (plus some cosmetics). Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmt_service.c =================================================================== --- osmt_service.c (revision 9661) +++ osmt_service.c (working copy) @@ -60,6 +60,9 @@ #include #include "osmtest.h" +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_register_service( IN osmtest_t * const p_osmt, IN ib_net64_t service_id, @@ -174,6 +177,9 @@ osmt_register_service( IN osmtest_t * co return status; } +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_register_service_with_full_key ( IN osmtest_t * const p_osmt, IN ib_net64_t service_id, @@ -260,6 +266,23 @@ osmt_register_service_with_full_key ( IN } status = context.result.status; + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_register_service_with_full_key: ERR 4A04: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); + + if( status == IB_REMOTE_ERROR ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_register_service_with_full_key: " + "Remote error = %s\n", + ib_get_mad_status_str( osm_madw_get_mad_ptr + ( context.result. + p_result_madw ) ) ); + } + goto Exit; + } /* Check service key on context to see if match */ p_rec = osmv_get_query_svc_rec( context.result.p_result_madw, 0 ); @@ -277,30 +300,12 @@ osmt_register_service_with_full_key ( IN { status = IB_REMOTE_ERROR; osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_register_service_with_full_key:" + "osmt_register_service_with_full_key: ERR 4A34: " "Data mismatch in service_key\n" ); goto Exit; } - if( status != IB_SUCCESS ) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_register_service_with_full_key: ERR 4A04: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - - if( status == IB_REMOTE_ERROR ) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_register_service_with_full_key: " - "Remote error = %s\n", - ib_get_mad_status_str( osm_madw_get_mad_ptr - ( context.result. - p_result_madw ) ) ); - } - goto Exit; - } - Exit: if( context.result.p_result_madw != NULL ) { @@ -312,6 +317,9 @@ osmt_register_service_with_full_key ( IN return status; } +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_register_service_with_data( IN osmtest_t * const p_osmt, IN ib_net64_t service_id, @@ -478,6 +486,9 @@ osmt_register_service_with_data( IN osmt return status; } +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_get_service_by_id_and_name ( IN osmtest_t * const p_osmt, IN uint32_t rec_num, @@ -618,6 +629,9 @@ osmt_get_service_by_id_and_name ( IN osm return status; } +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_get_service_by_id ( IN osmtest_t * const p_osmt, IN uint32_t rec_num, @@ -755,6 +769,9 @@ osmt_get_service_by_id ( IN osmtest_t * return status; } +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_get_service_by_name_and_key ( IN osmtest_t * const p_osmt, IN char * sr_name, @@ -907,6 +924,9 @@ osmt_get_service_by_name_and_key ( IN os return status; } +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_get_service_by_name( IN osmtest_t * const p_osmt, IN char * sr_name, @@ -1036,6 +1056,9 @@ osmt_get_service_by_name( IN osmtest_t * return status; } +/********************************************************************** + **********************************************************************/ + #ifdef VENDOR_RMPP_SUPPORT ib_api_status_t osmt_get_all_services_and_check_names( IN osmtest_t * const p_osmt, @@ -1170,6 +1193,9 @@ osmt_get_all_services_and_check_names( I } #endif +/********************************************************************** + **********************************************************************/ + ib_api_status_t osmt_delete_service_by_name(IN osmtest_t * const p_osmt, IN uint8_t IsServiceExist, @@ -1293,6 +1319,9 @@ osmt_delete_service_by_name(IN osmtest_t return status; } +/********************************************************************** + **********************************************************************/ + /* * Run a complete service records flow: * - register a service From jlentini at netapp.com Thu Sep 28 08:48:10 2006 From: jlentini at netapp.com (James Lentini) Date: Thu, 28 Sep 2006 11:48:10 -0400 (EDT) Subject: [openib-general] Compile warnings (cross build) In-Reply-To: References: <20060926135114.1da96c1b@freekitty> <20060928062919.GH23828@mellanox.co.il> Message-ID: On Thu, 28 Sep 2006, Roland Dreier wrote: > Michael> BTW, is there some printk format to print u64 type? Try "%Lu", That will print a long long unsigned value. From bos at pathscale.com Thu Sep 28 08:59:58 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 08:59:58 -0700 Subject: [openib-general] [PATCH 2 of 28] IB/ipath - fix memory leak if allocation fails In-Reply-To: Message-ID: <45079acba20851290d1f.1159459198@eng-12.pathscale.com> If the second allocation failed, the first structure allocated in this routine was not freed. Signed-off-by: Bryan O'Sullivan diff -r c46292ccb0f5 -r 45079acba208 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 @@ -1326,6 +1326,9 @@ int ipath_create_rcvhdrq(struct ipath_de "for port %u rcvhdrqtailaddr failed\n", pd->port_port); ret = -ENOMEM; + dma_free_coherent(&dd->pcidev->dev, amt, + pd->port_rcvhdrq, pd->port_rcvhdrq_phys); + pd->port_rcvhdrq = NULL; goto bail; } pd->port_rcvhdrqtailaddr_phys = phys_hdrqtail; From bos at pathscale.com Thu Sep 28 08:59:56 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 08:59:56 -0700 Subject: [openib-general] [PATCH 0 of 28] ipath patches for 2.6.19 Message-ID: Hi, Roland - This patch series brings the ipath driver almost up to date with what's in our internal tree. The only substantial thing missing is the memcpy_cachebypass patch that I sent out a while back and haven't had time to rework. These patches have seen a lot of testing, including on a git snapshot as of yesterday afternoon. Please apply. Thanks, Message-ID: The sender requests an ACK every 1/2 MB to avoid retransmit timeouts that were causing MVAPICH mod_bw to fail after a predictable number of sends. Signed-off-by: Bryan O'Sullivan diff -r f1b431dca1f9 -r c46292ccb0f5 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Thu Sep 28 08:57:12 2006 -0700 @@ -342,6 +342,7 @@ static void ipath_reset_qp(struct ipath_ qp->s_last = 0; qp->s_ssn = 1; qp->s_lsn = 0; + qp->s_wait_credit = 0; if (qp->r_rq.wq) { qp->r_rq.wq->head = 0; qp->r_rq.wq->tail = 0; @@ -516,7 +517,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, qp->remote_qpn = attr->dest_qp_num; if (attr_mask & IB_QP_SQ_PSN) { - qp->s_next_psn = attr->sq_psn; + qp->s_psn = qp->s_next_psn = attr->sq_psn; qp->s_last_psn = qp->s_next_psn - 1; } diff -r f1b431dca1f9 -r c46292ccb0f5 drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Thu Sep 28 08:57:12 2006 -0700 @@ -201,6 +201,18 @@ int ipath_make_rc_req(struct ipath_qp *q qp->s_rnr_timeout) goto done; + /* Limit the number of packets sent without an ACK. */ + if (ipath_cmp24(qp->s_psn, qp->s_last_psn + IPATH_PSN_CREDIT) > 0) { + qp->s_wait_credit = 1; + dev->n_rc_stalls++; + spin_lock(&dev->pending_lock); + if (list_empty(&qp->timerwait)) + list_add_tail(&qp->timerwait, + &dev->pending[dev->pending_index]); + spin_unlock(&dev->pending_lock); + goto done; + } + /* header size in 32-bit words LRH+BTH = (8+12)/4. */ hwords = 5; bth0 = 0; @@ -221,7 +233,7 @@ int ipath_make_rc_req(struct ipath_qp *q /* Check if send work queue is empty. */ if (qp->s_tail == qp->s_head) goto done; - qp->s_psn = wqe->psn = qp->s_next_psn; + wqe->psn = qp->s_next_psn; newreq = 1; } /* @@ -393,12 +405,6 @@ int ipath_make_rc_req(struct ipath_qp *q ss = &qp->s_sge; len = qp->s_len; if (len > pmtu) { - /* - * Request an ACK every 1/2 MB to avoid retransmit - * timeouts. - */ - if (((wqe->length - len) % (512 * 1024)) == 0) - bth2 |= 1 << 31; len = pmtu; break; } @@ -435,12 +441,6 @@ int ipath_make_rc_req(struct ipath_qp *q ss = &qp->s_sge; len = qp->s_len; if (len > pmtu) { - /* - * Request an ACK every 1/2 MB to avoid retransmit - * timeouts. - */ - if (((wqe->length - len) % (512 * 1024)) == 0) - bth2 |= 1 << 31; len = pmtu; break; } @@ -498,6 +498,8 @@ int ipath_make_rc_req(struct ipath_qp *q */ goto done; } + if (ipath_cmp24(qp->s_psn, qp->s_last_psn + IPATH_PSN_CREDIT - 1) >= 0) + bth2 |= 1 << 31; /* Request ACK. */ qp->s_len -= len; qp->s_hdrwords = hwords; qp->s_cur_sge = ss; @@ -737,6 +739,15 @@ bail: return; } +static inline void update_last_psn(struct ipath_qp *qp, u32 psn) +{ + if (qp->s_wait_credit) { + qp->s_wait_credit = 0; + tasklet_hi_schedule(&qp->s_task); + } + qp->s_last_psn = psn; +} + /** * do_rc_ack - process an incoming RC ACK * @qp: the QP the ACK came in on @@ -805,7 +816,7 @@ static int do_rc_ack(struct ipath_qp *qp * The last valid PSN seen is the previous * request's. */ - qp->s_last_psn = wqe->psn - 1; + update_last_psn(qp, wqe->psn - 1); /* Retry this request. */ ipath_restart_rc(qp, wqe->psn, &wc); /* @@ -864,7 +875,7 @@ static int do_rc_ack(struct ipath_qp *qp ipath_get_credit(qp, aeth); qp->s_rnr_retry = qp->s_rnr_retry_cnt; qp->s_retry = qp->s_retry_cnt; - qp->s_last_psn = psn; + update_last_psn(qp, psn); ret = 1; goto bail; @@ -883,7 +894,7 @@ static int do_rc_ack(struct ipath_qp *qp goto bail; /* The last valid PSN is the previous PSN. */ - qp->s_last_psn = psn - 1; + update_last_psn(qp, psn - 1); dev->n_rc_resends += (int)qp->s_psn - (int)psn; @@ -898,7 +909,7 @@ static int do_rc_ack(struct ipath_qp *qp case 3: /* NAK */ /* The last valid PSN seen is the previous request's. */ if (qp->s_last != qp->s_tail) - qp->s_last_psn = wqe->psn - 1; + update_last_psn(qp, wqe->psn - 1); switch ((aeth >> IPATH_AETH_CREDIT_SHIFT) & IPATH_AETH_CREDIT_MASK) { case 0: /* PSN sequence error */ @@ -1071,7 +1082,7 @@ static inline void ipath_rc_rcv_resp(str * since we don't want s_sge modified. */ qp->s_len -= pmtu; - qp->s_last_psn = psn; + update_last_psn(qp, psn); spin_unlock_irqrestore(&qp->s_lock, flags); ipath_copy_sge(&qp->s_sge, data, pmtu); goto bail; diff -r f1b431dca1f9 -r c46292ccb0f5 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Sep 28 08:57:12 2006 -0700 @@ -1683,6 +1683,7 @@ static ssize_t show_stats(struct class_d "RC OTH NAKs %d\n" "RC timeouts %d\n" "RC RDMA dup %d\n" + "RC stalls %d\n" "piobuf wait %d\n" "no piobuf %d\n" "PKT drops %d\n" @@ -1690,7 +1691,7 @@ static ssize_t show_stats(struct class_d dev->n_rc_resends, dev->n_rc_qacks, dev->n_rc_acks, dev->n_seq_naks, dev->n_rdma_seq, dev->n_rnr_naks, dev->n_other_naks, dev->n_timeouts, - dev->n_rdma_dup_busy, dev->n_piowait, + dev->n_rdma_dup_busy, dev->n_rc_stalls, dev->n_piowait, dev->n_no_piobuf, dev->n_pkt_drops, dev->n_wqe_errs); for (i = 0; i < ARRAY_SIZE(dev->opstats); i++) { const struct ipath_opcode_stats *si = &dev->opstats[i]; diff -r f1b431dca1f9 -r c46292ccb0f5 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Sep 28 08:57:12 2006 -0700 @@ -370,6 +370,7 @@ struct ipath_qp { u8 s_rnr_retry_cnt; u8 s_retry; /* requester retry counter */ u8 s_rnr_retry; /* requester RNR retry counter */ + u8 s_wait_credit; /* limit number of unacked packets sent */ u8 s_pkey_index; /* PKEY index to use */ u8 timeout; /* Timeout for this QP */ enum ib_mtu path_mtu; @@ -392,6 +393,8 @@ struct ipath_qp { */ #define IPATH_S_BUSY 0 #define IPATH_S_SIGNAL_REQ_WR 1 + +#define IPATH_PSN_CREDIT 2048 /* * Since struct ipath_swqe is not a fixed size, we can't simply index into @@ -521,6 +524,7 @@ struct ipath_ibdev { u32 n_rnr_naks; u32 n_other_naks; u32 n_timeouts; + u32 n_rc_stalls; u32 n_pkt_drops; u32 n_vl15_dropped; u32 n_wqe_errs; From bos at pathscale.com Thu Sep 28 09:00:00 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:00 -0700 Subject: [openib-general] [PATCH 4 of 28] IB/ipath - support revision 2 InfiniPath PCIE devices In-Reply-To: Message-ID: This also entailed a little GPIO-interrupt general cleanup. Signed-off-by: Bryan O'Sullivan diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_common.h --- a/drivers/infiniband/hw/ipath/ipath_common.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_common.h Thu Sep 28 08:57:12 2006 -0700 @@ -186,6 +186,8 @@ typedef enum _ipath_ureg { #define IPATH_RUNTIME_FORCE_WC_ORDER 0x4 #define IPATH_RUNTIME_RCVHDR_COPY 0x8 #define IPATH_RUNTIME_MASTER 0x10 +#define IPATH_RUNTIME_PBC_REWRITE 0x20 +#define IPATH_RUNTIME_LOOSE_DMA_ALIGN 0x40 /* * This structure is returned by ipath_userinit() immediately after diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_iba6120.c --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:12 2006 -0700 @@ -294,6 +294,13 @@ static const struct ipath_cregs ipath_pe #define IPATH_GPIO_SCL (1ULL << \ (_IPATH_GPIO_SCL_NUM+INFINIPATH_EXTC_GPIOOE_SHIFT)) +/* + * Rev2 silicon allows suppressing check for ArmLaunch errors. + * this can speed up short packet sends on systems that do + * not guaranteee write-order. + */ +#define INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR (1ULL<<63) + /** * ipath_pe_handle_hwerrors - display hardware errors. * @dd: the infinipath device @@ -571,9 +578,12 @@ static void ipath_pe_init_hwerrors(struc if (!dd->ipath_boardrev) // no PLL for Emulator val &= ~INFINIPATH_HWE_SERDESPLLFAILED; - /* workaround bug 9460 in internal interface bus parity checking */ - val &= ~INFINIPATH_HWE_PCIEBUSPARITYRADM; - + if (dd->ipath_minrev < 2) { + /* workaround bug 9460 in internal interface bus parity + * checking. Fixed (HW bug 9490) in Rev2. + */ + val &= ~INFINIPATH_HWE_PCIEBUSPARITYRADM; + } dd->ipath_hwerrmask = val; } @@ -583,8 +593,8 @@ static void ipath_pe_init_hwerrors(struc */ static int ipath_pe_bringup_serdes(struct ipath_devdata *dd) { - u64 val, tmp, config1; - int ret = 0, change = 0; + u64 val, tmp, config1, prev_val; + int ret = 0; ipath_dbg("Trying to bringup serdes\n"); @@ -641,6 +651,7 @@ static int ipath_pe_bringup_serdes(struc val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig); + prev_val = val; if (((val >> INFINIPATH_XGXS_MDIOADDR_SHIFT) & INFINIPATH_XGXS_MDIOADDR_MASK) != 3) { val &= @@ -648,11 +659,9 @@ static int ipath_pe_bringup_serdes(struc INFINIPATH_XGXS_MDIOADDR_SHIFT); /* MDIO address 3 */ val |= 3ULL << INFINIPATH_XGXS_MDIOADDR_SHIFT; - change = 1; } if (val & INFINIPATH_XGXS_RESET) { val &= ~INFINIPATH_XGXS_RESET; - change = 1; } if (((val >> INFINIPATH_XGXS_RX_POL_SHIFT) & INFINIPATH_XGXS_RX_POL_MASK) != dd->ipath_rx_pol_inv ) { @@ -661,9 +670,19 @@ static int ipath_pe_bringup_serdes(struc INFINIPATH_XGXS_RX_POL_SHIFT); val |= dd->ipath_rx_pol_inv << INFINIPATH_XGXS_RX_POL_SHIFT; - change = 1; - } - if (change) + } + if (dd->ipath_minrev >= 2) { + /* Rev 2. can tolerate multiple writes to PBC, and + * allowing them can provide lower latency on some + * CPUs, but this feature is off by default, only + * turned on by setting D63 of XGXSconfig reg. + * May want to make this conditional more + * fine-grained in future. This is not exactly + * related to XGXS, but where the bit ended up. + */ + val |= INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR; + } + if (val != prev_val) ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val); val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_serdesconfig0); @@ -717,9 +736,25 @@ static void ipath_pe_quiet_serdes(struct ipath_write_kreg(dd, dd->ipath_kregs->kr_serdesconfig0, val); } -/* this is not yet needed on this chip, so just return 0. */ static int ipath_pe_intconfig(struct ipath_devdata *dd) { + u64 val; + u32 chiprev; + + /* + * If the chip supports added error indication via GPIO pins, + * enable interrupts on those bits so the interrupt routine + * can count the events. Also set flag so interrupt routine + * can know they are expected. + */ + chiprev = dd->ipath_revision >> INFINIPATH_R_CHIPREVMINOR_SHIFT; + if ((chiprev & INFINIPATH_R_CHIPREVMINOR_MASK) > 1) { + /* Rev2+ reports extra errors via internal GPIO pins */ + dd->ipath_flags |= IPATH_GPIO_ERRINTRS; + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_gpio_mask); + val |= IPATH_GPIO_ERRINTR_MASK; + ipath_write_kreg( dd, dd->ipath_kregs->kr_gpio_mask, val); + } return 0; } @@ -1082,6 +1117,45 @@ static void ipath_pe_put_tid(struct ipat mmiowb(); spin_unlock_irqrestore(&dd->ipath_tid_lock, flags); } +/** + * ipath_pe_put_tid_2 - write a TID in chip, Revision 2 or higher + * @dd: the infinipath device + * @tidptr: pointer to the expected TID (in chip) to udpate + * @tidtype: 0 for eager, 1 for expected + * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing + * + * This exists as a separate routine to allow for selection of the + * appropriate "flavor". The static calls in cleanup just use the + * revision-agnostic form, as they are not performance critical. + */ +static void ipath_pe_put_tid_2(struct ipath_devdata *dd, u64 __iomem *tidptr, + u32 type, unsigned long pa) +{ + u32 __iomem *tidp32 = (u32 __iomem *)tidptr; + + if (pa != dd->ipath_tidinvalid) { + if (pa & ((1U << 11) - 1)) { + dev_info(&dd->pcidev->dev, "BUG: physaddr %lx " + "not 4KB aligned!\n", pa); + return; + } + pa >>= 11; + /* paranoia check */ + if (pa & (7<<29)) + ipath_dev_err(dd, + "BUG: Physical page address 0x%lx " + "has bits set in 31-29\n", pa); + + if (type == 0) + pa |= dd->ipath_tidtemplate; + else /* for now, always full 4KB page */ + pa |= 2 << 29; + } + if (dd->ipath_kregbase) + writel(pa, tidp32); + mmiowb(); +} + /** * ipath_pe_clear_tid - clear all TID entries for a port, expected and eager @@ -1203,7 +1277,7 @@ int __attribute__((weak)) ipath_unordere /** * ipath_init_pe_get_base_info - set chip-specific flags for user code - * @dd: the infinipath device + * @pd: the infinipath port * @kbase: ipath_base_info pointer * * We set the PCIE flag because the lower bandwidth on PCIe vs @@ -1212,6 +1286,7 @@ static int ipath_pe_get_base_info(struct static int ipath_pe_get_base_info(struct ipath_portdata *pd, void *kbase) { struct ipath_base_info *kinfo = kbase; + struct ipath_devdata *dd; if (ipath_unordered_wc()) { kinfo->spi_runtime_flags |= IPATH_RUNTIME_FORCE_WC_ORDER; @@ -1220,8 +1295,20 @@ static int ipath_pe_get_base_info(struct else ipath_cdbg(PROC, "Not Intel processor, WC ordered\n"); + if (pd == NULL) + goto done; + + dd = pd->port_dd; + + if (dd != NULL && dd->ipath_minrev >= 2) { + ipath_cdbg(PROC, "IBA6120 Rev2, allow multiple PBC write\n"); + kinfo->spi_runtime_flags |= IPATH_RUNTIME_PBC_REWRITE; + ipath_cdbg(PROC, "IBA6120 Rev2, allow loose DMA alignment\n"); + kinfo->spi_runtime_flags |= IPATH_RUNTIME_LOOSE_DMA_ALIGN; + } + +done: kinfo->spi_runtime_flags |= IPATH_RUNTIME_PCIE; - return 0; } @@ -1244,7 +1331,10 @@ void ipath_init_iba6120_funcs(struct ipa dd->ipath_f_quiet_serdes = ipath_pe_quiet_serdes; dd->ipath_f_bringup_serdes = ipath_pe_bringup_serdes; dd->ipath_f_clear_tids = ipath_pe_clear_tids; - dd->ipath_f_put_tid = ipath_pe_put_tid; + if (dd->ipath_minrev >= 2) + dd->ipath_f_put_tid = ipath_pe_put_tid_2; + else + dd->ipath_f_put_tid = ipath_pe_put_tid; dd->ipath_f_cleanup = ipath_setup_pe_cleanup; dd->ipath_f_setextled = ipath_setup_pe_setextled; dd->ipath_f_get_base_info = ipath_pe_get_base_info; diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:12 2006 -0700 @@ -808,7 +808,7 @@ irqreturn_t ipath_intr(int irq, void *da if (oldhead != curtail) { if (dd->ipath_flags & IPATH_GPIO_INTR) { ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, - (u64) (1 << 2)); + (u64) (1 << IPATH_GPIO_PORT0_BIT)); istat = port0rbits | INFINIPATH_I_GPIO; } else @@ -867,25 +867,79 @@ irqreturn_t ipath_intr(int irq, void *da if (istat & INFINIPATH_I_GPIO) { /* - * Packets are available in the port 0 rcv queue. - * Eventually this needs to be generalized to check - * IPATH_GPIO_INTR, and the specific GPIO bit, if - * GPIO interrupts are used for anything else. - */ - if (unlikely(!(dd->ipath_flags & IPATH_GPIO_INTR))) { - u32 gpiostatus; - gpiostatus = ipath_read_kreg32( - dd, dd->ipath_kregs->kr_gpio_status); - ipath_dbg("Unexpected GPIO interrupt bits %x\n", - gpiostatus); + * GPIO interrupts fall in two broad classes: + * GPIO_2 indicates (on some HT4xx boards) that a packet + * has arrived for Port 0. Checking for this + * is controlled by flag IPATH_GPIO_INTR. + * GPIO_3..5 on IBA6120 Rev2 chips indicate errors + * that we need to count. Checking for this + * is controlled by flag IPATH_GPIO_ERRINTRS. + */ + u32 gpiostatus; + u32 to_clear = 0; + + gpiostatus = ipath_read_kreg32( + dd, dd->ipath_kregs->kr_gpio_status); + /* First the error-counter case. + */ + if ((gpiostatus & IPATH_GPIO_ERRINTR_MASK) && + (dd->ipath_flags & IPATH_GPIO_ERRINTRS)) { + /* want to clear the bits we see asserted. */ + to_clear |= (gpiostatus & IPATH_GPIO_ERRINTR_MASK); + + /* + * Count appropriately, clear bits out of our copy, + * as they have been "handled". + */ + if (gpiostatus & (1 << IPATH_GPIO_RXUVL_BIT)) { + ipath_dbg("FlowCtl on UnsupVL\n"); + dd->ipath_rxfc_unsupvl_errs++; + } + if (gpiostatus & (1 << IPATH_GPIO_OVRUN_BIT)) { + ipath_dbg("Overrun Threshold exceeded\n"); + dd->ipath_overrun_thresh_errs++; + } + if (gpiostatus & (1 << IPATH_GPIO_LLI_BIT)) { + ipath_dbg("Local Link Integrity error\n"); + dd->ipath_lli_errs++; + } + gpiostatus &= ~IPATH_GPIO_ERRINTR_MASK; + } + /* Now the Port0 Receive case */ + if ((gpiostatus & (1 << IPATH_GPIO_PORT0_BIT)) && + (dd->ipath_flags & IPATH_GPIO_INTR)) { + /* + * GPIO status bit 2 is set, and we expected it. + * clear it and indicate in p0bits. + * This probably only happens if a Port0 pkt + * arrives at _just_ the wrong time, and we + * handle that by seting chk0rcv; + */ + to_clear |= (1 << IPATH_GPIO_PORT0_BIT); + gpiostatus &= ~(1 << IPATH_GPIO_PORT0_BIT); + chk0rcv = 1; + } + if (unlikely(gpiostatus)) { + /* + * Some unexpected bits remain. If they could have + * caused the interrupt, complain and clear. + * MEA: this is almost certainly non-ideal. + * we should look into auto-disable of unexpected + * GPIO interrupts, possibly on a "three strikes" + * basis. + */ + u32 mask; + mask = ipath_read_kreg32( + dd, dd->ipath_kregs->kr_gpio_mask); + if (mask & gpiostatus) { + ipath_dbg("Unexpected GPIO IRQ bits %x\n", + gpiostatus & mask); + to_clear |= (gpiostatus & mask); + } + } + if (to_clear) { ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, - gpiostatus); - } - else { - /* Clear GPIO status bit 2 */ - ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, - (u64) (1 << 2)); - chk0rcv = 1; + (u64) to_clear); } } chk0rcv |= istat & port0rbits; diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 @@ -524,6 +524,15 @@ struct ipath_devdata { u32 ipath_lli_counter; /* local link integrity errors */ u32 ipath_lli_errors; + /* + * Above counts only cases where _successive_ LocalLinkIntegrity + * errors were seen in the receive headers of kern-packets. + * Below are the three (monotonically increasing) counters + * maintained via GPIO interrupts on iba6120-rev2. + */ + u32 ipath_rxfc_unsupvl_errs; + u32 ipath_overrun_thresh_errs; + u32 ipath_lli_errs; }; /* Private data for file operations */ @@ -636,6 +645,15 @@ int ipath_set_rx_pol_inv(struct ipath_de /* can miss port0 rx interrupts */ #define IPATH_POLL_RX_INTR 0x40000 #define IPATH_DISABLED 0x80000 /* administratively disabled */ + /* Use GPIO interrupts for new counters */ +#define IPATH_GPIO_ERRINTRS 0x100000 + +/* Bits in GPIO for the added interrupts */ +#define IPATH_GPIO_PORT0_BIT 2 +#define IPATH_GPIO_RXUVL_BIT 3 +#define IPATH_GPIO_OVRUN_BIT 4 +#define IPATH_GPIO_LLI_BIT 5 +#define IPATH_GPIO_ERRINTR_MASK 0x38 /* portdata flag bit offsets */ /* waiting for a packet to arrive */ diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Sep 28 08:57:12 2006 -0700 @@ -898,7 +898,8 @@ int ipath_get_counters(struct ipath_devd ipath_snap_cntr(dd, dd->ipath_cregs->cr_erricrccnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_errvcrccnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_errlpcrccnt) + - ipath_snap_cntr(dd, dd->ipath_cregs->cr_badformatcnt); + ipath_snap_cntr(dd, dd->ipath_cregs->cr_badformatcnt) + + dd->ipath_rxfc_unsupvl_errs; cntrs->port_rcv_remphys_errors = ipath_snap_cntr(dd, dd->ipath_cregs->cr_rcvebpcnt); cntrs->port_xmit_discards = @@ -911,8 +912,10 @@ int ipath_get_counters(struct ipath_devd ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktsendcnt); cntrs->port_rcv_packets = ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktrcvcnt); - cntrs->local_link_integrity_errors = dd->ipath_lli_errors; - cntrs->excessive_buffer_overrun_errors = 0; /* XXX */ + cntrs->local_link_integrity_errors = + (dd->ipath_flags & IPATH_GPIO_ERRINTRS) ? + dd->ipath_lli_errs : dd->ipath_lli_errors; + cntrs->excessive_buffer_overrun_errors = dd->ipath_overrun_thresh_errs; ret = 0; @@ -1380,11 +1383,13 @@ static int enable_timer(struct ipath_dev * processing. */ if (dd->ipath_flags & IPATH_GPIO_INTR) { + u64 val; ipath_write_kreg(dd, dd->ipath_kregs->kr_debugportselect, 0x2074076542310ULL); /* Enable GPIO bit 2 interrupt */ - ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_mask, - (u64) (1 << 2)); + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_gpio_mask); + val |= (u64) (1 << IPATH_GPIO_PORT0_BIT); + ipath_write_kreg( dd, dd->ipath_kregs->kr_gpio_mask, val); } init_timer(&dd->verbs_timer); @@ -1399,8 +1404,17 @@ static int disable_timer(struct ipath_de static int disable_timer(struct ipath_devdata *dd) { /* Disable GPIO bit 2 interrupt */ - if (dd->ipath_flags & IPATH_GPIO_INTR) - ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_mask, 0); + if (dd->ipath_flags & IPATH_GPIO_INTR) { + u64 val; + /* Disable GPIO bit 2 interrupt */ + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_gpio_mask); + val &= ~((u64) (1 << IPATH_GPIO_PORT0_BIT)); + ipath_write_kreg( dd, dd->ipath_kregs->kr_gpio_mask, val); + /* + * We might want to undo changes to debugportselect, + * but how? + */ + } del_timer_sync(&dd->verbs_timer); From bos at pathscale.com Thu Sep 28 09:00:01 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:01 -0700 Subject: [openib-general] [PATCH 5 of 28] IB/ipath - unregister from IB core early In-Reply-To: Message-ID: This gives upper-level protocols a chance to unregister while the device is still usable. Signed-off-by: Bryan O'Sullivan diff -r a69f8b7a8a04 -r e2916bbf09ed drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 @@ -536,7 +536,12 @@ static void __devexit ipath_remove_one(s return; dd = pci_get_drvdata(pdev); - ipath_unregister_ib_device(dd->verbs_dev); + + if (dd->verbs_dev) { + ipath_unregister_ib_device(dd->verbs_dev); + dd->verbs_dev = NULL; + } + ipath_diag_remove(dd); ipath_user_remove(dd); ipathfs_remove_device(dd); @@ -2027,6 +2032,11 @@ static void __exit infinipath_cleanup(vo list_for_each_entry_safe(dd, tmp, &ipath_dev_list, ipath_list) { spin_unlock_irqrestore(&ipath_devs_lock, flags); + if (dd->verbs_dev) { + ipath_unregister_ib_device(dd->verbs_dev); + dd->verbs_dev = NULL; + } + if (dd->ipath_kregbase) cleanup_device(dd); From bos at pathscale.com Thu Sep 28 08:59:59 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 08:59:59 -0700 Subject: [openib-general] [PATCH 3 of 28] IB/ipath - driver support for userspace sharing of HW contexts In-Reply-To: Message-ID: <7f5b6127be15cded56e1.1159459199@eng-12.pathscale.com> This allows multiple userspace processes to share a single hardware context in a master/slave arrangement. It is backwards binary compatible with existing userspace. Signed-off-by: Bryan O'Sullivan diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_common.h --- a/drivers/infiniband/hw/ipath/ipath_common.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_common.h Thu Sep 28 08:57:12 2006 -0700 @@ -185,6 +185,7 @@ typedef enum _ipath_ureg { #define IPATH_RUNTIME_PCIE 0x2 #define IPATH_RUNTIME_FORCE_WC_ORDER 0x4 #define IPATH_RUNTIME_RCVHDR_COPY 0x8 +#define IPATH_RUNTIME_MASTER 0x10 /* * This structure is returned by ipath_userinit() immediately after @@ -202,7 +203,8 @@ struct ipath_base_info { /* version of software, for feature checking. */ __u32 spi_sw_version; /* InfiniPath port assigned, goes into sent packets */ - __u32 spi_port; + __u16 spi_port; + __u16 spi_subport; /* * IB MTU, packets IB data must be less than this. * The MTU is in bytes, and will be a multiple of 4 bytes. @@ -218,7 +220,7 @@ struct ipath_base_info { __u32 spi_tidcnt; /* size of the TID Eager list in infinipath, in entries */ __u32 spi_tidegrcnt; - /* size of a single receive header queue entry. */ + /* size of a single receive header queue entry in words. */ __u32 spi_rcvhdrent_size; /* * Count of receive header queue entries allocated. @@ -310,6 +312,12 @@ struct ipath_base_info { __u32 spi_filler_for_align; /* address of readonly memory copy of the rcvhdrq tail register. */ __u64 spi_rcvhdr_tailaddr; + + /* shared memory pages for subports if IPATH_RUNTIME_MASTER is set */ + __u64 spi_subport_uregbase; + __u64 spi_subport_rcvegrbuf; + __u64 spi_subport_rcvhdr_base; + } __attribute__ ((aligned(8))); @@ -328,12 +336,12 @@ struct ipath_base_info { /* * Minor version differences are always compatible - * a within a major version, however if if user software is larger + * a within a major version, however if user software is larger * than driver software, some new features and/or structure fields * may not be implemented; the user code must deal with this if it - * cares, or it must abort after initialization reports the difference - */ -#define IPATH_USER_SWMINOR 2 + * cares, or it must abort after initialization reports the difference. + */ +#define IPATH_USER_SWMINOR 3 #define IPATH_USER_SWVERSION ((IPATH_USER_SWMAJOR<<16) | IPATH_USER_SWMINOR) @@ -379,7 +387,16 @@ struct ipath_user_info { */ __u32 spu_rcvhdrsize; - __u64 spu_unused; /* kept for compatible layout */ + /* + * If two or more processes wish to share a port, each process + * must set the spu_subport_cnt and spu_subport_id to the same + * values. The only restriction on the spu_subport_id is that + * it be unique for a given node. + */ + __u16 spu_subport_cnt; + __u16 spu_subport_id; + + __u32 spu_unused; /* kept for compatible layout */ /* * address of struct base_info to write to @@ -398,13 +415,17 @@ struct ipath_user_info { #define IPATH_CMD_TID_UPDATE 19 /* update expected TID entries */ #define IPATH_CMD_TID_FREE 20 /* free expected TID entries */ #define IPATH_CMD_SET_PART_KEY 21 /* add partition key */ - -#define IPATH_CMD_MAX 21 +#define IPATH_CMD_SLAVE_INFO 22 /* return info on slave processes */ + +#define IPATH_CMD_MAX 22 struct ipath_port_info { __u32 num_active; /* number of active units */ __u32 unit; /* unit (chip) assigned to caller */ - __u32 port; /* port on unit assigned to caller */ + __u16 port; /* port on unit assigned to caller */ + __u16 subport; /* subport on unit assigned to caller */ + __u16 num_ports; /* number of ports available on unit */ + __u16 num_subports; /* number of subport slaves opened on port */ }; struct ipath_tid_info { @@ -435,6 +456,8 @@ struct ipath_cmd { __u32 recv_ctrl; /* partition key to set */ __u16 part_key; + /* user address of __u32 bitmask of active slaves */ + __u64 slave_mask_addr; } cmd; }; @@ -596,6 +619,10 @@ struct infinipath_counters { /* K_PktFlags bits */ #define INFINIPATH_KPF_INTR 0x1 +#define INFINIPATH_KPF_SUBPORT_MASK 0x3 +#define INFINIPATH_KPF_SUBPORT_SHIFT 1 + +#define INFINIPATH_MAX_SUBPORT 4 /* SendPIO per-buffer control */ #define INFINIPATH_SP_TEST 0x40 @@ -610,7 +637,7 @@ struct ipath_header { /* * Version - 4 bits, Port - 4 bits, TID - 10 bits and Offset - * 14 bits before ECO change ~28 Dec 03. After that, Vers 4, - * Port 3, TID 11, offset 14. + * Port 4, TID 11, offset 13. */ __le32 ver_port_tid_offset; __le16 chksum; diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 @@ -1827,9 +1827,9 @@ void ipath_free_pddata(struct ipath_devd dma_free_coherent(&dd->pcidev->dev, size, base, pd->port_rcvegrbuf_phys[e]); } - vfree(pd->port_rcvegrbuf); + kfree(pd->port_rcvegrbuf); pd->port_rcvegrbuf = NULL; - vfree(pd->port_rcvegrbuf_phys); + kfree(pd->port_rcvegrbuf_phys); pd->port_rcvegrbuf_phys = NULL; pd->port_rcvegrbuf_chunks = 0; } else if (pd->port_port == 0 && dd->ipath_port0_skbs) { @@ -1845,6 +1845,9 @@ void ipath_free_pddata(struct ipath_devd vfree(skbs); } kfree(pd->port_tid_pg_list); + vfree(pd->subport_uregbase); + vfree(pd->subport_rcvegrbuf); + vfree(pd->subport_rcvhdr_base); kfree(pd); } diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Thu Sep 28 08:57:12 2006 -0700 @@ -41,6 +41,12 @@ #include "ipath_kernel.h" #include "ipath_common.h" +/* + * mmap64 doesn't allow all 64 bits for 32-bit applications + * so only use the low 43 bits. + */ +#define MMAP64_MASK 0x7FFFFFFFFFFUL + static int ipath_open(struct inode *, struct file *); static int ipath_close(struct inode *, struct file *); static ssize_t ipath_write(struct file *, const char __user *, size_t, @@ -57,18 +63,35 @@ static struct file_operations ipath_file .mmap = ipath_mmap }; -static int ipath_get_base_info(struct ipath_portdata *pd, +static int ipath_get_base_info(struct file *fp, void __user *ubase, size_t ubase_size) { + struct ipath_portdata *pd = port_fp(fp); int ret = 0; struct ipath_base_info *kinfo = NULL; struct ipath_devdata *dd = pd->port_dd; - - if (ubase_size < sizeof(*kinfo)) { + unsigned subport_cnt; + int shared, master; + size_t sz; + + subport_cnt = pd->port_subport_cnt; + if (!subport_cnt) { + shared = 0; + master = 0; + subport_cnt = 1; + } else { + shared = 1; + master = !subport_fp(fp); + } + + sz = sizeof(*kinfo); + /* If port sharing is not requested, allow the old size structure */ + if (!shared) + sz -= 3 * sizeof(u64); + if (ubase_size < sz) { ipath_cdbg(PROC, - "Base size %lu, need %lu (version mismatch?)\n", - (unsigned long) ubase_size, - (unsigned long) sizeof(*kinfo)); + "Base size %zu, need %zu (version mismatch?)\n", + ubase_size, sz); ret = -EINVAL; goto bail; } @@ -95,7 +118,9 @@ static int ipath_get_base_info(struct ip kinfo->spi_rcv_egrperchunk = pd->port_rcvegrbufs_perchunk; kinfo->spi_rcv_egrchunksize = kinfo->spi_rcv_egrbuftotlen / pd->port_rcvegrbuf_chunks; - kinfo->spi_tidcnt = dd->ipath_rcvtidcnt; + kinfo->spi_tidcnt = dd->ipath_rcvtidcnt / subport_cnt; + if (master) + kinfo->spi_tidcnt += dd->ipath_rcvtidcnt % subport_cnt; /* * for this use, may be ipath_cfgports summed over all chips that * are are configured and present @@ -118,30 +143,75 @@ static int ipath_get_base_info(struct ip * page_address() macro worked, but in 2.6.11, even that returns the * full 64 bit address (upper bits all 1's). So far, using the * physical addresses (or chip offsets, for chip mapping) works, but - * no doubt some future kernel release will chang that, and we'll be - * on to yet another method of dealing with this + * no doubt some future kernel release will change that, and we'll be + * on to yet another method of dealing with this. */ kinfo->spi_rcvhdr_base = (u64) pd->port_rcvhdrq_phys; - kinfo->spi_rcvhdr_tailaddr = (u64)pd->port_rcvhdrqtailaddr_phys; + kinfo->spi_rcvhdr_tailaddr = (u64) pd->port_rcvhdrqtailaddr_phys; kinfo->spi_rcv_egrbufs = (u64) pd->port_rcvegr_phys; kinfo->spi_pioavailaddr = (u64) dd->ipath_pioavailregs_phys; kinfo->spi_status = (u64) kinfo->spi_pioavailaddr + (void *) dd->ipath_statusp - (void *) dd->ipath_pioavailregs_dma; - kinfo->spi_piobufbase = (u64) pd->port_piobufs; - kinfo->__spi_uregbase = - dd->ipath_uregbase + dd->ipath_palign * pd->port_port; - - kinfo->spi_pioindex = dd->ipath_pbufsport * (pd->port_port - 1); - kinfo->spi_piocnt = dd->ipath_pbufsport; + if (!shared) { + kinfo->spi_piocnt = dd->ipath_pbufsport; + kinfo->spi_piobufbase = (u64) pd->port_piobufs; + kinfo->__spi_uregbase = (u64) dd->ipath_uregbase + + dd->ipath_palign * pd->port_port; + } else if (master) { + kinfo->spi_piocnt = (dd->ipath_pbufsport / subport_cnt) + + (dd->ipath_pbufsport % subport_cnt); + /* Master's PIO buffers are after all the slave's */ + kinfo->spi_piobufbase = (u64) pd->port_piobufs + + dd->ipath_palign * + (dd->ipath_pbufsport - kinfo->spi_piocnt); + kinfo->__spi_uregbase = (u64) dd->ipath_uregbase + + dd->ipath_palign * pd->port_port; + } else { + unsigned slave = subport_fp(fp) - 1; + + kinfo->spi_piocnt = dd->ipath_pbufsport / subport_cnt; + kinfo->spi_piobufbase = (u64) pd->port_piobufs + + dd->ipath_palign * kinfo->spi_piocnt * slave; + kinfo->__spi_uregbase = ((u64) pd->subport_uregbase + + PAGE_SIZE * slave) & MMAP64_MASK; + + kinfo->spi_rcvhdr_base = ((u64) pd->subport_rcvhdr_base + + pd->port_rcvhdrq_size * slave) & MMAP64_MASK; + kinfo->spi_rcvhdr_tailaddr = + (u64) pd->port_rcvhdrqtailaddr_phys & MMAP64_MASK; + kinfo->spi_rcv_egrbufs = ((u64) pd->subport_rcvegrbuf + + dd->ipath_rcvegrcnt * dd->ipath_rcvegrbufsize * slave) & + MMAP64_MASK; + } + + kinfo->spi_pioindex = (kinfo->spi_piobufbase - dd->ipath_piobufbase) / + dd->ipath_palign; kinfo->spi_pioalign = dd->ipath_palign; kinfo->spi_qpair = IPATH_KD_QP; kinfo->spi_piosize = dd->ipath_ibmaxlen; kinfo->spi_mtu = dd->ipath_ibmaxlen; /* maxlen, not ibmtu */ kinfo->spi_port = pd->port_port; + kinfo->spi_subport = subport_fp(fp); kinfo->spi_sw_version = IPATH_KERN_SWVERSION; kinfo->spi_hw_version = dd->ipath_revision; + + if (master) { + kinfo->spi_runtime_flags |= IPATH_RUNTIME_MASTER; + kinfo->spi_subport_uregbase = + (u64) pd->subport_uregbase & MMAP64_MASK; + kinfo->spi_subport_rcvegrbuf = + (u64) pd->subport_rcvegrbuf & MMAP64_MASK; + kinfo->spi_subport_rcvhdr_base = + (u64) pd->subport_rcvhdr_base & MMAP64_MASK; + ipath_cdbg(PROC, "port %u flags %x %llx %llx %llx\n", + kinfo->spi_port, + kinfo->spi_runtime_flags, + kinfo->spi_subport_uregbase, + kinfo->spi_subport_rcvegrbuf, + kinfo->spi_subport_rcvhdr_base); + } if (copy_to_user(ubase, kinfo, sizeof(*kinfo))) ret = -EFAULT; @@ -154,6 +224,7 @@ bail: /** * ipath_tid_update - update a port TID * @pd: the port + * @fp: the ipath device file * @ti: the TID information * * The new implementation as of Oct 2004 is that the driver assigns @@ -176,11 +247,11 @@ bail: * virtually contiguous pages, that should change to improve * performance. */ -static int ipath_tid_update(struct ipath_portdata *pd, +static int ipath_tid_update(struct ipath_portdata *pd, struct file *fp, const struct ipath_tid_info *ti) { int ret = 0, ntids; - u32 tid, porttid, cnt, i, tidcnt; + u32 tid, porttid, cnt, i, tidcnt, tidoff; u16 *tidlist; struct ipath_devdata *dd = pd->port_dd; u64 physaddr; @@ -188,6 +259,7 @@ static int ipath_tid_update(struct ipath u64 __iomem *tidbase; unsigned long tidmap[8]; struct page **pagep = NULL; + unsigned subport = subport_fp(fp); if (!dd->ipath_pageshadow) { ret = -ENOMEM; @@ -204,20 +276,34 @@ static int ipath_tid_update(struct ipath ret = -EFAULT; goto done; } - tidcnt = dd->ipath_rcvtidcnt; - if (cnt >= tidcnt) { + porttid = pd->port_port * dd->ipath_rcvtidcnt; + if (!pd->port_subport_cnt) { + tidcnt = dd->ipath_rcvtidcnt; + tid = pd->port_tidcursor; + tidoff = 0; + } else if (!subport) { + tidcnt = (dd->ipath_rcvtidcnt / pd->port_subport_cnt) + + (dd->ipath_rcvtidcnt % pd->port_subport_cnt); + tidoff = dd->ipath_rcvtidcnt - tidcnt; + porttid += tidoff; + tid = tidcursor_fp(fp); + } else { + tidcnt = dd->ipath_rcvtidcnt / pd->port_subport_cnt; + tidoff = tidcnt * (subport - 1); + porttid += tidoff; + tid = tidcursor_fp(fp); + } + if (cnt > tidcnt) { /* make sure it all fits in port_tid_pg_list */ dev_info(&dd->pcidev->dev, "Process tried to allocate %u " "TIDs, only trying max (%u)\n", cnt, tidcnt); cnt = tidcnt; } - pagep = (struct page **)pd->port_tid_pg_list; - tidlist = (u16 *) (&pagep[cnt]); + pagep = &((struct page **) pd->port_tid_pg_list)[tidoff]; + tidlist = &((u16 *) &pagep[dd->ipath_rcvtidcnt])[tidoff]; memset(tidmap, 0, sizeof(tidmap)); - tid = pd->port_tidcursor; /* before decrement; chip actual # */ - porttid = pd->port_port * tidcnt; ntids = tidcnt; tidbase = (u64 __iomem *) (((char __iomem *) dd->ipath_kregbase) + dd->ipath_rcvtidbase + @@ -274,9 +360,9 @@ static int ipath_tid_update(struct ipath ret = -ENOMEM; break; } - tidlist[i] = tid; + tidlist[i] = tid + tidoff; ipath_cdbg(VERBOSE, "Updating idx %u to TID %u, " - "vaddr %lx\n", i, tid, vaddr); + "vaddr %lx\n", i, tid + tidoff, vaddr); /* we "know" system pages and TID pages are same size */ dd->ipath_pageshadow[porttid + tid] = pagep[i]; /* @@ -341,7 +427,10 @@ static int ipath_tid_update(struct ipath } if (tid == tidcnt) tid = 0; - pd->port_tidcursor = tid; + if (!pd->port_subport_cnt) + pd->port_tidcursor = tid; + else + tidcursor_fp(fp) = tid; } done: @@ -354,6 +443,7 @@ done: /** * ipath_tid_free - free a port TID * @pd: the port + * @subport: the subport * @ti: the TID info * * right now we are unlocking one page at a time, but since @@ -367,7 +457,7 @@ done: * they pass in to us. */ -static int ipath_tid_free(struct ipath_portdata *pd, +static int ipath_tid_free(struct ipath_portdata *pd, unsigned subport, const struct ipath_tid_info *ti) { int ret = 0; @@ -388,11 +478,20 @@ static int ipath_tid_free(struct ipath_p } porttid = pd->port_port * dd->ipath_rcvtidcnt; + if (!pd->port_subport_cnt) + tidcnt = dd->ipath_rcvtidcnt; + else if (!subport) { + tidcnt = (dd->ipath_rcvtidcnt / pd->port_subport_cnt) + + (dd->ipath_rcvtidcnt % pd->port_subport_cnt); + porttid += dd->ipath_rcvtidcnt - tidcnt; + } else { + tidcnt = dd->ipath_rcvtidcnt / pd->port_subport_cnt; + porttid += tidcnt * (subport - 1); + } tidbase = (u64 __iomem *) ((char __iomem *)(dd->ipath_kregbase) + dd->ipath_rcvtidbase + porttid * sizeof(*tidbase)); - tidcnt = dd->ipath_rcvtidcnt; limit = sizeof(tidmap) * BITS_PER_BYTE; if (limit > tidcnt) /* just in case size changes in future */ @@ -581,20 +680,24 @@ bail: /** * ipath_manage_rcvq - manage a port's receive queue * @pd: the port + * @subport: the subport * @start_stop: action to carry out * * start_stop == 0 disables receive on the port, for use in queue * overflow conditions. start_stop==1 re-enables, to be used to * re-init the software copy of the head register */ -static int ipath_manage_rcvq(struct ipath_portdata *pd, int start_stop) +static int ipath_manage_rcvq(struct ipath_portdata *pd, unsigned subport, + int start_stop) { struct ipath_devdata *dd = pd->port_dd; u64 tval; - ipath_cdbg(PROC, "%sabling rcv for unit %u port %u\n", + ipath_cdbg(PROC, "%sabling rcv for unit %u port %u:%u\n", start_stop ? "en" : "dis", dd->ipath_unit, - pd->port_port); + pd->port_port, subport); + if (subport) + goto bail; /* atomically clear receive enable port. */ if (start_stop) { /* @@ -630,6 +733,7 @@ static int ipath_manage_rcvq(struct ipat tval = ipath_read_ureg32(dd, ur_rcvhdrtail, pd->port_port); } /* always; new head should be equal to new tail; see above */ +bail: return 0; } @@ -687,6 +791,36 @@ static void ipath_clean_part_key(struct } } +/* + * Initialize the port data with the receive buffer sizes + * so this can be done while the master port is locked. + * Otherwise, there is a race with a slave opening the port + * and seeing these fields uninitialized. + */ +static void init_user_egr_sizes(struct ipath_portdata *pd) +{ + struct ipath_devdata *dd = pd->port_dd; + unsigned egrperchunk, egrcnt, size; + + /* + * to avoid wasting a lot of memory, we allocate 32KB chunks of + * physically contiguous memory, advance through it until used up + * and then allocate more. Of course, we need memory to store those + * extra pointers, now. Started out with 256KB, but under heavy + * memory pressure (creating large files and then copying them over + * NFS while doing lots of MPI jobs), we hit some allocation + * failures, even though we can sleep... (2.6.10) Still get + * failures at 64K. 32K is the lowest we can go without wasting + * additional memory. + */ + size = 0x8000; + egrperchunk = size / dd->ipath_rcvegrbufsize; + egrcnt = dd->ipath_rcvegrcnt; + pd->port_rcvegrbuf_chunks = (egrcnt + egrperchunk - 1) / egrperchunk; + pd->port_rcvegrbufs_perchunk = egrperchunk; + pd->port_rcvegrbuf_size = size; +} + /** * ipath_create_user_egr - allocate eager TID buffers * @pd: the port to allocate TID buffers for @@ -702,7 +836,7 @@ static int ipath_create_user_egr(struct static int ipath_create_user_egr(struct ipath_portdata *pd) { struct ipath_devdata *dd = pd->port_dd; - unsigned e, egrcnt, alloced, egrperchunk, chunk, egrsize, egroff; + unsigned e, egrcnt, egrperchunk, chunk, egrsize, egroff; size_t size; int ret; gfp_t gfp_flags; @@ -722,31 +856,18 @@ static int ipath_create_user_egr(struct ipath_cdbg(VERBOSE, "Allocating %d egr buffers, at egrtid " "offset %x, egrsize %u\n", egrcnt, egroff, egrsize); - /* - * to avoid wasting a lot of memory, we allocate 32KB chunks of - * physically contiguous memory, advance through it until used up - * and then allocate more. Of course, we need memory to store those - * extra pointers, now. Started out with 256KB, but under heavy - * memory pressure (creating large files and then copying them over - * NFS while doing lots of MPI jobs), we hit some allocation - * failures, even though we can sleep... (2.6.10) Still get - * failures at 64K. 32K is the lowest we can go without wasting - * additional memory. - */ - size = 0x8000; - alloced = ALIGN(egrsize * egrcnt, size); - egrperchunk = size / egrsize; - chunk = (egrcnt + egrperchunk - 1) / egrperchunk; - pd->port_rcvegrbuf_chunks = chunk; - pd->port_rcvegrbufs_perchunk = egrperchunk; - pd->port_rcvegrbuf_size = size; - pd->port_rcvegrbuf = vmalloc(chunk * sizeof(pd->port_rcvegrbuf[0])); + chunk = pd->port_rcvegrbuf_chunks; + egrperchunk = pd->port_rcvegrbufs_perchunk; + size = pd->port_rcvegrbuf_size; + pd->port_rcvegrbuf = kmalloc(chunk * sizeof(pd->port_rcvegrbuf[0]), + GFP_KERNEL); if (!pd->port_rcvegrbuf) { ret = -ENOMEM; goto bail; } pd->port_rcvegrbuf_phys = - vmalloc(chunk * sizeof(pd->port_rcvegrbuf_phys[0])); + kmalloc(chunk * sizeof(pd->port_rcvegrbuf_phys[0]), + GFP_KERNEL); if (!pd->port_rcvegrbuf_phys) { ret = -ENOMEM; goto bail_rcvegrbuf; @@ -791,94 +912,12 @@ bail_rcvegrbuf_phys: pd->port_rcvegrbuf_phys[e]); } - vfree(pd->port_rcvegrbuf_phys); + kfree(pd->port_rcvegrbuf_phys); pd->port_rcvegrbuf_phys = NULL; bail_rcvegrbuf: - vfree(pd->port_rcvegrbuf); + kfree(pd->port_rcvegrbuf); pd->port_rcvegrbuf = NULL; bail: - return ret; -} - -static int ipath_do_user_init(struct ipath_portdata *pd, - const struct ipath_user_info *uinfo) -{ - int ret = 0; - struct ipath_devdata *dd = pd->port_dd; - u32 head32; - - /* for now, if major version is different, bail */ - if ((uinfo->spu_userversion >> 16) != IPATH_USER_SWMAJOR) { - dev_info(&dd->pcidev->dev, - "User major version %d not same as driver " - "major %d\n", uinfo->spu_userversion >> 16, - IPATH_USER_SWMAJOR); - ret = -ENODEV; - goto done; - } - - if ((uinfo->spu_userversion & 0xffff) != IPATH_USER_SWMINOR) - ipath_dbg("User minor version %d not same as driver " - "minor %d\n", uinfo->spu_userversion & 0xffff, - IPATH_USER_SWMINOR); - - if (uinfo->spu_rcvhdrsize) { - ret = ipath_setrcvhdrsize(dd, uinfo->spu_rcvhdrsize); - if (ret) - goto done; - } - - /* for now we do nothing with rcvhdrcnt: uinfo->spu_rcvhdrcnt */ - - /* for right now, kernel piobufs are at end, so port 1 is at 0 */ - pd->port_piobufs = dd->ipath_piobufbase + - dd->ipath_pbufsport * (pd->port_port - - 1) * dd->ipath_palign; - ipath_cdbg(VERBOSE, "Set base of piobufs for port %u to 0x%x\n", - pd->port_port, pd->port_piobufs); - - /* - * Now allocate the rcvhdr Q and eager TIDs; skip the TID - * array for time being. If pd->port_port > chip-supported, - * we need to do extra stuff here to handle by handling overflow - * through port 0, someday - */ - ret = ipath_create_rcvhdrq(dd, pd); - if (!ret) - ret = ipath_create_user_egr(pd); - if (ret) - goto done; - - /* - * set the eager head register for this port to the current values - * of the tail pointers, since we don't know if they were - * updated on last use of the port. - */ - head32 = ipath_read_ureg32(dd, ur_rcvegrindextail, pd->port_port); - ipath_write_ureg(dd, ur_rcvegrindexhead, head32, pd->port_port); - dd->ipath_lastegrheads[pd->port_port] = -1; - dd->ipath_lastrcvhdrqtails[pd->port_port] = -1; - ipath_cdbg(VERBOSE, "Wrote port%d egrhead %x from tail regs\n", - pd->port_port, head32); - pd->port_tidcursor = 0; /* start at beginning after open */ - /* - * now enable the port; the tail registers will be written to memory - * by the chip as soon as it sees the write to - * dd->ipath_kregs->kr_rcvctrl. The update only happens on - * transition from 0 to 1, so clear it first, then set it as part of - * enabling the port. This will (very briefly) affect any other - * open ports, but it shouldn't be long enough to be an issue. - * We explictly set the in-memory copy to 0 beforehand, so we don't - * have to wait to be sure the DMA update has happened. - */ - *pd->port_rcvhdrtail_kvaddr = 0ULL; - set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port, - &dd->ipath_rcvctrl); - ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, - dd->ipath_rcvctrl & ~INFINIPATH_R_TAILUPD); - ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, - dd->ipath_rcvctrl); -done: return ret; } @@ -957,7 +996,8 @@ static int mmap_ureg(struct vm_area_stru static int mmap_piobufs(struct vm_area_struct *vma, struct ipath_devdata *dd, - struct ipath_portdata *pd) + struct ipath_portdata *pd, + unsigned piobufs, unsigned piocnt) { unsigned long phys; int ret; @@ -968,16 +1008,15 @@ static int mmap_piobufs(struct vm_area_s * process data, and catches users who might try to read the i/o * space due to a bug. */ - if ((vma->vm_end - vma->vm_start) > - (dd->ipath_pbufsport * dd->ipath_palign)) { + if ((vma->vm_end - vma->vm_start) > (piocnt * dd->ipath_palign)) { dev_info(&dd->pcidev->dev, "FAIL mmap piobufs: " "reqlen %lx > PAGE\n", vma->vm_end - vma->vm_start); - ret = -EFAULT; - goto bail; - } - - phys = dd->ipath_physaddr + pd->port_piobufs; + ret = -EINVAL; + goto bail; + } + + phys = dd->ipath_physaddr + piobufs; /* * Don't mark this as non-cached, or we don't get the @@ -1021,7 +1060,7 @@ static int mmap_rcvegrbufs(struct vm_are "reqlen %lx > actual %lx\n", vma->vm_end - vma->vm_start, (unsigned long) total_size); - ret = -EFAULT; + ret = -EINVAL; goto bail; } @@ -1043,6 +1082,122 @@ static int mmap_rcvegrbufs(struct vm_are if (ret < 0) goto bail; } + ret = 0; + +bail: + return ret; +} + +/* + * ipath_file_vma_nopage - handle a VMA page fault. + */ +static struct page *ipath_file_vma_nopage(struct vm_area_struct *vma, + unsigned long address, int *type) +{ + unsigned long offset = address - vma->vm_start; + struct page *page = NOPAGE_SIGBUS; + void *pageptr; + + /* + * Convert the vmalloc address into a struct page. + */ + pageptr = (void *)(offset + (vma->vm_pgoff << PAGE_SHIFT)); + page = vmalloc_to_page(pageptr); + if (!page) + goto out; + + /* Increment the reference count. */ + get_page(page); + if (type) + *type = VM_FAULT_MINOR; +out: + return page; +} + +static struct vm_operations_struct ipath_file_vm_ops = { + .nopage = ipath_file_vma_nopage, +}; + +static int mmap_kvaddr(struct vm_area_struct *vma, u64 pgaddr, + struct ipath_portdata *pd, unsigned subport) +{ + unsigned long len; + struct ipath_devdata *dd; + void *addr; + size_t size; + int ret; + + /* If the port is not shared, all addresses should be physical */ + if (!pd->port_subport_cnt) { + ret = -EINVAL; + goto bail; + } + + dd = pd->port_dd; + size = pd->port_rcvegrbuf_chunks * pd->port_rcvegrbuf_size; + + /* + * Master has all the slave uregbase, rcvhdrq, and + * rcvegrbufs mmapped. + */ + if (subport == 0) { + unsigned num_slaves = pd->port_subport_cnt - 1; + + if (pgaddr == ((u64) pd->subport_uregbase & MMAP64_MASK)) { + addr = pd->subport_uregbase; + size = PAGE_SIZE * num_slaves; + } else if (pgaddr == ((u64) pd->subport_rcvhdr_base & + MMAP64_MASK)) { + addr = pd->subport_rcvhdr_base; + size = pd->port_rcvhdrq_size * num_slaves; + } else if (pgaddr == ((u64) pd->subport_rcvegrbuf & + MMAP64_MASK)) { + addr = pd->subport_rcvegrbuf; + size *= num_slaves; + } else { + ret = -EINVAL; + goto bail; + } + } else if (pgaddr == (((u64) pd->subport_uregbase + + PAGE_SIZE * (subport - 1)) & MMAP64_MASK)) { + addr = pd->subport_uregbase + PAGE_SIZE * (subport - 1); + size = PAGE_SIZE; + } else if (pgaddr == (((u64) pd->subport_rcvhdr_base + + pd->port_rcvhdrq_size * (subport - 1)) & + MMAP64_MASK)) { + addr = pd->subport_rcvhdr_base + + pd->port_rcvhdrq_size * (subport - 1); + size = pd->port_rcvhdrq_size; + } else if (pgaddr == (((u64) pd->subport_rcvegrbuf + + size * (subport - 1)) & MMAP64_MASK)) { + addr = pd->subport_rcvegrbuf + size * (subport - 1); + /* rcvegrbufs are read-only on the slave */ + if (vma->vm_flags & VM_WRITE) { + dev_info(&dd->pcidev->dev, + "Can't map eager buffers as " + "writable (flags=%lx)\n", vma->vm_flags); + ret = -EPERM; + goto bail; + } + /* + * Don't allow permission to later change to writeable + * with mprotect. + */ + vma->vm_flags &= ~VM_MAYWRITE; + } else { + ret = -EINVAL; + goto bail; + } + len = vma->vm_end - vma->vm_start; + if (len > size) { + ipath_cdbg(MM, "FAIL: reqlen %lx > %zx\n", len, size); + ret = -EINVAL; + goto bail; + } + + vma->vm_pgoff = (unsigned long) addr >> PAGE_SHIFT; + vma->vm_ops = &ipath_file_vm_ops; + vma->vm_flags |= VM_RESERVED | VM_DONTEXPAND; ret = 0; bail: @@ -1064,73 +1219,99 @@ static int ipath_mmap(struct file *fp, s struct ipath_portdata *pd; struct ipath_devdata *dd; u64 pgaddr, ureg; + unsigned piobufs, piocnt; int ret; pd = port_fp(fp); + if (!pd) { + ret = -EINVAL; + goto bail; + } dd = pd->port_dd; /* * This is the ipath_do_user_init() code, mapping the shared buffers * into the user process. The address referred to by vm_pgoff is the - * virtual, not physical, address; we only do one mmap for each - * space mapped. + * file offset passed via mmap(). For shared ports, this is the + * kernel vmalloc() address of the pages to share with the master. + * For non-shared or master ports, this is a physical address. + * We only do one mmap for each space mapped. */ pgaddr = vma->vm_pgoff << PAGE_SHIFT; /* - * Must fit in 40 bits for our hardware; some checked elsewhere, - * but we'll be paranoid. Check for 0 is mostly in case one of the - * allocations failed, but user called mmap anyway. We want to catch - * that before it can match. + * Check for 0 in case one of the allocations failed, but user + * called mmap anyway. */ - if (!pgaddr || pgaddr >= (1ULL<<40)) { - ipath_dev_err(dd, "Bad phys addr %llx, start %lx, end %lx\n", - (unsigned long long)pgaddr, vma->vm_start, vma->vm_end); - return -EINVAL; - } - - /* just the offset of the port user registers, not physical addr */ - ureg = dd->ipath_uregbase + dd->ipath_palign * pd->port_port; - - ipath_cdbg(MM, "ushare: pgaddr %llx vm_start=%lx, vmlen %lx\n", + if (!pgaddr) { + ret = -EINVAL; + goto bail; + } + + ipath_cdbg(MM, "pgaddr %llx vm_start=%lx len %lx port %u:%u:%u\n", (unsigned long long) pgaddr, vma->vm_start, - vma->vm_end - vma->vm_start); - - if (vma->vm_start & (PAGE_SIZE-1)) { - ipath_dev_err(dd, - "vm_start not aligned: %lx, end=%lx phys %lx\n", - vma->vm_start, vma->vm_end, (unsigned long)pgaddr); + vma->vm_end - vma->vm_start, dd->ipath_unit, + pd->port_port, subport_fp(fp)); + + /* + * Physical addresses must fit in 40 bits for our hardware. + * Check for kernel virtual addresses first, anything else must + * match a HW or memory address. + */ + if (pgaddr >= (1ULL<<40)) { + ret = mmap_kvaddr(vma, pgaddr, pd, subport_fp(fp)); + goto bail; + } + + if (!pd->port_subport_cnt) { + /* port is not shared */ + ureg = dd->ipath_uregbase + dd->ipath_palign * pd->port_port; + piocnt = dd->ipath_pbufsport; + piobufs = pd->port_piobufs; + } else if (!subport_fp(fp)) { + /* caller is the master */ + ureg = dd->ipath_uregbase + dd->ipath_palign * pd->port_port; + piocnt = (dd->ipath_pbufsport / pd->port_subport_cnt) + + (dd->ipath_pbufsport % pd->port_subport_cnt); + piobufs = pd->port_piobufs + + dd->ipath_palign * (dd->ipath_pbufsport - piocnt); + } else { + unsigned slave = subport_fp(fp) - 1; + + /* caller is a slave */ + ureg = 0; + piocnt = dd->ipath_pbufsport / pd->port_subport_cnt; + piobufs = pd->port_piobufs + dd->ipath_palign * piocnt * slave; + } + + if (pgaddr == ureg) + ret = mmap_ureg(vma, dd, ureg); + else if (pgaddr == piobufs) + ret = mmap_piobufs(vma, dd, pd, piobufs, piocnt); + else if (pgaddr == dd->ipath_pioavailregs_phys) + /* in-memory copy of pioavail registers */ + ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0, + dd->ipath_pioavailregs_phys, + "pioavail registers"); + else if (subport_fp(fp)) + /* Subports don't mmap the physical receive buffers */ ret = -EINVAL; - } - else if (pgaddr == ureg) - ret = mmap_ureg(vma, dd, ureg); - else if (pgaddr == pd->port_piobufs) - ret = mmap_piobufs(vma, dd, pd); - else if (pgaddr == (u64) pd->port_rcvegr_phys) + else if (pgaddr == pd->port_rcvegr_phys) ret = mmap_rcvegrbufs(vma, pd); - else if (pgaddr == (u64) pd->port_rcvhdrq_phys) { + else if (pgaddr == (u64) pd->port_rcvhdrq_phys) /* * The rcvhdrq itself; readonly except on HT (so have * to allow writable mapping), multiple pages, contiguous * from an i/o perspective. */ - unsigned total_size = - ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize - * sizeof(u32), PAGE_SIZE); - ret = ipath_mmap_mem(vma, pd, total_size, 1, + ret = ipath_mmap_mem(vma, pd, pd->port_rcvhdrq_size, 1, pd->port_rcvhdrq_phys, "rcvhdrq"); - } - else if (pgaddr == (u64)pd->port_rcvhdrqtailaddr_phys) + else if (pgaddr == (u64) pd->port_rcvhdrqtailaddr_phys) /* in-memory copy of rcvhdrq tail register */ ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0, pd->port_rcvhdrqtailaddr_phys, "rcvhdrq tail"); - else if (pgaddr == dd->ipath_pioavailregs_phys) - /* in-memory copy of pioavail registers */ - ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0, - dd->ipath_pioavailregs_phys, - "pioavail registers"); else ret = -EINVAL; @@ -1138,9 +1319,10 @@ static int ipath_mmap(struct file *fp, s if (ret < 0) dev_info(&dd->pcidev->dev, - "Failure %d on addr %lx, off %lx\n", - -ret, vma->vm_start, vma->vm_pgoff); - + "Failure %d on off %llx len %lx\n", + -ret, (unsigned long long)pgaddr, + vma->vm_end - vma->vm_start); +bail: return ret; } @@ -1154,6 +1336,8 @@ static unsigned int ipath_poll(struct fi struct ipath_devdata *dd; pd = port_fp(fp); + if (!pd) + goto bail; dd = pd->port_dd; bit = pd->port_port + INFINIPATH_R_INTRAVAIL_SHIFT; @@ -1176,7 +1360,7 @@ static unsigned int ipath_poll(struct fi if (tail == head) { set_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag); - if(dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */ + if (dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */ (void)ipath_write_ureg(dd, ur_rcvhdrhead, dd->ipath_rhdrhead_intr_off | head, pd->port_port); @@ -1200,18 +1384,80 @@ static unsigned int ipath_poll(struct fi ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl); +bail: return pollflag; } +static int init_subports(struct ipath_devdata *dd, + struct ipath_portdata *pd, + const struct ipath_user_info *uinfo) +{ + int ret = 0; + unsigned num_slaves; + size_t size; + + /* Old user binaries don't know about subports */ + if ((uinfo->spu_userversion & 0xffff) != IPATH_USER_SWMINOR) + goto bail; + /* + * If the user is requesting zero or one port, + * skip the subport allocation. + */ + if (uinfo->spu_subport_cnt <= 1) + goto bail; + if (uinfo->spu_subport_cnt > 4) { + ret = -EINVAL; + goto bail; + } + + num_slaves = uinfo->spu_subport_cnt - 1; + pd->subport_uregbase = vmalloc(PAGE_SIZE * num_slaves); + if (!pd->subport_uregbase) { + ret = -ENOMEM; + goto bail; + } + /* Note: pd->port_rcvhdrq_size isn't initialized yet. */ + size = ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize * + sizeof(u32), PAGE_SIZE) * num_slaves; + pd->subport_rcvhdr_base = vmalloc(size); + if (!pd->subport_rcvhdr_base) { + ret = -ENOMEM; + goto bail_ureg; + } + + pd->subport_rcvegrbuf = vmalloc(pd->port_rcvegrbuf_chunks * + pd->port_rcvegrbuf_size * + num_slaves); + if (!pd->subport_rcvegrbuf) { + ret = -ENOMEM; + goto bail_rhdr; + } + + pd->port_subport_cnt = uinfo->spu_subport_cnt; + pd->port_subport_id = uinfo->spu_subport_id; + pd->active_slaves = 1; + goto bail; + +bail_rhdr: + vfree(pd->subport_rcvhdr_base); +bail_ureg: + vfree(pd->subport_uregbase); + pd->subport_uregbase = NULL; +bail: + return ret; +} + static int try_alloc_port(struct ipath_devdata *dd, int port, - struct file *fp) -{ + struct file *fp, + const struct ipath_user_info *uinfo) +{ + struct ipath_portdata *pd; int ret; - if (!dd->ipath_pd[port]) { - void *p, *ptmp; - - p = kzalloc(sizeof(struct ipath_portdata), GFP_KERNEL); + if (!(pd = dd->ipath_pd[port])) { + void *ptmp; + + pd = kzalloc(sizeof(struct ipath_portdata), GFP_KERNEL); /* * Allocate memory for use in ipath_tid_update() just once @@ -1221,34 +1467,36 @@ static int try_alloc_port(struct ipath_d ptmp = kmalloc(dd->ipath_rcvtidcnt * sizeof(u16) + dd->ipath_rcvtidcnt * sizeof(struct page **), GFP_KERNEL); - if (!p || !ptmp) { + if (!pd || !ptmp) { ipath_dev_err(dd, "Unable to allocate portdata " "memory, failing open\n"); ret = -ENOMEM; - kfree(p); + kfree(pd); kfree(ptmp); goto bail; } - dd->ipath_pd[port] = p; + dd->ipath_pd[port] = pd; dd->ipath_pd[port]->port_port = port; dd->ipath_pd[port]->port_dd = dd; dd->ipath_pd[port]->port_tid_pg_list = ptmp; init_waitqueue_head(&dd->ipath_pd[port]->port_wait); } - if (!dd->ipath_pd[port]->port_cnt) { - dd->ipath_pd[port]->port_cnt = 1; - fp->private_data = (void *) dd->ipath_pd[port]; + if (!pd->port_cnt) { + pd->userversion = uinfo->spu_userversion; + init_user_egr_sizes(pd); + if ((ret = init_subports(dd, pd, uinfo)) != 0) + goto bail; ipath_cdbg(PROC, "%s[%u] opened unit:port %u:%u\n", current->comm, current->pid, dd->ipath_unit, port); - dd->ipath_pd[port]->port_pid = current->pid; - strncpy(dd->ipath_pd[port]->port_comm, current->comm, - sizeof(dd->ipath_pd[port]->port_comm)); + pd->port_cnt = 1; + port_fp(fp) = pd; + pd->port_pid = current->pid; + strncpy(pd->port_comm, current->comm, sizeof(pd->port_comm)); ipath_stats.sps_ports++; ret = 0; - goto bail; - } - ret = -EBUSY; + } else + ret = -EBUSY; bail: return ret; @@ -1264,7 +1512,8 @@ static inline int usable(struct ipath_de | IPATH_LINKUNK)); } -static int find_free_port(int unit, struct file *fp) +static int find_free_port(int unit, struct file *fp, + const struct ipath_user_info *uinfo) { struct ipath_devdata *dd = ipath_lookup(unit); int ret, i; @@ -1279,8 +1528,8 @@ static int find_free_port(int unit, stru goto bail; } - for (i = 0; i < dd->ipath_cfgports; i++) { - ret = try_alloc_port(dd, i, fp); + for (i = 1; i < dd->ipath_cfgports; i++) { + ret = try_alloc_port(dd, i, fp, uinfo); if (ret != -EBUSY) goto bail; } @@ -1290,13 +1539,14 @@ bail: return ret; } -static int find_best_unit(struct file *fp) +static int find_best_unit(struct file *fp, + const struct ipath_user_info *uinfo) { int ret = 0, i, prefunit = -1, devmax; int maxofallports, npresent, nup; int ndev; - (void) ipath_count_units(&npresent, &nup, &maxofallports); + devmax = ipath_count_units(&npresent, &nup, &maxofallports); /* * This code is present to allow a knowledgeable person to @@ -1343,8 +1593,6 @@ static int find_best_unit(struct file *f if (prefunit != -1) devmax = prefunit + 1; - else - devmax = ipath_count_units(NULL, NULL, NULL); recheck: for (i = 1; i < maxofallports; i++) { for (ndev = prefunit != -1 ? prefunit : 0; ndev < devmax; @@ -1359,7 +1607,7 @@ recheck: * next. */ continue; - ret = try_alloc_port(dd, i, fp); + ret = try_alloc_port(dd, i, fp, uinfo); if (!ret) goto done; } @@ -1395,22 +1643,174 @@ done: return ret; } +static int find_shared_port(struct file *fp, + const struct ipath_user_info *uinfo) +{ + int devmax, ndev, i; + int ret = 0; + + devmax = ipath_count_units(NULL, NULL, NULL); + + for (ndev = 0; ndev < devmax; ndev++) { + struct ipath_devdata *dd = ipath_lookup(ndev); + + if (!dd) + continue; + for (i = 1; i < dd->ipath_cfgports; i++) { + struct ipath_portdata *pd = dd->ipath_pd[i]; + + /* Skip ports which are not yet open */ + if (!pd || !pd->port_cnt) + continue; + /* Skip port if it doesn't match the requested one */ + if (pd->port_subport_id != uinfo->spu_subport_id) + continue; + /* Verify the sharing process matches the master */ + if (pd->port_subport_cnt != uinfo->spu_subport_cnt || + pd->userversion != uinfo->spu_userversion || + pd->port_cnt >= pd->port_subport_cnt) { + ret = -EINVAL; + goto done; + } + port_fp(fp) = pd; + subport_fp(fp) = pd->port_cnt++; + tidcursor_fp(fp) = 0; + pd->active_slaves |= 1 << subport_fp(fp); + ipath_cdbg(PROC, + "%s[%u] %u sharing %s[%u] unit:port %u:%u\n", + current->comm, current->pid, + subport_fp(fp), + pd->port_comm, pd->port_pid, + dd->ipath_unit, pd->port_port); + ret = 1; + goto done; + } + } + +done: + return ret; +} + static int ipath_open(struct inode *in, struct file *fp) { - int ret, user_minor; + /* The real work is performed later in ipath_do_user_init() */ + fp->private_data = kzalloc(sizeof(struct ipath_filedata), GFP_KERNEL); + return fp->private_data ? 0 : -ENOMEM; +} + +static int ipath_do_user_init(struct file *fp, + const struct ipath_user_info *uinfo) +{ + int ret; + struct ipath_portdata *pd; + struct ipath_devdata *dd; + u32 head32; + int i_minor; + unsigned swminor; + + /* Check to be sure we haven't already initialized this file */ + if (port_fp(fp)) { + ret = -EINVAL; + goto done; + } + + /* for now, if major version is different, bail */ + if ((uinfo->spu_userversion >> 16) != IPATH_USER_SWMAJOR) { + ipath_dbg("User major version %d not same as driver " + "major %d\n", uinfo->spu_userversion >> 16, + IPATH_USER_SWMAJOR); + ret = -ENODEV; + goto done; + } + + swminor = uinfo->spu_userversion & 0xffff; + if (swminor != IPATH_USER_SWMINOR) + ipath_dbg("User minor version %d not same as driver " + "minor %d\n", swminor, IPATH_USER_SWMINOR); mutex_lock(&ipath_mutex); - user_minor = iminor(in) - IPATH_USER_MINOR_BASE; + if (swminor == IPATH_USER_SWMINOR && uinfo->spu_subport_cnt && + (ret = find_shared_port(fp, uinfo))) { + mutex_unlock(&ipath_mutex); + if (ret > 0) + ret = 0; + goto done; + } + + i_minor = iminor(fp->f_dentry->d_inode) - IPATH_USER_MINOR_BASE; ipath_cdbg(VERBOSE, "open on dev %lx (minor %d)\n", - (long)in->i_rdev, user_minor); - - if (user_minor) - ret = find_free_port(user_minor - 1, fp); + (long)fp->f_dentry->d_inode->i_rdev, i_minor); + + if (i_minor) + ret = find_free_port(i_minor - 1, fp, uinfo); else - ret = find_best_unit(fp); + ret = find_best_unit(fp, uinfo); mutex_unlock(&ipath_mutex); + + if (ret) + goto done; + + pd = port_fp(fp); + dd = pd->port_dd; + + if (uinfo->spu_rcvhdrsize) { + ret = ipath_setrcvhdrsize(dd, uinfo->spu_rcvhdrsize); + if (ret) + goto done; + } + + /* for now we do nothing with rcvhdrcnt: uinfo->spu_rcvhdrcnt */ + + /* for right now, kernel piobufs are at end, so port 1 is at 0 */ + pd->port_piobufs = dd->ipath_piobufbase + + dd->ipath_pbufsport * (pd->port_port - 1) * dd->ipath_palign; + ipath_cdbg(VERBOSE, "Set base of piobufs for port %u to 0x%x\n", + pd->port_port, pd->port_piobufs); + + /* + * Now allocate the rcvhdr Q and eager TIDs; skip the TID + * array for time being. If pd->port_port > chip-supported, + * we need to do extra stuff here to handle by handling overflow + * through port 0, someday + */ + ret = ipath_create_rcvhdrq(dd, pd); + if (!ret) + ret = ipath_create_user_egr(pd); + if (ret) + goto done; + + /* + * set the eager head register for this port to the current values + * of the tail pointers, since we don't know if they were + * updated on last use of the port. + */ + head32 = ipath_read_ureg32(dd, ur_rcvegrindextail, pd->port_port); + ipath_write_ureg(dd, ur_rcvegrindexhead, head32, pd->port_port); + dd->ipath_lastegrheads[pd->port_port] = -1; + dd->ipath_lastrcvhdrqtails[pd->port_port] = -1; + ipath_cdbg(VERBOSE, "Wrote port%d egrhead %x from tail regs\n", + pd->port_port, head32); + pd->port_tidcursor = 0; /* start at beginning after open */ + /* + * now enable the port; the tail registers will be written to memory + * by the chip as soon as it sees the write to + * dd->ipath_kregs->kr_rcvctrl. The update only happens on + * transition from 0 to 1, so clear it first, then set it as part of + * enabling the port. This will (very briefly) affect any other + * open ports, but it shouldn't be long enough to be an issue. + * We explictly set the in-memory copy to 0 beforehand, so we don't + * have to wait to be sure the DMA update has happened. + */ + *pd->port_rcvhdrtail_kvaddr = 0ULL; + set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port, + &dd->ipath_rcvctrl); + ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, + dd->ipath_rcvctrl & ~INFINIPATH_R_TAILUPD); + ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, + dd->ipath_rcvctrl); +done: return ret; } @@ -1453,6 +1853,7 @@ static int ipath_close(struct inode *in, static int ipath_close(struct inode *in, struct file *fp) { int ret = 0; + struct ipath_filedata *fd; struct ipath_portdata *pd; struct ipath_devdata *dd; unsigned port; @@ -1462,9 +1863,24 @@ static int ipath_close(struct inode *in, mutex_lock(&ipath_mutex); - pd = port_fp(fp); + fd = (struct ipath_filedata *) fp->private_data; + fp->private_data = NULL; + pd = fd->pd; + if (!pd) { + mutex_unlock(&ipath_mutex); + goto bail; + } + if (--pd->port_cnt) { + /* + * XXX If the master closes the port before the slave(s), + * revoke the mmap for the eager receive queue so + * the slave(s) don't wait for receive data forever. + */ + pd->active_slaves &= ~(1 << fd->subport); + mutex_unlock(&ipath_mutex); + goto bail; + } port = pd->port_port; - fp->private_data = NULL; dd = pd->port_dd; if (pd->port_hdrqfull) { @@ -1503,8 +1919,6 @@ static int ipath_close(struct inode *in, /* clean up the pkeys for this port user */ ipath_clean_part_key(pd, dd); - - /* * be paranoid, and never write 0's to these, just use an * unused part of the port 0 tail page. Of course, @@ -1533,35 +1947,55 @@ static int ipath_close(struct inode *in, dd->ipath_f_clear_tids(dd, pd->port_port); } - pd->port_cnt = 0; pd->port_pid = 0; - dd->ipath_pd[pd->port_port] = NULL; /* before releasing mutex */ mutex_unlock(&ipath_mutex); ipath_free_pddata(dd, pd); /* after releasing the mutex */ - return ret; -} - -static int ipath_port_info(struct ipath_portdata *pd, +bail: + kfree(fd); + return ret; +} + +static int ipath_port_info(struct ipath_portdata *pd, u16 subport, struct ipath_port_info __user *uinfo) { struct ipath_port_info info; int nup; int ret; + size_t sz; (void) ipath_count_units(NULL, &nup, NULL); info.num_active = nup; info.unit = pd->port_dd->ipath_unit; info.port = pd->port_port; - - if (copy_to_user(uinfo, &info, sizeof(info))) { + info.subport = subport; + /* Don't return new fields if old library opened the port. */ + if ((pd->userversion & 0xffff) == IPATH_USER_SWMINOR) { + /* Number of user ports available for this device. */ + info.num_ports = pd->port_dd->ipath_cfgports - 1; + info.num_subports = pd->port_subport_cnt; + sz = sizeof(info); + } else + sz = sizeof(info) - 2 * sizeof(u16); + + if (copy_to_user(uinfo, &info, sz)) { ret = -EFAULT; goto bail; } ret = 0; bail: + return ret; +} + +static int ipath_get_slave_info(struct ipath_portdata *pd, + void __user *slave_mask_addr) +{ + int ret = 0; + + if (copy_to_user(slave_mask_addr, &pd->active_slaves, sizeof(u32))) + ret = -EFAULT; return ret; } @@ -1617,6 +2051,11 @@ static ssize_t ipath_write(struct file * dest = &cmd.cmd.part_key; src = &ucmd->cmd.part_key; break; + case IPATH_CMD_SLAVE_INFO: + copy = sizeof(cmd.cmd.slave_mask_addr); + dest = &cmd.cmd.slave_mask_addr; + src = &ucmd->cmd.slave_mask_addr; + break; default: ret = -EINVAL; goto bail; @@ -1634,33 +2073,42 @@ static ssize_t ipath_write(struct file * consumed += copy; pd = port_fp(fp); + if (!pd && cmd.type != IPATH_CMD_USER_INIT) { + ret = -EINVAL; + goto bail; + } switch (cmd.type) { case IPATH_CMD_USER_INIT: - ret = ipath_do_user_init(pd, &cmd.cmd.user_info); - if (ret < 0) + ret = ipath_do_user_init(fp, &cmd.cmd.user_info); + if (ret) goto bail; ret = ipath_get_base_info( - pd, (void __user *) (unsigned long) + fp, (void __user *) (unsigned long) cmd.cmd.user_info.spu_base_info, cmd.cmd.user_info.spu_base_info_size); break; case IPATH_CMD_RECV_CTRL: - ret = ipath_manage_rcvq(pd, cmd.cmd.recv_ctrl); + ret = ipath_manage_rcvq(pd, subport_fp(fp), cmd.cmd.recv_ctrl); break; case IPATH_CMD_PORT_INFO: - ret = ipath_port_info(pd, + ret = ipath_port_info(pd, subport_fp(fp), (struct ipath_port_info __user *) (unsigned long) cmd.cmd.port_info); break; case IPATH_CMD_TID_UPDATE: - ret = ipath_tid_update(pd, &cmd.cmd.tid_info); + ret = ipath_tid_update(pd, fp, &cmd.cmd.tid_info); break; case IPATH_CMD_TID_FREE: - ret = ipath_tid_free(pd, &cmd.cmd.tid_info); + ret = ipath_tid_free(pd, subport_fp(fp), &cmd.cmd.tid_info); break; case IPATH_CMD_SET_PART_KEY: ret = ipath_set_part_key(pd, cmd.cmd.part_key); + break; + case IPATH_CMD_SLAVE_INFO: + ret = ipath_get_slave_info(pd, + (void __user *) (unsigned long) + cmd.cmd.slave_mask_addr); break; } @@ -1858,4 +2306,3 @@ bail: bail: return; } - diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 @@ -79,8 +79,8 @@ struct ipath_portdata { dma_addr_t port_rcvhdrq_phys; dma_addr_t port_rcvhdrqtailaddr_phys; /* - * number of opens on this instance (0 or 1; ignoring forks, dup, - * etc. for now) + * number of opens (including slave subports) on this instance + * (ignoring forks, dup, etc. for now) */ int port_cnt; /* @@ -89,6 +89,10 @@ struct ipath_portdata { */ /* instead of calculating it */ unsigned port_port; + /* non-zero if port is being shared. */ + u16 port_subport_cnt; + /* non-zero if port is being shared. */ + u16 port_subport_id; /* chip offset of PIO buffers for this port */ u32 port_piobufs; /* how many alloc_pages() chunks in port_rcvegrbuf_pages */ @@ -121,6 +125,16 @@ struct ipath_portdata { u16 port_pkeys[4]; /* so file ops can get at unit */ struct ipath_devdata *port_dd; + /* A page of memory for rcvhdrhead, rcvegrhead, rcvegrtail * N */ + void *subport_uregbase; + /* An array of pages for the eager receive buffers * N */ + void *subport_rcvegrbuf; + /* An array of pages for the eager header queue entries * N */ + void *subport_rcvhdr_base; + /* The version of the library which opened this port */ + u32 userversion; + /* Bitmask of active slaves */ + u32 active_slaves; }; struct sk_buff; @@ -512,6 +526,12 @@ struct ipath_devdata { u32 ipath_lli_errors; }; +/* Private data for file operations */ +struct ipath_filedata { + struct ipath_portdata *pd; + unsigned subport; + unsigned tidcursor; +}; extern struct list_head ipath_dev_list; extern spinlock_t ipath_devs_lock; extern struct ipath_devdata *ipath_lookup(int unit); @@ -572,7 +592,11 @@ int ipath_set_rx_pol_inv(struct ipath_de int ipath_set_rx_pol_inv(struct ipath_devdata *dd, u8 new_pol_inv); /* for use in system calls, where we want to know device type, etc. */ -#define port_fp(fp) ((struct ipath_portdata *) (fp)->private_data) +#define port_fp(fp) ((struct ipath_filedata *)(fp)->private_data)->pd +#define subport_fp(fp) \ + ((struct ipath_filedata *)(fp)->private_data)->subport +#define tidcursor_fp(fp) \ + ((struct ipath_filedata *)(fp)->private_data)->tidcursor /* * values for ipath_flags diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_sysfs.c --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Thu Sep 28 08:57:12 2006 -0700 @@ -295,6 +295,16 @@ static ssize_t show_nguid(struct device struct ipath_devdata *dd = dev_get_drvdata(dev); return scnprintf(buf, PAGE_SIZE, "%u\n", dd->ipath_nguid); +} + +static ssize_t show_nports(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct ipath_devdata *dd = dev_get_drvdata(dev); + + /* Return the number of user ports available. */ + return scnprintf(buf, PAGE_SIZE, "%u\n", dd->ipath_cfgports - 1); } static ssize_t show_serial(struct device *dev, @@ -608,6 +618,7 @@ static DEVICE_ATTR(mtu, S_IWUSR | S_IRUG static DEVICE_ATTR(mtu, S_IWUSR | S_IRUGO, show_mtu, store_mtu); static DEVICE_ATTR(enabled, S_IWUSR | S_IRUGO, show_enabled, store_enabled); static DEVICE_ATTR(nguid, S_IRUGO, show_nguid, NULL); +static DEVICE_ATTR(nports, S_IRUGO, show_nports, NULL); static DEVICE_ATTR(reset, S_IWUSR, NULL, store_reset); static DEVICE_ATTR(serial, S_IRUGO, show_serial, NULL); static DEVICE_ATTR(status, S_IRUGO, show_status, NULL); @@ -623,6 +634,7 @@ static struct attribute *dev_attributes[ &dev_attr_mlid.attr, &dev_attr_mtu.attr, &dev_attr_nguid.attr, + &dev_attr_nports.attr, &dev_attr_serial.attr, &dev_attr_status.attr, &dev_attr_status_str.attr, From bos at pathscale.com Thu Sep 28 09:00:05 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:05 -0700 Subject: [openib-general] [PATCH 9 of 28] IB/ipath - only allow complete writes to flash In-Reply-To: Message-ID: <934e5c1d6adecef606f8.1159459205@eng-12.pathscale.com> Don't allow a write to the eeprom from ipathfs unless the write is exactly 128 bytes and starts at offset 0. Signed-off-by: Bryan O'Sullivan diff -r cc3350eeb557 -r 934e5c1d6ade drivers/infiniband/hw/ipath/ipath_fs.c --- a/drivers/infiniband/hw/ipath/ipath_fs.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_fs.c Thu Sep 28 08:57:12 2006 -0700 @@ -357,18 +357,15 @@ static ssize_t flash_write(struct file * pos = *ppos; - if ( pos < 0) { + if (pos != 0) { ret = -EINVAL; goto bail; } - if (pos >= sizeof(struct ipath_flash)) { - ret = 0; - goto bail; - } - - if (count > sizeof(struct ipath_flash) - pos) - count = sizeof(struct ipath_flash) - pos; + if (count != sizeof(struct ipath_flash)) { + ret = -EINVAL; + goto bail; + } tmp = kmalloc(count, GFP_KERNEL); if (!tmp) { From bos at pathscale.com Thu Sep 28 09:00:04 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:04 -0700 Subject: [openib-general] [PATCH 8 of 28] IB/ipath - count SRQs properly In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r fcd3e3bc98d8 -r cc3350eeb557 drivers/infiniband/hw/ipath/ipath_srq.c --- a/drivers/infiniband/hw/ipath/ipath_srq.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_srq.c Thu Sep 28 08:57:12 2006 -0700 @@ -103,11 +103,6 @@ struct ib_srq *ipath_create_srq(struct i struct ipath_srq *srq; u32 sz; struct ib_srq *ret; - - if (dev->n_srqs_allocated == ib_ipath_max_srqs) { - ret = ERR_PTR(-ENOMEM); - goto done; - } if (srq_init_attr->attr.max_wr == 0) { ret = ERR_PTR(-EINVAL); @@ -180,10 +175,17 @@ struct ib_srq *ipath_create_srq(struct i spin_lock_init(&srq->rq.lock); srq->rq.wq->head = 0; srq->rq.wq->tail = 0; - srq->rq.max_sge = srq_init_attr->attr.max_sge; srq->limit = srq_init_attr->attr.srq_limit; - dev->n_srqs_allocated++; + spin_lock(&dev->n_srqs_lock); + if (dev->n_srqs_allocated == ib_ipath_max_srqs) { + spin_unlock(&dev->n_srqs_lock); + ret = ERR_PTR(-ENOMEM); + goto bail_wq; + } + + dev->n_srqs_allocated++; + spin_unlock(&dev->n_srqs_lock); ret = &srq->ibsrq; goto done; @@ -351,8 +353,13 @@ int ipath_destroy_srq(struct ib_srq *ibs struct ipath_srq *srq = to_isrq(ibsrq); struct ipath_ibdev *dev = to_idev(ibsrq->device); + spin_lock(&dev->n_srqs_lock); dev->n_srqs_allocated--; - vfree(srq->rq.wq); + spin_unlock(&dev->n_srqs_lock); + if (srq->ip) + kref_put(&srq->ip->ref, ipath_release_mmap_info); + else + vfree(srq->rq.wq); kfree(srq); return 0; From bos at pathscale.com Thu Sep 28 09:00:02 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:02 -0700 Subject: [openib-general] [PATCH 6 of 28] IB/ipath - clean up handling of GUID 0 In-Reply-To: Message-ID: <0fe847c544586f6f74d0.1159459202@eng-12.pathscale.com> Respond with an error to the SM if our GUID is 0, and don't allow the user to set our GUID to 0. Signed-off-by: Bryan O'Sullivan diff -r e2916bbf09ed -r 0fe847c54458 drivers/infiniband/hw/ipath/ipath_mad.c --- a/drivers/infiniband/hw/ipath/ipath_mad.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mad.c Thu Sep 28 08:57:12 2006 -0700 @@ -87,7 +87,8 @@ static int recv_subn_get_nodeinfo(struct struct ipath_devdata *dd = to_idev(ibdev)->dd; u32 vendor, majrev, minrev; - if (smp->attr_mod) + /* GUID 0 is illegal */ + if (smp->attr_mod || (dd->ipath_guid == 0)) smp->status |= IB_SMP_INVALID_FIELD; nip->base_version = 1; @@ -131,10 +132,15 @@ static int recv_subn_get_guidinfo(struct * We only support one GUID for now. If this changes, the * portinfo.guid_cap field needs to be updated too. */ - if (startgx == 0) - /* The first is a copy of the read-only HW GUID. */ - *p = to_idev(ibdev)->dd->ipath_guid; - else + if (startgx == 0) { + __be64 g = to_idev(ibdev)->dd->ipath_guid; + if (g == 0) + /* GUID 0 is illegal */ + smp->status |= IB_SMP_INVALID_FIELD; + else + /* The first is a copy of the read-only HW GUID. */ + *p = g; + } else smp->status |= IB_SMP_INVALID_FIELD; return reply(smp); diff -r e2916bbf09ed -r 0fe847c54458 drivers/infiniband/hw/ipath/ipath_sysfs.c --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Thu Sep 28 08:57:12 2006 -0700 @@ -257,7 +257,7 @@ static ssize_t store_guid(struct device struct ipath_devdata *dd = dev_get_drvdata(dev); ssize_t ret; unsigned short guid[8]; - __be64 nguid; + __be64 new_guid; u8 *ng; int i; @@ -266,7 +266,7 @@ static ssize_t store_guid(struct device &guid[4], &guid[5], &guid[6], &guid[7]) != 8) goto invalid; - ng = (u8 *) &nguid; + ng = (u8 *) &new_guid; for (i = 0; i < 8; i++) { if (guid[i] > 0xff) @@ -274,7 +274,10 @@ static ssize_t store_guid(struct device ng[i] = guid[i]; } - dd->ipath_guid = nguid; + if (new_guid == 0) + goto invalid; + + dd->ipath_guid = new_guid; dd->ipath_nguid = 1; ret = strlen(buf); From bos at pathscale.com Thu Sep 28 09:00:06 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:06 -0700 Subject: [openib-general] [PATCH 10 of 28] IB/ipath - RC and UC should validate SLID and DLID In-Reply-To: Message-ID: This is required for IB conformance (spec ch. 9.6.1.5). Signed-off-by: Bryan O'Sullivan diff -r 934e5c1d6ade -r f8c0eb9dc3b8 drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Thu Sep 28 08:57:12 2006 -0700 @@ -1319,6 +1319,10 @@ void ipath_rc_rcv(struct ipath_ibdev *de int diff; struct ib_reth *reth; int header_in_data; + + /* Validate the SLID. See Ch. 9.6.1.5 */ + if (unlikely(be16_to_cpu(hdr->lrh[3]) != qp->remote_ah_attr.dlid)) + goto done; /* Check for GRH */ if (!has_grh) { diff -r 934e5c1d6ade -r f8c0eb9dc3b8 drivers/infiniband/hw/ipath/ipath_uc.c --- a/drivers/infiniband/hw/ipath/ipath_uc.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_uc.c Thu Sep 28 08:57:12 2006 -0700 @@ -246,6 +246,10 @@ void ipath_uc_rcv(struct ipath_ibdev *de struct ib_reth *reth; int header_in_data; + /* Validate the SLID. See Ch. 9.6.1.5 */ + if (unlikely(be16_to_cpu(hdr->lrh[3]) != qp->remote_ah_attr.dlid)) + goto done; + /* Check for GRH */ if (!has_grh) { ohdr = &hdr->u.oth; From bos at pathscale.com Thu Sep 28 09:00:10 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:10 -0700 Subject: [openib-general] [PATCH 14 of 28] IB/ipath - Fix mismatch in shifts and masks for printing debug info In-Reply-To: Message-ID: <42f82d2c62bce5aa8ae0.1159459210@eng-12.pathscale.com> Fixed mismatch in linkstate/trainingstate shifts and masks in the IPATH_IBSTATE_MASK macro. It kept some linktrainingstates from being printed correctly in debug; no functionality issue unless I misread the code. Signed-off-by: Bryan O'Sullivan diff -r 2a328f7db58f -r 42f82d2c62bc drivers/infiniband/hw/ipath/ipath_registers.h --- a/drivers/infiniband/hw/ipath/ipath_registers.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_registers.h Thu Sep 28 08:57:12 2006 -0700 @@ -223,9 +223,9 @@ /* combination link status states that we use with some frequency */ #define IPATH_IBSTATE_MASK ((INFINIPATH_IBCS_LINKTRAININGSTATE_MASK \ - << INFINIPATH_IBCS_LINKSTATE_SHIFT) | \ + << INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) | \ (INFINIPATH_IBCS_LINKSTATE_MASK \ - < Message-ID: Signed-off-by: Bryan O'Sullivan diff -r 0fe847c54458 -r fcd3e3bc98d8 drivers/infiniband/hw/ipath/ipath_cq.c --- a/drivers/infiniband/hw/ipath/ipath_cq.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_cq.c Thu Sep 28 08:57:12 2006 -0700 @@ -174,11 +174,6 @@ struct ib_cq *ipath_create_cq(struct ib_ if (entries < 1 || entries > ib_ipath_max_cqes) { ret = ERR_PTR(-EINVAL); - goto done; - } - - if (dev->n_cqs_allocated == ib_ipath_max_cqs) { - ret = ERR_PTR(-ENOMEM); goto done; } @@ -237,6 +232,16 @@ struct ib_cq *ipath_create_cq(struct ib_ } else cq->ip = NULL; + spin_lock(&dev->n_cqs_lock); + if (dev->n_cqs_allocated == ib_ipath_max_cqs) { + spin_unlock(&dev->n_cqs_lock); + ret = ERR_PTR(-ENOMEM); + goto bail_wc; + } + + dev->n_cqs_allocated++; + spin_unlock(&dev->n_cqs_lock); + /* * ib_create_cq() will initialize cq->ibcq except for cq->ibcq.cqe. * The number of entries should be >= the number requested or return @@ -253,7 +258,6 @@ struct ib_cq *ipath_create_cq(struct ib_ ret = &cq->ibcq; - dev->n_cqs_allocated++; goto done; bail_wc: @@ -280,7 +284,9 @@ int ipath_destroy_cq(struct ib_cq *ibcq) struct ipath_cq *cq = to_icq(ibcq); tasklet_kill(&cq->comptask); + spin_lock(&dev->n_cqs_lock); dev->n_cqs_allocated--; + spin_unlock(&dev->n_cqs_lock); if (cq->ip) kref_put(&cq->ip->ref, ipath_release_mmap_info); else From bos at pathscale.com Thu Sep 28 09:00:09 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:09 -0700 Subject: [openib-general] [PATCH 13 of 28] IB/ipath - fix compiler warnings and errors on non-x86_64 systems In-Reply-To: Message-ID: <2a328f7db58fad9a19ff.1159459209@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r a7ba4b73f972 -r 2a328f7db58f drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Thu Sep 28 08:57:12 2006 -0700 @@ -206,11 +206,10 @@ static int ipath_get_base_info(struct fi kinfo->spi_subport_rcvhdr_base = (u64) pd->subport_rcvhdr_base & MMAP64_MASK; ipath_cdbg(PROC, "port %u flags %x %llx %llx %llx\n", - kinfo->spi_port, - kinfo->spi_runtime_flags, - kinfo->spi_subport_uregbase, - kinfo->spi_subport_rcvegrbuf, - kinfo->spi_subport_rcvhdr_base); + kinfo->spi_port, kinfo->spi_runtime_flags, + (unsigned long long) kinfo->spi_subport_uregbase, + (unsigned long long) kinfo->spi_subport_rcvegrbuf, + (unsigned long long) kinfo->spi_subport_rcvhdr_base); } if (copy_to_user(ubase, kinfo, sizeof(*kinfo))) From bos at pathscale.com Thu Sep 28 09:00:07 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:07 -0700 Subject: [openib-general] [PATCH 11 of 28] IB/ipath - ensure that PD of MR matches PD of QP checking the Rkey In-Reply-To: Message-ID: <4dbe5e686c780530dd04.1159459207@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_keys.c --- a/drivers/infiniband/hw/ipath/ipath_keys.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_keys.c Thu Sep 28 08:57:12 2006 -0700 @@ -118,9 +118,10 @@ void ipath_free_lkey(struct ipath_lkey_t * Check the IB SGE for validity and initialize our internal version * of it. */ -int ipath_lkey_ok(struct ipath_lkey_table *rkt, struct ipath_sge *isge, +int ipath_lkey_ok(struct ipath_qp *qp, struct ipath_sge *isge, struct ib_sge *sge, int acc) { + struct ipath_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table; struct ipath_mregion *mr; unsigned n, m; size_t off; @@ -140,7 +141,8 @@ int ipath_lkey_ok(struct ipath_lkey_tabl goto bail; } mr = rkt->table[(sge->lkey >> (32 - ib_ipath_lkey_table_size))]; - if (unlikely(mr == NULL || mr->lkey != sge->lkey)) { + if (unlikely(mr == NULL || mr->lkey != sge->lkey || + qp->ibqp.pd != mr->pd)) { ret = 0; goto bail; } @@ -188,9 +190,10 @@ bail: * * Return 1 if successful, otherwise 0. */ -int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss, +int ipath_rkey_ok(struct ipath_qp *qp, struct ipath_sge_state *ss, u32 len, u64 vaddr, u32 rkey, int acc) { + struct ipath_ibdev *dev = to_idev(qp->ibqp.device); struct ipath_lkey_table *rkt = &dev->lk_table; struct ipath_sge *sge = &ss->sge; struct ipath_mregion *mr; @@ -214,7 +217,8 @@ int ipath_rkey_ok(struct ipath_ibdev *de } mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))]; - if (unlikely(mr == NULL || mr->lkey != rkey)) { + if (unlikely(mr == NULL || mr->lkey != rkey || + qp->ibqp.pd != mr->pd)) { ret = 0; goto bail; } diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_mr.c --- a/drivers/infiniband/hw/ipath/ipath_mr.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mr.c Thu Sep 28 08:57:12 2006 -0700 @@ -138,6 +138,7 @@ struct ib_mr *ipath_reg_phys_mr(struct i goto bail; } + mr->mr.pd = pd; mr->mr.user_base = *iova_start; mr->mr.iova = *iova_start; mr->mr.length = 0; @@ -197,6 +198,7 @@ struct ib_mr *ipath_reg_user_mr(struct i goto bail; } + mr->mr.pd = pd; mr->mr.user_base = region->user_base; mr->mr.iova = region->virt_base; mr->mr.length = region->length; @@ -289,6 +291,7 @@ struct ib_fmr *ipath_alloc_fmr(struct ib * Resources are allocated but no valid mapping (RKEY can't be * used). */ + fmr->mr.pd = pd; fmr->mr.user_base = 0; fmr->mr.iova = 0; fmr->mr.length = 0; diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Thu Sep 28 08:57:12 2006 -0700 @@ -1234,7 +1234,7 @@ static inline int ipath_rc_rcv_error(str * Address range must be a subset of the original * request and start on pmtu boundaries. */ - ok = ipath_rkey_ok(dev, &qp->s_rdma_sge, + ok = ipath_rkey_ok(qp, &qp->s_rdma_sge, qp->s_rdma_len, vaddr, rkey, IB_ACCESS_REMOTE_READ); if (unlikely(!ok)) { @@ -1532,7 +1532,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de int ok; /* Check rkey & NAK */ - ok = ipath_rkey_ok(dev, &qp->r_sge, + ok = ipath_rkey_ok(qp, &qp->r_sge, qp->r_len, vaddr, rkey, IB_ACCESS_REMOTE_WRITE); if (unlikely(!ok)) @@ -1574,7 +1574,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de int ok; /* Check rkey & NAK */ - ok = ipath_rkey_ok(dev, &qp->s_rdma_sge, + ok = ipath_rkey_ok(qp, &qp->s_rdma_sge, qp->s_rdma_len, vaddr, rkey, IB_ACCESS_REMOTE_READ); if (unlikely(!ok)) { @@ -1633,7 +1633,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de goto nack_inv; rkey = be32_to_cpu(ateth->rkey); /* Check rkey & NAK */ - if (unlikely(!ipath_rkey_ok(dev, &qp->r_sge, + if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, sizeof(u64), vaddr, rkey, IB_ACCESS_REMOTE_ATOMIC))) goto nack_acc; diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_ruc.c --- a/drivers/infiniband/hw/ipath/ipath_ruc.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Thu Sep 28 08:57:12 2006 -0700 @@ -108,7 +108,6 @@ void ipath_insert_rnr_queue(struct ipath static int init_sge(struct ipath_qp *qp, struct ipath_rwqe *wqe) { - struct ipath_ibdev *dev = to_idev(qp->ibqp.device); int user = to_ipd(qp->ibqp.pd)->user; int i, j, ret; struct ib_wc wc; @@ -119,8 +118,7 @@ static int init_sge(struct ipath_qp *qp, continue; /* Check LKEY */ if ((user && wqe->sg_list[i].lkey == 0) || - !ipath_lkey_ok(&dev->lk_table, - &qp->r_sg_list[j], &wqe->sg_list[i], + !ipath_lkey_ok(qp, &qp->r_sg_list[j], &wqe->sg_list[i], IB_ACCESS_LOCAL_WRITE)) goto bad_lkey; qp->r_len += wqe->sg_list[i].length; @@ -326,7 +324,7 @@ again: case IB_WR_RDMA_WRITE: if (wqe->length == 0) break; - if (unlikely(!ipath_rkey_ok(dev, &qp->r_sge, wqe->length, + if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, wqe->length, wqe->wr.wr.rdma.remote_addr, wqe->wr.wr.rdma.rkey, IB_ACCESS_REMOTE_WRITE))) { @@ -350,7 +348,7 @@ again: break; case IB_WR_RDMA_READ: - if (unlikely(!ipath_rkey_ok(dev, &sqp->s_sge, wqe->length, + if (unlikely(!ipath_rkey_ok(qp, &sqp->s_sge, wqe->length, wqe->wr.wr.rdma.remote_addr, wqe->wr.wr.rdma.rkey, IB_ACCESS_REMOTE_READ))) @@ -365,7 +363,7 @@ again: case IB_WR_ATOMIC_CMP_AND_SWP: case IB_WR_ATOMIC_FETCH_AND_ADD: - if (unlikely(!ipath_rkey_ok(dev, &qp->r_sge, sizeof(u64), + if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, sizeof(u64), wqe->wr.wr.rdma.remote_addr, wqe->wr.wr.rdma.rkey, IB_ACCESS_REMOTE_ATOMIC))) @@ -575,8 +573,7 @@ int ipath_post_ruc_send(struct ipath_qp } if (wr->sg_list[i].length == 0) continue; - if (!ipath_lkey_ok(&to_idev(qp->ibqp.device)->lk_table, - &wqe->sg_list[j], &wr->sg_list[i], + if (!ipath_lkey_ok(qp, &wqe->sg_list[j], &wr->sg_list[i], acc)) { spin_unlock_irqrestore(&qp->s_lock, flags); ret = -EINVAL; diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_uc.c --- a/drivers/infiniband/hw/ipath/ipath_uc.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_uc.c Thu Sep 28 08:57:12 2006 -0700 @@ -444,7 +444,7 @@ void ipath_uc_rcv(struct ipath_ibdev *de int ok; /* Check rkey */ - ok = ipath_rkey_ok(dev, &qp->r_sge, qp->r_len, + ok = ipath_rkey_ok(qp, &qp->r_sge, qp->r_len, vaddr, rkey, IB_ACCESS_REMOTE_WRITE); if (unlikely(!ok)) { diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_ud.c --- a/drivers/infiniband/hw/ipath/ipath_ud.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ud.c Thu Sep 28 08:57:12 2006 -0700 @@ -39,7 +39,6 @@ static int init_sge(struct ipath_qp *qp, static int init_sge(struct ipath_qp *qp, struct ipath_rwqe *wqe, u32 *lengthp, struct ipath_sge_state *ss) { - struct ipath_ibdev *dev = to_idev(qp->ibqp.device); int user = to_ipd(qp->ibqp.pd)->user; int i, j, ret; struct ib_wc wc; @@ -50,8 +49,7 @@ static int init_sge(struct ipath_qp *qp, continue; /* Check LKEY */ if ((user && wqe->sg_list[i].lkey == 0) || - !ipath_lkey_ok(&dev->lk_table, - j ? &ss->sg_list[j - 1] : &ss->sge, + !ipath_lkey_ok(qp, j ? &ss->sg_list[j - 1] : &ss->sge, &wqe->sg_list[i], IB_ACCESS_LOCAL_WRITE)) goto bad_lkey; *lengthp += wqe->sg_list[i].length; @@ -343,7 +341,7 @@ int ipath_post_ud_send(struct ipath_qp * if (wr->sg_list[i].length == 0) continue; - if (!ipath_lkey_ok(&dev->lk_table, ss.num_sge ? + if (!ipath_lkey_ok(qp, ss.num_sge ? sg_list + ss.num_sge - 1 : &ss.sge, &wr->sg_list[i], 0)) { ret = -EINVAL; diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Sep 28 08:57:12 2006 -0700 @@ -220,6 +220,7 @@ struct ipath_segarray { }; struct ipath_mregion { + struct ib_pd *pd; /* shares refcnt of ibmr.pd */ u64 user_base; /* User's address for this region */ u64 iova; /* IB start address of this region */ size_t length; @@ -657,12 +658,6 @@ int ipath_verbs_send(struct ipath_devdat void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig); -int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss, - u32 len, u64 vaddr, u32 rkey, int acc); - -int ipath_lkey_ok(struct ipath_lkey_table *rkt, struct ipath_sge *isge, - struct ib_sge *sge, int acc); - void ipath_copy_sge(struct ipath_sge_state *ss, void *data, u32 length); void ipath_skip_sge(struct ipath_sge_state *ss, u32 length); @@ -687,10 +682,10 @@ int ipath_alloc_lkey(struct ipath_lkey_t void ipath_free_lkey(struct ipath_lkey_table *rkt, u32 lkey); -int ipath_lkey_ok(struct ipath_lkey_table *rkt, struct ipath_sge *isge, +int ipath_lkey_ok(struct ipath_qp *qp, struct ipath_sge *isge, struct ib_sge *sge, int acc); -int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss, +int ipath_rkey_ok(struct ipath_qp *qp, struct ipath_sge_state *ss, u32 len, u64 vaddr, u32 rkey, int acc); int ipath_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr, From bos at pathscale.com Thu Sep 28 09:00:08 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:08 -0700 Subject: [openib-general] [PATCH 12 of 28] IB/ipath - print more informative parity error messages In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r 4dbe5e686c78 -r a7ba4b73f972 drivers/infiniband/hw/ipath/ipath_iba6110.c --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c Thu Sep 28 08:57:12 2006 -0700 @@ -389,17 +389,28 @@ static void hwerr_crcbits(struct ipath_d _IPATH_HTLINK1_CRCBITS))); } +/* 6110 specific hardware errors... */ +static const struct ipath_hwerror_msgs ipath_6110_hwerror_msgs[] = { + INFINIPATH_HWE_MSG(HTCBUSIREQPARITYERR, "HTC Ireq Parity"), + INFINIPATH_HWE_MSG(HTCBUSTREQPARITYERR, "HTC Treq Parity"), + INFINIPATH_HWE_MSG(HTCBUSTRESPPARITYERR, "HTC Tresp Parity"), + INFINIPATH_HWE_MSG(HTCMISCERR5, "HT core Misc5"), + INFINIPATH_HWE_MSG(HTCMISCERR6, "HT core Misc6"), + INFINIPATH_HWE_MSG(HTCMISCERR7, "HT core Misc7"), + INFINIPATH_HWE_MSG(RXDSYNCMEMPARITYERR, "Rx Dsync"), + INFINIPATH_HWE_MSG(SERDESPLLFAILED, "SerDes PLL"), +}; + /** - * ipath_ht_handle_hwerrors - display hardware errors + * ipath_ht_handle_hwerrors - display hardware errors. * @dd: the infinipath device * @msg: the output buffer * @msgl: the size of the output buffer * - * Use same msg buffer as regular errors to avoid - * excessive stack use. Most hardware errors are catastrophic, but for - * right now, we'll print them and continue. - * We reuse the same message buffer as ipath_handle_errors() to avoid - * excessive stack usage. + * Use same msg buffer as regular errors to avoid excessive stack + * use. Most hardware errors are catastrophic, but for right now, + * we'll print them and continue. We reuse the same message buffer as + * ipath_handle_errors() to avoid excessive stack usage. */ static void ipath_ht_handle_hwerrors(struct ipath_devdata *dd, char *msg, size_t msgl) @@ -499,44 +510,16 @@ static void ipath_ht_handle_hwerrors(str bits); strlcat(msg, bitsmsg, msgl); } - if (hwerrs & (INFINIPATH_HWE_RXEMEMPARITYERR_MASK - << INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT)) { - bits = (u32) ((hwerrs >> - INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) & - INFINIPATH_HWE_RXEMEMPARITYERR_MASK); - snprintf(bitsmsg, sizeof bitsmsg, "[RXE Parity Errs %x] ", - bits); - strlcat(msg, bitsmsg, msgl); - } - if (hwerrs & (INFINIPATH_HWE_TXEMEMPARITYERR_MASK - << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) { - bits = (u32) ((hwerrs >> - INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) & - INFINIPATH_HWE_TXEMEMPARITYERR_MASK); - snprintf(bitsmsg, sizeof bitsmsg, "[TXE Parity Errs %x] ", - bits); - strlcat(msg, bitsmsg, msgl); - } - if (hwerrs & INFINIPATH_HWE_IBCBUSTOSPCPARITYERR) - strlcat(msg, "[IB2IPATH Parity]", msgl); - if (hwerrs & INFINIPATH_HWE_IBCBUSFRSPCPARITYERR) - strlcat(msg, "[IPATH2IB Parity]", msgl); - if (hwerrs & INFINIPATH_HWE_HTCBUSIREQPARITYERR) - strlcat(msg, "[HTC Ireq Parity]", msgl); - if (hwerrs & INFINIPATH_HWE_HTCBUSTREQPARITYERR) - strlcat(msg, "[HTC Treq Parity]", msgl); - if (hwerrs & INFINIPATH_HWE_HTCBUSTRESPPARITYERR) - strlcat(msg, "[HTC Tresp Parity]", msgl); + + ipath_format_hwerrors(hwerrs, + ipath_6110_hwerror_msgs, + sizeof(ipath_6110_hwerror_msgs) / + sizeof(ipath_6110_hwerror_msgs[0]), + msg, msgl); if (hwerrs & (_IPATH_HTLINK0_CRCBITS | _IPATH_HTLINK1_CRCBITS)) hwerr_crcbits(dd, hwerrs, msg, msgl); - if (hwerrs & INFINIPATH_HWE_HTCMISCERR5) - strlcat(msg, "[HT core Misc5]", msgl); - if (hwerrs & INFINIPATH_HWE_HTCMISCERR6) - strlcat(msg, "[HT core Misc6]", msgl); - if (hwerrs & INFINIPATH_HWE_HTCMISCERR7) - strlcat(msg, "[HT core Misc7]", msgl); if (hwerrs & INFINIPATH_HWE_MEMBISTFAILED) { strlcat(msg, "[Memory BIST test failed, InfiniPath hardware unusable]", msgl); @@ -572,11 +555,6 @@ static void ipath_ht_handle_hwerrors(str ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, dd->ipath_hwerrmask); } - - if (hwerrs & INFINIPATH_HWE_RXDSYNCMEMPARITYERR) - strlcat(msg, "[Rx Dsync]", msgl); - if (hwerrs & INFINIPATH_HWE_SERDESPLLFAILED) - strlcat(msg, "[SerDes PLL]", msgl); ipath_dev_err(dd, "%s hardware error\n", msg); if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg) diff -r 4dbe5e686c78 -r a7ba4b73f972 drivers/infiniband/hw/ipath/ipath_iba6120.c --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:12 2006 -0700 @@ -301,6 +301,26 @@ static const struct ipath_cregs ipath_pe */ #define INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR (1ULL<<63) +/* 6120 specific hardware errors... */ +static const struct ipath_hwerror_msgs ipath_6120_hwerror_msgs[] = { + INFINIPATH_HWE_MSG(PCIEPOISONEDTLP, "PCIe Poisoned TLP"), + INFINIPATH_HWE_MSG(PCIECPLTIMEOUT, "PCIe completion timeout"), + /* + * In practice, it's unlikely wthat we'll see PCIe PLL, or bus + * parity or memory parity error failures, because most likely we + * won't be able to talk to the core of the chip. Nonetheless, we + * might see them, if they are in parts of the PCIe core that aren't + * essential. + */ + INFINIPATH_HWE_MSG(PCIE1PLLFAILED, "PCIePLL1"), + INFINIPATH_HWE_MSG(PCIE0PLLFAILED, "PCIePLL0"), + INFINIPATH_HWE_MSG(PCIEBUSPARITYXTLH, "PCIe XTLH core parity"), + INFINIPATH_HWE_MSG(PCIEBUSPARITYXADM, "PCIe ADM TX core parity"), + INFINIPATH_HWE_MSG(PCIEBUSPARITYRADM, "PCIe ADM RX core parity"), + INFINIPATH_HWE_MSG(RXDSYNCMEMPARITYERR, "Rx Dsync"), + INFINIPATH_HWE_MSG(SERDESPLLFAILED, "SerDes PLL"), +}; + /** * ipath_pe_handle_hwerrors - display hardware errors. * @dd: the infinipath device @@ -403,24 +423,13 @@ static void ipath_pe_handle_hwerrors(str ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, dd->ipath_hwerrmask); } - if (hwerrs & (INFINIPATH_HWE_RXEMEMPARITYERR_MASK - << INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT)) { - bits = (u32) ((hwerrs >> - INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) & - INFINIPATH_HWE_RXEMEMPARITYERR_MASK); - snprintf(bitsmsg, sizeof bitsmsg, "[RXE Parity Errs %x] ", - bits); - strlcat(msg, bitsmsg, msgl); - } - if (hwerrs & (INFINIPATH_HWE_TXEMEMPARITYERR_MASK - << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) { - bits = (u32) ((hwerrs >> - INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) & - INFINIPATH_HWE_TXEMEMPARITYERR_MASK); - snprintf(bitsmsg, sizeof bitsmsg, "[TXE Parity Errs %x] ", - bits); - strlcat(msg, bitsmsg, msgl); - } + + ipath_format_hwerrors(hwerrs, + ipath_6120_hwerror_msgs, + sizeof(ipath_6120_hwerror_msgs)/ + sizeof(ipath_6120_hwerror_msgs[0]), + msg, msgl); + if (hwerrs & (INFINIPATH_HWE_PCIEMEMPARITYERR_MASK << INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT)) { bits = (u32) ((hwerrs >> @@ -430,10 +439,6 @@ static void ipath_pe_handle_hwerrors(str "[PCIe Mem Parity Errs %x] ", bits); strlcat(msg, bitsmsg, msgl); } - if (hwerrs & INFINIPATH_HWE_IBCBUSTOSPCPARITYERR) - strlcat(msg, "[IB2IPATH Parity]", msgl); - if (hwerrs & INFINIPATH_HWE_IBCBUSFRSPCPARITYERR) - strlcat(msg, "[IPATH2IB Parity]", msgl); #define _IPATH_PLL_FAIL (INFINIPATH_HWE_COREPLL_FBSLIP | \ INFINIPATH_HWE_COREPLL_RFSLIP ) @@ -458,34 +463,6 @@ static void ipath_pe_handle_hwerrors(str ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask, dd->ipath_hwerrmask); } - - if (hwerrs & INFINIPATH_HWE_PCIEPOISONEDTLP) - strlcat(msg, "[PCIe Poisoned TLP]", msgl); - if (hwerrs & INFINIPATH_HWE_PCIECPLTIMEOUT) - strlcat(msg, "[PCIe completion timeout]", msgl); - - /* - * In practice, it's unlikely wthat we'll see PCIe PLL, or bus - * parity or memory parity error failures, because most likely we - * won't be able to talk to the core of the chip. Nonetheless, we - * might see them, if they are in parts of the PCIe core that aren't - * essential. - */ - if (hwerrs & INFINIPATH_HWE_PCIE1PLLFAILED) - strlcat(msg, "[PCIePLL1]", msgl); - if (hwerrs & INFINIPATH_HWE_PCIE0PLLFAILED) - strlcat(msg, "[PCIePLL0]", msgl); - if (hwerrs & INFINIPATH_HWE_PCIEBUSPARITYXTLH) - strlcat(msg, "[PCIe XTLH core parity]", msgl); - if (hwerrs & INFINIPATH_HWE_PCIEBUSPARITYXADM) - strlcat(msg, "[PCIe ADM TX core parity]", msgl); - if (hwerrs & INFINIPATH_HWE_PCIEBUSPARITYRADM) - strlcat(msg, "[PCIe ADM RX core parity]", msgl); - - if (hwerrs & INFINIPATH_HWE_RXDSYNCMEMPARITYERR) - strlcat(msg, "[Rx Dsync]", msgl); - if (hwerrs & INFINIPATH_HWE_SERDESPLLFAILED) - strlcat(msg, "[SerDes PLL]", msgl); ipath_dev_err(dd, "%s hardware error\n", msg); if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg) { diff -r 4dbe5e686c78 -r a7ba4b73f972 drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:12 2006 -0700 @@ -132,6 +132,82 @@ static u64 handle_e_sum_errs(struct ipat return ignore_this_time; } +/* generic hw error messages... */ +#define INFINIPATH_HWE_TXEMEMPARITYERR_MSG(a) \ + { \ + .mask = ( INFINIPATH_HWE_TXEMEMPARITYERR_##a << \ + INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT ), \ + .msg = "TXE " #a " Memory Parity" \ + } +#define INFINIPATH_HWE_RXEMEMPARITYERR_MSG(a) \ + { \ + .mask = ( INFINIPATH_HWE_RXEMEMPARITYERR_##a << \ + INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT ), \ + .msg = "RXE " #a " Memory Parity" \ + } + +static const struct ipath_hwerror_msgs ipath_generic_hwerror_msgs[] = { + INFINIPATH_HWE_MSG(IBCBUSFRSPCPARITYERR, "IPATH2IB Parity"), + INFINIPATH_HWE_MSG(IBCBUSTOSPCPARITYERR, "IB2IPATH Parity"), + + INFINIPATH_HWE_TXEMEMPARITYERR_MSG(PIOBUF), + INFINIPATH_HWE_TXEMEMPARITYERR_MSG(PIOPBC), + INFINIPATH_HWE_TXEMEMPARITYERR_MSG(PIOLAUNCHFIFO), + + INFINIPATH_HWE_RXEMEMPARITYERR_MSG(RCVBUF), + INFINIPATH_HWE_RXEMEMPARITYERR_MSG(LOOKUPQ), + INFINIPATH_HWE_RXEMEMPARITYERR_MSG(EAGERTID), + INFINIPATH_HWE_RXEMEMPARITYERR_MSG(EXPTID), + INFINIPATH_HWE_RXEMEMPARITYERR_MSG(FLAGBUF), + INFINIPATH_HWE_RXEMEMPARITYERR_MSG(DATAINFO), + INFINIPATH_HWE_RXEMEMPARITYERR_MSG(HDRINFO), +}; + +/** + * ipath_format_hwmsg - format a single hwerror message + * @msg message buffer + * @msgl length of message buffer + * @hwmsg message to add to message buffer + */ +static void ipath_format_hwmsg(char *msg, size_t msgl, const char *hwmsg) +{ + strlcat(msg, "[", msgl); + strlcat(msg, hwmsg, msgl); + strlcat(msg, "]", msgl); +} + +/** + * ipath_format_hwerrors - format hardware error messages for display + * @hwerrs hardware errors bit vector + * @hwerrmsgs hardware error descriptions + * @nhwerrmsgs number of hwerrmsgs + * @msg message buffer + * @msgl message buffer length + */ +void ipath_format_hwerrors(u64 hwerrs, + const struct ipath_hwerror_msgs *hwerrmsgs, + size_t nhwerrmsgs, + char *msg, size_t msgl) +{ + int i; + const int glen = + sizeof(ipath_generic_hwerror_msgs) / + sizeof(ipath_generic_hwerror_msgs[0]); + + for (i=0; i Message-ID: <9fa624c592af68f7a851.1159459220@eng-12.pathscale.com> It's possible (from the compiler perspective) that f0 is unitialized in two functions (shows up with gcc4.0.2 on fc4, for example). Initialize to zero to fix warning. Signed-off-by: Bryan O'Sullivan diff -r 6a9a67c2b35a -r 9fa624c592af drivers/infiniband/hw/mthca/mthca_qp.c --- a/drivers/infiniband/hw/mthca/mthca_qp.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/mthca/mthca_qp.c Thu Sep 28 08:57:13 2006 -0700 @@ -1527,7 +1527,7 @@ int mthca_tavor_post_send(struct ib_qp * int i; int size; int size0 = 0; - u32 f0; + u32 f0 = 0; int ind; u8 op0 = 0; @@ -1870,7 +1870,7 @@ int mthca_arbel_post_send(struct ib_qp * int i; int size; int size0 = 0; - u32 f0; + u32 f0 = 0; int ind; u8 op0 = 0; From bos at pathscale.com Thu Sep 28 09:00:12 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:12 -0700 Subject: [openib-general] [PATCH 16 of 28] IB/ipath - drop unnecessary "(void *)" casts In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r dcf5ac390abd -r cdbbf110848d drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 @@ -1350,7 +1350,7 @@ int ipath_create_rcvhdrq(struct ipath_de /* clear for security and sanity on each use */ memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size); - memset((void *)pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE); + memset(pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE); /* * tell chip each time we init it, even if we are re-using previous @@ -1803,7 +1803,7 @@ void ipath_free_pddata(struct ipath_devd pd->port_rcvhdrq = NULL; if (pd->port_rcvhdrtail_kvaddr) { dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE, - (void *)pd->port_rcvhdrtail_kvaddr, + pd->port_rcvhdrtail_kvaddr, pd->port_rcvhdrqtailaddr_phys); pd->port_rcvhdrtail_kvaddr = NULL; } @@ -1934,7 +1934,7 @@ static void cleanup_device(struct ipath_ if (dd->ipath_pioavailregs_dma) { dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE, - (void *) dd->ipath_pioavailregs_dma, + dd->ipath_pioavailregs_dma, dd->ipath_pioavailregs_phys); dd->ipath_pioavailregs_dma = NULL; } From bos at pathscale.com Thu Sep 28 09:00:19 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:19 -0700 Subject: [openib-general] [PATCH 23 of 28] IB/ipath - fix EEPROM read when driver is compiled with -Os In-Reply-To: Message-ID: <6a9a67c2b35aa7f6636f.1159459219@eng-12.pathscale.com> The EEPROM is read via programmable I/O pins. When the driver is compiled -Os, the CPU can speculatively read the I/O value before it is valid. This patch fixes the problem. Signed-off-by: Bryan O'Sullivan diff -r 5aea5f31529d -r 6a9a67c2b35a drivers/infiniband/hw/ipath/ipath_eeprom.c --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c Thu Sep 28 08:57:13 2006 -0700 @@ -187,6 +187,7 @@ static void i2c_wait_for_writes(struct i static void i2c_wait_for_writes(struct ipath_devdata *dd) { (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); + rmb(); } static void scl_out(struct ipath_devdata *dd, u8 bit) From bos at pathscale.com Thu Sep 28 09:00:15 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:15 -0700 Subject: [openib-general] [PATCH 19 of 28] IB/ipath - call mtrr_del with correct arguments In-Reply-To: Message-ID: <858280e8cbab089eab00.1159459215@eng-12.pathscale.com> We were passing 0 for base and length, which worked on older kernels, but it doesn't seem to any longer. Signed-off-by: Bryan O'Sullivan diff -r de99d6fb2d1d -r 858280e8cbab drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 @@ -336,6 +336,8 @@ struct ipath_devdata { u8 ipath_ht_slave_off; /* for write combining settings */ unsigned long ipath_wc_cookie; + unsigned long ipath_wc_base; + unsigned long ipath_wc_len; /* ref count for each pkey */ atomic_t ipath_pkeyrefs[4]; /* shadow copy of all exptids physaddr; used only by funcsim */ diff -r de99d6fb2d1d -r 858280e8cbab drivers/infiniband/hw/ipath/ipath_wc_x86_64.c --- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c Thu Sep 28 08:57:12 2006 -0700 @@ -123,6 +123,8 @@ int ipath_enable_wc(struct ipath_devdata ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, " "cookie is %d\n", cookie); dd->ipath_wc_cookie = cookie; + dd->ipath_wc_base = (unsigned long) pioaddr; + dd->ipath_wc_len = (unsigned long) piolen; } } @@ -136,9 +138,16 @@ void ipath_disable_wc(struct ipath_devda void ipath_disable_wc(struct ipath_devdata *dd) { if (dd->ipath_wc_cookie) { + int r; ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n"); - mtrr_del(dd->ipath_wc_cookie, 0, 0); - dd->ipath_wc_cookie = 0; + r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base, + dd->ipath_wc_len); + if (r < 0) + dev_info(&dd->pcidev->dev, + "mtrr_del(%lx, %lx, %lx) failed: %d\n", + dd->ipath_wc_cookie, dd->ipath_wc_base, + dd->ipath_wc_len, r); + dd->ipath_wc_cookie = 0; // even on failure } } From bos at pathscale.com Thu Sep 28 09:00:14 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:14 -0700 Subject: [openib-general] [PATCH 18 of 28] IB/ipath - flush RWQEs if access error or invalid error seen In-Reply-To: Message-ID: If the receiver goes into the error state, we need to flush the posted receive WQEs. Signed-off-by: Bryan O'Sullivan diff -r f6794c8289ab -r de99d6fb2d1d drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Thu Sep 28 08:57:12 2006 -0700 @@ -335,6 +335,7 @@ static void ipath_reset_qp(struct ipath_ qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE; qp->r_ack_state = IB_OPCODE_RC_ACKNOWLEDGE; qp->r_nak_state = 0; + qp->r_wrid_valid = 0; qp->s_rnr_timeout = 0; qp->s_head = 0; qp->s_tail = 0; @@ -353,12 +354,13 @@ static void ipath_reset_qp(struct ipath_ /** * ipath_error_qp - put a QP into an error state * @qp: the QP to put into an error state + * @err: the receive completion error to signal if a RWQE is active * * Flushes both send and receive work queues. * QP s_lock should be held and interrupts disabled. */ -void ipath_error_qp(struct ipath_qp *qp) +void ipath_error_qp(struct ipath_qp *qp, enum ib_wc_status err) { struct ipath_ibdev *dev = to_idev(qp->ibqp.device); struct ib_wc wc; @@ -374,7 +376,6 @@ void ipath_error_qp(struct ipath_qp *qp) list_del_init(&qp->piowait); spin_unlock(&dev->pending_lock); - wc.status = IB_WC_WR_FLUSH_ERR; wc.vendor_err = 0; wc.byte_len = 0; wc.imm_data = 0; @@ -386,6 +387,12 @@ void ipath_error_qp(struct ipath_qp *qp) wc.sl = 0; wc.dlid_path_bits = 0; wc.port_num = 0; + if (qp->r_wrid_valid) { + qp->r_wrid_valid = 0; + wc.status = err; + ipath_cq_enter(to_icq(qp->ibqp.send_cq), &wc, 1); + } + wc.status = IB_WC_WR_FLUSH_ERR; while (qp->s_last != qp->s_head) { struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last); @@ -502,7 +509,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, break; case IB_QPS_ERR: - ipath_error_qp(qp); + ipath_error_qp(qp, IB_WC_GENERAL_ERR); break; default: diff -r f6794c8289ab -r de99d6fb2d1d drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Thu Sep 28 08:57:12 2006 -0700 @@ -1293,6 +1293,14 @@ done: return 1; } +static void ipath_rc_error(struct ipath_qp *qp, enum ib_wc_status err) +{ + spin_lock_irq(&qp->s_lock); + qp->state = IB_QPS_ERR; + ipath_error_qp(qp, err); + spin_unlock_irq(&qp->s_lock); +} + /** * ipath_rc_rcv - process an incoming RC packet * @dev: the device this packet came in on @@ -1385,8 +1393,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de */ if (qp->r_ack_state >= OP(COMPARE_SWAP)) goto send_ack; - /* XXX Flush WQEs */ - qp->state = IB_QPS_ERR; + ipath_rc_error(qp, IB_WC_REM_INV_REQ_ERR); qp->r_ack_state = OP(SEND_ONLY); qp->r_nak_state = IB_NAK_INVALID_REQUEST; qp->r_ack_psn = qp->r_psn; @@ -1492,9 +1499,9 @@ void ipath_rc_rcv(struct ipath_ibdev *de goto nack_inv; ipath_copy_sge(&qp->r_sge, data, tlen); qp->r_msn++; - if (opcode == OP(RDMA_WRITE_LAST) || - opcode == OP(RDMA_WRITE_ONLY)) + if (!qp->r_wrid_valid) break; + qp->r_wrid_valid = 0; wc.wr_id = qp->r_wr_id; wc.status = IB_WC_SUCCESS; wc.opcode = IB_WC_RECV; @@ -1685,8 +1692,7 @@ nack_acc: * is pending though. */ if (qp->r_ack_state < OP(COMPARE_SWAP)) { - /* XXX Flush WQEs */ - qp->state = IB_QPS_ERR; + ipath_rc_error(qp, IB_WC_REM_ACCESS_ERR); qp->r_ack_state = OP(RDMA_WRITE_ONLY); qp->r_nak_state = IB_NAK_REMOTE_ACCESS_ERROR; qp->r_ack_psn = qp->r_psn; diff -r f6794c8289ab -r de99d6fb2d1d drivers/infiniband/hw/ipath/ipath_ruc.c --- a/drivers/infiniband/hw/ipath/ipath_ruc.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Thu Sep 28 08:57:12 2006 -0700 @@ -229,6 +229,7 @@ int ipath_get_rwqe(struct ipath_qp *qp, } } spin_unlock_irqrestore(&rq->lock, flags); + qp->r_wrid_valid = 1; bail: return ret; diff -r f6794c8289ab -r de99d6fb2d1d drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Sep 28 08:57:12 2006 -0700 @@ -365,6 +365,7 @@ struct ipath_qp { u8 r_min_rnr_timer; /* retry timeout value for RNR NAKs */ u8 r_reuse_sge; /* for UC receive errors */ u8 r_sge_inx; /* current index into sg_list */ + u8 r_wrid_valid; /* r_wrid set but CQ entry not yet made */ u8 qp_access_flags; u8 s_max_sge; /* size of s_wq->sg_list */ u8 s_retry_cnt; /* number of times to retry */ @@ -639,6 +640,8 @@ struct ib_qp *ipath_create_qp(struct ib_ int ipath_destroy_qp(struct ib_qp *ibqp); +void ipath_error_qp(struct ipath_qp *qp, enum ib_wc_status err); + int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, struct ib_udata *udata); From bos at pathscale.com Thu Sep 28 09:00:11 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:11 -0700 Subject: [openib-general] [PATCH 15 of 28] IB/ipath - support multiple simultaneous devices of different types In-Reply-To: Message-ID: Prior to this change, the driver was not able to support a HT and PCIE card simultaneously present in the same machine. Signed-off-by: Bryan O'Sullivan diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 @@ -94,16 +94,6 @@ const char *ipath_ibcstatus_str[] = { "RecovWaitRmt", "RecovIdle", }; - -/* - * These variables are initialized in the chip-specific files - * but are defined here. - */ -u16 ipath_gpio_sda_num, ipath_gpio_scl_num; -u64 ipath_gpio_sda, ipath_gpio_scl; -u64 infinipath_i_bitsextant; -ipath_err_t infinipath_e_bitsextant, infinipath_hwe_bitsextant; -u32 infinipath_i_rcvavail_mask, infinipath_i_rcvurg_mask; static void __devexit ipath_remove_one(struct pci_dev *); static int __devinit ipath_init_one(struct pci_dev *, diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_eeprom.c --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c Thu Sep 28 08:57:12 2006 -0700 @@ -100,9 +100,9 @@ static int i2c_gpio_set(struct ipath_dev gpioval = &dd->ipath_gpio_out; read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl); if (line == i2c_line_scl) - mask = ipath_gpio_scl; + mask = dd->ipath_gpio_scl; else - mask = ipath_gpio_sda; + mask = dd->ipath_gpio_sda; if (new_line_state == i2c_line_high) /* tri-state the output rather than force high */ @@ -119,12 +119,12 @@ static int i2c_gpio_set(struct ipath_dev write_val = 0x0UL; if (line == i2c_line_scl) { - write_val <<= ipath_gpio_scl_num; - *gpioval = *gpioval & ~(1UL << ipath_gpio_scl_num); + write_val <<= dd->ipath_gpio_scl_num; + *gpioval = *gpioval & ~(1UL << dd->ipath_gpio_scl_num); *gpioval |= write_val; } else { - write_val <<= ipath_gpio_sda_num; - *gpioval = *gpioval & ~(1UL << ipath_gpio_sda_num); + write_val <<= dd->ipath_gpio_sda_num; + *gpioval = *gpioval & ~(1UL << dd->ipath_gpio_sda_num); *gpioval |= write_val; } ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_out, *gpioval); @@ -157,9 +157,9 @@ static int i2c_gpio_get(struct ipath_dev read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl); /* config line to be an input */ if (line == i2c_line_scl) - mask = ipath_gpio_scl; + mask = dd->ipath_gpio_scl; else - mask = ipath_gpio_sda; + mask = dd->ipath_gpio_sda; write_val = read_val & ~mask; ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, write_val); read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extstatus); diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_iba6110.c --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c Thu Sep 28 08:57:12 2006 -0700 @@ -252,8 +252,8 @@ static const struct ipath_cregs ipath_ht }; /* kr_intstatus, kr_intclear, kr_intmask bits */ -#define INFINIPATH_I_RCVURG_MASK 0x1FF -#define INFINIPATH_I_RCVAVAIL_MASK 0x1FF +#define INFINIPATH_I_RCVURG_MASK ((1U<<9)-1) +#define INFINIPATH_I_RCVAVAIL_MASK ((1U<<9)-1) /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */ #define INFINIPATH_HWE_HTCMEMPARITYERR_SHIFT 0 @@ -457,10 +457,10 @@ static void ipath_ht_handle_hwerrors(str "(cleared)\n", (unsigned long long) hwerrs); dd->ipath_lasthwerror |= hwerrs; - if (hwerrs & ~infinipath_hwe_bitsextant) + if (hwerrs & ~dd->ipath_hwe_bitsextant) ipath_dev_err(dd, "hwerror interrupt with unknown errors " "%llx set\n", (unsigned long long) - (hwerrs & ~infinipath_hwe_bitsextant)); + (hwerrs & ~dd->ipath_hwe_bitsextant)); ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control); if (ctrl & INFINIPATH_C_FREEZEMODE) { @@ -1059,21 +1059,21 @@ static void ipath_setup_ht_setextled(str ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, extctl); } -static void ipath_init_ht_variables(void) -{ - ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM; - ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM; - ipath_gpio_sda = IPATH_GPIO_SDA; - ipath_gpio_scl = IPATH_GPIO_SCL; - - infinipath_i_bitsextant = +static void ipath_init_ht_variables(struct ipath_devdata *dd) +{ + dd->ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM; + dd->ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM; + dd->ipath_gpio_sda = IPATH_GPIO_SDA; + dd->ipath_gpio_scl = IPATH_GPIO_SCL; + + dd->ipath_i_bitsextant = (INFINIPATH_I_RCVURG_MASK << INFINIPATH_I_RCVURG_SHIFT) | (INFINIPATH_I_RCVAVAIL_MASK << INFINIPATH_I_RCVAVAIL_SHIFT) | INFINIPATH_I_ERROR | INFINIPATH_I_SPIOSENT | INFINIPATH_I_SPIOBUFAVAIL | INFINIPATH_I_GPIO; - infinipath_e_bitsextant = + dd->ipath_e_bitsextant = INFINIPATH_E_RFORMATERR | INFINIPATH_E_RVCRC | INFINIPATH_E_RICRC | INFINIPATH_E_RMINPKTLEN | INFINIPATH_E_RMAXPKTLEN | INFINIPATH_E_RLONGPKTLEN | @@ -1091,7 +1091,7 @@ static void ipath_init_ht_variables(void INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET | INFINIPATH_E_HARDWARE; - infinipath_hwe_bitsextant = + dd->ipath_hwe_bitsextant = (INFINIPATH_HWE_HTCMEMPARITYERR_MASK << INFINIPATH_HWE_HTCMEMPARITYERR_SHIFT) | (INFINIPATH_HWE_TXEMEMPARITYERR_MASK << @@ -1120,8 +1120,8 @@ static void ipath_init_ht_variables(void INFINIPATH_HWE_IBCBUSTOSPCPARITYERR | INFINIPATH_HWE_IBCBUSFRSPCPARITYERR; - infinipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK; - infinipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK; + dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK; + dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK; } /** @@ -1586,5 +1586,5 @@ void ipath_init_iba6110_funcs(struct ipa * do very early init that is needed before ipath_f_bus is * called */ - ipath_init_ht_variables(); -} + ipath_init_ht_variables(dd); +} diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_iba6120.c --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:12 2006 -0700 @@ -263,8 +263,8 @@ static const struct ipath_cregs ipath_pe }; /* kr_intstatus, kr_intclear, kr_intmask bits */ -#define INFINIPATH_I_RCVURG_MASK 0x1F -#define INFINIPATH_I_RCVAVAIL_MASK 0x1F +#define INFINIPATH_I_RCVURG_MASK ((1U<<5)-1) +#define INFINIPATH_I_RCVAVAIL_MASK ((1U<<5)-1) /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */ #define INFINIPATH_HWE_PCIEMEMPARITYERR_MASK 0x000000000000003fULL @@ -376,10 +376,10 @@ static void ipath_pe_handle_hwerrors(str "(cleared)\n", (unsigned long long) hwerrs); dd->ipath_lasthwerror |= hwerrs; - if (hwerrs & ~infinipath_hwe_bitsextant) + if (hwerrs & ~dd->ipath_hwe_bitsextant) ipath_dev_err(dd, "hwerror interrupt with unknown errors " "%llx set\n", (unsigned long long) - (hwerrs & ~infinipath_hwe_bitsextant)); + (hwerrs & ~dd->ipath_hwe_bitsextant)); ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control); if (ctrl & INFINIPATH_C_FREEZEMODE) { @@ -865,19 +865,19 @@ static int ipath_setup_pe_config(struct return 0; } -static void ipath_init_pe_variables(void) +static void ipath_init_pe_variables(struct ipath_devdata *dd) { /* * bits for selecting i2c direction and values, * used for I2C serial flash */ - ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM; - ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM; - ipath_gpio_sda = IPATH_GPIO_SDA; - ipath_gpio_scl = IPATH_GPIO_SCL; + dd->ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM; + dd->ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM; + dd->ipath_gpio_sda = IPATH_GPIO_SDA; + dd->ipath_gpio_scl = IPATH_GPIO_SCL; /* variables for sanity checking interrupt and errors */ - infinipath_hwe_bitsextant = + dd->ipath_hwe_bitsextant = (INFINIPATH_HWE_RXEMEMPARITYERR_MASK << INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) | (INFINIPATH_HWE_PCIEMEMPARITYERR_MASK << @@ -895,13 +895,13 @@ static void ipath_init_pe_variables(void INFINIPATH_HWE_SERDESPLLFAILED | INFINIPATH_HWE_IBCBUSTOSPCPARITYERR | INFINIPATH_HWE_IBCBUSFRSPCPARITYERR; - infinipath_i_bitsextant = + dd->ipath_i_bitsextant = (INFINIPATH_I_RCVURG_MASK << INFINIPATH_I_RCVURG_SHIFT) | (INFINIPATH_I_RCVAVAIL_MASK << INFINIPATH_I_RCVAVAIL_SHIFT) | INFINIPATH_I_ERROR | INFINIPATH_I_SPIOSENT | INFINIPATH_I_SPIOBUFAVAIL | INFINIPATH_I_GPIO; - infinipath_e_bitsextant = + dd->ipath_e_bitsextant = INFINIPATH_E_RFORMATERR | INFINIPATH_E_RVCRC | INFINIPATH_E_RICRC | INFINIPATH_E_RMINPKTLEN | INFINIPATH_E_RMAXPKTLEN | INFINIPATH_E_RLONGPKTLEN | @@ -919,8 +919,8 @@ static void ipath_init_pe_variables(void INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET | INFINIPATH_E_HARDWARE; - infinipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK; - infinipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK; + dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK; + dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK; } /* setup the MSI stuff again after a reset. I'd like to just call @@ -1326,6 +1326,6 @@ void ipath_init_iba6120_funcs(struct ipa dd->ipath_kregs = &ipath_pe_kregs; dd->ipath_cregs = &ipath_pe_cregs; - ipath_init_pe_variables(); -} - + ipath_init_pe_variables(dd); +} + diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:12 2006 -0700 @@ -480,10 +480,10 @@ static int handle_errors(struct ipath_de dd->ipath_f_handle_hwerrors(dd, msg, sizeof msg); } - if (!noprint && (errs & ~infinipath_e_bitsextant)) + if (!noprint && (errs & ~dd->ipath_e_bitsextant)) ipath_dev_err(dd, "error interrupt with unknown errors " "%llx set\n", (unsigned long long) - (errs & ~infinipath_e_bitsextant)); + (errs & ~dd->ipath_e_bitsextant)); if (errs & E_SUM_ERRS) ignore_this_time = handle_e_sum_errs(dd, errs); @@ -805,9 +805,9 @@ static void handle_urcv(struct ipath_dev int rcvdint = 0; portr = ((istat >> INFINIPATH_I_RCVAVAIL_SHIFT) & - infinipath_i_rcvavail_mask) + dd->ipath_i_rcvavail_mask) | ((istat >> INFINIPATH_I_RCVURG_SHIFT) & - infinipath_i_rcvurg_mask); + dd->ipath_i_rcvurg_mask); for (i = 1; i < dd->ipath_cfgports; i++) { struct ipath_portdata *pd = dd->ipath_pd[i]; if (portr & (1 << i) && pd && pd->port_cnt && @@ -914,10 +914,10 @@ irqreturn_t ipath_intr(int irq, void *da if (unexpected) unexpected = 0; - if (unlikely(istat & ~infinipath_i_bitsextant)) + if (unlikely(istat & ~dd->ipath_i_bitsextant)) ipath_dev_err(dd, "interrupt with unknown interrupts %x set\n", - istat & (u32) ~ infinipath_i_bitsextant); + istat & (u32) ~ dd->ipath_i_bitsextant); else ipath_cdbg(VERBOSE, "intr stat=0x%x\n", istat); @@ -1041,9 +1041,9 @@ irqreturn_t ipath_intr(int irq, void *da istat &= ~port0rbits; } - if (istat & ((infinipath_i_rcvavail_mask << + if (istat & ((dd->ipath_i_rcvavail_mask << INFINIPATH_I_RCVAVAIL_SHIFT) - | (infinipath_i_rcvurg_mask << + | (dd->ipath_i_rcvurg_mask << INFINIPATH_I_RCVURG_SHIFT))) handle_urcv(dd, istat); diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 @@ -533,6 +533,30 @@ struct ipath_devdata { u32 ipath_rxfc_unsupvl_errs; u32 ipath_overrun_thresh_errs; u32 ipath_lli_errs; + + /* + * Not all devices managed by a driver instance are the same + * type, so these fields must be per-device. + */ + u64 ipath_i_bitsextant; + ipath_err_t ipath_e_bitsextant; + ipath_err_t ipath_hwe_bitsextant; + + /* + * Below should be computable from number of ports, + * since they are never modified. + */ + u32 ipath_i_rcvavail_mask; + u32 ipath_i_rcvurg_mask; + + /* + * Register bits for selecting i2c direction and values, used for + * I2C serial flash. + */ + u16 ipath_gpio_sda_num; + u16 ipath_gpio_scl_num; + u64 ipath_gpio_sda; + u64 ipath_gpio_scl; }; /* Private data for file operations */ diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_registers.h --- a/drivers/infiniband/hw/ipath/ipath_registers.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_registers.h Thu Sep 28 08:57:12 2006 -0700 @@ -316,19 +316,23 @@ typedef u64 ipath_err_t; +/* The following change with the type of device, so + * need to be part of the ipath_devdata struct, or + * we could have problems plugging in devices of + * different types (e.g. one HT, one PCIE) + * in one system, to be managed by one driver. + * On the other hand, this file is may also be included + * by other code, so leave the declarations here + * temporarily. Minor footprint issue if common-model + * linker used, none if C89+ linker used. + */ + /* mask of defined bits for various registers */ extern u64 infinipath_i_bitsextant; extern ipath_err_t infinipath_e_bitsextant, infinipath_hwe_bitsextant; /* masks that are different in various chips, or only exist in some chips */ extern u32 infinipath_i_rcvavail_mask, infinipath_i_rcvurg_mask; - -/* - * register bits for selecting i2c direction and values, used for I2C serial - * flash - */ -extern u16 ipath_gpio_sda_num, ipath_gpio_scl_num; -extern u64 ipath_gpio_sda, ipath_gpio_scl; /* * These are the infinipath general register numbers (not offsets). From bos at pathscale.com Thu Sep 28 09:00:22 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:22 -0700 Subject: [openib-general] [PATCH 26 of 28] IB/ipath - support new PCIE device, QLE7142 In-Reply-To: Message-ID: <8b45b43df5adb4ea7dec.1159459222@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 4269068599c2 -r 8b45b43df5ad drivers/infiniband/hw/ipath/ipath_iba6120.c --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:13 2006 -0700 @@ -538,6 +538,9 @@ static int ipath_pe_boardname(struct ipa case 5: n = "InfiniPath_QMH7140"; break; + case 6: + n = "InfiniPath_QLE7142"; + break; default: ipath_dev_err(dd, "Don't yet know about board with ID %u\n", From bos at pathscale.com Thu Sep 28 09:00:17 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:17 -0700 Subject: [openib-general] [PATCH 21 of 28] IB/ipath - change HT CRC message to indicate how to resolve problem In-Reply-To: Message-ID: The system must be powercycled to clear a HT CRC error; reloading the driver is not enough. Signed-off-by: Bryan O'Sullivan diff -r e3158e62d6bf -r a78c7b475df6 drivers/infiniband/hw/ipath/ipath_iba6110.c --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c Thu Sep 28 08:57:13 2006 -0700 @@ -338,7 +338,7 @@ static void hwerr_crcbits(struct ipath_d if (crcbits) { u16 ctrl0, ctrl1; snprintf(bitsmsg, sizeof bitsmsg, - "[HT%s lane %s CRC (%llx); ignore till reload]", + "[HT%s lane %s CRC (%llx); powercycle to completely clear]", !(crcbits & _IPATH_HTLINK1_CRCBITS) ? "0 (A)" : (!(crcbits & _IPATH_HTLINK0_CRCBITS) ? "1 (B)" : "0+1 (A+B)"), From bos at pathscale.com Thu Sep 28 09:00:23 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:23 -0700 Subject: [openib-general] [PATCH 27 of 28] IB/ipath - fix races with ib_resize_cq() In-Reply-To: Message-ID: <944d7e53a04937d73513.1159459223@eng-12.pathscale.com> The resize CQ function changes the memory used to store the queue. Other routines need to honor the lock before accessing the pointer to the queue and verify that the head and tail are in range. Signed-off-by: Bryan O'Sullivan diff -r 8b45b43df5ad -r 944d7e53a049 drivers/infiniband/hw/ipath/ipath_cq.c --- a/drivers/infiniband/hw/ipath/ipath_cq.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_cq.c Thu Sep 28 08:57:13 2006 -0700 @@ -46,7 +46,7 @@ */ void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int solicited) { - struct ipath_cq_wc *wc = cq->queue; + struct ipath_cq_wc *wc; unsigned long flags; u32 head; u32 next; @@ -57,6 +57,7 @@ void ipath_cq_enter(struct ipath_cq *cq, * Note that the head pointer might be writable by user processes. * Take care to verify it is a sane value. */ + wc = cq->queue; head = wc->head; if (head >= (unsigned) cq->ibcq.cqe) { head = cq->ibcq.cqe; @@ -109,21 +110,27 @@ int ipath_poll_cq(struct ib_cq *ibcq, in int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) { struct ipath_cq *cq = to_icq(ibcq); - struct ipath_cq_wc *wc = cq->queue; + struct ipath_cq_wc *wc; unsigned long flags; int npolled; + u32 tail; spin_lock_irqsave(&cq->lock, flags); + wc = cq->queue; + tail = wc->tail; + if (tail > (u32) cq->ibcq.cqe) + tail = (u32) cq->ibcq.cqe; for (npolled = 0; npolled < num_entries; ++npolled, ++entry) { - if (wc->tail == wc->head) + if (tail == wc->head) break; - *entry = wc->queue[wc->tail]; - if (wc->tail >= cq->ibcq.cqe) - wc->tail = 0; + *entry = wc->queue[tail]; + if (tail >= cq->ibcq.cqe) + tail = 0; else - wc->tail++; - } + tail++; + } + wc->tail = tail; spin_unlock_irqrestore(&cq->lock, flags); @@ -322,10 +329,16 @@ int ipath_req_notify_cq(struct ib_cq *ib return 0; } +/** + * ipath_resize_cq - change the size of the CQ + * @ibcq: the completion queue + * + * Returns 0 for success. + */ int ipath_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) { struct ipath_cq *cq = to_icq(ibcq); - struct ipath_cq_wc *old_wc = cq->queue; + struct ipath_cq_wc *old_wc; struct ipath_cq_wc *wc; u32 head, tail, n; int ret; @@ -361,6 +374,7 @@ int ipath_resize_cq(struct ib_cq *ibcq, * Make sure head and tail are sane since they * might be user writable. */ + old_wc = cq->queue; head = old_wc->head; if (head > (u32) cq->ibcq.cqe) head = (u32) cq->ibcq.cqe; From bos at pathscale.com Thu Sep 28 09:00:24 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:24 -0700 Subject: [openib-general] [PATCH 28 of 28] IB/ipath - fix lockdep error upon "ifconfig ibN down" In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r 944d7e53a049 -r c61b17b5602f drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Sep 28 08:57:13 2006 -0700 @@ -1202,6 +1202,7 @@ static struct ib_ah *ipath_create_ah(str struct ipath_ah *ah; struct ib_ah *ret; struct ipath_ibdev *dev = to_idev(pd->device); + unsigned long flags; /* A multicast address requires a GRH (see ch. 8.4.1). */ if (ah_attr->dlid >= IPATH_MULTICAST_LID_BASE && @@ -1228,16 +1229,16 @@ static struct ib_ah *ipath_create_ah(str goto bail; } - spin_lock(&dev->n_ahs_lock); + spin_lock_irqsave(&dev->n_ahs_lock, flags); if (dev->n_ahs_allocated == ib_ipath_max_ahs) { - spin_unlock(&dev->n_ahs_lock); + spin_unlock_irqrestore(&dev->n_ahs_lock, flags); kfree(ah); ret = ERR_PTR(-ENOMEM); goto bail; } dev->n_ahs_allocated++; - spin_unlock(&dev->n_ahs_lock); + spin_unlock_irqrestore(&dev->n_ahs_lock, flags); /* ib_create_ah() will initialize ah->ibah. */ ah->attr = *ah_attr; @@ -1258,10 +1259,11 @@ static int ipath_destroy_ah(struct ib_ah { struct ipath_ibdev *dev = to_idev(ibah->device); struct ipath_ah *ah = to_iah(ibah); - - spin_lock(&dev->n_ahs_lock); + unsigned long flags; + + spin_lock_irqsave(&dev->n_ahs_lock, flags); dev->n_ahs_allocated--; - spin_unlock(&dev->n_ahs_lock); + spin_unlock_irqrestore(&dev->n_ahs_lock, flags); kfree(ah); From bos at pathscale.com Thu Sep 28 09:00:16 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:16 -0700 Subject: [openib-general] [PATCH 20 of 28] IB/ipath - clean up module exit code In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r 858280e8cbab -r e3158e62d6bf drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:13 2006 -0700 @@ -517,33 +517,146 @@ bail: return ret; } +static void __devexit cleanup_device(struct ipath_devdata *dd) +{ + int port; + + ipath_shutdown_device(dd); + + if (*dd->ipath_statusp & IPATH_STATUS_CHIP_PRESENT) { + /* can't do anything more with chip; needs re-init */ + *dd->ipath_statusp &= ~IPATH_STATUS_CHIP_PRESENT; + if (dd->ipath_kregbase) { + /* + * if we haven't already cleaned up before these are + * to ensure any register reads/writes "fail" until + * re-init + */ + dd->ipath_kregbase = NULL; + dd->ipath_uregbase = 0; + dd->ipath_sregbase = 0; + dd->ipath_cregbase = 0; + dd->ipath_kregsize = 0; + } + ipath_disable_wc(dd); + } + + if (dd->ipath_pioavailregs_dma) { + dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE, + (void *) dd->ipath_pioavailregs_dma, + dd->ipath_pioavailregs_phys); + dd->ipath_pioavailregs_dma = NULL; + } + if (dd->ipath_dummy_hdrq) { + dma_free_coherent(&dd->pcidev->dev, + dd->ipath_pd[0]->port_rcvhdrq_size, + dd->ipath_dummy_hdrq, dd->ipath_dummy_hdrq_phys); + dd->ipath_dummy_hdrq = NULL; + } + + if (dd->ipath_pageshadow) { + struct page **tmpp = dd->ipath_pageshadow; + dma_addr_t *tmpd = dd->ipath_physshadow; + int i, cnt = 0; + + ipath_cdbg(VERBOSE, "Unlocking any expTID pages still " + "locked\n"); + for (port = 0; port < dd->ipath_cfgports; port++) { + int port_tidbase = port * dd->ipath_rcvtidcnt; + int maxtid = port_tidbase + dd->ipath_rcvtidcnt; + for (i = port_tidbase; i < maxtid; i++) { + if (!tmpp[i]) + continue; + pci_unmap_page(dd->pcidev, tmpd[i], + PAGE_SIZE, PCI_DMA_FROMDEVICE); + ipath_release_user_pages(&tmpp[i], 1); + tmpp[i] = NULL; + cnt++; + } + } + if (cnt) { + ipath_stats.sps_pageunlocks += cnt; + ipath_cdbg(VERBOSE, "There were still %u expTID " + "entries locked\n", cnt); + } + if (ipath_stats.sps_pagelocks || + ipath_stats.sps_pageunlocks) + ipath_cdbg(VERBOSE, "%llu pages locked, %llu " + "unlocked via ipath_m{un}lock\n", + (unsigned long long) + ipath_stats.sps_pagelocks, + (unsigned long long) + ipath_stats.sps_pageunlocks); + + ipath_cdbg(VERBOSE, "Free shadow page tid array at %p\n", + dd->ipath_pageshadow); + vfree(dd->ipath_pageshadow); + dd->ipath_pageshadow = NULL; + } + + /* + * free any resources still in use (usually just kernel ports) + * at unload; we do for portcnt, not cfgports, because cfgports + * could have changed while we were loaded. + */ + for (port = 0; port < dd->ipath_portcnt; port++) { + struct ipath_portdata *pd = dd->ipath_pd[port]; + dd->ipath_pd[port] = NULL; + ipath_free_pddata(dd, pd); + } + kfree(dd->ipath_pd); + /* + * debuggability, in case some cleanup path tries to use it + * after this + */ + dd->ipath_pd = NULL; +} + static void __devexit ipath_remove_one(struct pci_dev *pdev) { - struct ipath_devdata *dd; - - ipath_cdbg(VERBOSE, "removing, pdev=%p\n", pdev); - if (!pdev) - return; - - dd = pci_get_drvdata(pdev); - - if (dd->verbs_dev) { + struct ipath_devdata *dd = pci_get_drvdata(pdev); + + ipath_cdbg(VERBOSE, "removing, pdev=%p, dd=%p\n", pdev, dd); + + if (dd->verbs_dev) ipath_unregister_ib_device(dd->verbs_dev); - dd->verbs_dev = NULL; - } ipath_diag_remove(dd); ipath_user_remove(dd); ipathfs_remove_device(dd); ipath_device_remove_group(&pdev->dev, dd); + ipath_cdbg(VERBOSE, "Releasing pci memory regions, dd %p, " "unit %u\n", dd, (u32) dd->ipath_unit); - if (dd->ipath_kregbase) { - ipath_cdbg(VERBOSE, "Unmapping kregbase %p\n", - dd->ipath_kregbase); - iounmap((volatile void __iomem *) dd->ipath_kregbase); - dd->ipath_kregbase = NULL; - } + + cleanup_device(dd); + + /* + * turn off rcv, send, and interrupts for all ports, all drivers + * should also hard reset the chip here? + * free up port 0 (kernel) rcvhdr, egr bufs, and eventually tid bufs + * for all versions of the driver, if they were allocated + */ + if (pdev->irq) { + ipath_cdbg(VERBOSE, + "unit %u free_irq of irq %x\n", + dd->ipath_unit, pdev->irq); + free_irq(pdev->irq, dd); + } else + ipath_dbg("irq is 0, not doing free_irq " + "for unit %u\n", dd->ipath_unit); + /* + * we check for NULL here, because it's outside + * the kregbase check, and we need to call it + * after the free_irq. Thus it's possible that + * the function pointers were never initialized. + */ + if (dd->ipath_f_cleanup) + /* clean up chip-specific stuff */ + dd->ipath_f_cleanup(dd); + + ipath_cdbg(VERBOSE, "Unmapping kregbase %p\n", dd->ipath_kregbase); + iounmap((volatile void __iomem *) dd->ipath_kregbase); pci_release_regions(pdev); ipath_cdbg(VERBOSE, "calling pci_disable_device\n"); pci_disable_device(pdev); @@ -1917,157 +2030,11 @@ bail: return ret; } -static void cleanup_device(struct ipath_devdata *dd) -{ - int port; - - ipath_shutdown_device(dd); - - if (*dd->ipath_statusp & IPATH_STATUS_CHIP_PRESENT) { - /* can't do anything more with chip; needs re-init */ - *dd->ipath_statusp &= ~IPATH_STATUS_CHIP_PRESENT; - if (dd->ipath_kregbase) { - /* - * if we haven't already cleaned up before these are - * to ensure any register reads/writes "fail" until - * re-init - */ - dd->ipath_kregbase = NULL; - dd->ipath_uregbase = 0; - dd->ipath_sregbase = 0; - dd->ipath_cregbase = 0; - dd->ipath_kregsize = 0; - } - ipath_disable_wc(dd); - } - - if (dd->ipath_pioavailregs_dma) { - dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE, - (void *) dd->ipath_pioavailregs_dma, - dd->ipath_pioavailregs_phys); - dd->ipath_pioavailregs_dma = NULL; - } - if (dd->ipath_dummy_hdrq) { - dma_free_coherent(&dd->pcidev->dev, - dd->ipath_pd[0]->port_rcvhdrq_size, - dd->ipath_dummy_hdrq, dd->ipath_dummy_hdrq_phys); - dd->ipath_dummy_hdrq = NULL; - } - - if (dd->ipath_pageshadow) { - struct page **tmpp = dd->ipath_pageshadow; - dma_addr_t *tmpd = dd->ipath_physshadow; - int i, cnt = 0; - - ipath_cdbg(VERBOSE, "Unlocking any expTID pages still " - "locked\n"); - for (port = 0; port < dd->ipath_cfgports; port++) { - int port_tidbase = port * dd->ipath_rcvtidcnt; - int maxtid = port_tidbase + dd->ipath_rcvtidcnt; - for (i = port_tidbase; i < maxtid; i++) { - if (!tmpp[i]) - continue; - pci_unmap_page(dd->pcidev, tmpd[i], - PAGE_SIZE, PCI_DMA_FROMDEVICE); - ipath_release_user_pages(&tmpp[i], 1); - tmpp[i] = NULL; - cnt++; - } - } - if (cnt) { - ipath_stats.sps_pageunlocks += cnt; - ipath_cdbg(VERBOSE, "There were still %u expTID " - "entries locked\n", cnt); - } - if (ipath_stats.sps_pagelocks || - ipath_stats.sps_pageunlocks) - ipath_cdbg(VERBOSE, "%llu pages locked, %llu " - "unlocked via ipath_m{un}lock\n", - (unsigned long long) - ipath_stats.sps_pagelocks, - (unsigned long long) - ipath_stats.sps_pageunlocks); - - ipath_cdbg(VERBOSE, "Free shadow page tid array at %p\n", - dd->ipath_pageshadow); - vfree(dd->ipath_pageshadow); - dd->ipath_pageshadow = NULL; - } - - /* - * free any resources still in use (usually just kernel ports) - * at unload; we do for portcnt, not cfgports, because cfgports - * could have changed while we were loaded. - */ - for (port = 0; port < dd->ipath_portcnt; port++) { - struct ipath_portdata *pd = dd->ipath_pd[port]; - dd->ipath_pd[port] = NULL; - ipath_free_pddata(dd, pd); - } - kfree(dd->ipath_pd); - /* - * debuggability, in case some cleanup path tries to use it - * after this - */ - dd->ipath_pd = NULL; -} - static void __exit infinipath_cleanup(void) { - struct ipath_devdata *dd, *tmp; - unsigned long flags; - - ipath_diagpkt_remove(); - ipath_exit_ipathfs(); ipath_driver_remove_group(&ipath_driver.driver); - - spin_lock_irqsave(&ipath_devs_lock, flags); - - /* - * turn off rcv, send, and interrupts for all ports, all drivers - * should also hard reset the chip here? - * free up port 0 (kernel) rcvhdr, egr bufs, and eventually tid bufs - * for all versions of the driver, if they were allocated - */ - list_for_each_entry_safe(dd, tmp, &ipath_dev_list, ipath_list) { - spin_unlock_irqrestore(&ipath_devs_lock, flags); - - if (dd->verbs_dev) { - ipath_unregister_ib_device(dd->verbs_dev); - dd->verbs_dev = NULL; - } - - if (dd->ipath_kregbase) - cleanup_device(dd); - - if (dd->pcidev) { - if (dd->pcidev->irq) { - ipath_cdbg(VERBOSE, - "unit %u free_irq of irq %x\n", - dd->ipath_unit, dd->pcidev->irq); - free_irq(dd->pcidev->irq, dd); - } else - ipath_dbg("irq is 0, not doing free_irq " - "for unit %u\n", dd->ipath_unit); - - /* - * we check for NULL here, because it's outside - * the kregbase check, and we need to call it - * after the free_irq. Thus it's possible that - * the function pointers were never initialized. - */ - if (dd->ipath_f_cleanup) - /* clean up chip-specific stuff */ - dd->ipath_f_cleanup(dd); - - dd->pcidev = NULL; - } - spin_lock_irqsave(&ipath_devs_lock, flags); - } - - spin_unlock_irqrestore(&ipath_devs_lock, flags); ipath_cdbg(VERBOSE, "Unregistering pci driver\n"); pci_unregister_driver(&ipath_driver); From bos at pathscale.com Thu Sep 28 09:00:18 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:18 -0700 Subject: [openib-general] [PATCH 22 of 28] IB/ipath - fix and recover TXE piobuf and PBC parity errors In-Reply-To: Message-ID: <5aea5f31529d9b8ff214.1159459218@eng-12.pathscale.com> We can sometimes trigger parity errors due to processor speculative reads to our write-combined memory (mostly seen on Woodcrest). Add a stats counter for these. Factored out the sendbuffererror buffer cancellation code so it can be used in the new handling; suppress likely subsequent error messages if within two jiffies of the cancellation. Also restore 2 dropped TXE lines on hwe_bitsextant noticed while debugging. Signed-off-by: Bryan O'Sullivan diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_common.h --- a/drivers/infiniband/hw/ipath/ipath_common.h Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_common.h Thu Sep 28 08:57:13 2006 -0700 @@ -141,8 +141,9 @@ struct infinipath_stats { * packets if ipath not configured, etc.) */ __u64 sps_krdrops; + __u64 sps_txeparity; // PIO buffer parity error, recovered /* pad for future growth */ - __u64 __sps_pad[46]; + __u64 __sps_pad[45]; }; /* diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_iba6110.c --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c Thu Sep 28 08:57:13 2006 -0700 @@ -451,7 +451,10 @@ static void ipath_ht_handle_hwerrors(str * make sure we get this much out, unless told to be quiet, * or it's occurred within the last 5 seconds */ - if ((hwerrs & ~dd->ipath_lasthwerror) || + if ((hwerrs & ~(dd->ipath_lasthwerror | + ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT))) || (ipath_debug & __IPATH_VERBDBG)) dev_info(&dd->pcidev->dev, "Hardware error: hwerr=0x%llx " "(cleared)\n", (unsigned long long) hwerrs); @@ -464,6 +467,33 @@ static void ipath_ht_handle_hwerrors(str ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control); if (ctrl & INFINIPATH_C_FREEZEMODE) { + /* + * parity errors in send memory are recoverable, + * just cancel the send (if indicated in * sendbuffererror), + * count the occurrence, unfreeze (if no other handled + * hardware error bits are set), and continue. They can + * occur if a processor speculative read is done to the PIO + * buffer while we are sending a packet, for example. + */ + if (hwerrs & ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) { + ipath_stats.sps_txeparity++; + ipath_dbg("Recovering from TXE parity error (%llu), " + "hwerrstatus=%llx\n", + (unsigned long long) ipath_stats.sps_txeparity, + (unsigned long long) hwerrs); + ipath_disarm_senderrbufs(dd); + hwerrs &= ~((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT); + if (!hwerrs) { // else leave in freeze mode + ipath_write_kreg(dd, + dd->ipath_kregs->kr_control, + dd->ipath_control); + return; + } + } if (hwerrs) { /* * if any set that we aren't ignoring; only diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_iba6120.c --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:13 2006 -0700 @@ -370,7 +370,10 @@ static void ipath_pe_handle_hwerrors(str * make sure we get this much out, unless told to be quiet, * or it's occurred within the last 5 seconds */ - if ((hwerrs & ~dd->ipath_lasthwerror) || + if ((hwerrs & ~(dd->ipath_lasthwerror | + ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT))) || (ipath_debug & __IPATH_VERBDBG)) dev_info(&dd->pcidev->dev, "Hardware error: hwerr=0x%llx " "(cleared)\n", (unsigned long long) hwerrs); @@ -383,6 +386,33 @@ static void ipath_pe_handle_hwerrors(str ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control); if (ctrl & INFINIPATH_C_FREEZEMODE) { + /* + * parity errors in send memory are recoverable, + * just cancel the send (if indicated in * sendbuffererror), + * count the occurrence, unfreeze (if no other handled + * hardware error bits are set), and continue. They can + * occur if a processor speculative read is done to the PIO + * buffer while we are sending a packet, for example. + */ + if (hwerrs & ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) { + ipath_stats.sps_txeparity++; + ipath_dbg("Recovering from TXE parity error (%llu), " + "hwerrstatus=%llx\n", + (unsigned long long) ipath_stats.sps_txeparity, + (unsigned long long) hwerrs); + ipath_disarm_senderrbufs(dd); + hwerrs &= ~((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | + INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) + << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT); + if (!hwerrs) { // else leave in freeze mode + ipath_write_kreg(dd, + dd->ipath_kregs->kr_control, + dd->ipath_control); + return; + } + } if (hwerrs) { /* * if any set that we aren't ignoring only make the @@ -406,9 +436,8 @@ static void ipath_pe_handle_hwerrors(str } else { ipath_dbg("Clearing freezemode on ignored hardware " "error\n"); - ctrl &= ~INFINIPATH_C_FREEZEMODE; ipath_write_kreg(dd, dd->ipath_kregs->kr_control, - ctrl); + dd->ipath_control); } } @@ -880,6 +909,8 @@ static void ipath_init_pe_variables(stru dd->ipath_hwe_bitsextant = (INFINIPATH_HWE_RXEMEMPARITYERR_MASK << INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) | + (INFINIPATH_HWE_TXEMEMPARITYERR_MASK << + INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) | (INFINIPATH_HWE_PCIEMEMPARITYERR_MASK << INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT) | INFINIPATH_HWE_PCIE1PLLFAILED | diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:13 2006 -0700 @@ -37,6 +37,50 @@ #include "ipath_verbs.h" #include "ipath_common.h" +/* + * Called when we might have an error that is specific to a particular + * PIO buffer, and may need to cancel that buffer, so it can be re-used. + */ +void ipath_disarm_senderrbufs(struct ipath_devdata *dd) +{ + u32 piobcnt; + unsigned long sbuf[4]; + /* + * it's possible that sendbuffererror could have bits set; might + * have already done this as a result of hardware error handling + */ + piobcnt = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k; + /* read these before writing errorclear */ + sbuf[0] = ipath_read_kreg64( + dd, dd->ipath_kregs->kr_sendbuffererror); + sbuf[1] = ipath_read_kreg64( + dd, dd->ipath_kregs->kr_sendbuffererror + 1); + if (piobcnt > 128) { + sbuf[2] = ipath_read_kreg64( + dd, dd->ipath_kregs->kr_sendbuffererror + 2); + sbuf[3] = ipath_read_kreg64( + dd, dd->ipath_kregs->kr_sendbuffererror + 3); + } + + if (sbuf[0] || sbuf[1] || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) { + int i; + if (ipath_debug & (__IPATH_PKTDBG|__IPATH_DBG)) { + __IPATH_DBG_WHICH(__IPATH_PKTDBG|__IPATH_DBG, + "SendbufErrs %lx %lx", sbuf[0], + sbuf[1]); + if (ipath_debug & __IPATH_PKTDBG && piobcnt > 128) + printk(" %lx %lx ", sbuf[2], sbuf[3]); + printk("\n"); + } + + for (i = 0; i < piobcnt; i++) + if (test_bit(i, sbuf)) + ipath_disarm_piobufs(dd, i, 1); + dd->ipath_lastcancel = jiffies+3; // no armlaunch for a bit + } +} + + /* These are all rcv-related errors which we want to count for stats */ #define E_SUM_PKTERRS \ (INFINIPATH_E_RHDRLEN | INFINIPATH_E_RBADTID | \ @@ -68,53 +112,9 @@ static u64 handle_e_sum_errs(struct ipath_devdata *dd, ipath_err_t errs) { - unsigned long sbuf[4]; u64 ignore_this_time = 0; - u32 piobcnt; - - /* if possible that sendbuffererror could be valid */ - piobcnt = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k; - /* read these before writing errorclear */ - sbuf[0] = ipath_read_kreg64( - dd, dd->ipath_kregs->kr_sendbuffererror); - sbuf[1] = ipath_read_kreg64( - dd, dd->ipath_kregs->kr_sendbuffererror + 1); - if (piobcnt > 128) { - sbuf[2] = ipath_read_kreg64( - dd, dd->ipath_kregs->kr_sendbuffererror + 2); - sbuf[3] = ipath_read_kreg64( - dd, dd->ipath_kregs->kr_sendbuffererror + 3); - } - - if (sbuf[0] || sbuf[1] || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) { - int i; - - ipath_cdbg(PKT, "SendbufErrs %lx %lx ", sbuf[0], sbuf[1]); - if (ipath_debug & __IPATH_PKTDBG && piobcnt > 128) - printk("%lx %lx ", sbuf[2], sbuf[3]); - for (i = 0; i < piobcnt; i++) { - if (test_bit(i, sbuf)) { - u32 __iomem *piobuf; - if (i < dd->ipath_piobcnt2k) - piobuf = (u32 __iomem *) - (dd->ipath_pio2kbase + - i * dd->ipath_palign); - else - piobuf = (u32 __iomem *) - (dd->ipath_pio4kbase + - (i - dd->ipath_piobcnt2k) * - dd->ipath_4kalign); - - ipath_cdbg(PKT, - "PIObuf[%u] @%p pbc is %x; ", - i, piobuf, readl(piobuf)); - - ipath_disarm_piobufs(dd, i, 1); - } - } - if (ipath_debug & __IPATH_PKTDBG) - printk("\n"); - } + + ipath_disarm_senderrbufs(dd); if ((errs & E_SUM_LINK_PKTERRS) && !(dd->ipath_flags & IPATH_LINKACTIVE)) { /* @@ -554,6 +554,14 @@ static int handle_errors(struct ipath_de ~(INFINIPATH_E_HARDWARE | INFINIPATH_E_IBSTATUSCHANGED); } + + // likely due to cancel, so suppress + if ((errs & (INFINIPATH_E_SPKTLEN | INFINIPATH_E_SPIOARMLAUNCH)) && + dd->ipath_lastcancel > jiffies) { + ipath_dbg("Suppressed armlaunch/spktlen after error send cancel\n"); + errs &= ~(INFINIPATH_E_SPIOARMLAUNCH | INFINIPATH_E_SPKTLEN); + } + if (!errs) return 0; diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:13 2006 -0700 @@ -427,6 +427,9 @@ struct ipath_devdata { unsigned long ipath_rcvctrl; /* shadow kr_sendctrl */ unsigned long ipath_sendctrl; + /* ports waiting for PIOavail intr */ + unsigned long ipath_portpiowait; + unsigned long ipath_lastcancel; // to not count armlaunch after cancel /* value we put in kr_rcvhdrcnt */ u32 ipath_rcvhdrcnt; @@ -490,8 +493,6 @@ struct ipath_devdata { u32 ipath_htwidth; /* HT speed (200,400,800,1000) from HT config */ u32 ipath_htspeed; - /* ports waiting for PIOavail intr */ - unsigned long ipath_portpiowait; /* * number of sequential ibcstatus change for polling active/quiet * (i.e., link not coming up). @@ -585,6 +586,7 @@ void ipath_disable_wc(struct ipath_devda void ipath_disable_wc(struct ipath_devdata *dd); int ipath_count_units(int *npresentp, int *nupp, u32 *maxportsp); void ipath_shutdown_device(struct ipath_devdata *); +void ipath_disarm_senderrbufs(struct ipath_devdata *); struct file_operations; int ipath_cdev_init(int minor, char *name, struct file_operations *fops, From bos at pathscale.com Thu Sep 28 09:00:13 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:13 -0700 Subject: [openib-general] [PATCH 17 of 28] IB/ipath - improved support for powerpc In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Thu Sep 28 08:57:12 2006 -0700 @@ -755,8 +755,8 @@ static inline void *ipath_get_egrbuf(str static inline void *ipath_get_egrbuf(struct ipath_devdata *dd, u32 bufnum, int err) { - return dd->ipath_port0_skbs ? - (void *)dd->ipath_port0_skbs[bufnum]->data : NULL; + return dd->ipath_port0_skbinfo ? + (void *) dd->ipath_port0_skbinfo[bufnum].skb->data : NULL; } /** @@ -778,31 +778,34 @@ struct sk_buff *ipath_alloc_skb(struct i */ /* - * We need 4 extra bytes for unaligned transfer copying + * We need 2 extra bytes for ipath_ether data sent in the + * key header. In order to keep everything dword aligned, + * we'll reserve 4 bytes. */ + len = dd->ipath_ibmaxlen + 4; + if (dd->ipath_flags & IPATH_4BYTE_TID) { - /* we need a 4KB multiple alignment, and there is no way + /* We need a 2KB multiple alignment, and there is no way * to do it except to allocate extra and then skb_reserve * enough to bring it up to the right alignment. */ - len = dd->ipath_ibmaxlen + 4 + (1 << 11) - 1; - } - else - len = dd->ipath_ibmaxlen + 4; + len += 2047; + } + skb = __dev_alloc_skb(len, gfp_mask); if (!skb) { ipath_dev_err(dd, "Failed to allocate skbuff, length %u\n", len); goto bail; } + + skb_reserve(skb, 4); + if (dd->ipath_flags & IPATH_4BYTE_TID) { - u32 una = ((1 << 11) - 1) & (unsigned long)(skb->data + 4); + u32 una = (unsigned long)skb->data & 2047; if (una) - skb_reserve(skb, 4 + (1 << 11) - una); - else - skb_reserve(skb, 4); - } else - skb_reserve(skb, 4); + skb_reserve(skb, 2048 - una); + } bail: return skb; @@ -1345,8 +1348,9 @@ int ipath_create_rcvhdrq(struct ipath_de ipath_cdbg(VERBOSE, "reuse port %d rcvhdrq @%p %llx phys; " "hdrtailaddr@%p %llx physical\n", pd->port_port, pd->port_rcvhdrq, - pd->port_rcvhdrq_phys, pd->port_rcvhdrtail_kvaddr, - (unsigned long long)pd->port_rcvhdrqtailaddr_phys); + (unsigned long long) pd->port_rcvhdrq_phys, + pd->port_rcvhdrtail_kvaddr, (unsigned long long) + pd->port_rcvhdrqtailaddr_phys); /* clear for security and sanity on each use */ memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size); @@ -1827,17 +1831,22 @@ void ipath_free_pddata(struct ipath_devd kfree(pd->port_rcvegrbuf_phys); pd->port_rcvegrbuf_phys = NULL; pd->port_rcvegrbuf_chunks = 0; - } else if (pd->port_port == 0 && dd->ipath_port0_skbs) { + } else if (pd->port_port == 0 && dd->ipath_port0_skbinfo) { unsigned e; - struct sk_buff **skbs = dd->ipath_port0_skbs; - - dd->ipath_port0_skbs = NULL; - ipath_cdbg(VERBOSE, "free closed port %d ipath_port0_skbs " - "@ %p\n", pd->port_port, skbs); + struct ipath_skbinfo *skbinfo = dd->ipath_port0_skbinfo; + + dd->ipath_port0_skbinfo = NULL; + ipath_cdbg(VERBOSE, "free closed port %d " + "ipath_port0_skbinfo @ %p\n", pd->port_port, + skbinfo); for (e = 0; e < dd->ipath_rcvegrcnt; e++) - if (skbs[e]) - dev_kfree_skb(skbs[e]); - vfree(skbs); + if (skbinfo[e].skb) { + pci_unmap_single(dd->pcidev, skbinfo[e].phys, + dd->ipath_ibmaxlen, + PCI_DMA_FROMDEVICE); + dev_kfree_skb(skbinfo[e].skb); + } + vfree(skbinfo); } kfree(pd->port_tid_pg_list); vfree(pd->subport_uregbase); @@ -1934,7 +1943,7 @@ static void cleanup_device(struct ipath_ if (dd->ipath_pioavailregs_dma) { dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE, - dd->ipath_pioavailregs_dma, + (void *) dd->ipath_pioavailregs_dma, dd->ipath_pioavailregs_phys); dd->ipath_pioavailregs_dma = NULL; } @@ -1947,6 +1956,7 @@ static void cleanup_device(struct ipath_ if (dd->ipath_pageshadow) { struct page **tmpp = dd->ipath_pageshadow; + dma_addr_t *tmpd = dd->ipath_physshadow; int i, cnt = 0; ipath_cdbg(VERBOSE, "Unlocking any expTID pages still " @@ -1957,6 +1967,8 @@ static void cleanup_device(struct ipath_ for (i = port_tidbase; i < maxtid; i++) { if (!tmpp[i]) continue; + pci_unmap_page(dd->pcidev, tmpd[i], + PAGE_SIZE, PCI_DMA_FROMDEVICE); ipath_release_user_pages(&tmpp[i], 1); tmpp[i] = NULL; cnt++; diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Thu Sep 28 08:57:12 2006 -0700 @@ -364,11 +364,14 @@ static int ipath_tid_update(struct ipath "vaddr %lx\n", i, tid + tidoff, vaddr); /* we "know" system pages and TID pages are same size */ dd->ipath_pageshadow[porttid + tid] = pagep[i]; + dd->ipath_physshadow[porttid + tid] = ipath_map_page( + dd->pcidev, pagep[i], 0, PAGE_SIZE, + PCI_DMA_FROMDEVICE); /* * don't need atomic or it's overhead */ __set_bit(tid, tidmap); - physaddr = page_to_phys(pagep[i]); + physaddr = dd->ipath_physshadow[porttid + tid]; ipath_stats.sps_pagelocks++; ipath_cdbg(VERBOSE, "TID %u, vaddr %lx, physaddr %llx pgp %p\n", @@ -402,6 +405,9 @@ static int ipath_tid_update(struct ipath tid); dd->ipath_f_put_tid(dd, &tidbase[tid], 1, dd->ipath_tidinvalid); + pci_unmap_page(dd->pcidev, + dd->ipath_physshadow[porttid + tid], + PAGE_SIZE, PCI_DMA_FROMDEVICE); dd->ipath_pageshadow[porttid + tid] = NULL; ipath_stats.sps_pageunlocks++; } @@ -515,6 +521,9 @@ static int ipath_tid_free(struct ipath_p pd->port_pid, tid); dd->ipath_f_put_tid(dd, &tidbase[tid], 1, dd->ipath_tidinvalid); + pci_unmap_page(dd->pcidev, + dd->ipath_physshadow[porttid + tid], + PAGE_SIZE, PCI_DMA_FROMDEVICE); ipath_release_user_pages( &dd->ipath_pageshadow[porttid + tid], 1); dd->ipath_pageshadow[porttid + tid] = NULL; @@ -711,7 +720,7 @@ static int ipath_manage_rcvq(struct ipat * updated and correct itself, even in the face of software * bugs. */ - *pd->port_rcvhdrtail_kvaddr = 0; + *(volatile u64 *)pd->port_rcvhdrtail_kvaddr = 0; set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port, &dd->ipath_rcvctrl); } else @@ -923,11 +932,11 @@ bail: /* common code for the mappings on dma_alloc_coherent mem */ static int ipath_mmap_mem(struct vm_area_struct *vma, - struct ipath_portdata *pd, unsigned len, - int write_ok, dma_addr_t addr, char *what) + struct ipath_portdata *pd, unsigned len, int write_ok, + void *kvaddr, char *what) { struct ipath_devdata *dd = pd->port_dd; - unsigned pfn = (unsigned long)addr >> PAGE_SHIFT; + unsigned long pfn; int ret; if ((vma->vm_end - vma->vm_start) > len) { @@ -950,17 +959,17 @@ static int ipath_mmap_mem(struct vm_area vma->vm_flags &= ~VM_MAYWRITE; } + pfn = virt_to_phys(kvaddr) >> PAGE_SHIFT; ret = remap_pfn_range(vma, vma->vm_start, pfn, len, vma->vm_page_prot); if (ret) - dev_info(&dd->pcidev->dev, - "%s port%u mmap of %lx, %x bytes r%c failed: %d\n", - what, pd->port_port, (unsigned long)addr, len, - write_ok?'w':'o', ret); + dev_info(&dd->pcidev->dev, "%s port%u mmap of %lx, %x " + "bytes r%c failed: %d\n", what, pd->port_port, + pfn, len, write_ok?'w':'o', ret); else - ipath_cdbg(VERBOSE, "%s port%u mmaped %lx, %x bytes r%c\n", - what, pd->port_port, (unsigned long)addr, len, - write_ok?'w':'o'); + ipath_cdbg(VERBOSE, "%s port%u mmaped %lx, %x bytes " + "r%c\n", what, pd->port_port, pfn, len, + write_ok?'w':'o'); bail: return ret; } @@ -1049,7 +1058,7 @@ static int mmap_rcvegrbufs(struct vm_are struct ipath_devdata *dd = pd->port_dd; unsigned long start, size; size_t total_size, i; - dma_addr_t *phys; + unsigned long pfn; int ret; size = pd->port_rcvegrbuf_size; @@ -1073,11 +1082,11 @@ static int mmap_rcvegrbufs(struct vm_are vma->vm_flags &= ~VM_MAYWRITE; start = vma->vm_start; - phys = pd->port_rcvegrbuf_phys; for (i = 0; i < pd->port_rcvegrbuf_chunks; i++, start += size) { - ret = remap_pfn_range(vma, start, phys[i] >> PAGE_SHIFT, - size, vma->vm_page_prot); + pfn = virt_to_phys(pd->port_rcvegrbuf[i]) >> PAGE_SHIFT; + ret = remap_pfn_range(vma, start, pfn, size, + vma->vm_page_prot); if (ret < 0) goto bail; } @@ -1290,7 +1299,7 @@ static int ipath_mmap(struct file *fp, s else if (pgaddr == dd->ipath_pioavailregs_phys) /* in-memory copy of pioavail registers */ ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0, - dd->ipath_pioavailregs_phys, + (void *) dd->ipath_pioavailregs_dma, "pioavail registers"); else if (subport_fp(fp)) /* Subports don't mmap the physical receive buffers */ @@ -1304,12 +1313,12 @@ static int ipath_mmap(struct file *fp, s * from an i/o perspective. */ ret = ipath_mmap_mem(vma, pd, pd->port_rcvhdrq_size, 1, - pd->port_rcvhdrq_phys, + pd->port_rcvhdrq, "rcvhdrq"); else if (pgaddr == (u64) pd->port_rcvhdrqtailaddr_phys) /* in-memory copy of rcvhdrq tail register */ ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0, - pd->port_rcvhdrqtailaddr_phys, + pd->port_rcvhdrtail_kvaddr, "rcvhdrq tail"); else ret = -EINVAL; @@ -1802,7 +1811,7 @@ static int ipath_do_user_init(struct fil * We explictly set the in-memory copy to 0 beforehand, so we don't * have to wait to be sure the DMA update has happened. */ - *pd->port_rcvhdrtail_kvaddr = 0ULL; + *(volatile u64 *)pd->port_rcvhdrtail_kvaddr = 0ULL; set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port, &dd->ipath_rcvctrl); ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, @@ -1832,6 +1841,8 @@ static void unlock_expected_tids(struct if (!dd->ipath_pageshadow[i]) continue; + pci_unmap_page(dd->pcidev, dd->ipath_physshadow[i], + PAGE_SIZE, PCI_DMA_FROMDEVICE); ipath_release_user_pages_on_close(&dd->ipath_pageshadow[i], 1); dd->ipath_pageshadow[i] = NULL; @@ -1936,14 +1947,14 @@ static int ipath_close(struct inode *in, i = dd->ipath_pbufsport * (port - 1); ipath_disarm_piobufs(dd, i, dd->ipath_pbufsport); + dd->ipath_f_clear_tids(dd, pd->port_port); + if (dd->ipath_pageshadow) unlock_expected_tids(pd); ipath_stats.sps_ports--; ipath_cdbg(PROC, "%s[%u] closed port %u:%u\n", pd->port_comm, pd->port_pid, dd->ipath_unit, port); - - dd->ipath_f_clear_tids(dd, pd->port_port); } pd->port_pid = 0; diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_iba6120.c --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c Thu Sep 28 08:57:12 2006 -0700 @@ -1113,7 +1113,7 @@ static void ipath_pe_put_tid_2(struct ip if (pa != dd->ipath_tidinvalid) { if (pa & ((1U << 11) - 1)) { dev_info(&dd->pcidev->dev, "BUG: physaddr %lx " - "not 4KB aligned!\n", pa); + "not 2KB aligned!\n", pa); return; } pa >>= 11; diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_init_chip.c --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Thu Sep 28 08:57:12 2006 -0700 @@ -88,13 +88,13 @@ static int create_port0_egr(struct ipath static int create_port0_egr(struct ipath_devdata *dd) { unsigned e, egrcnt; - struct sk_buff **skbs; + struct ipath_skbinfo *skbinfo; int ret; egrcnt = dd->ipath_rcvegrcnt; - skbs = vmalloc(sizeof(*dd->ipath_port0_skbs) * egrcnt); - if (skbs == NULL) { + skbinfo = vmalloc(sizeof(*dd->ipath_port0_skbinfo) * egrcnt); + if (skbinfo == NULL) { ipath_dev_err(dd, "allocation error for eager TID " "skb array\n"); ret = -ENOMEM; @@ -109,13 +109,13 @@ static int create_port0_egr(struct ipath * 4 bytes so that the data buffer stays word aligned. * See ipath_kreceive() for more details. */ - skbs[e] = ipath_alloc_skb(dd, GFP_KERNEL); - if (!skbs[e]) { + skbinfo[e].skb = ipath_alloc_skb(dd, GFP_KERNEL); + if (!skbinfo[e].skb) { ipath_dev_err(dd, "SKB allocation error for " "eager TID %u\n", e); while (e != 0) - dev_kfree_skb(skbs[--e]); - vfree(skbs); + dev_kfree_skb(skbinfo[--e].skb); + vfree(skbinfo); ret = -ENOMEM; goto bail; } @@ -124,14 +124,17 @@ static int create_port0_egr(struct ipath * After loop above, so we can test non-NULL to see if ready * to use at receive, etc. */ - dd->ipath_port0_skbs = skbs; + dd->ipath_port0_skbinfo = skbinfo; for (e = 0; e < egrcnt; e++) { - unsigned long phys = - virt_to_phys(dd->ipath_port0_skbs[e]->data); + dd->ipath_port0_skbinfo[e].phys = + ipath_map_single(dd->pcidev, + dd->ipath_port0_skbinfo[e].skb->data, + dd->ipath_ibmaxlen, PCI_DMA_FROMDEVICE); dd->ipath_f_put_tid(dd, e + (u64 __iomem *) ((char __iomem *) dd->ipath_kregbase + - dd->ipath_rcvegrbase), 0, phys); + dd->ipath_rcvegrbase), 0, + dd->ipath_port0_skbinfo[e].phys); } ret = 0; @@ -432,16 +435,33 @@ done: */ static void init_shadow_tids(struct ipath_devdata *dd) { - dd->ipath_pageshadow = (struct page **) - vmalloc(dd->ipath_cfgports * dd->ipath_rcvtidcnt * + struct page **pages; + dma_addr_t *addrs; + + pages = vmalloc(dd->ipath_cfgports * dd->ipath_rcvtidcnt * sizeof(struct page *)); - if (!dd->ipath_pageshadow) + if (!pages) { ipath_dev_err(dd, "failed to allocate shadow page * " "array, no expected sends!\n"); - else - memset(dd->ipath_pageshadow, 0, - dd->ipath_cfgports * dd->ipath_rcvtidcnt * - sizeof(struct page *)); + dd->ipath_pageshadow = NULL; + return; + } + + addrs = vmalloc(dd->ipath_cfgports * dd->ipath_rcvtidcnt * + sizeof(dma_addr_t)); + if (!addrs) { + ipath_dev_err(dd, "failed to allocate shadow dma handle " + "array, no expected sends!\n"); + vfree(dd->ipath_pageshadow); + dd->ipath_pageshadow = NULL; + return; + } + + memset(pages, 0, dd->ipath_cfgports * dd->ipath_rcvtidcnt * + sizeof(struct page *)); + + dd->ipath_pageshadow = pages; + dd->ipath_physshadow = addrs; } static void enable_chip(struct ipath_devdata *dd, diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Thu Sep 28 08:57:12 2006 -0700 @@ -605,7 +605,7 @@ static int handle_errors(struct ipath_de * don't report same point multiple times, * except kernel */ - tl = (u32) * pd->port_rcvhdrtail_kvaddr; + tl = *(u64 *) pd->port_rcvhdrtail_kvaddr; if (tl == dd->ipath_lastrcvhdrqtails[i]) continue; hd = ipath_read_ureg32(dd, ur_rcvhdrhead, diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Thu Sep 28 08:57:12 2006 -0700 @@ -39,6 +39,8 @@ */ #include +#include +#include #include #include "ipath_common.h" @@ -62,7 +64,7 @@ struct ipath_portdata { /* rcvhdrq base, needs mmap before useful */ void *port_rcvhdrq; /* kernel virtual address where hdrqtail is updated */ - volatile __le64 *port_rcvhdrtail_kvaddr; + void *port_rcvhdrtail_kvaddr; /* * temp buffer for expected send setup, allocated at open, instead * of each setup call @@ -146,6 +148,11 @@ struct _ipath_layer { void *l_arg; }; +struct ipath_skbinfo { + struct sk_buff *skb; + dma_addr_t phys; +}; + struct ipath_devdata { struct list_head ipath_list; @@ -168,7 +175,7 @@ struct ipath_devdata { /* ipath_cfgports pointers */ struct ipath_portdata **ipath_pd; /* sk_buffs used by port 0 eager receive queue */ - struct sk_buff **ipath_port0_skbs; + struct ipath_skbinfo *ipath_port0_skbinfo; /* kvirt address of 1st 2k pio buffer */ void __iomem *ipath_pio2kbase; /* kvirt address of 1st 4k pio buffer */ @@ -335,6 +342,8 @@ struct ipath_devdata { u64 *ipath_tidsimshadow; /* shadow copy of struct page *'s for exp tid pages */ struct page **ipath_pageshadow; + /* shadow copy of dma handles for exp tid pages */ + dma_addr_t *ipath_physshadow; /* lock to workaround chip bug 9437 */ spinlock_t ipath_tid_lock; @@ -865,6 +874,13 @@ int ipathfs_remove_device(struct ipath_d int ipathfs_remove_device(struct ipath_devdata *); /* + * dma_addr wrappers - all 0's invalid for hw + */ +dma_addr_t ipath_map_page(struct pci_dev *, struct page *, unsigned long, + size_t, int); +dma_addr_t ipath_map_single(struct pci_dev *, void *, size_t, int); + +/* * Flush write combining store buffers (if present) and perform a write * barrier. */ diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_user_pages.c --- a/drivers/infiniband/hw/ipath/ipath_user_pages.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c Thu Sep 28 08:57:12 2006 -0700 @@ -90,6 +90,62 @@ bail: } /** + * ipath_map_page - a safety wrapper around pci_map_page() + * + * A dma_addr of all 0's is interpreted by the chip as "disabled". + * Unfortunately, it can also be a valid dma_addr returned on some + * architectures. + * + * The powerpc iommu assigns dma_addrs in ascending order, so we don't + * have to bother with retries or mapping a dummy page to insure we + * don't just get the same mapping again. + * + * I'm sure we won't be so lucky with other iommu's, so FIXME. + */ +dma_addr_t ipath_map_page(struct pci_dev *hwdev, struct page *page, + unsigned long offset, size_t size, int direction) +{ + dma_addr_t phys; + + phys = pci_map_page(hwdev, page, offset, size, direction); + + if (phys == 0) { + pci_unmap_page(hwdev, phys, size, direction); + phys = pci_map_page(hwdev, page, offset, size, direction); + /* + * FIXME: If we get 0 again, we should keep this page, + * map another, then free the 0 page. + */ + } + + return phys; +} + +/** + * ipath_map_single - a safety wrapper around pci_map_single() + * + * Same idea as ipath_map_page(). + */ +dma_addr_t ipath_map_single(struct pci_dev *hwdev, void *ptr, size_t size, + int direction) +{ + dma_addr_t phys; + + phys = pci_map_single(hwdev, ptr, size, direction); + + if (phys == 0) { + pci_unmap_single(hwdev, phys, size, direction); + phys = pci_map_single(hwdev, ptr, size, direction); + /* + * FIXME: If we get 0 again, we should keep this page, + * map another, then free the 0 page. + */ + } + + return phys; +} + +/** * ipath_get_user_pages - lock user pages into memory * @start_page: the start page * @num_pages: the number of pages diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_wc_ppc64.c --- a/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c Thu Sep 28 08:57:12 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c Thu Sep 28 08:57:12 2006 -0700 @@ -38,13 +38,23 @@ #include "ipath_kernel.h" /** - * ipath_unordered_wc - indicate whether write combining is ordered + * ipath_enable_wc - enable write combining for MMIO writes to the device + * @dd: infinipath device * - * PowerPC systems (at least those in the 970 processor family) - * write partially filled store buffers in address order, but will write - * completely filled store buffers in "random" order, and therefore must - * have serialization for correctness with current InfiniPath chips. + * Nothing to do on PowerPC, so just return without error. + */ +int ipath_enable_wc(struct ipath_devdata *dd) +{ + return 0; +} + +/** + * ipath_unordered_wc - indicate whether write combining is unordered * + * Because our performance depends on our ability to do write + * combining mmio writes in the most efficient way, we need to + * know if we are on a processor that may reorder stores when + * write combining. */ int ipath_unordered_wc(void) { From bos at pathscale.com Thu Sep 28 09:00:21 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:00:21 -0700 Subject: [openib-general] [PATCH 25 of 28] IB/ipath - Set CPU affinity early In-Reply-To: Message-ID: <4269068599c270538c2e.1159459221@eng-12.pathscale.com> This change moves around port assignment so that it happens before any memory is allocated. This allows memory to be allocated on an appropriate CPU, which improves performance for users of /dev/ipath. Signed-off-by: Bryan O'Sullivan diff -r 9fa624c592af -r 4269068599c2 drivers/infiniband/hw/ipath/ipath_common.h --- a/drivers/infiniband/hw/ipath/ipath_common.h Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_common.h Thu Sep 28 08:57:13 2006 -0700 @@ -412,15 +412,17 @@ struct ipath_user_info { #define IPATH_CMD_MIN 16 -#define IPATH_CMD_USER_INIT 16 /* set up userspace */ +#define __IPATH_CMD_USER_INIT 16 /* old set up userspace (for old user code) */ #define IPATH_CMD_PORT_INFO 17 /* find out what resources we got */ #define IPATH_CMD_RECV_CTRL 18 /* control receipt of packets */ #define IPATH_CMD_TID_UPDATE 19 /* update expected TID entries */ #define IPATH_CMD_TID_FREE 20 /* free expected TID entries */ #define IPATH_CMD_SET_PART_KEY 21 /* add partition key */ #define IPATH_CMD_SLAVE_INFO 22 /* return info on slave processes */ - -#define IPATH_CMD_MAX 22 +#define IPATH_CMD_ASSIGN_PORT 23 /* allocate HCA and port */ +#define IPATH_CMD_USER_INIT 24 /* set up userspace */ + +#define IPATH_CMD_MAX 24 struct ipath_port_info { __u32 num_active; /* number of active units */ diff -r 9fa624c592af -r 4269068599c2 drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Thu Sep 28 08:57:13 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Thu Sep 28 08:57:13 2006 -0700 @@ -1701,18 +1701,17 @@ done: static int ipath_open(struct inode *in, struct file *fp) { - /* The real work is performed later in ipath_do_user_init() */ + /* The real work is performed later in ipath_assign_port() */ fp->private_data = kzalloc(sizeof(struct ipath_filedata), GFP_KERNEL); return fp->private_data ? 0 : -ENOMEM; } -static int ipath_do_user_init(struct file *fp, + +// Get port early, so can set affinity prior to memory allocation +static int ipath_assign_port(struct file *fp, const struct ipath_user_info *uinfo) { int ret; - struct ipath_portdata *pd; - struct ipath_devdata *dd; - u32 head32; int i_minor; unsigned swminor; @@ -1757,8 +1756,18 @@ static int ipath_do_user_init(struct fil mutex_unlock(&ipath_mutex); - if (ret) - goto done; +done: + return ret; +} + + +static int ipath_do_user_init(struct file *fp, + const struct ipath_user_info *uinfo) +{ + int ret; + struct ipath_portdata *pd; + struct ipath_devdata *dd; + u32 head32; pd = port_fp(fp); dd = pd->port_dd; @@ -2035,6 +2044,8 @@ static ssize_t ipath_write(struct file * consumed = sizeof(cmd.type); switch (cmd.type) { + case IPATH_CMD_ASSIGN_PORT: + case __IPATH_CMD_USER_INIT: case IPATH_CMD_USER_INIT: copy = sizeof(cmd.cmd.user_info); dest = &cmd.cmd.user_info; @@ -2083,12 +2094,24 @@ static ssize_t ipath_write(struct file * consumed += copy; pd = port_fp(fp); - if (!pd && cmd.type != IPATH_CMD_USER_INIT) { + if (!pd && cmd.type != __IPATH_CMD_USER_INIT && + cmd.type != IPATH_CMD_ASSIGN_PORT) { ret = -EINVAL; goto bail; } switch (cmd.type) { + case IPATH_CMD_ASSIGN_PORT: + ret = ipath_assign_port(fp, &cmd.cmd.user_info); + if (ret) + goto bail; + break; + case __IPATH_CMD_USER_INIT: + // backwards compatibility, get port first + ret = ipath_assign_port(fp, &cmd.cmd.user_info); + if (ret) + goto bail; + // and fall through to current version. case IPATH_CMD_USER_INIT: ret = ipath_do_user_init(fp, &cmd.cmd.user_info); if (ret) From sean.hefty at intel.com Thu Sep 28 09:02:14 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 28 Sep 2006 09:02:14 -0700 Subject: [openib-general] RDMA CM callback status In-Reply-To: <20060928063133.GI23828@mellanox.co.il> Message-ID: <000601c6e317$76ca4c00$8698070a@amr.corp.intel.com> >Can you post a patch pls? This was the patch committed to svn. I'm creating a patch set for review for 2.6.19/2.6.20 to merge the svn code upstream. I will post those patches against the 2.6.19 code tree when they are ready. Signed-off-by: Sean Hefty Index: core/cma.c =================================================================== --- core/cma.c (revision 9652) +++ core/cma.c (revision 9653) @@ -1245,6 +1245,7 @@ work->old_state = CMA_ROUTE_QUERY; work->new_state = CMA_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_ERROR; + work->event.status = status; } queue_work(cma_wq, &work->work); From rdreier at cisco.com Thu Sep 28 09:11:11 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 09:11:11 -0700 Subject: [openib-general] Compile warnings (cross build) In-Reply-To: (James Lentini's message of "Thu, 28 Sep 2006 11:48:10 -0400 (EDT)") References: <20060926135114.1da96c1b@freekitty> <20060928062919.GH23828@mellanox.co.il> Message-ID: Michael> BTW, is there some printk format to print u64 type? James> Try "%Lu", That will print a long long unsigned value. That's the problem -- u64 is not always unsigned long long. For example on ppc64, u64 is just unsigned long. - R. From rdreier at cisco.com Thu Sep 28 09:11:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 09:11:47 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: <20060928151549.GG28790@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 28 Sep 2006 18:15:49 +0300") References: <20060928060817.GD23828@mellanox.co.il> <20060928151549.GG28790@mellanox.co.il> Message-ID: Michael> Looked pretty simple on the outset, but oh well. Keep us Michael> posted. I just work slowly. Anyway I don't think this is that urgent -- we've dumped enough stuff into 2.6.19, so I think this should wait for 2.6.20 at the earliest anyway. From mshefty at ichips.intel.com Thu Sep 28 09:09:58 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Sep 2006 09:09:58 -0700 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <451B6945.1050707@voltaire.com> References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> <451ABF0C.90607@ichips.intel.com> <451B6945.1050707@voltaire.com> Message-ID: <451BF3D6.7080403@ichips.intel.com> Or Gerlitz wrote: > My understanding is that without this patch the side that sends the DREQ > would do few DREQ resends as of the "firsts" DREPs being lost and no > DREPs sent once the id at the peer side left the timewait state, correct? This is correct. Note that the number of DREQ retries was changed to 15 now. > Can you please share what were the implications with intel MPI running a > 64 nodes (128 ranks?) job? was the issue here just making the ***job > termination time*** bigger? The job termination time was taking about a minute waiting for the DREQ to timeout. When running a series of tests, this becomes a fairly large issue. - Sean From jlentini at netapp.com Thu Sep 28 09:21:52 2006 From: jlentini at netapp.com (James Lentini) Date: Thu, 28 Sep 2006 12:21:52 -0400 (EDT) Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159455506.11976.1.camel@chalcedony.pathscale.com> References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> Message-ID: On Thu, 28 Sep 2006, Bryan O'Sullivan wrote: > On Thu, 2006-09-28 at 09:27 +0300, Michael S. Tsirkin wrote: > > > Me too. > > Roland and I (following his example) checked in changes to the mthca and > ipath drivers in SVN yesterday that add a #warning to a core driver > source file saying "don't look here, look over there!" That's a first > step towards dropping the drivers from SVN trunk altogether. As a user of the SVN repository, I'm confused about what this means going forward. Are you going to completely remove the mthca and ipath code from SVN or just stop updating the code that is there? Will the other components that are upstream (SRP, iSER, IPoIB, CM, RDMA CM, SA, MAD, CORE, ...) be removed? What rules are you using to determine if the SVN version will be kept up to date? To date, the process for using and testing new OFA features has been very simple. Users simply downloaded the lastest stable kernel release and replaced the drivers/infiniband directory with the sources out of SVN. This worked for the development of new components (e.g. eHCA, ipath, RDMA cm, iSER, SRP, etc.). In the future, how will users work with new features that are not yet upstream? From shemminger at osdl.org Thu Sep 28 08:39:02 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Thu, 28 Sep 2006 08:39:02 -0700 Subject: [openib-general] Compile warnings (cross build) In-Reply-To: References: <20060926135114.1da96c1b@freekitty> <20060928062919.GH23828@mellanox.co.il> Message-ID: <20060928083902.62850820@freekitty> On Thu, 28 Sep 2006 09:11:11 -0700 Roland Dreier wrote: > Michael> BTW, is there some printk format to print u64 type? > > James> Try "%Lu", That will print a long long unsigned value. > > That's the problem -- u64 is not always unsigned long long. For > example on ppc64, u64 is just unsigned long. > > - R. The only safe way is to cast u64 to long long unsigned. and then use either %Lu or %llu as format string. It means that on 64bit platforms the u64 will end up getting extended, but the it's harmless. -- Stephen Hemminger From bos at pathscale.com Thu Sep 28 09:31:33 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 09:31:33 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> Message-ID: <1159461093.5010.8.camel@chalcedony.pathscale.com> On Thu, 2006-09-28 at 12:21 -0400, James Lentini wrote: > As a user of the SVN repository, I'm confused about what this means > going forward. > > Are you going to completely remove the mthca and ipath code from SVN > or just stop updating the code that is there? I will let Roland speak for the mthca driver, but we have stopped maintaining the ipath driver in the SVN tree, and I expect that we will remove it entirely in perhaps a month or so. > Will the other components that are upstream (SRP, iSER, IPoIB, CM, > RDMA CM, SA, MAD, CORE, ...) be removed? What rules are you using to > determine if the SVN version will be kept up to date? I have no stake in what happens to those components, but I would not personally mind if they moved into Roland's git tree. I don't care for git, but I vastly prefer using it to waiting for SVN. > In the future, how will users work > with new features that are not yet upstream? One possibility would be to pull the same components out of a branch of a git tree; same procedure, different source. James wrote, >As a user of the SVN repository, I'm confused about what this means >going forward. >Are you going to completely remove the mthca and ipath code from SVN >or just stop updating the code that is there? I have said this before, but I will repeat myself once again. I really do not care where the latest code is, but there needs to be ONE place where we can get all the latest code for development and testing. Right now there are three branches, SVN which has some of Sean's latest changes, Rolands git tree, and the OFED git tree. All three of these have slightly different code bases and thus there is no one "latest" code base anymore and that is really confusing for people trying to use and test with the latest code to make sure their components work properly. It also multiplies the testing efforts, do we test with the SVN version, the OFED version, Roland's version. As an ISV of a 3rd party ULP (Intel MPI) this is making my life much more difficult than it should be. Please get your act together and lets get back to ONE database for the trunk code. I can live with having a branch for OFED releases as I see the need to branch and stabilize periodically for releases, but having 2 different development trees (Rolands git tree and SVN) for development is not working very well. my 2 cents. woody From mlleinin at hpcn.ca.sandia.gov Thu Sep 28 10:19:59 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Thu, 28 Sep 2006 10:19:59 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159461093.5010.8.camel@chalcedony.pathscale.com> References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> Message-ID: <1159463999.15009.207.camel@localhost> If we move forward with a git repository then we should move all kernel code into git. I don't want to get into a situation where kernel components are spread out over various repositories and servers. I'm all for making your development lives easier. The entire development tree has gotten very confusing over the past few months. The ipath driver is never up to date (therefore it's always broken). Iwarp is upstream but not in the main line development tree. If a simpler process can fix this then I'm all for it. So what it your proposal (Roland and Bryan)? Do you want to move all kernel development into Roland's git tree, and have the user space code stay in svn (at least for the time being)? This would allow OFED releases to be pulled direct from Roland's git tree (kernel) and the openfabrics svn (user space). BTW if it is useful we can set up a git repository on openfabrics once we move the server to its new provider. Thanks, - Matt On Thu, 2006-09-28 at 09:31 -0700, Bryan O'Sullivan wrote: > On Thu, 2006-09-28 at 12:21 -0400, James Lentini wrote: > > > As a user of the SVN repository, I'm confused about what this means > > going forward. > > > > Are you going to completely remove the mthca and ipath code from SVN > > or just stop updating the code that is there? > > I will let Roland speak for the mthca driver, but we have stopped > maintaining the ipath driver in the SVN tree, and I expect that we will > remove it entirely in perhaps a month or so. > > > Will the other components that are upstream (SRP, iSER, IPoIB, CM, > > RDMA CM, SA, MAD, CORE, ...) be removed? What rules are you using to > > determine if the SVN version will be kept up to date? > > I have no stake in what happens to those components, but I would not > personally mind if they moved into Roland's git tree. I don't care for > git, but I vastly prefer using it to waiting for SVN. > > > In the future, how will users work > > with new features that are not yet upstream? > > One possibility would be to pull the same components out of a branch of > a git tree; same procedure, different source. > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Thu Sep 28 10:33:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 10:33:17 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159463999.15009.207.camel@localhost> (Matt Leininger's message of "Thu, 28 Sep 2006 10:19:59 -0700") References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> Message-ID: Matt> So what it your proposal (Roland and Bryan)? Do you want Matt> to move all kernel development into Roland's git tree, and Matt> have the user space code stay in svn (at least for the time Matt> being)? My proposal would be to leave userspace in svn, and make Linus's git tree the definitive source for Linux kernel code. My git tree may be useful for people who want to try things that haven't been merged upstream yet, but other developers of Linux kernel code may want to host their work too (either as a git tree, a patch set, or however else they want). This would match existing practice for other subsystems pretty closely. - R. From rdreier at cisco.com Thu Sep 28 10:31:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 10:31:07 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: (Robert J. Woodruff's message of "Thu, 28 Sep 2006 09:58:06 -0700") References: Message-ID: > I have said this before, but I will repeat myself once again. > I really do not care where the latest code is, but there needs > to be ONE place where we can get all the latest code for development > and testing. I'll repeat my usual response: the notion of a single "latest" tree doesn't match reality, and any attempt to coerce things into that mold just causes problems. There's not necessarily any correlation between the newest ipath code and Sean's RDMA CM. git (or any other true distributed SCM system) makes this easier to handle: you can easily merge the branches you're interested in trying into your local tree. - R. From bos at pathscale.com Thu Sep 28 10:43:20 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 10:43:20 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: Message-ID: <1159465400.5010.49.camel@chalcedony.pathscale.com> On Thu, 2006-09-28 at 09:58 -0700, Woodruff, Robert J wrote: > I have said this before, but I will repeat myself once again. > I really do not care where the latest code is, but there needs > to be ONE place where we can get all the latest code for development > and testing. Right now there are three branches, SVN which has some > of Sean's latest changes, Rolands git tree, and the OFED git tree. If you want to focus on one thing to test, use Linus's current git tree or a release candidate tarball from it. That way, all of the extraneous cruft that sits in SVN doesn't matter until someone actually submits it, and everyone has a shared understanding of what bits to bang on. The OFED tree gets built from what's in Linus's tree, so if something gets fixed in Linus's tree, the fix will percolate into OFED. This might lose you the ability to look in a single place to test stuff like SDP that's not yet upstream, but as far as I'm concerned, that's *good*. At least you'll then know for sure "I'm testing something that is different from what other people are working with". References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> Message-ID: <1159465954.15009.223.camel@localhost> On Thu, 2006-09-28 at 10:33 -0700, Roland Dreier wrote: > Matt> So what it your proposal (Roland and Bryan)? Do you want > Matt> to move all kernel development into Roland's git tree, and > Matt> have the user space code stay in svn (at least for the time > Matt> being)? > > My proposal would be to leave userspace in svn, and make Linus's git > tree the definitive source for Linux kernel code. My git tree may be > useful for people who want to try things that haven't been merged > upstream yet, but other developers of Linux kernel code may want to > host their work too (either as a git tree, a patch set, or however > else they want). This would match existing practice for other > subsystems pretty closely. > That sounds reasonable to me. I'd add one more thing. To make the OFED release process go more smoothly I'd like to see the maintainers for each stack component spin out releases from time to time. Roland has been doing this with libmthca and libibverbs. If we had the development releases for other kernel and all user space components then OFED could simple combine the latest development releases and start more through testing. Thoughts? - Matt From robert.j.woodruff at intel.com Thu Sep 28 10:53:45 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 28 Sep 2006 10:53:45 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. Message-ID: Bryan wrote, >If you want to focus on one thing to test, use Linus's current git tree >or a release candidate tarball from it. That way, all of the extraneous >cruft that sits in SVN doesn't matter until someone actually submits it, >and everyone has a shared understanding of what bits to bang on. The >OFED tree gets built from what's in Linus's tree, so if something gets >fixed in Linus's tree, the fix will percolate into OFED. The problem I have is that I need to test new code that is not yet ready to be merged upstream and thus is not yet in Linus's tree. We have deliverables to the National labs as part of the Pathforward work and we need a tree where we can put the code, they can pull it and test it and provide feedback prior to us submitting it upstream. In the past, all of the latest code has been put into SVN which allowed us and our customers the ability to try it out and provide feedback so we know it is what they want/need before it is submitted to Linus via Roland. Perhaps we need our own Pathforward tree for this, but we would rather not have to maintain a separate tree for this work and would prefer the model that we used over the last couple of years, where there was one main trunk development branch. woody From rdreier at cisco.com Thu Sep 28 10:56:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 10:56:18 -0700 Subject: [openib-general] [PATCH 0/3] IB/iser: bug fixes for 2.6.19 rc1 In-Reply-To: (Erez Zilber's message of "Wed, 27 Sep 2006 15:21:35 +0300 (IDT)") References: Message-ID: Thanks, applied although I had to fix up patch 3/3 by hand, since it did not apply to my tree I merge > 100 patches every kernel release. If I have to spend an extra 5 minutes for each one fixing a patch or pulling it out of svn, then I end up burning an extra 9 hours of stupid work. If 20+ people who contribute patches sent me clean patches, then everyone will be happier because I'll be able to merge things quicker and focus on productive work. From rdreier at cisco.com Thu Sep 28 10:59:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 10:59:12 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159465954.15009.223.camel@localhost> (Matt Leininger's message of "Thu, 28 Sep 2006 10:52:34 -0700") References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> Message-ID: Matt> I'd add one more thing. To make the OFED release process Matt> go more smoothly I'd like to see the maintainers for each Matt> stack component spin out releases from time to time. Roland Matt> has been doing this with libmthca and libibverbs. If we had Matt> the development releases for other kernel and all user space Matt> components then OFED could simple combine the latest Matt> development releases and start more through testing. Yes, I strongly support that, although the OFED benefits are just a side effect to me. The real reason to have these releases is to support distributions other than OFED -- for example having tarball releases of all the components makes it possible to get this stuff further upstream into real Linux distros. eg I have gotten libibverbs/libmthca into Fedora Extras and Debian/Ubuntu, so users of those distros can install them natively using standard distro tools. - R. From mshefty at ichips.intel.com Thu Sep 28 10:59:41 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Sep 2006 10:59:41 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159465954.15009.223.camel@localhost> References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> Message-ID: <451C0D8D.5030601@ichips.intel.com> Matt Leininger wrote: > I'd add one more thing. To make the OFED release process go more > smoothly I'd like to see the maintainers for each stack component spin > out releases from time to time. Roland has been doing this with > libmthca and libibverbs. If we had the development releases for other > kernel and all user space components then OFED could simple combine the > latest development releases and start more through testing. I agree, but to clarify: The rdma_cm does not have kernel support for userspace upstream yet. A release of librdmacm before that seems premature. Likewise, the libibcm is, for practical purposes, useless without a userspace SA solution. - Sean From halr at voltaire.com Thu Sep 28 10:59:06 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2006 13:59:06 -0400 Subject: [openib-general] [PATCH TRIVIAL] opensm: libibumad: show open()'s errno string. In-Reply-To: <20060926205130.GB23096@sashak.voltaire.com> References: <20060926205130.GB23096@sashak.voltaire.com> Message-ID: <1159466342.4353.317825.camel@hal.voltaire.com> On Tue, 2006-09-26 at 16:51, Sasha Khapyorsky wrote: > Show errno string then open() fails. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From bos at pathscale.com Thu Sep 28 11:00:41 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 11:00:41 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: Message-ID: <1159466441.5010.58.camel@chalcedony.pathscale.com> On Thu, 2006-09-28 at 10:53 -0700, Woodruff, Robert J wrote: > Perhaps we need our own Pathforward tree for this, but we would > rather not have to maintain a separate tree for this work > and would prefer the model that we used over the last couple of > years, where there was one main trunk development branch. I understand your desire to have a single tree, but it's just not feasible. For Pathforward, you have presumably got a bunch of features to deal with that as a non-Pathforward participant I don't want to be troubled by as I try to assure that the ipath driver is in reasonable shape. And really, I think that if you give it a try, you will not find maintaining a git or whatever tree to be much work; in fact, it's vastly easier in my experience than having a rat's nest of unrelated things all artificially crammed into a single branch. Bryan wrote, >I understand your desire to have a single tree, but it's just not >feasible. For Pathforward, you have presumably got a bunch of features >to deal with that as a non-Pathforward participant I don't want to be >troubled by as I try to assure that the ipath driver is in reasonable >shape. Given that people like the Labs are the customers that buy your hardware, you should be concerned that your driver works with the features that they want, even if you are not a pathforward participant. If not, and your driver does not work, they will just buy someone else's hardware that does work with those features. So it is really in your best interest to have a working driver in the same development tree that is being used for the Pathforward development, right now that is SVN. woody From parks at lanl.gov Thu Sep 28 11:16:14 2006 From: parks at lanl.gov (Parks Fields) Date: Thu, 28 Sep 2006 12:16:14 -0600 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159463999.15009.207.camel@localhost> References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> Message-ID: <7.0.1.0.2.20060928121458.025b0310@lanl.gov> At 11:19 AM 9/28/2006, Matt Leininger wrote: > If we move forward with a git repository then we should move all >kernel code into git. I don't want to get into a situation where kernel >components are spread out over various repositories and servers. I'm >all for making your development lives easier. The entire development >tree has gotten very confusing over the past few months. The ipath >driver is never up to date (therefore it's always broken). Iwarp is >upstream but not in the main line development tree. If a simpler >process can fix this then I'm all for it. I agree. We need to make this useable by the major of the people. I know you can't please all the people all the time. ***** Correspondence ***** This email contains no programmatic content that requires independent ADC review From rdreier at cisco.com Thu Sep 28 11:14:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 11:14:28 -0700 Subject: [openib-general] [PATCH 24 of 28] IB/mthca - Fix compiler warnings with gcc4 on possible unitialized variables In-Reply-To: <9fa624c592af68f7a851.1159459220@eng-12.pathscale.com> ( Bryan O'Sullivan's message of "Thu, 28 Sep 2006 09:00:20 -0700") References: <9fa624c592af68f7a851.1159459220@eng-12.pathscale.com> Message-ID: NAK -- I don't want to generate worse code to fix a compiler warning false positive. - R. From rdreier at cisco.com Thu Sep 28 11:15:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 11:15:03 -0700 Subject: [openib-general] [PATCH 4 of 28] IB/ipath - support revision 2 InfiniPath PCIE devices In-Reply-To: ( Bryan O'Sullivan's message of "Thu, 28 Sep 2006 09:00:00 -0700") References: Message-ID: > + /* > + /* Use GPIO interrupts for new counters */ trailing whitespace... From rdreier at cisco.com Thu Sep 28 11:15:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 11:15:45 -0700 Subject: [openib-general] [PATCH 25 of 28] IB/ipath - Set CPU affinity early In-Reply-To: <4269068599c270538c2e.1159459221@eng-12.pathscale.com> ( Bryan O'Sullivan's message of "Thu, 28 Sep 2006 09:00:21 -0700") References: <4269068599c270538c2e.1159459221@eng-12.pathscale.com> Message-ID: > +// Get port early, so can set affinity prior to memory allocation C++ style comments are frowned on in the kernel. I fixed all the new ones up to "/* */" style when applying the patches. From rdreier at cisco.com Thu Sep 28 11:16:57 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 11:16:57 -0700 Subject: [openib-general] [PATCH 1 of 28] IB/ipath - limit # of packets sent without an ACK received In-Reply-To: ( Bryan O'Sullivan's message of "Thu, 28 Sep 2006 08:59:57 -0700") References: Message-ID: I applied all except #24 with minor comments as sent separately. - R. From rdreier at cisco.com Thu Sep 28 11:20:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 11:20:16 -0700 Subject: [openib-general] [GIT PULL] Please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will merge: - ipath updates - iSER updates - a few of amso1100 Coverity fixes and warning cleanups Bryan O'Sullivan: IB/ipath: Limit # of packets sent without an ACK received IB/ipath: Fix memory leak if allocation fails IB/ipath: Driver support for userspace sharing of HW contexts IB/ipath: Support revision 2 InfiniPath PCIE devices IB/ipath: Unregister from IB core early IB/ipath: Clean up handling of GUID 0 IB/ipath: Lock and count allocated CQs properly IB/ipath: Count SRQs properly IB/ipath: Only allow complete writes to flash IB/ipath: RC and UC should validate SLID and DLID IB/ipath: Ensure that PD of MR matches PD of QP checking the Rkey IB/ipath: Print more informative parity error messages IB/ipath: Fix compiler warnings and errors on non-x86_64 systems IB/ipath: Fix mismatch in shifts and masks for printing debug info IB/ipath: Support multiple simultaneous devices of different types IB/ipath: Drop unnecessary "(void *)" casts IB/ipath: Improved support for PowerPC IB/ipath: Flush RWQEs if access error or invalid error seen IB/ipath: Call mtrr_del with correct arguments IB/ipath: Clean up module exit code IB/ipath: Change HT CRC message to indicate how to resolve problem IB/ipath: Fix and recover TXE piobuf and PBC parity errors IB/ipath: Fix EEPROM read when driver is compiled with -Os IB/ipath: Set CPU affinity early IB/ipath: Support new PCIE device, QLE7142 IB/ipath: Fix races with ib_resize_cq() IB/ipath: Fix lockdep error upon "ifconfig ibN down" Erez Zilber: IB/iser: Have iSER data transaction object point to iSER conn IB/iser: DMA unmap unaligned for RDMA data before touching it IB/iser: Fix the description of iSER in Kconfig Eric Sesterhenn: RDMA/amso1100: Fix error path in c2_llp_accept() Roland Dreier: RDMA/amso1100: Fix compile warnings RDMA/amso1100: Fix memory leak in c2_reg_phys_mr() drivers/infiniband/hw/amso1100/c2_ae.c | 2 drivers/infiniband/hw/amso1100/c2_alloc.c | 2 drivers/infiniband/hw/amso1100/c2_cm.c | 15 drivers/infiniband/hw/amso1100/c2_provider.c | 8 drivers/infiniband/hw/amso1100/c2_rnic.c | 4 drivers/infiniband/hw/ipath/ipath_common.h | 54 + drivers/infiniband/hw/ipath/ipath_cq.c | 48 + drivers/infiniband/hw/ipath/ipath_driver.c | 359 ++++----- drivers/infiniband/hw/ipath/ipath_eeprom.c | 17 drivers/infiniband/hw/ipath/ipath_file_ops.c | 974 ++++++++++++++++++------ drivers/infiniband/hw/ipath/ipath_fs.c | 9 drivers/infiniband/hw/ipath/ipath_iba6110.c | 132 ++- drivers/infiniband/hw/ipath/ipath_iba6120.c | 263 ++++-- drivers/infiniband/hw/ipath/ipath_init_chip.c | 56 + drivers/infiniband/hw/ipath/ipath_intr.c | 280 +++++-- drivers/infiniband/hw/ipath/ipath_kernel.h | 116 +++ drivers/infiniband/hw/ipath/ipath_keys.c | 12 drivers/infiniband/hw/ipath/ipath_mad.c | 16 drivers/infiniband/hw/ipath/ipath_mr.c | 3 drivers/infiniband/hw/ipath/ipath_qp.c | 16 drivers/infiniband/hw/ipath/ipath_rc.c | 77 +- drivers/infiniband/hw/ipath/ipath_registers.h | 40 + drivers/infiniband/hw/ipath/ipath_ruc.c | 14 drivers/infiniband/hw/ipath/ipath_srq.c | 23 - drivers/infiniband/hw/ipath/ipath_sysfs.c | 21 - drivers/infiniband/hw/ipath/ipath_uc.c | 6 drivers/infiniband/hw/ipath/ipath_ud.c | 6 drivers/infiniband/hw/ipath/ipath_user_pages.c | 56 + drivers/infiniband/hw/ipath/ipath_verbs.c | 43 + drivers/infiniband/hw/ipath/ipath_verbs.h | 18 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c | 20 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 13 drivers/infiniband/ulp/iser/Kconfig | 13 drivers/infiniband/ulp/iser/iscsi_iser.c | 2 drivers/infiniband/ulp/iser/iscsi_iser.h | 9 drivers/infiniband/ulp/iser/iser_initiator.c | 60 - drivers/infiniband/ulp/iser/iser_memory.c | 42 + drivers/infiniband/ulp/iser/iser_verbs.c | 8 38 files changed, 1973 insertions(+), 884 deletions(-) From halr at voltaire.com Thu Sep 28 11:26:52 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2006 14:26:52 -0400 Subject: [openib-general] [PATCH 3/3] IB/iser: fix the description of iSER in Kconfig In-Reply-To: References: Message-ID: <1159468012.4353.318794.camel@hal.voltaire.com> On Wed, 2006-09-27 at 09:48, Erez Zilber wrote: > fix the description of iSER in Kconfig. It is not accurate. > > Signed-off-by: Erez Zilber > > --- > > drivers/infiniband/ulp/iser/Kconfig | 11 ++++++----- > 1 files changed, 6 insertions(+), 5 deletions(-) > > e6a8887cad4e2270c5173451e8b706b907b88133 > diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig > index fead87d..80f6716 100644 > --- a/drivers/infiniband/ulp/iser/Kconfig > +++ b/drivers/infiniband/ulp/iser/Kconfig > @@ -1,11 +1,12 @@ > config INFINIBAND_ISER > - tristate "ISCSI RDMA Protocol" > + tristate "iSCSI Extensions for RDMA (iSER)" > depends on INFINIBAND && SCSI > select SCSI_ISCSI_ATTRS > ---help--- > - Support for the ISCSI RDMA Protocol over InfiniBand. This > - allows you to access storage devices that speak ISER/ISCSI > + Support for the iSCSI Extensions for RDMA (iSER) Protocol over InfiniBand. This > + allows you to access storage devices that speak iSCSI over iSER > over InfiniBand. > > - The ISER protocol is defined by IETF. > - See . > + The iSER protocol is defined by IETF. > + See > + and This spec is now officially released from IBTA as an annex and the URL is different. -- Hal From bos at pathscale.com Thu Sep 28 11:33:52 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 28 Sep 2006 11:33:52 -0700 Subject: [openib-general] [PATCH 1 of 28] IB/ipath - limit # of packets sent without an ACK received In-Reply-To: References: Message-ID: <1159468432.5010.60.camel@chalcedony.pathscale.com> On Thu, 2006-09-28 at 11:16 -0700, Roland Dreier wrote: > I applied all except #24 with minor comments as sent separately. Thanks! References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> Message-ID: <1159471350.15009.237.camel@localhost> On Thu, 2006-09-28 at 10:59 -0700, Roland Dreier wrote: > Matt> I'd add one more thing. To make the OFED release process > Matt> go more smoothly I'd like to see the maintainers for each > Matt> stack component spin out releases from time to time. Roland > Matt> has been doing this with libmthca and libibverbs. If we had > Matt> the development releases for other kernel and all user space > Matt> components then OFED could simple combine the latest > Matt> development releases and start more through testing. > > Yes, I strongly support that, although the OFED benefits are just a > side effect to me. The real reason to have these releases is to > support distributions other than OFED -- for example having tarball > releases of all the components makes it possible to get this stuff > further upstream into real Linux distros. > RedHat and SuSE have stated several times that they want an OFED like process that takes the OF code and runs it through a rigorous suite of regression and performance tests. The purpose of OFED is to get into the commercially supported distros (e.g RHEL and SLES). That is what the majority of end customers want/need. That said spinning out "pre-OFED" releases of each component would help to get the code into the other distros (FC, Debian, Ubuntu, Gentoo, etc.) which, of course, is a very good thing to do. Thanks, - Matt From halr at voltaire.com Thu Sep 28 12:20:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2006 15:20:49 -0400 Subject: [openib-general] [PATCH] osm_vendor_mlx_sa.c - missing status on timeout SA query In-Reply-To: <868xk4zjro.fsf@mtl066.yok.mtl.com> References: <868xk4zjro.fsf@mtl066.yok.mtl.com> Message-ID: <1159471248.4353.320594.camel@hal.voltaire.com> On Thu, 2006-09-28 at 00:40, Eitan Zahavi wrote: > Hi Hal > > Similar to the bug discovered by Yevgeny on the osm_vendor_ibumad_sa.c > the very same bug happens on osm_vendor_mlx_sa.c which fails osmtest. > The issue is that the status of the result of the query is not returned > as the result of the SA query. > > Eitan > > Signed-off-by: Eitan Zahavi Thanks. Applied to trunk only. -- Hal From thlin at us.ibm.com Thu Sep 28 12:43:15 2006 From: thlin at us.ibm.com (Tseng-Hui (Frank) Lin) Date: Thu, 28 Sep 2006 14:43:15 -0500 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <1159472595.21249.79.camel@flin.austin.ibm.com> The ppc64 problem is actually in pci_64.c. Here is the patch: ============ cut here ============= diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index 4c4449b..490403c 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -734,9 +734,7 @@ static struct resource *__pci_mmap_make_ if (hose == 0) return NULL; /* should never happen */ - /* If memory, add on the PCI bridge address offset */ if (mmap_state == pci_mmap_mem) { - *offset += hose->pci_mem_offset; res_bit = IORESOURCE_MEM; } else { io_offset = (unsigned long)hose->io_base_virt - pci_io_base; ============= end cut ============= The mmap() system call on resource0 does not work on ppc64 without this patch. PowerMAC G5 got away with this because its hose->pci_mem_offset was set to 0. The fix is made on 8/21. It may be able to make it into 2.6.19. But it certainly won't get into SLES10, SLES9-SP3, or REHL4-U4 which have already been released. To cover both cases with and without the fix, my patch try to mmap /sys/bus/pci/..../resource0 first. It it failed it tries mmap /proc/bus/pci/.... If it failed again, we have no choice but fall back to use PCI config space. On Thu, 2006-09-28 at 16:59 +0300, Moshe Kazir wrote: > Michael, > > Frank found the cause to the problem in the implementation of > arch/ppc/kernel/pci.c , > and asked the IBM kernel group to send a bug fix to the Linux kernel > group. > > The problem is : > > 1. This bug fix will not enter SLES10 as it is closed. > 2. It also will not enter SLES9 :-) or Redhate as4 u4 . > > So we need a bug fix that will enable the use of mstflint on js21 PPC64 > + backport to old systems . > > Franks fix is based on two points (if I understand the code with no > errors) - > > 1. It opens /proc/bus/pci... And not /sys/bus/pci/... > 2. It perform an ictl(fd, PCIIOC_MMAP_IS_MEM) ; > > Frank - am I write ? > > Can we enter these two small changes to the mstflint to have it working > on the PPC64 js21 ? > > Moshe > > > > > > ____________________________________________________________ > Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) > > Voltaire - The Grid Backbone > > www.voltaire.com > > > > > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Thursday, September 28, 2006 4:41 PM > To: Moshe Kazir > Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org; > openib-general at openib.org > Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not > loaded on AMD > > > Quoting r. Moshe Kazir : > > > > Quoting r. Moshe Kazir : > > > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is > > > not > > > loaded on AMD > > > > > > > > > # ls /sys/class/infiniband/mthca0/device/resource0 > > > /sys/class/infiniband/mthca0/device/resource0 > > > > OK, so can you try this please: > > > > strace -f -v -o log mstflint -d > > /sys/class/infiniband/mthca0/device/resource0 q > > > > cat log > > > > -- > > MST > > > > > 30463 open("/sys/class/infiniband/mthca0/device/resource0", > O_RDWR|O_SYNC|O_LARGEFILE) = 3 > > 30463 mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = > -1 EINVAL (Invalid argument) > > So we see that mmap is failing with EINVAL. > But why? We seem to be passing all valid parameters to it. > > I'm looking at arch/ppc/kernel/pci.c at the moment. > It seems that EINVAL is returned if __pci_mmap_make_offset > fails, and that seems to be only looking for a valid resource size. > > Are you up to finding the root cause of the problem in > arch/ppc/kernel/pci.c? > > Maybe the resource offsets are wrong? What does > cat /sys/class/infiniband/mthca0/device/resource > show? > > Maybe there's some problem to map a full megabyte? > Here's a test that only maps 4K. Could you strace it please? > > >>>>>>>>>>> > > #define _XOPEN_SOURCE 500 > #define _FILE_OFFSET_BITS 64 > > #include > > #include > > #include > #include > #include > #include > #include > #include > #include > > #include > #include > > #include > #include > #include > /* #include > * #include */ > > int main() > { > int fd; > unsigned value; > volatile void *ptr; > fd = open("/proc/bus/pci/00/00.0" ,O_RDWR | O_SYNC); > > /* ioctl(fd, PCIIOC_MMAP_IS_MEM); */ > ptr = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, > 0xf0000); > memcpy(&value, (void*)(ptr + 0x14), sizeof value); > printf("0x%x\n"); > return 0; > } > > > From halr at voltaire.com Thu Sep 28 13:21:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2006 16:21:10 -0400 Subject: [openib-general] [PATCH 1/2] osm: osmtest ignores error status In-Reply-To: References: Message-ID: <1159474864.4353.322640.camel@hal.voltaire.com> On Thu, 2006-09-28 at 11:16, Yevgeny Kliteynik wrote: > Hi Hal. > > This patch takes care of several cases where osmtest > ignored error status. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied to trunk only. -- Hal From halr at voltaire.com Thu Sep 28 13:44:11 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2006 16:44:11 -0400 Subject: [openib-general] [PATCH 2/2] osm: osmtest ignores error status In-Reply-To: References: Message-ID: <1159476246.4353.323381.camel@hal.voltaire.com> On Thu, 2006-09-28 at 11:20, Yevgeny Kliteynik wrote: > Hi Hal. > > This patch takes care of several cases where osmtest > ignored error status (plus some cosmetics). > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied to trunk only. -- Hal From jeremy at goop.org Thu Sep 28 13:46:19 2006 From: jeremy at goop.org (Jeremy Fitzhardinge) Date: Thu, 28 Sep 2006 13:46:19 -0700 Subject: [openib-general] [PATCH 24 of 28] IB/mthca - Fix compiler warnings with gcc4 on possible unitialized variables In-Reply-To: References: <9fa624c592af68f7a851.1159459220@eng-12.pathscale.com> Message-ID: <451C349B.9020102@goop.org> Roland Dreier wrote: > NAK -- I don't want to generate worse code to fix a compiler warning > false positive. > Maybe we should have a "make defined" operation for this kind of thing: #define DEFVALUE(x) asm("" : "=rm" (x)) Which is pretty ugly, I admit... J From swise at opengridcomputing.com Thu Sep 28 13:49:45 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 28 Sep 2006 15:49:45 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: Message-ID: <1159476585.30153.80.camel@stevo-desktop> On Thu, 2006-09-28 at 10:31 -0700, Roland Dreier wrote: > > I have said this before, but I will repeat myself once again. > > I really do not care where the latest code is, but there needs > > to be ONE place where we can get all the latest code for development > > and testing. > > I'll repeat my usual response: the notion of a single "latest" tree > doesn't match reality, and any attempt to coerce things into that mold > just causes problems. There's not necessarily any correlation between > the newest ipath code and Sean's RDMA CM. > > git (or any other true distributed SCM system) makes this easier to > handle: you can easily merge the branches you're interested in trying > into your local tree. > > - R. So I think we all agree on the need for a way to get a "latest" snapshot of the kernel code (we argue a LOT about how this is done :). And at this point in time its definitely _not_ the svn trunk for some kernel components. Like infiniband/core, which is behind linus's git tree for some things (eg iwarp), and ahead of linus's git tree for others (eg ucma). This is bad. There's no way to get the latest code with all features (eg iwarp + user cma). The model we should adopt IMO is: linus's git tree + some set of patches that compose the latest open fabrics kernel code. The patches are all in-process for going into linus's tree at some point. And the maintainer of that technology, (eg sean for ucma) will keep that patch set up to date for folks to pull until it gets pulled into an upstream git tree (like linus's or roland's). With git and stg this is pretty easy IMO. So the kernel developers all adopt git and maintain their latest changes that are always on top of linus's git tree, or roland's infiniband git tree. And we document where each component's patches or git tree is located. Perhaps on the openib wiki. OFED and others who build snapshots will have to pull from these different components, merge them, test them, then release it as a "snaphot". Here is what we did for iwarp, which is an example of one of these components: Setup initial patch set: - clone linus's git tree - clone roland's git tree and reference linus's tree - create your stacked patchset Whenever you want to get upstream updates from roland's tree and/or linus's tree you do this: - pop your patchset - git pull from linus's tree to update your clone of his git tree. - re-clone roland's tree again referencing linus's tree (this makes the operation quicker). - push (and merge) your patchset back on top Dunno if all this helps the conversation, but we've argued about it for a long time, and I thought maybe getting more specific on process and how to maintain patches might help move things along... STeve. From rdreier at cisco.com Thu Sep 28 13:55:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 13:55:31 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159471350.15009.237.camel@localhost> (Matt Leininger's message of "Thu, 28 Sep 2006 12:22:30 -0700") References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> <1159471350.15009.237.camel@localhost> Message-ID: Matt> RedHat and SuSE have stated several times that they want Matt> an OFED like process that takes the OF code and runs it Matt> through a rigorous suite of regression and performance Matt> tests. The purpose of OFED is to get into the commercially Matt> supported distros (e.g RHEL and SLES). That is what the Matt> majority of end customers want/need. That said spinning out Matt> "pre-OFED" releases of each component would help to get the Matt> code into the other distros (FC, Debian, Ubuntu, Gentoo, Matt> etc.) which, of course, is a very good thing to do. I think we've gotten mixed up about "release" vs. "distribution" again. I would say that all the packaging crap, which OFED does as a short-term thing to make it possible for naive users to install, is actually a big negative for RH and Novell -- they would rather package and build software themselves. What is missing is the tested, coordinated tarball release of OF userspace stuff -- http://www.gnome.org/start/2.16/ might be a useful model, particularly the "Getting GNOME 2.16" section. Then if the OFED group wants to build a distribution, that's fine and healthy. - R. From rdreier at cisco.com Thu Sep 28 14:00:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 14:00:59 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159476585.30153.80.camel@stevo-desktop> (Steve Wise's message of "Thu, 28 Sep 2006 15:49:45 -0500") References: <1159476585.30153.80.camel@stevo-desktop> Message-ID: Steve> So I think we all agree on the need for a way to get a Steve> "latest" snapshot of the kernel code (we argue a LOT about Steve> how this is done :). Not to be difficult -- but I disagree. I think this statement doesn't actually make sense, because: ** what does "latest" mean?? ** Does someone who wants to check if the new ipath tree fixed a bug really want to run my bleeding-edge IPoIB NAPI stuff? Does someone who wants to try IPoIB NAPI want to run possibly-broken bleeding edge RDMA CM code? etc. etc. Steve> The model we should adopt IMO is: linus's git tree + some Steve> set of patches that compose the latest open fabrics kernel Steve> code. The patches are all in-process for going into Steve> linus's tree at some point. And the maintainer of that Steve> technology, (eg sean for ucma) will keep that patch set up Steve> to date for folks to pull until it gets pulled into an Steve> upstream git tree (like linus's or roland's). With git and Steve> stg this is pretty easy IMO. Well, I think that's sort of reasonable, except that it has to be more than one git branch. All the in-process stuff should be on logically separate "topic branches". I'm happy to maintain for-2.6.x trees that represent stuff queued for the current and next kernel release, but stuff that hasn't been fully stabilized and reviewed should be kept separate, and I'm happy to create branches in my git tree for any other patch sets for developers who don't want to use git or don't have a place to host things. - R. From rdreier at cisco.com Thu Sep 28 14:03:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 14:03:32 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: (Roland Dreier's message of "Thu, 28 Sep 2006 13:55:31 -0700") References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> <1159471350.15009.237.camel@localhost> Message-ID: BTW, http://www.kernel.org/git/?p=linux/kernel/git/jgarzik/libata-dev.git;a=summary is an example of what I'm talking about: notice how Jeff has branches for specific changesets. - R. From swise at opengridcomputing.com Thu Sep 28 14:13:24 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 28 Sep 2006 16:13:24 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159476585.30153.80.camel@stevo-desktop> Message-ID: <1159478004.30153.98.camel@stevo-desktop> On Thu, 2006-09-28 at 14:00 -0700, Roland Dreier wrote: > Steve> So I think we all agree on the need for a way to get a > Steve> "latest" snapshot of the kernel code (we argue a LOT about > Steve> how this is done :). > > Not to be difficult -- but I disagree. I think this statement doesn't > actually make sense, because: ** what does "latest" mean?? ** > Perhaps "latest" was a bad word. > Does someone who wants to check if the new ipath tree fixed a bug > really want to run my bleeding-edge IPoIB NAPI stuff? No, they just want to test a bug in the ipath code. They don't care about iwarp or rdma cm probably either. > Does someone > who wants to try IPoIB NAPI want to run possibly-broken bleeding edge > RDMA CM code? etc. etc. > Right, there are users who DONT want that. But there are users who want: dapl + user mode rdma cm + user mode iwarp + rdma cm kernel + iwarp kernel + chelsio driver + ipath driver, for example. They should be able to pull these into a single tree and build it somehow. Previously it was easy because everyone pushed their code into the svn repos. With the changing focus on feeding things into kernel.org I think we need a new process. > Steve> The model we should adopt IMO is: linus's git tree + some > Steve> set of patches that compose the latest open fabrics kernel > Steve> code. The patches are all in-process for going into > Steve> linus's tree at some point. And the maintainer of that > Steve> technology, (eg sean for ucma) will keep that patch set up > Steve> to date for folks to pull until it gets pulled into an > Steve> upstream git tree (like linus's or roland's). With git and > Steve> stg this is pretty easy IMO. > > Well, I think that's sort of reasonable, except that it has to be more > than one git branch. All the in-process stuff should be on logically > separate "topic branches". I'm happy to maintain for-2.6.x trees that > represent stuff queued for the current and next kernel release, but > stuff that hasn't been fully stabilized and reviewed should be kept > separate, and I'm happy to create branches in my git tree for any > other patch sets for developers who don't want to use git or don't > have a place to host things. > ok. topic branches in your git tree or a set of git trees sounds reasonable. But to facilitate those trying to assemble bits and pieces, we should provide documentation on where they get this stuff. This _might_ help convince those who are hanging on to the svn idea to adopt this new scheme... Steve. From robert.j.woodruff at intel.com Thu Sep 28 14:25:35 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 28 Sep 2006 14:25:35 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. Message-ID: Steve wrote, >ok. topic branches in your git tree or a set of git trees sounds >reasonable. But to facilitate those trying to assemble bits and >pieces, we should provide documentation on where they get this stuff. >This _might_ help convince those who are hanging on to the svn idea to >adopt this new scheme... >Steve. Perhaps we need something similar to the concept of an MM tree where new, more experimental patches, can be applied and tested together before going into Roland's mainline git tree that is queued for kernel.org. Again, some sort of development branch like what we use to have with SVN. Does not matter to me if this is git or SVN, but a central data base is desirable so that people don't have to get things from all over the place. There are definitely going to be early adopters that want to try out several of the new things, iWarp, rdma_cm, SDP, etc. all at once from one code base, so having a way for them to get versions of all the various components that are still under development is what is needed. What I don't want to see is what we have now. Things like iWarp that are submitted upstream but do not work with some of the latest development code (in SVN) for the rdma_cm, SDP, etc. In that model, there is no easy way for someone to get a version of all of the different pieces that all work together. woody From swise at opengridcomputing.com Thu Sep 28 14:30:52 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 28 Sep 2006 16:30:52 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> <1159471350.15009.237.camel@localhost> Message-ID: <1159479052.30153.103.camel@stevo-desktop> On Thu, 2006-09-28 at 14:03 -0700, Roland Dreier wrote: > BTW, http://www.kernel.org/git/?p=linux/kernel/git/jgarzik/libata-dev.git;a=summary > is an example of what I'm talking about: notice how Jeff has branches > for specific changesets. > I see. What might the branch layout look like today for openib? This might help clarify the idea. Steve. From mshefty at ichips.intel.com Thu Sep 28 14:30:42 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Sep 2006 14:30:42 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159476585.30153.80.camel@stevo-desktop> Message-ID: <451C3F02.3000907@ichips.intel.com> Roland Dreier wrote: > Not to be difficult -- but I disagree. I think this statement doesn't > actually make sense, because: ** what does "latest" mean?? ** I think this is more a matter of whether there's a single, "main" development branch somewhere, or if one even needs to exist. > Well, I think that's sort of reasonable, except that it has to be more > than one git branch. All the in-process stuff should be on logically > separate "topic branches". agreed > I'm happy to create branches in my git tree for any > other patch sets for developers who don't want to use git or don't > have a place to host things. Someday soon I hear, OFA will be able to host git repositories, so my preference is to delay any svn to git transition until then. (I cannot host git from inside Intel's firewall, nor can I access a git repository which isn't hosted at kernel.org.) How would you handle merging in changes from the main branch to side branches? - Sean From robert.j.woodruff at intel.com Thu Sep 28 14:32:53 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 28 Sep 2006 14:32:53 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. Message-ID: Steve wrote >The model we should adopt IMO is: linus's git tree + some set of patches >that compose the latest open fabrics kernel code. The patches are all >in-process for going into linus's tree at some point. And the >maintainer of that technology, (eg sean for ucma) will keep that patch >set up to date for folks to pull until it gets pulled into an upstream >git tree (like linus's or roland's). With git and stg this is pretty >easy IMO. >So the kernel developers all adopt git and maintain their latest changes >that are always on top of linus's git tree, or roland's infiniband git >tree. And we document where each component's patches or git tree is >located. Perhaps on the openib wiki. Perhaps if we used an MM tree model, the initial MM tree would be cloned from Linus's git tree or Rolands tree that is queued for Linus, then people that want to test their new code with other more experimental code would sumbit a patch for the MM tree. When a component is thought to be stable enough to go up stream, a patch is then submitted to Roland for his git tree. If there are changes made to Rolands tree for non-experimental components, the MM tree maintainer would periodically sink the MM tree to the mainline (Roland's) tree. Would something like this work ? woody From tom at opengridcomputing.com Thu Sep 28 14:40:15 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Thu, 28 Sep 2006 16:40:15 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: Message-ID: On 9/28/06 4:25 PM, "Woodruff, Robert J" wrote: > Steve wrote, >> ok. topic branches in your git tree or a set of git trees sounds >> reasonable. But to facilitate those trying to assemble bits and >> pieces, we should provide documentation on where they get this stuff. >> This _might_ help convince those who are hanging on to the svn idea to >> adopt this new scheme... > >> Steve. > > Perhaps we need something similar to the concept of an MM tree > where new, more experimental patches, can be applied and tested > together before going into Roland's mainline git tree that is queued for > kernel.org. > > Again, some sort of development branch like what we use to have > with SVN. Does not matter to me if this is git or SVN, but > a central data base is desirable so that people don't have to > get things from all over the place. I think that there is an elephant in this room that everyone seems to be ignoring -- no one is signed up to select and merge the relevant topic branches together to create a unified, working "release candidate" and then posting it in a convenient place for you to pull from. Unless this developer resource problem is solved, you will be left with a well defined (but empty) branch to pull from. > > There are definitely going to be early adopters that want to > try out several of the new things, iWarp, rdma_cm, SDP, etc. > all at once from one code base, so having a way for them to > get versions of all the various components that are still under > development is what is needed. > > What I don't want to see is what we have now. Things like iWarp that > are submitted upstream but do not work with some of the latest > development code (in SVN) for the rdma_cm, SDP, etc. In that model, > there is no easy way for someone to get a version of all of the > different pieces that all work together. > > woody > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Thu Sep 28 14:40:59 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 28 Sep 2006 16:40:59 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: Message-ID: <1159479659.30153.108.camel@stevo-desktop> On Thu, 2006-09-28 at 14:32 -0700, Woodruff, Robert J wrote: > Steve wrote > > >The model we should adopt IMO is: linus's git tree + some set of > patches > >that compose the latest open fabrics kernel code. The patches are all > >in-process for going into linus's tree at some point. And the > >maintainer of that technology, (eg sean for ucma) will keep that patch > >set up to date for folks to pull until it gets pulled into an upstream > >git tree (like linus's or roland's). With git and stg this is pretty > >easy IMO. > > >So the kernel developers all adopt git and maintain their latest > changes > >that are always on top of linus's git tree, or roland's infiniband git > >tree. And we document where each component's patches or git tree is > >located. Perhaps on the openib wiki. > > Perhaps if we used an MM tree model, the initial MM tree would be > cloned from Linus's git tree or Rolands tree that is queued for > Linus, then people that want to test their new code with other > more experimental code would sumbit a patch for the MM tree. > When a component is thought to be stable enough to go up stream, > a patch is then submitted to Roland for his git tree. > > If there are changes made to Rolands tree for non-experimental > components, the MM tree maintainer would periodically sink > the MM tree to the mainline (Roland's) tree. > > Would something like this work ? > Yes, that seems like it would work. We need an Andrew Morton. ;-) Seriously, I think part of the issue here is getting the warm body that will do that work... Steve. From robert.j.woodruff at intel.com Thu Sep 28 14:59:32 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 28 Sep 2006 14:59:32 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. Message-ID: Steve Wise wrote, >Yes, that seems like it would work. We need an Andrew Morton. ;-) >Seriously, I think part of the issue here is getting the warm body that >will do that work... >Steve. Perhaps we could (Sean, Hal, and/or myself) could maintain such a development (MM) tree once OpenFabrics is able to host git as we already have to maintain separate clones anyway to use for development of new features for the Labs. I'll talk offline with Sean and Hal and see if we have the time to maintain an MM-like development branch. But until OpenFabrics can host git, I think we are stuck with SVN and the current mess unless we asked to host the MM tree branch at kernel.org, and I am not sure what it takes to get kernel.org to host a git tree. Thoughts ? woody From rdreier at cisco.com Thu Sep 28 16:16:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 16:16:37 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159479052.30153.103.camel@stevo-desktop> (Steve Wise's message of "Thu, 28 Sep 2006 16:30:52 -0500") References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> <1159471350.15009.237.camel@localhost> <1159479052.30153.103.camel@stevo-desktop> Message-ID: Steve> What might the branch layout look like today for openib? Steve> This might help clarify the idea. I don't really know everything people are working on. But we might have ipoib-napi, ipath, ehca, ucma branches at least. From rdreier at cisco.com Thu Sep 28 16:18:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 16:18:24 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159479659.30153.108.camel@stevo-desktop> (Steve Wise's message of "Thu, 28 Sep 2006 16:40:59 -0500") References: <1159479659.30153.108.camel@stevo-desktop> Message-ID: Steve> Yes, that seems like it would work. We need an Andrew Steve> Morton. ;-) Steve> Seriously, I think part of the issue here is getting the Steve> warm body that will do that work... It would be fairly easy to create a "union of all git development branches" git branch, as long as I can use native git to get to all the branches. So I'm happy to maintain that, with updates once or twice a week say. - R. From rdreier at cisco.com Thu Sep 28 16:19:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 16:19:37 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: (Robert J. Woodruff's message of "Thu, 28 Sep 2006 14:59:32 -0700") References: Message-ID: Robert> such a development (MM) tree once OpenFabrics is able to Robert> host git as we already have to maintain separate clones Robert> anyway to use for development of new features for the Robert> Labs. I'll talk offline with Sean and Hal and see if we Robert> have the time to maintain an MM-like development branch. Robert> But until OpenFabrics can host git, I think we are stuck Robert> with SVN and the current mess unless we asked to host the Robert> MM tree branch at kernel.org, and I am not sure what it Robert> takes to get kernel.org to host a git tree. It's really easy to host git trees at kernel.org. I don't really know what the criteria are for getting a kernel.org account but I don't think they're that stringent. - R. From mlleinin at hpcn.ca.sandia.gov Thu Sep 28 16:29:38 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Thu, 28 Sep 2006 16:29:38 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> <1159471350.15009.237.camel@localhost> Message-ID: <1159486178.15009.302.camel@localhost> On Thu, 2006-09-28 at 13:55 -0700, Roland Dreier wrote: > Matt> RedHat and SuSE have stated several times that they want > Matt> an OFED like process that takes the OF code and runs it > Matt> through a rigorous suite of regression and performance > Matt> tests. The purpose of OFED is to get into the commercially > Matt> supported distros (e.g RHEL and SLES). That is what the > Matt> majority of end customers want/need. That said spinning out > Matt> "pre-OFED" releases of each component would help to get the > Matt> code into the other distros (FC, Debian, Ubuntu, Gentoo, > Matt> etc.) which, of course, is a very good thing to do. > > I think we've gotten mixed up about "release" vs. "distribution" > again. I would say that all the packaging crap, which OFED does as a > short-term thing to make it possible for naive users to install, is > actually a big negative for RH and Novell -- they would rather package > and build software themselves. Fair point. I don't like the way OFED is packaged. It's messy and just causes more problems than it is worth. What I do like about OFED is the rigorous testing that each company does. It would be great if we can include this rigorous testing into the OF release process. > > What is missing is the tested, coordinated tarball release of OF > userspace stuff -- http://www.gnome.org/start/2.16/ might be a useful > model, particularly the "Getting GNOME 2.16" section. > Yes, we need something like this. - Matt From rdreier at cisco.com Thu Sep 28 16:53:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Sep 2006 16:53:18 -0700 Subject: [openib-general] Coverity found iSER bug? In-Reply-To: (Roland Dreier's message of "Thu, 28 Sep 2006 10:56:18 -0700") References: Message-ID: (This is from the Coverity scanner, CID 1396) In iser_initiator.c there is suspicious code in iser_rcv_completion(). We start with char *rx_data = NULL; int rx_data_len = 0; and then do if (dto_xfer_len > ISER_TOTAL_HEADERS_LEN) { /* we have data */ rx_data_len = dto_xfer_len - ISER_TOTAL_HEADERS_LEN; rx_data = dto->regd[1]->virt_addr; rx_data += dto->offset[1]; } I see no assignment to rx_data if dto_xfer_len <= ISER_TOTAL_HEADERS_LEN. Then after a bunch of other stuff, we do iscsi_iser_recv(conn->iscsi_conn, hdr, rx_data, rx_data_len); Coverity eventually follows this path to iscsi_scsi_cmd_rsp(), which might dereference rx_data directly. Is this a "can't happen" false positive or is there really a problem here? - R. From swise at opengridcomputing.com Fri Sep 29 06:38:12 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Sep 2006 08:38:12 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> <1159471350.15009.237.camel@localhost> <1159479052.30153.103.camel@stevo-desktop> Message-ID: <1159537092.21613.14.camel@stevo-desktop> On Thu, 2006-09-28 at 16:16 -0700, Roland Dreier wrote: > Steve> What might the branch layout look like today for openib? > Steve> This might help clarify the idea. > > I don't really know everything people are working on. But we might > have ipoib-napi, ipath, ehca, ucma branches at least. > Add to that cxgb3 for chelsio's T3 drivers. From jlentini at netapp.com Fri Sep 29 07:58:45 2006 From: jlentini at netapp.com (James Lentini) Date: Fri, 29 Sep 2006 10:58:45 -0400 (EDT) Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: Message-ID: On Thu, 28 Sep 2006, Tom Tucker wrote: > I think that there is an elephant in this room that everyone seems > to be ignoring -- no one is signed up to select and merge the > relevant topic branches together to create a unified, working > "release candidate" and then posting it in a convenient place for > you to pull from. > > Unless this developer resource problem is solved, you will be left > with a well defined (but empty) branch to pull from. I think there are two elephants in the room. What bout the dual license policy that is enforced by the OpenFabrics Alliance? Currently the OpenFabrics Alliance members require that all code committed to the OFA repository will be dual GPL/BSD licensed. If the source code is no longer hosted on OFA servers, who is going to guarantee that? From jlentini at netapp.com Fri Sep 29 09:26:41 2006 From: jlentini at netapp.com (James Lentini) Date: Fri, 29 Sep 2006 12:26:41 -0400 (EDT) Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <451C3F02.3000907@ichips.intel.com> References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> Message-ID: On Thu, 28 Sep 2006, Sean Hefty wrote: > Someday soon I hear, OFA will be able to host git repositories, so > my preference is to delay any svn to git transition until then. (I > cannot host git from inside Intel's firewall, nor can I access a git > repository which isn't hosted at kernel.org.) Sean's concern brings to mind an important issue. The OFA repository is a common, neutral area to which we can all contribute. It would be a shame if we went back to the "dark ages" before OFA were every vendor had their own slightly different software stack. Balkanizing the OFA repository into corporate repositories would be a mistake. It is likely that companies will restrict developers at HCA vendor X from contributing code to HCA vendor Y's repository. From swise at opengridcomputing.com Fri Sep 29 09:34:14 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Sep 2006 11:34:14 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> Message-ID: <1159547654.21613.71.camel@stevo-desktop> On Fri, 2006-09-29 at 12:26 -0400, James Lentini wrote: > > On Thu, 28 Sep 2006, Sean Hefty wrote: > > > Someday soon I hear, OFA will be able to host git repositories, so > > my preference is to delay any svn to git transition until then. (I > > cannot host git from inside Intel's firewall, nor can I access a git > > repository which isn't hosted at kernel.org.) > > Sean's concern brings to mind an important issue. The OFA repository > is a common, neutral area to which we can all contribute. It would be > a shame if we went back to the "dark ages" before OFA were every > vendor had their own slightly different software stack. > > Balkanizing the OFA repository into corporate repositories would be a > mistake. It is likely that companies will restrict developers at HCA > vendor X from contributing code to HCA vendor Y's repository. I don't think anybody is suggesting corporate private git trees... But just to state it clearly: We either host git trees on kernel.org or on openfabrics.org. Steve. From bos at pathscale.com Fri Sep 29 10:03:51 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 29 Sep 2006 10:03:51 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159486178.15009.302.camel@localhost> References: <1159300251.11549.6.camel@stevo-desktop> <1159300894.11549.11.camel@stevo-desktop> <1159393578.21086.16.camel@chalcedony.pathscale.com> <20060928062723.GG23828@mellanox.co.il> <1159455506.11976.1.camel@chalcedony.pathscale.com> <1159461093.5010.8.camel@chalcedony.pathscale.com> <1159463999.15009.207.camel@localhost> <1159465954.15009.223.camel@localhost> <1159471350.15009.237.camel@localhost> <1159486178.15009.302.camel@localhost> Message-ID: <1159549431.17595.21.camel@sardonyx> On Thu, 2006-09-28 at 16:29 -0700, Matt Leininger wrote: > Fair point. I don't like the way OFED is packaged. It's messy and > just causes more problems than it is worth. +10 > What I do like about OFED > is the rigorous testing that each company does. It would be great if we > can include this rigorous testing into the OF release process. +1 References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> Message-ID: <1159550667.17595.29.camel@sardonyx> On Fri, 2006-09-29 at 12:26 -0400, James Lentini wrote: > Balkanizing the OFA repository into corporate repositories would be a > mistake. Nobody is suggesting this. However, separating the mess that is the current SVN trunk into a set of well-understood branches, each of which sees some testing by its authors in isolation, can *only* be a good thing for ensuring a higher-quality OF process in general. > It is likely that companies will restrict developers at HCA > vendor X from contributing code to HCA vendor Y's repository. I doubt it. As a practical matter, having your driver in the kernel tree means it's open season for anyone who wants to take a crack at it. Just look at the number of IB/10gbE/iWarp hardware vendors that have fingerprints all over each other's code in drivers/infiniband/hw for an example. References: Message-ID: <1159550757.17595.31.camel@sardonyx> On Fri, 2006-09-29 at 10:58 -0400, James Lentini wrote: > Currently the OpenFabrics Alliance members require that all code > committed to the OFA repository will be dual GPL/BSD licensed. If the > source code is no longer hosted on OFA servers, who is going to > guarantee that? It's been the responsibility of OFA members to ensure that all along. Just because people are using a different revision control tool doesn't have much bearing on that. Message-ID: openib-general-bounces at openib.org wrote on 09/28/2006 09:11:47 AM: > Michael> Looked pretty simple on the outset, but oh well. Keep us > Michael> posted. > > I just work slowly. > > Anyway I don't think this is that urgent -- we've dumped enough stuff > into 2.6.19, so I think this should wait for 2.6.20 at the earliest anyway. Please wait for other device drivers to finish the performance test. This NAPI patch somehow kills ehca performance, extremly bad. Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.j.woodruff at intel.com Fri Sep 29 10:55:53 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 29 Sep 2006 10:55:53 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. Message-ID: Steve wrote, >I don't think anybody is suggesting corporate private git trees... >But just to state it clearly: We either host git trees on kernel.org or >on openfabrics.org. >Steve. Right now it looks like the OFED git tree is hosted on the mellanox site, not kernel.org or openfabrics. >From the HOW to build documentation in OFED 1.1... mkdir gitdir cd gitdir git clone -s --bare git://www.mellanox.co.il/~git/infiniband .git git checkout ofed_1_1 `git-ls-tree -r --name-only ofed_1_1 \ include/rdma include/scsi/srp.h drivers/infiniband \ Documentation/infiniband ofed_scripts kernel_patches` _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sean.hefty at intel.com Fri Sep 29 11:47:06 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 29 Sep 2006 11:47:06 -0700 Subject: [openib-general] [PATCH 1/5] 2.6.19 rdma_cm: fix leak of cm_id's in case of failures Message-ID: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> cma_connect_ib and cma_connect_iw leak cm_id's in failure cases. Signed-off-by: Krishna Kumar Signed-off-by: Sean Hefty --- Steve, I modified Krishna's patch to include a fix for iWarp as well. Please verify that it looks okay. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 1178bd4..69bb089 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1862,6 +1862,11 @@ static int cma_connect_ib(struct rdma_id ret = ib_send_cm_req(id_priv->cm_id.ib, &req); out: + if (ret && !IS_ERR(id_priv->cm_id.ib)) { + ib_destroy_cm_id(id_priv->cm_id.ib); + id_priv->cm_id.ib = NULL; + } + kfree(private_data); return ret; } @@ -1889,10 +1894,8 @@ static int cma_connect_iw(struct rdma_id cm_id->remote_addr = *sin; ret = cma_modify_qp_rtr(&id_priv->id); - if (ret) { - iw_destroy_cm_id(cm_id); - return ret; - } + if (ret) + goto out; iw_param.ord = conn_param->initiator_depth; iw_param.ird = conn_param->responder_resources; @@ -1904,6 +1907,10 @@ static int cma_connect_iw(struct rdma_id iw_param.qpn = conn_param->qp_num; ret = iw_cm_connect(cm_id, &iw_param); out: + if (ret && !IS_ERR(cm_id)) { + iw_destroy_cm_id(cm_id); + id_priv->cm_id.iw = NULL; + } return ret; } From sean.hefty at intel.com Fri Sep 29 11:51:49 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 29 Sep 2006 11:51:49 -0700 Subject: [openib-general] [PATCH 2/5] 2.6.19 rdma_cm: fix device removal race In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> Message-ID: <000201c6e3f8$51de5e40$ff0da8c0@amr.corp.intel.com> The race is as follows: A process : cma_process_remove() calls cma_remove_id_dev(), which sets id state to CMA_DEVICE_REMOVAL and calls wait_event(dev_remove). B process : cma_req_handler() had incremented dev_remove, and calls cma_acquire_ib_dev() and on failure calls cma_release_remove(), which does a wake_up of cma_process_remove(). Then cma_req_handler() calls rdma_destroy_id(); A Process : cma_remove_id_dev() gets woken and checks the state of id, and since it is still (wrongly) CMA_DEVICE_REMOVAL, it calls notify_user(id) and if that fails, the caller - cma_process_remove() calls rdma_destroy_id(id). Two processes can call rdma_destroy_id(), resulting in one de-referencing kfreed id_priv. Fix is for process B to set CMA_DESTROYING in cma_req_handler() so that process A will return instead of doing a rdma_destroy_id(). Signed-off-by: Krishna Kumar Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 69bb089..f383a4f 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -932,6 +932,7 @@ static int cma_req_handler(struct ib_cm_ mutex_unlock(&lock); if (ret) { ret = -ENODEV; + cma_exch(conn_id, CMA_DESTROYING); cma_release_remove(conn_id); rdma_destroy_id(&conn_id->id); goto out; From sean.hefty at intel.com Fri Sep 29 11:57:09 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 29 Sep 2006 11:57:09 -0700 Subject: [openib-general] [PATCH 3/5] 2.6.19 rdma_cm: set status correct on route resolution error In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> Message-ID: <000301c6e3f9$10eae830$ff0da8c0@amr.corp.intel.com> On reporting a route error, also include the status for the error, rather than indicating a status of 0 when an error has occurred. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f383a4f..d10fdf1 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1308,6 +1308,7 @@ static void cma_query_handler(int status work->old_state = CMA_ROUTE_QUERY; work->new_state = CMA_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_ERROR; + work->event.status = status; } queue_work(cma_wq, &work->work); From sean.hefty at intel.com Fri Sep 29 12:03:35 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 29 Sep 2006 12:03:35 -0700 Subject: [openib-general] [PATCH 4/5] 2.6.19 rdma_cm: eliminate unnecessary remove list In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> Message-ID: <000401c6e3f9$f718b030$ff0da8c0@amr.corp.intel.com> Eliminate remove_list by using list_del_init instead during device removal handling. Signed-off-by: Krishna Kumar Signed-off-by: Sean Hefty --- This removes a stack variable and simplifies the code, but does not fix any bugs. We can defer this to 2.6.20 if necessary. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d10fdf1..3982b81 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2151,12 +2151,9 @@ static int cma_remove_id_dev(struct rdma static void cma_process_remove(struct cma_device *cma_dev) { - struct list_head remove_list; struct rdma_id_private *id_priv; int ret; - INIT_LIST_HEAD(&remove_list); - mutex_lock(&lock); while (!list_empty(&cma_dev->id_list)) { id_priv = list_entry(cma_dev->id_list.next, @@ -2167,8 +2164,7 @@ static void cma_process_remove(struct cm continue; } - list_del(&id_priv->list); - list_add_tail(&id_priv->list, &remove_list); + list_del_init(&id_priv->list); atomic_inc(&id_priv->refcount); mutex_unlock(&lock); From sean.hefty at intel.com Fri Sep 29 12:09:51 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 29 Sep 2006 12:09:51 -0700 Subject: [openib-general] [PATCH 5/5] 2.6.19 rdma_cm: optimize error handling In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> Message-ID: <000501c6e3fa$d6f90150$ff0da8c0@amr.corp.intel.com> Re-organize code relating to cma_get_net_info() and rdam_create_id() to optimize error case handling (no need to alloc memory/etc. as part of rdma_create_id() if input parameters are wrong). Signed-off-by: Krishna Kumar Signed-off-by: Sean Hefty --- This does not fix a bug. We can defer this to 2.6.20 if necessary. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 3982b81..9ae4f3a 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -874,23 +874,25 @@ static struct rdma_id_private *cma_new_i __u16 port; u8 ip_ver; + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto err; + id = rdma_create_id(listen_id->event_handler, listen_id->context, listen_id->ps); if (IS_ERR(id)) - return NULL; + goto err; + + cma_save_net_info(&id->route.addr, &listen_id->route.addr, + ip_ver, port, src, dst); rt = &id->route; rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1; - rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); + rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, + GFP_KERNEL); if (!rt->path_rec) - goto err; + goto destroy_id; - if (cma_get_net_info(ib_event->private_data, listen_id->ps, - &ip_ver, &port, &src, &dst)) - goto err; - - cma_save_net_info(&id->route.addr, &listen_id->route.addr, - ip_ver, port, src, dst); rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; if (rt->num_paths == 2) rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; @@ -903,8 +905,10 @@ static struct rdma_id_private *cma_new_i id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = CMA_CONNECT; return id_priv; -err: + +destroy_id: rdma_destroy_id(id); +err: return NULL; } From rdreier at cisco.com Fri Sep 29 12:45:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 29 Sep 2006 12:45:23 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159547654.21613.71.camel@stevo-desktop> (Steve Wise's message of "Fri, 29 Sep 2006 11:34:14 -0500") References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159547654.21613.71.camel@stevo-desktop> Message-ID: Steve> I don't think anybody is suggesting corporate private git Steve> trees... Steve> But just to state it clearly: We either host git trees on Steve> kernel.org or on openfabrics.org. Why? I don't see anything wrong with the git trees that are at www.mellanox.co.il right now. As long as we agree that Linus's tree is the ultimate destination for Linux drivers, I don't think the domain name that people use to publish their work-in-progress trees matters at all. - R. From swise at opengridcomputing.com Fri Sep 29 12:47:50 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Sep 2006 14:47:50 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159547654.21613.71.camel@stevo-desktop> Message-ID: <1159559270.21613.94.camel@stevo-desktop> On Fri, 2006-09-29 at 12:45 -0700, Roland Dreier wrote: > Steve> I don't think anybody is suggesting corporate private git > Steve> trees... > > Steve> But just to state it clearly: We either host git trees on > Steve> kernel.org or on openfabrics.org. > > Why? I don't see anything wrong with the git trees that are at > www.mellanox.co.il right now. > > As long as we agree that Linus's tree is the ultimate destination for > Linux drivers, I don't think the domain name that people use to > publish their work-in-progress trees matters at all. > > - R. Just trying to simplify things and centralize the technology location... From rdreier at cisco.com Fri Sep 29 12:48:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 29 Sep 2006 12:48:37 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: (James Lentini's message of "Fri, 29 Sep 2006 10:58:45 -0400 (EDT)") References: Message-ID: James> I think there are two elephants in the room. What bout the James> dual license policy that is enforced by the OpenFabrics James> Alliance? James> Currently the OpenFabrics Alliance members require that all James> code committed to the OFA repository will be dual GPL/BSD James> licensed. If the source code is no longer hosted on OFA James> servers, who is going to guarantee that? I would call this more of a red herring than an elephant. Right now their is nothing that prevents me or anyone else from writing GPL-only code and getting it merged into Linus's tree. When I pointed this out before, your response was that such code would not be part of the OpenFabrics stack -- and I think that's exactly the answer to the issue you're raising: For better or for worse, the OFA marketing has created a peer pressure situation that all the IB and RDMA vendors feel compelled to play along with. And if GPL-only code doesn't get the OFA stamp of approval, then vendors aren't going to do that. The domain name of a source code repository is pretty irrelevant here. (Not to mention the fact that no one is enforcing the dual license on things that _are_ checked into openib.org anyway...) - R. From rdreier at cisco.com Fri Sep 29 12:49:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 29 Sep 2006 12:49:14 -0700 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: (Shirley Ma's message of "Fri, 29 Sep 2006 10:19:15 -0700") References: Message-ID: Shirley> Please wait for other device drivers to finish the Shirley> performance test. This NAPI patch somehow kills ehca Shirley> performance, extremly bad. Which NAPI patch? The last one posted can't work for non-Mellanox devices, since it doesn't handle the "rotting packet" issue. - R. From rdreier at cisco.com Fri Sep 29 12:50:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 29 Sep 2006 12:50:42 -0700 Subject: [openib-general] [PATCH 1/5] 2.6.19 rdma_cm: fix leak of cm_id's in case of failures In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> (Sean Hefty's message of "Fri, 29 Sep 2006 11:47:06 -0700") References: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> Message-ID: Are all the patches here that are 'Signed-off-by: Krishna Kumar ' also 'From: Krishna Kumar ' ? I try to keep the right authorship information when merging stuff... From sean.hefty at intel.com Fri Sep 29 13:37:06 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 29 Sep 2006 13:37:06 -0700 Subject: [openib-general] [PATCH 1/5] 2.6.19 rdma_cm: fix leak of cm_id's in case of failures In-Reply-To: Message-ID: <000601c6e407$07b31a40$ff0da8c0@amr.corp.intel.com> >Are all the patches here that are 'Signed-off-by: Krishna Kumar >' >also 'From: Krishna Kumar ' ? > >I try to keep the right authorship information when merging stuff... Yes. I made additional changes to patch 1 to include iwarp in the fixes. The rest are unmodified patches from Krishna. - Sean From mshefty at ichips.intel.com Fri Sep 29 13:44:28 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 29 Sep 2006 13:44:28 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159559270.21613.94.camel@stevo-desktop> References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159547654.21613.71.camel@stevo-desktop> <1159559270.21613.94.camel@stevo-desktop> Message-ID: <451D85AC.60603@ichips.intel.com> Steve Wise wrote: >>Why? I don't see anything wrong with the git trees that are at >>www.mellanox.co.il right now. > > Just trying to simplify things and centralize the technology location... Well, for myself, I have been unable to access the git trees at mellanox. For me to access git repositories, I need access through Intel's firewall. Today I'm restricted to kernel.org only, unless the git repositories are accessible using http. (Yes, it's a personal problem, but not one easily fixed...) - Sean From bos at pathscale.com Fri Sep 29 14:37:51 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 29 Sep 2006 14:37:51 -0700 Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads Message-ID: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> The PSN used to generate the request following a RDMA read was incorrect and some state booking wasn't maintained correctly. This patch fixes that. Signed-off-by: Ralph Campbell Signed-off-by: Bryan O'Sullivan diff -r ac3953427dbf -r 7b2b5b33a248 drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri Sep 29 14:20:17 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri Sep 29 14:20:40 2006 -0700 @@ -241,10 +241,7 @@ int ipath_make_rc_req(struct ipath_qp *q * original work request since we may need to resend * it. */ - qp->s_sge.sge = wqe->sg_list[0]; - qp->s_sge.sg_list = wqe->sg_list + 1; - qp->s_sge.num_sge = wqe->wr.num_sge; - qp->s_len = len = wqe->length; + len = wqe->length; ss = &qp->s_sge; bth2 = 0; switch (wqe->wr.opcode) { @@ -368,14 +365,23 @@ int ipath_make_rc_req(struct ipath_qp *q default: goto done; } + qp->s_sge.sge = wqe->sg_list[0]; + qp->s_sge.sg_list = wqe->sg_list + 1; + qp->s_sge.num_sge = wqe->wr.num_sge; + qp->s_len = wqe->length; if (newreq) { qp->s_tail++; if (qp->s_tail >= qp->s_size) qp->s_tail = 0; } - bth2 |= qp->s_psn++ & IPATH_PSN_MASK; - if ((int)(qp->s_psn - qp->s_next_psn) > 0) - qp->s_next_psn = qp->s_psn; + bth2 |= qp->s_psn & IPATH_PSN_MASK; + if (wqe->wr.opcode == IB_WR_RDMA_READ) + qp->s_psn = wqe->lpsn + 1; + else { + qp->s_psn++; + if ((int)(qp->s_psn - qp->s_next_psn) > 0) + qp->s_next_psn = qp->s_psn; + } /* * Put the QP on the pending list so lost ACKs will cause * a retry. More than one request can be pending so the @@ -690,13 +696,6 @@ void ipath_restart_rc(struct ipath_qp *q struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last); struct ipath_ibdev *dev; - /* - * If there are no requests pending, we are done. - */ - if (ipath_cmp24(psn, qp->s_next_psn) >= 0 || - qp->s_last == qp->s_tail) - goto done; - if (qp->s_retry == 0) { wc->wr_id = wqe->wr.wr_id; wc->status = IB_WC_RETRY_EXC_ERR; @@ -731,8 +730,6 @@ void ipath_restart_rc(struct ipath_qp *q dev->n_rc_resends += (int)qp->s_psn - (int)psn; reset_psn(qp, psn); - -done: tasklet_hi_schedule(&qp->s_task); bail: @@ -765,6 +762,7 @@ static int do_rc_ack(struct ipath_qp *qp struct ib_wc wc; struct ipath_swqe *wqe; int ret = 0; + u32 ack_psn; /* * Remove the QP from the timeout queue (or RNR timeout queue). @@ -777,26 +775,26 @@ static int do_rc_ack(struct ipath_qp *qp list_del_init(&qp->timerwait); spin_unlock(&dev->pending_lock); + /* Nothing is pending to ACK/NAK. */ + if (unlikely(qp->s_last == qp->s_tail)) + goto bail; + /* * Note that NAKs implicitly ACK outstanding SEND and RDMA write * requests and implicitly NAK RDMA read and atomic requests issued * before the NAK'ed request. The MSN won't include the NAK'ed * request but will include an ACK'ed request(s). */ + ack_psn = psn; + if (aeth >> 29) + ack_psn--; wqe = get_swqe_ptr(qp, qp->s_last); - - /* Nothing is pending to ACK/NAK. */ - if (qp->s_last == qp->s_tail) - goto bail; /* * The MSN might be for a later WQE than the PSN indicates so * only complete WQEs that the PSN finishes. */ - while (ipath_cmp24(psn, wqe->lpsn) >= 0) { - /* If we are ACKing a WQE, the MSN should be >= the SSN. */ - if (ipath_cmp24(aeth, wqe->ssn) < 0) - break; + while (ipath_cmp24(ack_psn, wqe->lpsn) >= 0) { /* * If this request is a RDMA read or atomic, and the ACK is * for a later operation, this ACK NAKs the RDMA read or @@ -807,7 +805,8 @@ static int do_rc_ack(struct ipath_qp *qp * is sent but before the response is received. */ if ((wqe->wr.opcode == IB_WR_RDMA_READ && - opcode != OP(RDMA_READ_RESPONSE_LAST)) || + (opcode != OP(RDMA_READ_RESPONSE_LAST) || + ipath_cmp24(ack_psn, wqe->lpsn) != 0)) || ((wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP || wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) && (opcode != OP(ATOMIC_ACKNOWLEDGE) || @@ -825,6 +824,10 @@ static int do_rc_ack(struct ipath_qp *qp */ goto bail; } + if (wqe->wr.opcode == IB_WR_RDMA_READ || + wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP || + wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) + tasklet_hi_schedule(&qp->s_task); /* Post a send completion queue entry if requested. */ if (!test_bit(IPATH_S_SIGNAL_REQ_WR, &qp->s_flags) || (wqe->wr.send_flags & IB_SEND_SIGNALED)) { @@ -1055,7 +1058,8 @@ static inline void ipath_rc_rcv_resp(str /* no AETH, no ACK */ if (unlikely(ipath_cmp24(psn, qp->s_last_psn + 1))) { dev->n_rdma_seq++; - ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); + if (qp->s_last != qp->s_tail) + ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); goto ack_done; } rdma_read: @@ -1091,7 +1095,8 @@ static inline void ipath_rc_rcv_resp(str /* ACKs READ req. */ if (unlikely(ipath_cmp24(psn, qp->s_last_psn + 1))) { dev->n_rdma_seq++; - ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); + if (qp->s_last != qp->s_tail) + ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); goto ack_done; } /* FALLTHROUGH */ From rdreier at cisco.com Fri Sep 29 14:39:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 29 Sep 2006 14:39:32 -0700 Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads In-Reply-To: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> ( Bryan O'Sullivan's message of "Fri, 29 Sep 2006 14:37:51 -0700") References: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> Message-ID: I assume this is 'From: Ralph Campbell ' (based on the sign-off)? From bos at pathscale.com Fri Sep 29 14:41:36 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 29 Sep 2006 14:41:36 -0700 Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads In-Reply-To: References: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> Message-ID: <1159566096.4841.13.camel@chalcedony.pathscale.com> On Fri, 2006-09-29 at 14:39 -0700, Roland Dreier wrote: > I assume this is 'From: Ralph Campbell ' > (based on the sign-off)? Yes, my patchbomb script ate the "From:". Thanks, (Sean Hefty's message of "Fri, 29 Sep 2006 11:47:06 -0700") References: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> Message-ID: Thanks, applied 1-5 to for-2.6.19. The merge worked very smoothly -- my only request would be to track authorship of patches you forward on (by including a "From:" line with the original author). From rdreier at cisco.com Fri Sep 29 14:46:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 29 Sep 2006 14:46:42 -0700 Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads In-Reply-To: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> ( Bryan O'Sullivan's message of "Fri, 29 Sep 2006 14:37:51 -0700") References: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> Message-ID: Thanks, applied (I assumed Ralph was the author when merging, please let me know if that was wrong) From ralphc at pathscale.com Fri Sep 29 15:14:48 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Fri, 29 Sep 2006 15:14:48 -0700 Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads In-Reply-To: References: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> Message-ID: <1159568088.29948.14.camel@brick.pathscale.com> Yes, I am the author. On Fri, 2006-09-29 at 14:46 -0700, Roland Dreier wrote: > Thanks, applied (I assumed Ralph was the author when merging, please > let me know if that was wrong) > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sean.hefty at intel.com Fri Sep 29 16:52:26 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 29 Sep 2006 16:52:26 -0700 Subject: [openib-general] [PATCH] ib_cm: fix module unload race with timewait In-Reply-To: <451A2E7E.8050504@voltaire.com> Message-ID: <000701c6e422$51589a60$ff0da8c0@amr.corp.intel.com> If the ib_cm module is unloaded while id's are still in timewait, the CM will destroy the work queue used to process timewait. Once the id's exit timewait, their timers will fire, leading to a crash trying to access the destroyed work queue. We need to track id's that are in timewait, and cancel their deferred work on module unload. Signed-off-by: Sean Hefty --- Erez, can you see if this fixes the crash problem that you're seeing? Index: cm.c =================================================================== --- cm.c (revision 9680) +++ cm.c (working copy) @@ -75,6 +75,7 @@ struct rb_root remote_sidr_table; struct idr local_id_table; __be32 random_id_operand; + struct list_head timewait_list; struct workqueue_struct *wq; } cm; @@ -112,6 +113,7 @@ struct cm_timewait_info { struct cm_work work; /* Must be first. */ + struct list_head list; struct rb_node remote_qp_node; struct rb_node remote_id_node; __be64 remote_ca_guid; @@ -648,13 +650,6 @@ static void cm_cleanup_timewait(struct cm_timewait_info *timewait_info) { - unsigned long flags; - - if (!timewait_info->inserted_remote_id && - !timewait_info->inserted_remote_qp) - return; - - spin_lock_irqsave(&cm.lock, flags); if (timewait_info->inserted_remote_id) { rb_erase(&timewait_info->remote_id_node, &cm.remote_id_table); timewait_info->inserted_remote_id = 0; @@ -664,7 +659,6 @@ rb_erase(&timewait_info->remote_qp_node, &cm.remote_qp_table); timewait_info->inserted_remote_qp = 0; } - spin_unlock_irqrestore(&cm.lock, flags); } static struct cm_timewait_info * cm_create_timewait_info(__be32 local_id) @@ -685,8 +679,12 @@ static void cm_enter_timewait(struct cm_id_private *cm_id_priv) { int wait_time; + unsigned long flags; + spin_lock_irqsave(&cm.lock, flags); cm_cleanup_timewait(cm_id_priv->timewait_info); + list_add_tail(&cm_id_priv->timewait_info->list, &cm.timewait_list); + spin_unlock_irqrestore(&cm.lock, flags); /* * The cm_id could be destroyed by the user before we exit timewait. @@ -702,9 +700,13 @@ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv) { + unsigned long flags; + cm_id_priv->id.state = IB_CM_IDLE; if (cm_id_priv->timewait_info) { + spin_lock_irqsave(&cm.lock, flags); cm_cleanup_timewait(cm_id_priv->timewait_info); + spin_unlock_irqrestore(&cm.lock, flags); kfree(cm_id_priv->timewait_info); cm_id_priv->timewait_info = NULL; } @@ -1308,6 +1310,7 @@ if (timewait_info) { cur_cm_id_priv = cm_get_id(timewait_info->work.local_id, timewait_info->work.remote_id); + cm_cleanup_timewait(cm_id_priv->timewait_info); spin_unlock_irqrestore(&cm.lock, flags); if (cur_cm_id_priv) { cm_dup_req_handler(work, cur_cm_id_priv); @@ -1316,7 +1319,8 @@ cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ, NULL, 0); - goto error; + listen_cm_id_priv = NULL; + goto out; } /* Find matching listen request. */ @@ -1324,21 +1328,20 @@ req_msg->service_id, req_msg->private_data); if (!listen_cm_id_priv) { + cm_cleanup_timewait(cm_id_priv->timewait_info); spin_unlock_irqrestore(&cm.lock, flags); cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_INVALID_SERVICE_ID, CM_MSG_RESPONSE_REQ, NULL, 0); - goto error; + goto out; } atomic_inc(&listen_cm_id_priv->refcount); atomic_inc(&cm_id_priv->refcount); cm_id_priv->id.state = IB_CM_REQ_RCVD; atomic_inc(&cm_id_priv->work_count); spin_unlock_irqrestore(&cm.lock, flags); +out: return listen_cm_id_priv; - -error: cm_cleanup_timewait(cm_id_priv->timewait_info); - return NULL; } static int cm_req_handler(struct cm_work *work) @@ -2634,7 +2637,9 @@ int ret; timewait_info = (struct cm_timewait_info *)work; - cm_cleanup_timewait(timewait_info); + spin_lock_irqsave(&cm.lock, flags); + list_del(&timewait_info->list); + spin_unlock_irqrestore(&cm.lock, flags); cm_id_priv = cm_acquire_id(timewait_info->work.local_id, timewait_info->work.remote_id); @@ -3434,6 +3439,7 @@ idr_init(&cm.local_id_table); get_random_bytes(&cm.random_id_operand, sizeof cm.random_id_operand); idr_pre_get(&cm.local_id_table, GFP_KERNEL); + INIT_LIST_HEAD(&cm.timewait_list); cm.wq = create_workqueue("ib_cm"); if (!cm.wq) @@ -3451,7 +3457,23 @@ static void __exit ib_cm_cleanup(void) { + struct cm_timewait_info *timewait_info; + unsigned long flags; + + spin_lock_irqsave(&cm.lock, flags); + list_for_each_entry(timewait_info, &cm.timewait_list, list) + cancel_delayed_work(&timewait_info->work.work); + spin_unlock_irqrestore(&cm.lock, flags); + destroy_workqueue(cm.wq); + + while (!list_empty(&cm.timewait_list)) { + timewait_info = container_of(cm.timewait_list.next, + struct cm_timewait_info, list); + list_del(&timewait_info->list); + kfree(timewait_info); + } + ib_unregister_client(&cm_client); idr_destroy(&cm.local_id_table); } From rdreier at cisco.com Fri Sep 29 17:10:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 29 Sep 2006 17:10:05 -0700 Subject: [openib-general] [PATCH] ib_cm: fix module unload race with timewait In-Reply-To: <000701c6e422$51589a60$ff0da8c0@amr.corp.intel.com> (Sean Hefty's message of "Fri, 29 Sep 2006 16:52:26 -0700") References: <000701c6e422$51589a60$ff0da8c0@amr.corp.intel.com> Message-ID: Some of these spin_lock_irqsave()s are too conservative. For example: > static void __exit ib_cm_cleanup(void) > { > + struct cm_timewait_info *timewait_info; > + unsigned long flags; > + > + spin_lock_irqsave(&cm.lock, flags); > + list_for_each_entry(timewait_info, &cm.timewait_list, list) > + cancel_delayed_work(&timewait_info->work.work); > + spin_unlock_irqrestore(&cm.lock, flags); > + > destroy_workqueue(cm.wq); destroy_workqueue() can only be called in process context -- so it is fine to just use spin_lock_irq() above. > + > + while (!list_empty(&cm.timewait_list)) { > + timewait_info = container_of(cm.timewait_list.next, > + struct cm_timewait_info, list); > + list_del(&timewait_info->list); > + kfree(timewait_info); > + } list_for_each_entry_safe() here? I assume nothing is getting added to the list while the loop runs... > + > ib_unregister_client(&cm_client); > idr_destroy(&cm.local_id_table); > } From ebiederm at xmission.com Fri Sep 29 17:36:25 2006 From: ebiederm at xmission.com (ebiederm at xmission.com) Date: Fri, 29 Sep 2006 18:36:25 -0600 Subject: [openib-general] [PATCH 0 of 28] ipath patches for 2.6.19 In-Reply-To: (Bryan O'Sullivan's message of "Thu, 28 Sep 2006 08:59:56 -0700") References: Message-ID: "Bryan O'Sullivan" writes: > Hi, Roland - > > This patch series brings the ipath driver almost up to date with what's > in our internal tree. The only substantial thing missing is the > memcpy_cachebypass patch that I sent out a while back and haven't had > time to rework. > > These patches have seen a lot of testing, including on a git snapshot > as of yesterday afternoon. Please apply. Have you tested your driver against the -mm tree? To the best of my knowledge the irq handling of your hypertransport card is a complete and total hack that works only by chance. In the -mm tree I have added a first pass at proper support for the hypertranport interrupt capability. As this code is slated to go into 2.6.19 could you please test against that? I would have tested it myself except when I mentioned this earlier I was told that your card does not actually implement the hypertransport interrupt capability properly. The practical reason for pathscale to work on this is the genirq work in 2.6.19 changes the internal implementation detail your hypertransport card has been relying on to work so your hypertranport card will not work without fixes. Thanks, Eric From bugzilla-daemon at openib.org Sat Sep 30 21:14:00 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sat, 30 Sep 2006 21:14:00 -0700 (PDT) Subject: [openib-general] [Bug 256] New: Missing include in ib_verbs.h Message-ID: <20061001041400.0A8962283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=256 Summary: Missing include in ib_verbs.h Product: OpenFabrics Linux Version: 1.1rc6 Platform: Other OS/Version: Other Status: NEW Severity: normal Priority: P2 Component: Verbs AssignedTo: bugzilla at openib.org ReportedBy: rk at scali.com CC: sp at scali.com ib_verbs.h uses struct kref, but fails to include kref.h One practical effect of this is that lustre fail to compile the o2ib module. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Sat Sep 30 21:14:44 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sat, 30 Sep 2006 21:14:44 -0700 (PDT) Subject: [openib-general] [Bug 256] Missing include in ib_verbs.h Message-ID: <20061001041444.DAF8B2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=256 ------- Comment #1 from rk at scali.com 2006-09-30 21:14 ------- Created an attachment (id=47) --> (http://openib.org/bugzilla/attachment.cgi?id=47&action=view) Add missing include ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Sat Sep 30 23:00:22 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sat, 30 Sep 2006 23:00:22 -0700 (PDT) Subject: [openib-general] [Bug 256] Missing include in ib_verbs.h Message-ID: <20061001060022.89BF82283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=256 vlad at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |jackm at mellanox.co.il ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Sat Sep 30 23:52:07 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Oct 2006 08:52:07 +0200 Subject: [openib-general] [PATCH] IB/ipoib: NAPI In-Reply-To: References: Message-ID: <20061001065206.GA888@mellanox.co.il> Quoting r. Roland Dreier : > Anyway I don't think this is that urgent -- we've dumped enough stuff > into 2.6.19, so I think this should wait for 2.6.20 at the earliest anyway. Hmm, iwarp went in, ehca went in, OK. Pathscale are dumping out their internal tree at a high rate. But if you look at IPoIB over mthca for example, there were almost no changes. Isn't that true? And isn't the NAPI patch quite small? Maybe, if you are worried about stability, we can make NAPI optional and off by default in 2.6.19? There's precedent for this with e1000. This would also give low level driver maintainers the chance to experiment and select the best API's, instead of just guessing which one is best. In 2.6.20 we'll be able to remove the non-NAPI path if it works out well. Want to see a patch like this? -- MST